Database comparison

I am working on a python batch process to validate data between source and target databases. The main validation process queries a rest api for a list of source/target databases tables that are most important for validation. It then spawns off two sub-processes to run a query against either the source or target and collect up results for evaluation. Once the sub-processes have completed, the main validation script will repeat – ask the rest api for the next most important source/target database tables etc.

  1. Query source api for most important source/target database tables to validate
  2. Spawn off two processes to evaluate both source and target database tables
  3. Join point
  4. Compare results and report

Best Laid Plans

In some cases, one or both of the parallel processes will throw an exception and die. In these cases, I want to store the source/target database and table that failed in a ‘naughty list’. In the future, if they show as the most important, I can skip them as they have been marked as failing. Some one will need to intervene on the ‘naughty list’ to correct the problem and then remove them from the ‘naughty list’ so they can even attempt to run again.

‘Naughty List’

I want the ‘Naughty List’ to be at the process level so that I could potentially have many validators working simultaneously all sharing a list of previously unsuccessful source/target database and tables.

See python jupyter notebook at gitlab:

Naughty List for multiprocessor execution.