Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
The Anonymous Synthesizer for Health Data features now a full pipeline to produce a synthetic dataset from input data choosing the best fitting SDV synthesizer model, according to similarity of data distribution in synthetic vs input data. The SDV synthesizers are initialised with heuristic settings to prevent over and underfitting. In the case of the GaussianCopulaSynthesizer, the best distribution to fit each numerical data column is determined and finally applied to initialise the synthesizer to be scored. The resulting model can be saved to pkl and reused to produce more synthetic data an demand. ASyH 1.0.0 is based on SDV-1.0.0. Optional reporting will give an overview of data fit and correlations between the synthetic and original input data.
- Loading branch information