dpone-datagenie is a Python library designed to facilitate the generation of synthetic datasets for machine learning and artificial intelligence applications. It provides functions to create synthetic dataframes with various features, including numerical and categorical variables, and offers utilities for visualization and data augmentation. With dpone-datagenie, users can quickly generate synthetic datasets tailored to their specific needs, whether for model training, testing, or exploration.
- Supervised Data Generation: Generate synthetic datasets with labeled data for supervised learning tasks.
- Numerical and Categorical Features: Create synthetic dataframes with both numerical and categorical features.
- Visualization Tools: Visualize correlations and distributions of features within the synthetic datasets.
- Data Augmentation: Apply techniques such as SMOTE (Synthetic Minority Over-sampling Technique) to address class imbalance in the datasets.
You can install Synthetic using pip:
pip install dpone-datagenie
import dpone-datagenie.generate as synthetic
# Generate a supervised numerical dataframe
numerical_df = synthetic.generate_supervised_numerical_dataframe(num_samples=1000, num_features=5)
# Generate a supervised dataframe with mixed numerical and categorical features
mixed_df = synthetic.generate_supervised_mixed_dataframe(num_samples=1000, num_numerical_features=3, num_categorical_features=2)
# Display the first few rows of the generated dataframes
print("Numerical DataFrame:")
print(numerical_df.head())
print("\nMixed DataFrame:")
print(mixed_df.head())
import dpone-datagenie
# Plot correlation heatmap
dpone-datagenie.plot_correlation(mixed_df)
# Plot distribution of features
dpone-datagenie.plot_distribution(mixed_df)
import dpone-datagenie
# Apply SMOTE for data augmentation
augmented_df = dpone-datagenie.apply_smote(mixed_df, target_column='label')
# Display the first few rows of the augmented dataframe
print("Augmented DataFrame:")
print(augmented_df.head())
For more examples and detailed usage instructions, please refer to the Examples directory in the repository.
We welcome contributions from the community. If you encounter any issues, have suggestions for improvements, or would like to contribute code, please feel free to open an issue or pull request on our GitHub repository.
This project is licensed under the MIT License - see the LICENSE file for details.