The project leverages Weka's data loading and preprocessing capabilities through several key components:
- CSVLoader: Used in
DataLoader
class to convert CSV files to Weka's native ARFF format - ArffLoader: Used in
ClassificationFramework
to load processed ARFF files - Conversion Process:
- CSV files are first loaded using
CSVLoader
- Data is then preprocessed (removing unnecessary attributes, setting class index)
- Data can be saved as ARFF using
ArffSaver
- CSV files are first loaded using
- Standardization: Uses Weka's
Standardize
filter to normalize numeric attributesStandardize standardize = new Standardize(); standardize.setInputFormat(fullData); fullData = Filter.useFilter(fullData, standardize);
- Automatic attribute removal
- Class index setting
- Optional data sampling for large datasets
- Interfaces and Abstract Classes:
IModel
interface defines core method signaturesBaseClassifier
provides common implementation for classifier methods- Specific classifier classes (LinearRegression, SVMRegression, etc.) extend
BaseClassifier
- Supports multiple regression models:
- Linear Regression
- SVM Regression
- M5P Decision Tree
- Random Forest
-
Train-Test Split:
- 80% training, 20% testing data
- Manual prediction and error calculation
-
Cross-Validation:
- 10-fold cross-validation using Weka's
Evaluation
class - Calculates metrics like:
- Correlation coefficient
- Mean absolute error
- Root mean squared error
- Relative error percentages
- 10-fold cross-validation using Weka's
Each classifier is configured with specific Weka API parameters:
SMOreg svm = new SMOreg();
svm.setC(1.0); // Complexity parameter
svm.setFilterType(new SelectedTag(SMOreg.FILTER_NORMALIZE, SMOreg.TAGS_FILTER));
PolyKernel polyKernel = new PolyKernel();
polyKernel.setExponent(1.0);
svm.setKernel(polyKernel);
lr.setAttributeSelectionMethod(new SelectedTag(1, LinearRegression.TAGS_SELECTION));
lr.setRidge(1.0E-8);
lr.setEliminateColinearAttributes(true);
- GUI allows interactive:
- Dataset loading
- Classifier selection
- Model training and evaluation
- Background threading for model training
- Real-time logging of training process
Instances
: Core data structureFilter
: Data preprocessingClassifier
: Model trainingEvaluation
: Performance metrics- Kernel and optimization configurations
This implementation demonstrates a comprehensive machine learning workflow using Weka's powerful APIs, providing flexibility in data preprocessing, model training, and evaluation.