Detect outliers with 3 methods: LOF, DBSCAN and one-class SVM
- Required packages can be installed with the following command:
pip install -r requirements.txt
consumption_data.xls
is provided. There are 4 columns with 940 entries. The first column denotes entry ID, which is ignored in detecting outliers. Therefore, the data entries are 3-dimensional.- Get numpy array data with size
[940, 3]
with the following code (check outdataset.py
for implementation):
from dataset import get_dataset
data = get_dataset()
- Data visualization:
For detailed descriptions please see report.pdf
.
- Check out
lof.py
for implementation. - Result:
- Check out
dbscan.py
for implementation. - Result:
- Check out
svdd.py
for implementation. - Result with Gaussian kernel:
- Result with linear kernel:
Zhongyu Chen