Skip to content

Sanikommus/Pima_Indians_Diabetes_Dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Pima-Indians-Diabetes-Dataset

Given the Pima Indians Diabetes Database as a csv file. This data-set is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The objective is to predict based on diagnostic measurements whether a patient has diabetes. Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here are females with at least 21 years old of Pima Indian heritage.

The Data_Visualization code:

  • Loads the csv file into the Spyder Enviornment.
  • Calculates and print the various statistical features of each attribute like Mean, Median, Mode etc.
  • Calculates and print the correlation cofficient between the target attribute and various columns
  • Plots the scatter plot between 2 different attributes.

The Data_Preprocessing code:

  • Imports PCA from sklearn.
  • Loads the csv file into the Spyder Enviornment.
  • Normalization and standarization of each and every attribute except the target class.
  • Then generates a syntheic data inoder to perform Eigenvalue and EigenVector decomposition.
  • Applys PCA on the given Dataset in oder to reduce the dimensions of the data.
  • Caluculates and print the covarience matrix after dimensionality reduction.

Input Dataset

https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database

image

Output

Statistical Analysis:

image

image

image

Normalization and Standarization of each column :

image

image

EigenValue analysis Of the Synetheic Data:

image

PCA on the main Dataset :

image

image

image

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages