Probability theory:
- Probability
- Random variables
- Probability distributions
- Conditional probability is crucial for modeling uncertainty in ML.
Descriptive statistics:
- Measures of central tendency (mean, median, mode)
- measures of dispersion (variance, standard deviation)
Inferential statistics:
- Hypothesis testing
- Confidence intervals
- P-values are essential for making inferences and drawing conclusions from data samples.
Regression analysis:
- Linear regression and its variants are widely used in ML for modeling relationships between variables and making predictions.
Probability distributions:
- Gaussian (normal) distribution
- Binomial distribution
- Poisson's distribution is beneficial for understanding the behavior of data and modeling assumptions.
Sampling techniques:
- Understanding different sampling techniques, such as random sampling and stratified sampling, is important for collecting representative training and test datasets.
Statistical hypothesis testing:
- Knowing how to perform hypothesis tests, interpret the results
- Make decisions based on statistical significance is crucial for evaluating ML models.
Statistical modeling: Knowledge of techniques like
- maximum likelihood estimation (MLE),
- Bayesian inference can be helpful for parameter estimation and building probabilistic models.
Experimental design:
- Understanding principles of experimental design, such as randomization, control groups, and factorial designs, helps in conducting rigorous experiments and A/B testing in ML.
Multivariate statistics:
- Techniques like principal component analysis (PCA), factor analysis
- Cluster analysis provide tools for dimensionality reduction, feature selection
- Pattern recognition in high-dimensional datasets.
Exploratory data analysis
- Scatter plot.
- Pair Plot.
- Histogram
- Cumulative Distribution
- Mean and Standard Deviation
- Median, Percentile, Quantile
- MAD, Box plot and Voilin Plot
- EDA on Cancer Dataset
- Gaussian or Normal distribution
- Skewness and Kurtosis
- Sampling Distribution & Standard Normal Variate(z) and Standardization
- Quantile quantile plot
- Chebyshev's inequality
- Uniform Distribution
- Bernoulli Vs Binomial VS Normal VS Pareto Distribute.
- Box Cox Transformation
- Covariance Statistics
- Pearson Correlation
- Spearman rank Correlation Coefficient
- Correlation VS Causation and confidence interval.
- Confidence Interval with underlying or Gaussian Distribution.
- Hypothesis testing and P value statistics.
- T test vs Chi Square test VS Anova test