Created by Nickhil Tekwani and Esha Aggarwal
Data from https://www.kaggle.com/drgilermo/nba-players-stats?select=Seasons_Stats.csv
Scouting and recruiting players in the NBA has been a process that has, for many years, been used and improved upon, but ultimately has always relied on coaches, agents, and scouts to predict if a player will be good just by watching them play. But what if this became more objective? Nothing can replace the experience and subjective thinking of the human brain, but our project idea is to supplement this process with a tool where scouts can predict how good a player might be based on certain stats and demographics in the NBA. This tool would also just be cool to see, based on historical data, how a random set of demographics might do in the NBA. It is important to tackle this problem as there are no PUBLIC tools for this, and is super useful for sports professionals, on top of just being a fun tool for sports fans. Insights can help drive better choices in choosing players, such as in pointing scouts towards talent they may not have recognized before, or just confirming/denying their thoughts on a recruit’s potential success in the NBA. This tool could also help coaches and team administration predict how a given player may perform in the next season, thus changing their minutes and training accordingly. The dataset that will be used to make these predictions holds information about NBA player efficiency dating back to the mid-1900s. It contains a variety of player data ranging from place of birth to number of wins. The specific features that will be used to predict player efficiency rating (target variable) will be height, age, weight, position, and number of years played.
A few questions that have arisen from this dataset include: what is the ideal body type for a successful NBA player, does player efficiency increase or decrease with the number of years played, and is it possible that height and weight have an effect on position and thus their overall efficiency? The player efficiency column in the table is calculated using many numeric statistics, such as minutes played. In this analysis, we are looking for a correlation among the non-numerical factors that are not directly used to calculate player efficiency, but that may inadvertently have some effect on it. From an initial look into the dataset, it seems player position, height, and number of years played will have the most influence on their player efficiency. We hypothesize that a predictive model can be created to accurately predict an NBA player’s efficiency rating based on position, height, weight, and number of years played. Specifically, none of the demographics are part of the statistics used to calculate efficiency rating.
This project is tackling a regression problem as it will be predicting a continuous quantitative output for the player efficiency score based on a given set of factors. The ML algorithm that will most likely be used to make this prediction will be a multiple linear regression. Each of the feature variables will be used to create a linear model of the dataset after their importance is determined. Then, using Recursive Feature Elimination, we will be able to see the most important selected features.