Housing data analyses🏠🪴 #21

MaherAssaf19 · 2025-01-01T19:27:55Z

This script aims to predict housing prices based on features like the size of the house, number of bedrooms, bathrooms, year built, and location score. Here's a simplified breakdown of the process:

Loading the Data: It loads a small dataset containing information about different houses.
Cleaning the Data: Any missing values are removed to ensure the data is ready for analysis.
Exploratory Data Analysis (EDA): The script provides a quick look at the data with summary statistics and visual plots to understand how the features relate to the price.
Training the Model: A linear regression model is trained to learn the relationship between the features and the house price.
Evaluating the Model: The model's accuracy is checked using metrics like Mean Squared Error and R-squared.
Visualizing Results: The script compares the actual prices to the predicted ones and shows which features matter most in determining the price.

In short, this process builds a predictive model that estimates house prices and helps identify what factors most influence those prices.

Here is the code: 👇

housing_prices_analysis.py

def load_data():
"""
Load a predefined housing dataset.

Returns:
    pd.DataFrame: Loaded dataset as a Pandas DataFrame.
"""
import pandas as pd
from io import StringIO

# Embedded dataset
data = """SquareFeet,Bedrooms,Bathrooms,YearBuilt,LocationScore,Price

1500,3,2,2000,85,300000
2000,4,3,2010,90,450000
1800,3,2,2005,88,350000
2400,4,3,2020,92,500000
1600,3,2,1995,80,280000
1200,2,1,1980,70,200000
"""
return pd.read_csv(StringIO(data))

def preprocess_data(data):
"""
Preprocess the housing dataset by handling missing values and extracting necessary features.

Parameters:
    data (pd.DataFrame): Raw dataset.

Returns:
    pd.DataFrame: Preprocessed dataset.
"""
data = data.dropna()
return data

def analyze_data(data):
"""
Perform exploratory data analysis on the dataset.

Parameters:
    data (pd.DataFrame): Dataset to analyze.

Returns:
    None: Prints summary statistics and shows plots.
"""
import matplotlib.pyplot as plt
import seaborn as sns

print("Dataset Summary:")
print(data.describe())

sns.pairplot(data[['SquareFeet', 'Bedrooms', 'Bathrooms', 'YearBuilt', 'LocationScore', 'Price']])
plt.show()

def train_model(data):
"""
Train a predictive model using the dataset.

Parameters:
    data (pd.DataFrame): Preprocessed dataset.

Returns:
    model: Trained model.
    X_test (pd.DataFrame): Test features.
    y_test (pd.Series): Test target values.
"""
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

features = ['SquareFeet', 'Bedrooms', 'Bathrooms', 'YearBuilt', 'LocationScore']
target = 'Price'

X = data[features]
y = data[target]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = LinearRegression()
model.fit(X_train, y_train)

return model, X_test, y_test

def evaluate_model(model, X_test, y_test):
"""
Evaluate the trained model using Mean Squared Error and R-squared metrics.

Parameters:
    model: Trained model.
    X_test (pd.DataFrame): Test features.
    y_test (pd.Series): Test target values.

Returns:
    None: Prints evaluation metrics.
"""
from sklearn.metrics import mean_squared_error, r2_score

y_pred = model.predict(X_test)

mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Mean Squared Error: {mse}")
print(f"R-squared: {r2}")

def visualize_results(model, X_test, y_test):
"""
Visualize the actual vs predicted prices and feature importance.

Parameters:
    model: Trained model.
    X_test (pd.DataFrame): Test features.
    y_test (pd.Series): Test target values.

Returns:
    None: Displays plots.
"""
import matplotlib.pyplot as plt
import pandas as pd

# Actual vs Predicted Prices
y_pred = model.predict(X_test)
plt.figure(figsize=(10, 6))
plt.scatter(y_test, y_pred, alpha=0.6, color='blue')
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], '--r', linewidth=2)
plt.xlabel('Actual Prices')
plt.ylabel('Predicted Prices')
plt.title('Actual vs Predicted Prices')
plt.show()

# Feature Coefficients
coefficients = pd.Series(model.coef_, index=X_test.columns)
plt.figure(figsize=(8, 4))
coefficients.plot(kind='bar', color='skyblue')
plt.title('Feature Coefficients')
plt.ylabel('Coefficient Value')
plt.show()

Example Usage

if name == "main":
data = load_data()
data = preprocess_data(data)
analyze_data(data)
model, X_test, y_test = train_model(data)
evaluate_model(model, X_test, y_test)
visualize_results(model, X_test, y_test)

The text was updated successfully, but these errors were encountered:

MaRia19280 · 2025-01-07T22:29:51Z

This script is an excellent end-to-end solution for predicting housing prices, covering data cleaning, EDA, model training, evaluation, and visualization. Its structured workflow ensures reliability, while linear regression provides interpretability. Adding cross-validation, handling outliers, or testing advanced models could further enhance performance. Overall, it's a solid foundation for real estate analytics. Well put, Maher 😊

MuhannadGTR · 2025-01-07T23:06:54Z

Hi Maher,

Please follow the instructions for adding your file.

MaherAssaf19 added this to ET6 Foundations Group 17 Jan 1, 2025

MaherAssaf19 moved this to TODO in ET6 Foundations Group 17 Jan 1, 2025

abdoalsir removed this from ET6 Foundations Group 17 Jan 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Housing data analyses🏠🪴 #21

Housing data analyses🏠🪴 #21

MaherAssaf19 commented Jan 1, 2025 •

edited

Loading

MaRia19280 commented Jan 7, 2025

MuhannadGTR commented Jan 7, 2025

Housing data analyses🏠🪴 #21

Housing data analyses🏠🪴 #21

Comments

MaherAssaf19 commented Jan 1, 2025 • edited Loading

housing_prices_analysis.py

Example Usage

MaRia19280 commented Jan 7, 2025

MuhannadGTR commented Jan 7, 2025

MaherAssaf19 commented Jan 1, 2025 •

edited

Loading