Predicting Housing Prices with Statistical Learning

[Python][XGBoost][SHAP][Machine Learning]

§1. Abstract

I built a machine learning model that acts like an automated real estate appraiser. By feeding it data on thousands of houses (square footage, quality, age), it learned how to accurately predict the sale price of a home, and can explain exactly which features added or subtracted value.

§2. Methodology & Implementation

Applied a structured statistical learning pipeline to predict housing prices. Handled mixed variable types, missing values, and feature engineering. Evaluated six models using 10-fold cross-validation. The final XGBoost model achieved an RMSE of $16,840 (R² = 0.947), with SHAP values used to interpret individual feature contributions.

§3. Key Metrics

ModelXGBoost
RMSE$16,840
0.947

§4. Full Analysis & Code