Student-Habits-vs-Academic-Performance

Data cleaning, and exploratory analysis using Python Pandas and scikit-learn.

Project Overview

The Challenge

Finding out which variables have a high correlation with test scores in the dataset in order to make a linear regression model.

My Solution

Since a lot of the columns had categorical non-numerical data, the best solution was to one-hot encode the values so that I could use them in the correlation heatmap.

Impact

  • Was able to see the relationships in data because I used a heatmap
  • Determined that the variable that impacted high test scores the most was the study hours per day

Technical Details

Data Visualizations

Data Structure Analysis

Column information visualization

This visual shows the data columns with their types before and after one-hot encoding.

Feature Importance

Feature importance visualization

The feature importance reveals which factors most significantly impact test scores.

Correlation Heatmap

Correlation heatmap visualization

Correlation heatmap showing relationships between different variables in the dataset.

Regression Analysis

Linear regression scatter plot

Scatter plot with regression line showing how well the predicted scores are against the actual scores.