Machine Learning:Energy Consumption Prediction Model

Introduction

Energy consumption plays a crucial role in sustainability and cost management. With advancements in machine learning, predicting energy usage can help optimize efficiency and resource planning. In this project, I developed an Energy Consumption Prediction Model using machine learning techniques, trained on real-world data, and deployed as an interactive web application with Streamlit.

🔗 Live Demo: https://lnkd.in/dS8yNcvw
🔗 GitHub Repository: https://lnkd.in/dDyxPVMt
🔗 Kaggle Dataset: https://www.kaggle.com/datasets/govindaramsriram/energy-consumption-dataset-linear-regression

Libraries Used

The following libraries were utilized:

Pandas – Data handling and preprocessing
NumPy – Numerical computations
Matplotlib & Seaborn – Data visualization
Scikit-learn – Machine learning modeling and evaluation
Streamlit – Web-based model deployment

Dataset Overview

The dataset, sourced from Kaggle, contains 10,000 rows and 7 columns, covering various building features and their corresponding energy consumption levels. The key variables include:

Building Type: Categorical (Residential, Commercial, Industrial)
Square Footage: Continuous numerical feature
Number of Occupants: Numeric
Appliances Used: Numeric, representing the number of appliances
Average Temperature: Continuous numerical feature
Day of the Week: Categorical (Weekday, Weekend)
Energy Consumption: Target variable (continuous numeric value in kWh)

Exploratory Data Analysis (EDA)

Data Structure and Summary

The dataset contains no missing values, ensuring completeness for model training.
Categorical features (Building Type, Day of the Week) were encoded for machine learning compatibility.

Key Visualizations and Insights

Distribution of Energy Consumption: The energy consumption data follows an approximately normal distribution.
Correlation Analysis:
- Square Footage is positively correlated with Energy Consumption, indicating larger buildings consume more energy.
- Appliances Used is positively correlated with Energy Consumption, meaning buildings with more appliances have higher energy usage.
- Average Temperature has a moderate positive correlation, suggesting warmer conditions lead to increased energy consumption, likely due to cooling requirements.

Model Evaluation, Selection, and Training

The selection of the best-performing model was based on evaluation metrics obtained from initial training. The following models were tested and assessed for predictive performance:

Linear Regression – Used as a baseline model but had limited accuracy.
Decision Trees – Captured non-linear patterns but prone to overfitting.
Random Forest – Enhanced generalization but required higher computational resources.
Gradient Boosting – Demonstrated the highest accuracy and generalization, making it the final selected model.

After selecting Gradient Boosting as the best model, hyperparameter tuning was performed to further enhance its performance.

The following models were tested for predictive performance:

Linear Regression – Used as a baseline model but had limited accuracy.
Decision Trees – Captured non-linear patterns but prone to overfitting.
Random Forest – Enhanced generalization but required higher computational resources.
Gradient Boosting (Final Model) – Selected for its balance of accuracy and generalization.

Hyperparameter Tuning and Final Evaluation

The Gradient Boosting model was fine-tuned using GridSearchCV, optimizing:

Learning Rate – Step size during updates.
Number of Estimators – The number of boosting stages.
Maximum Depth – Controls the complexity of the trees.
Minimum Samples Split – Minimum number of samples required to split an internal node.
Minimum Samples Leaf – Minimum number of samples required to be at a leaf node.

Before hyperparameter tuning, the Gradient Boosting model achieved an R² score of 0.9887. After tuning, the final model achieved an R² score of 0.9908, demonstrating strong predictive power.

The Gradient Boosting model was fine-tuned using GridSearchCV, optimizing:

Learning Rate – Step size during updates.
Number of Estimators – The number of boosting stages.
Maximum Depth – Controls the complexity of the trees.

Before hyperparameter tuning, the Gradient Boosting model achieved an R² score of 0.9887. After tuning, the final model achieved an R² score of 0.9908, demonstrating strong predictive power.

Deployment and Application

The trained model was deployed using Streamlit, enabling users to input building details and obtain real-time energy predictions.

Application Features

Simple and intuitive UI for inputting building attributes.
Instantaneous energy consumption predictions using the trained model.
Web-based accessibility, ensuring ease of use.

Introduction

Libraries Used

Dataset Overview

Exploratory Data Analysis (EDA)

Key Visualizations and Insights

Model Evaluation, Selection, and Training

Hyperparameter Tuning and Final Evaluation

Deployment and Application

Application Features

Related Posts

Daily Task Tracker in Excel – Simple, Visual & Automated

MBA or Not? Unveiling Key Trends with Power BI

Power BI Toy Stores KPIs