Book Appointment Now

Machine Learning:Energy Consumption Prediction Model
Introduction
Energy consumption plays a crucial role in sustainability and cost management. With advancements in machine learning, predicting energy usage can help optimize efficiency and resource planning. In this project, I developed an Energy Consumption Prediction Model using machine learning techniques, trained on real-world data, and deployed as an interactive web application with Streamlit.
π Live Demo: https://lnkd.in/dS8yNcvw
π GitHub Repository: https://lnkd.in/dDyxPVMt
π Kaggle Dataset: https://www.kaggle.com/datasets/govindaramsriram/energy-consumption-dataset-linear-regression

Libraries Used
The following libraries were utilized:
- Pandas β Data handling and preprocessing
- NumPy β Numerical computations
- Matplotlib & Seaborn β Data visualization
- Scikit-learn β Machine learning modeling and evaluation
- Streamlit β Web-based model deployment
Dataset Overview
The dataset, sourced from Kaggle, contains 10,000 rows and 7 columns, covering various building features and their corresponding energy consumption levels. The key variables include:
- Building Type: Categorical (Residential, Commercial, Industrial)
- Square Footage: Continuous numerical feature
- Number of Occupants: Numeric
- Appliances Used: Numeric, representing the number of appliances
- Average Temperature: Continuous numerical feature
- Day of the Week: Categorical (Weekday, Weekend)
- Energy Consumption: Target variable (continuous numeric value in kWh)
Exploratory Data Analysis (EDA)
Data Structure and Summary
- The dataset contains no missing values, ensuring completeness for model training.
- Categorical features (
Building Type
,Day of the Week
) were encoded for machine learning compatibility.
Key Visualizations and Insights
- Distribution of Energy Consumption: The energy consumption data follows an approximately normal distribution.
- Correlation Analysis:
- Square Footage is positively correlated with Energy Consumption, indicating larger buildings consume more energy.
- Appliances Used is positively correlated with Energy Consumption, meaning buildings with more appliances have higher energy usage.
- Average Temperature has a moderate positive correlation, suggesting warmer conditions lead to increased energy consumption, likely due to cooling requirements.
Model Evaluation, Selection, and Training
The selection of the best-performing model was based on evaluation metrics obtained from initial training. The following models were tested and assessed for predictive performance:
- Linear Regression β Used as a baseline model but had limited accuracy.
- Decision Trees β Captured non-linear patterns but prone to overfitting.
- Random Forest β Enhanced generalization but required higher computational resources.
- Gradient Boosting β Demonstrated the highest accuracy and generalization, making it the final selected model.
After selecting Gradient Boosting as the best model, hyperparameter tuning was performed to further enhance its performance.
The following models were tested for predictive performance:
- Linear Regression β Used as a baseline model but had limited accuracy.
- Decision Trees β Captured non-linear patterns but prone to overfitting.
- Random Forest β Enhanced generalization but required higher computational resources.
- Gradient Boosting (Final Model) β Selected for its balance of accuracy and generalization.
Hyperparameter Tuning and Final Evaluation
The Gradient Boosting model was fine-tuned using GridSearchCV, optimizing:
- Learning Rate β Step size during updates.
- Number of Estimators β The number of boosting stages.
- Maximum Depth β Controls the complexity of the trees.
- Minimum Samples Split β Minimum number of samples required to split an internal node.
- Minimum Samples Leaf β Minimum number of samples required to be at a leaf node.
Before hyperparameter tuning, the Gradient Boosting model achieved an RΒ² score of 0.9887. After tuning, the final model achieved an RΒ² score of 0.9908, demonstrating strong predictive power.
The Gradient Boosting model was fine-tuned using GridSearchCV, optimizing:
- Learning Rate β Step size during updates.
- Number of Estimators β The number of boosting stages.
- Maximum Depth β Controls the complexity of the trees.
Before hyperparameter tuning, the Gradient Boosting model achieved an RΒ² score of 0.9887. After tuning, the final model achieved an RΒ² score of 0.9908, demonstrating strong predictive power.
Deployment and Application
The trained model was deployed using Streamlit, enabling users to input building details and obtain real-time energy predictions.
Application Features
- Simple and intuitive UI for inputting building attributes.
- Instantaneous energy consumption predictions using the trained model.
- Web-based accessibility, ensuring ease of use.