Book Appointment Now
Blog
Key Concepts and Insights from My Machine Learning Class
Introduction to Machine Learning
One of the things I love most about my master’s program is the exposure to a variety of fields—especially data analysis, machine learning, and everything in between. Before diving into machine learning, I was already comfortable with Python, crunching numbers, and creating cool visualizations. But machine learning? That felt like stepping into a sci-fi movie where computers magically predict outcomes. Spoiler alert: It’s not magic (though it sometimes feels like it).
This blog is a mix of what I’ve learned so far, and insights from my class. I’m still learning, and I don’t intend to stop anytime soon. Let’s dive in!
How Machine Learning Works
Machine learning follows a structured workflow that includes:
- Data Collection – Gathering structured (e.g., databases) or unstructured (e.g., images, text) data.
- Data Preprocessing – Cleaning and transforming data into a usable format.
- Model Selection – Choosing a suitable algorithm based on the problem type.
- Model Training – Teaching the model using historical data.
- Evaluation & Tuning – Assessing model performance and optimizing hyperparameters.
- Deployment – Implementing the model for real-world usage.
Types of Machine Learning
Understanding the type of data you’re working with is crucial in machine learning, as it determines the best approach for training a model. Whether the data is labeled or unlabeled, structured or unstructured, choosing the right learning technique—supervised, unsupervised, or reinforcement learning—plays a key role in achieving meaningful insights and accurate predictions.
Type of Learning | Algorithm | Description / Use Cases |
Supervised Learning | Linear Regression | Predicting continuous values (e.g., stock prices, sales forecasts) |
Logistic Regression | Binary classification (e.g., spam detection, disease diagnosis) | |
K-Nearest Neighbors (KNN) | Classification & regression based on proximity to labeled data | |
Support Vector Machines (SVMs) | Classification & regression in high-dimensional spaces | |
Decision Trees | Splitting data into branches for classification & regression | |
Random Forest | Ensemble method to reduce overfitting & improve accuracy | |
XGBoost | Optimized gradient boosting for structured data analysis | |
Neural Networks (Deep Learning) | Complex problems (e.g., image recognition, NLP, forecasting) | |
Unsupervised Learning | K-Means Clustering | Grouping data points into distinct clusters |
Hierarchical Clustering | Building a hierarchy of clusters for multi-level analysis | |
DBSCAN | Density-based clustering for discovering hidden patterns | |
Principal Component Analysis (PCA) | Reducing dimensionality while retaining key information | |
Reinforcement Learning | Q-Learning | Learning optimal actions in an environment (e.g., game playing, robotics) |
Deep Q-Networks (DQN) | Combining deep learning with Q-learning for complex decision-making | |
Policy Gradient Methods | Optimizing policies directly for reinforcement tasks |
Data Preparation & Feature Engineering
What could go wrong? A model can perform like a genius in training but act like a clueless intern when faced with real-world data. That’s overfitting—when your model memorizes instead of learning. On the flip side, if your model is as clueless as me trying to read ancient hieroglyphics, that’s underfitting—it just didn’t learn enough. It will be poor in training and testing.
To avoid these issues, proper data preparation is key:
- Handling Missing Values – Fill in gaps or remove incomplete data to prevent misleading patterns.
- Encoding Categorical Variables – Convert text categories into numbers so the model can understand them.
- Feature Scaling – Normalize numerical values to ensure fair comparisons.
- Outlier Detection – Identify and handle extreme values that could distort predictions.
Once the data is clean, feature engineering helps refine what the model learns from:
- Creating New Features – Combining existing ones to extract more useful information.
- Selecting Important Features – Removing unnecessary or redundant data to improve model efficiency.
- Reducing Complexity – Techniques like Principal Component Analysis (PCA) and feature selection methods (e.g., removing highly correlated variables) help simplify large datasets while keeping the most important information, making it easier for the model to learn.
Model Evaluation & Hyperparameter Tuning
Building a machine learning model is just the beginning—the real challenge is ensuring it performs well on new, unseen data. Model evaluation helps measure how well a model is learning, while hyperparameter tuning fine-tunes its performance to achieve the best results.
Model Evaluation Metrics:
- Classification Metrics (For models predicting categories):
- Accuracy – The percentage of correctly predicted labels.
- Precision – How many positive predictions were actually correct?
- Recall – How many actual positives were correctly identified?
- F1 Score – A balance between precision and recall, useful for imbalanced datasets.
- Regression Metrics (For models predicting continuous values):
- Mean Absolute Error (MAE) – The average absolute difference between predicted and actual values.
- Root Mean Squared Error (RMSE) – Penalizes large errors more than MAE, making it sensitive to outliers.
- R-Squared (R²) – Explains how well the model fits the data (1 = perfect fit, 0 = no fit).
Hyperparameter Tuning:
Machine learning models have settings (hyperparameters) that significantly affect performance. Instead of guessing, we can use:
- GridSearchCV – Tests all possible combinations of hyperparameters (more precise but time-consuming).
- RandomizedSearchCV – Randomly selects hyperparameter combinations for faster tuning.
Deploying Machine Learning Models
After training, ML models can be deployed for real-world applications:
- Flask/FastAPI – Creating web-based ML applications.
- Streamlit – Building interactive ML dashboards.
- Cloud Platforms – Deploying models on AWS, Google Cloud, or Azure.
My Machine Learning Flex Just Getting Started
The hardest part of machine learning? It’s not just writing the code—it’s figuring out which algorithm will actually work for the data and give the best accuracy. Sometimes, even the most promising model flops, and you’re left questioning if the data is broken or if the data scientist (me) needs debugging.
But after tackling real-life datasets, fine-tuning models, and learning from every misstep, I’m finally flexing my ML skills!
In addition to class projects, assignments, and proving my dominance in Kahoot challenges, I’m also growing my portfolio. Check out my Kaggle projects here:
🔗https://www.kaggle.com/monicahnjiiri
Watch this space—more projects are on the way!