Blog – Monicah Njiiri

Key Concepts and Insights from My Machine Learning Class

Introduction to Machine Learning

One of the things I love most about my master’s program is the exposure to a variety of fields—especially data analysis, machine learning, and everything in between. Before diving into machine learning, I was already comfortable with Python, crunching numbers, and creating cool visualizations. But machine learning? That felt like stepping into a sci-fi movie where computers magically predict outcomes. Spoiler alert: It’s not magic (though it sometimes feels like it).

This blog is a mix of what I’ve learned so far, and insights from my class. I’m still learning, and I don’t intend to stop anytime soon. Let’s dive in!

How Machine Learning Works

Machine learning follows a structured workflow that includes:

Data Collection – Gathering structured (e.g., databases) or unstructured (e.g., images, text) data.
Data Preprocessing – Cleaning and transforming data into a usable format.
Model Selection – Choosing a suitable algorithm based on the problem type.
Model Training – Teaching the model using historical data.
Evaluation & Tuning – Assessing model performance and optimizing hyperparameters.
Deployment – Implementing the model for real-world usage.

Types of Machine Learning

Understanding the type of data you’re working with is crucial in machine learning, as it determines the best approach for training a model. Whether the data is labeled or unlabeled, structured or unstructured, choosing the right learning technique—supervised, unsupervised, or reinforcement learning—plays a key role in achieving meaningful insights and accurate predictions.

Type of Learning	Algorithm	Description / Use Cases
Supervised Learning	Linear Regression	Predicting continuous values (e.g., stock prices, sales forecasts)
	Logistic Regression	Binary classification (e.g., spam detection, disease diagnosis)
	K-Nearest Neighbors (KNN)	Classification & regression based on proximity to labeled data
	Support Vector Machines (SVMs)	Classification & regression in high-dimensional spaces
	Decision Trees	Splitting data into branches for classification & regression
	Random Forest	Ensemble method to reduce overfitting & improve accuracy
	XGBoost	Optimized gradient boosting for structured data analysis
	Neural Networks (Deep Learning)	Complex problems (e.g., image recognition, NLP, forecasting)
Unsupervised Learning	K-Means Clustering	Grouping data points into distinct clusters
	Hierarchical Clustering	Building a hierarchy of clusters for multi-level analysis
	DBSCAN	Density-based clustering for discovering hidden patterns
	Principal Component Analysis (PCA)	Reducing dimensionality while retaining key information
Reinforcement Learning	Q-Learning	Learning optimal actions in an environment (e.g., game playing, robotics)
	Deep Q-Networks (DQN)	Combining deep learning with Q-learning for complex decision-making
	Policy Gradient Methods	Optimizing policies directly for reinforcement tasks

Data Preparation & Feature Engineering

What could go wrong? A model can perform like a genius in training but act like a clueless intern when faced with real-world data. That’s overfitting—when your model memorizes instead of learning. On the flip side, if your model is as clueless as me trying to read ancient hieroglyphics, that’s underfitting—it just didn’t learn enough. It will be poor in training and testing.

To avoid these issues, proper data preparation is key:

Handling Missing Values – Fill in gaps or remove incomplete data to prevent misleading patterns.
Encoding Categorical Variables – Convert text categories into numbers so the model can understand them.
Feature Scaling – Normalize numerical values to ensure fair comparisons.
Outlier Detection – Identify and handle extreme values that could distort predictions.

Once the data is clean, feature engineering helps refine what the model learns from:

Creating New Features – Combining existing ones to extract more useful information.
Selecting Important Features – Removing unnecessary or redundant data to improve model efficiency.
Reducing Complexity – Techniques like Principal Component Analysis (PCA) and feature selection methods (e.g., removing highly correlated variables) help simplify large datasets while keeping the most important information, making it easier for the model to learn.

Model Evaluation & Hyperparameter Tuning

Building a machine learning model is just the beginning—the real challenge is ensuring it performs well on new, unseen data. Model evaluation helps measure how well a model is learning, while hyperparameter tuning fine-tunes its performance to achieve the best results.

Model Evaluation Metrics:

Classification Metrics (For models predicting categories):
- Accuracy – The percentage of correctly predicted labels.
- Precision – How many positive predictions were actually correct?
- Recall – How many actual positives were correctly identified?
- F1 Score – A balance between precision and recall, useful for imbalanced datasets.
Regression Metrics (For models predicting continuous values):
- Mean Absolute Error (MAE) – The average absolute difference between predicted and actual values.
- Root Mean Squared Error (RMSE) – Penalizes large errors more than MAE, making it sensitive to outliers.
- R-Squared (R²) – Explains how well the model fits the data (1 = perfect fit, 0 = no fit).

Hyperparameter Tuning:

Machine learning models have settings (hyperparameters) that significantly affect performance. Instead of guessing, we can use:

GridSearchCV – Tests all possible combinations of hyperparameters (more precise but time-consuming).
RandomizedSearchCV – Randomly selects hyperparameter combinations for faster tuning.

Deploying Machine Learning Models

After training, ML models can be deployed for real-world applications:

Flask/FastAPI – Creating web-based ML applications.
Streamlit – Building interactive ML dashboards.
Cloud Platforms – Deploying models on AWS, Google Cloud, or Azure.

My Machine Learning Flex Just Getting Started

The hardest part of machine learning? It’s not just writing the code—it’s figuring out which algorithm will actually work for the data and give the best accuracy. Sometimes, even the most promising model flops, and you’re left questioning if the data is broken or if the data scientist (me) needs debugging.

But after tackling real-life datasets, fine-tuning models, and learning from every misstep, I’m finally flexing my ML skills!

In addition to class projects, assignments, and proving my dominance in Kahoot challenges, I’m also growing my portfolio. Check out my Kaggle projects here:

🔗https://www.kaggle.com/monicahnjiiri

Watch this space—more projects are on the way!