AWS Machine Learning Certification Notes
A comprehensive guide covering ML fundamentals, learning types, project phases, evaluation metrics, and Amazon SageMaker.
Training Data Types
Labelled Data
Contains pairs of input data + corresponding output/response labels.
Unlabelled Data
Contains just input features without any output labels
Structured vs Unstructured Data
(rows/columns vs text heavy/multimedia content)
Datasets
- Training Dataset: 60% of the data
- Validation Dataset: 10-20% - to tune the model parameters
- Evaluation Dataset: 10-20% - to evaluate the final performance of model
Supervised Learning
Supervised learning uses labelled data to train the ML model.
Techniques of Supervised Learning
I. Regression
Predicting continuous numeric value based on input data.
Examples: Forecasting housing prices, stock prices, weather data
II. Classification
To classify, determine kind of a thing.
Examples: Email type (Spam/Non-Spam), movie genre
Algorithm: K-Nearest Neighbor (Tell me about your neighbors and I will tell who are you)
- Multi-class Classification: Assigns each instance to ONE of several possible classes.
- Multi-label Classification: Assigns each instance to one or more classes.
Support Vector Machines are supervised learning algorithms used in regression and classification.
Decision Trees are popular for their interpretability.
Unsupervised Learning
Uses unlabelled data to train the ML model. It recognizes patterns, structures and relationships in input data.
Common Techniques of Unsupervised Learning
I. Clustering
Grouping similar data points together.
Examples: Grouping users based on purchasing behavior, similar habits
Algorithm: K-means algorithm of clustering, K is number of clusters.
II. Association Rule Learning
Finding association between different features in dataset.
Examples: Market Based Analysis such as bread/eggs are bought together, TV Shows watched together.
Algorithm: Apriori Algorithm.
III. Anomaly Detection
Identify outliers in dataset. Outliers are those items which do not fit normal pattern.
Algorithm: Isolation Forest
IV. Probability Density Estimation
Technique for estimating the probability of new data point will fall within a particular range.
Principal Component Analysis (PCA)
Dimensionality reduction technique used in unsupervised learning. Reduces the number of features in your dataset.
Semi-supervised Learning
Small labelled data + large unlabelled data = used to train the model. Bridges supervised and unsupervised learning.
Use Cases:
- Pseudo-Labelling: Model can label the unlabelled data.
Reinforcement Learning
Based on Reward Function
Agent learns by interacting with external environment. Agent does some action, environment gives feedback in form of reward or penalty. Goal is to maximize the cumulative reward over time
RLHF: Reinforcement Learning from Human Feedback
Incorporate human feedback in reward function.
Phases of ML Project
- Define Business Goals
- Frame ML Problem
- Data Collection and Preparation
- Model Training, Evaluation, Deployment, Monitoring
Business Goals
Success criteria, key performance indicators, budget and value of project
Data Collection and Preparation
- Useable format
- Make it centrally accessible
- Data visualization: Exploratory Data Analysis
- Feature engineering
Exploratory Data Analysis (EDA)
Stage in Data Preparation Phase of ML Project.
EDA is the process of examining and understanding dataset with help of graphs before applying it to ML models.
Goals:
- Finding correlations between variables.
- Detect missing values
- Discover important features
Helps to make decision about feature engineering, pre-processing.
Feature Engineering
Transforming raw data into meaningful features by using domain knowledge.
Common Steps in Feature Engineering:
- Feature Extraction: Deriving new feature from raw data. Extracting day/month from timestamp
- Feature Selection: Choosing the most relevant features
- Feature Crossing: Combining two or more features to capture non-linear relationships
- Dimensionality Reduction: Reduce number of features by Principal Component Analysis.
Inferencing
Deployment and Monitoring Phase of ML Project. When trained ML model is making predictions about new, unseen data.
A. Real-time Inferencing
Used when you need Instant response and have predictable load.
B. Serverless Inferencing
Server-less inferencing is ideal for intermittent, sporadic traffic. Which can tolerate variations in latency. In Server-less inferencing, compute resources needed to run ML model are managed by Cloud.
C. Asynchronous Inferencing
Individual Inference requests are processed in the background.
Best for: When Input data is less than 1GB and "Near real-time latency" is needed.
D. Batch Inferencing
Making multiple predictions on large dataset all at once. Data is more than 1GB.
Best for:
- Offline scenarios.
- High accuracy needed, latency is not an issue.
Inferencing at the Edge
Small Language Model (SML) runs directly on edge device. Local inference, low latency
Hyperparameters Tuning
Hyper-parameters: Configuration settings that control "learning process" of the model.
Tuning: Finding the best combination of hyper-parameters
A. Algorithms for HP Tuning
B. Important Hyper-parameters
Under-fitting & Over-fitting
Under-fitting
Model is not predicting output correctly neither on training data nor on test data.
Training Error = High, Test Error = High
Results in High-Bias.
Cause:
- When model has not been trained for the appropriate length of time.
- Model is too simple.
- Model cannot capture the underlying pattern between input and output data.
Example: Linear model for non-linear problem
Solutions:
- Train longer
- Add more features
Over-Fitting
Performs good on training data, but poor on test data. Training Error = Low, Test Error = High. Results in High Variance.
Causes:
- The model is too complex. It captures noise.
- It has been trained longer than required.
Example: Deep neural network with little data
Solution:
- Raise regularization Coefficient: technique to reduce model complexity
- Apply early stopping - low number of epochs
- Data augmentation: Increase diversity of training data
Bias and Variance
Bias
Difference between predicted value and actual value
High Bias means high difference → under-fitting → reduce bias by add more features in training data
Variance
Measures consistency of model. How will model perform if we train it on a different sample of same dataset?
High Variance means model is sensitive to changes in training data - it learns noise → over-fitting problem
Reduce variance by feature selection → reduce number of features, raise Regularization Coefficient to reduce complexity
Evaluation Metrics
Evaluation Metric for Classification Models
Confusion Matrix
Tool for visualizing the performance of a classification model.
I. Precision
How many times we predicted positives correctly? Best: when false positives are costly.
II. Recall
How many times we have to recall decision. Walk back. Made wrong decision. Best: false negatives are costly.
III. Accuracy
Measures proportion of correctly predicted instances out of total number of instances.
IV. F1
Harmonic mean of precision and recall. Gives balance in extreme imbalanced datasets.
Use Case: To determine whether model performance has improved after fine-tuning.
AUC-ROC: Area under the curve - Receiver Operator Curve
Uses "sensitivity" and "I-specificity". This is threshold-independent parameter. It gives holistic view of model performance.
Use Case: Binary classification of fraud detection cases.
Evaluation Metric for Regression Models
I. Mean Absolute Error
Difference between predicted value and actual value
II. Mean Absolute Percentage Error
Computes percentage average of difference
III. R Square
Explains variance in your model.
Amazon SageMaker
End-to-end AWS managed service for:
- Collecting and preparing data
- Building, training, deploying, monitoring custom Models from scratch.
Components/Tools of Amazon SageMaker
Studio: IDE for machine learning
Data Wrangler: Data preparation, transformation and feature engineering
Canvas: No-code tool to build models using visual interface.
Feature Store: Central repository to store, share and manage features for ML models.
TensorBoard: Visualize and compare model convergence of different training jobs
Different Services/Features in Amazon SageMaker
I. Automatic Model Tuning
You just define the objective metric. AMT finds the best set of hyper-parameters that maximize your chosen objective metric.
II. SageMaker Clarify
Detects bias in data and models. Explainability: Generates explainability reports that explain reasons behind specific decisions/predictions made by model.
Important Use-Case: To ensure fairness and transparency in ML models.
III. SageMaker GroundTruth
Data labelling service for ML projects. Allows to use human laborers to annotate data.
IV. SageMaker JumpStart
Hub of pre-built ML models and Solutions. Accelerates the development.
V. SageMaker Model Monitor
Monitors AI models in production. Detects drifts
VI. SageMaker Governance Tools
Other Concepts
Network Isolation Mode
Security feature in SageMaker. SageMaker training job container runs in complete isolation without internet access
DeepAR Algorithm
Popular forecasting algorithm in SageMaker for time series prediction. It is based on recurrent neural network - RNN. It can handle multiple time-series simultaneously.
SageMaker Model Parallelism
A feature designed to help training large deep-learning models that cannot fit into the memory of a single GPU.
Last Updated: January 2026 | Notes by Nadir Hussain