AWS Machine Learning Certification Notes

Training Data Types

Labelled Data

Contains pairs of input data + corresponding output/response labels.

Unlabelled Data

Contains just input features without any output labels

Structured vs Unstructured Data

(rows/columns vs text heavy/multimedia content)

Datasets

Training Dataset: 60% of the data
Validation Dataset: 10-20% - to tune the model parameters
Evaluation Dataset: 10-20% - to evaluate the final performance of model

Supervised Learning

Supervised learning uses labelled data to train the ML model.

Techniques of Supervised Learning

I. Regression

Predicting continuous numeric value based on input data.

Examples: Forecasting housing prices, stock prices, weather data

II. Classification

To classify, determine kind of a thing.

Examples: Email type (Spam/Non-Spam), movie genre

Algorithm: K-Nearest Neighbor (Tell me about your neighbors and I will tell who are you)

Multi-class Classification: Assigns each instance to ONE of several possible classes.
Multi-label Classification: Assigns each instance to one or more classes.

Support Vector Machines are supervised learning algorithms used in regression and classification.

Decision Trees are popular for their interpretability.

Unsupervised Learning

Uses unlabelled data to train the ML model. It recognizes patterns, structures and relationships in input data.

Common Techniques of Unsupervised Learning

I. Clustering

Grouping similar data points together.

Examples: Grouping users based on purchasing behavior, similar habits

Algorithm: K-means algorithm of clustering, K is number of clusters.

II. Association Rule Learning

Finding association between different features in dataset.

Examples: Market Based Analysis such as bread/eggs are bought together, TV Shows watched together.

Algorithm: Apriori Algorithm.

III. Anomaly Detection

Identify outliers in dataset. Outliers are those items which do not fit normal pattern.

Algorithm: Isolation Forest

IV. Probability Density Estimation

Technique for estimating the probability of new data point will fall within a particular range.

Principal Component Analysis (PCA)

Dimensionality reduction technique used in unsupervised learning. Reduces the number of features in your dataset.

Semi-supervised Learning

Small labelled data + large unlabelled data = used to train the model. Bridges supervised and unsupervised learning.

Use Cases:

Pseudo-Labelling: Model can label the unlabelled data.

Reinforcement Learning

Based on Reward Function

Agent learns by interacting with external environment. Agent does some action, environment gives feedback in form of reward or penalty. Goal is to maximize the cumulative reward over time

RLHF: Reinforcement Learning from Human Feedback

Incorporate human feedback in reward function.

Phases of ML Project

Define Business Goals
Frame ML Problem
Data Collection and Preparation
Model Training, Evaluation, Deployment, Monitoring

Business Goals

Success criteria, key performance indicators, budget and value of project

Data Collection and Preparation

Useable format
Make it centrally accessible
Data visualization: Exploratory Data Analysis
Feature engineering

Exploratory Data Analysis (EDA)

Stage in Data Preparation Phase of ML Project.

EDA is the process of examining and understanding dataset with help of graphs before applying it to ML models.

Goals:

Finding correlations between variables.
Detect missing values
Discover important features

Helps to make decision about feature engineering, pre-processing.

Feature Engineering

Transforming raw data into meaningful features by using domain knowledge.

Common Steps in Feature Engineering:

Feature Extraction: Deriving new feature from raw data. Extracting day/month from timestamp
Feature Selection: Choosing the most relevant features
Feature Crossing: Combining two or more features to capture non-linear relationships
Dimensionality Reduction: Reduce number of features by Principal Component Analysis.

Inferencing

Deployment and Monitoring Phase of ML Project. When trained ML model is making predictions about new, unseen data.

A. Real-time Inferencing

Used when you need Instant response and have predictable load.

B. Serverless Inferencing

Server-less inferencing is ideal for intermittent, sporadic traffic. Which can tolerate variations in latency. In Server-less inferencing, compute resources needed to run ML model are managed by Cloud.

C. Asynchronous Inferencing

Individual Inference requests are processed in the background.

Best for: When Input data is less than 1GB and "Near real-time latency" is needed.

D. Batch Inferencing

Making multiple predictions on large dataset all at once. Data is more than 1GB.

Best for:

Offline scenarios.
High accuracy needed, latency is not an issue.

Inferencing at the Edge

Small Language Model (SML) runs directly on edge device. Local inference, low latency

Hyperparameters Tuning

Hyper-parameters: Configuration settings that control "learning process" of the model.

Tuning: Finding the best combination of hyper-parameters

A. Algorithms for HP Tuning

I. GridSearch Algorithm: Tries all combinations of hyper parameters values. Then selects the one which is best.

II. Random Search Algorithm: It tries random combinations instead of trying all combinations.

B. Important Hyper-parameters

I. Learning Rate: Controls size of the steps when updating the weight of a model during training.

II. Batch Size: Number of training examples used to update the weight of model.

III. Number of Epochs: How many times model will iterate over entire training dataset.

Under-fitting & Over-fitting

Under-fitting

Model is not predicting output correctly neither on training data nor on test data.

Training Error = High, Test Error = High

Results in High-Bias.

Cause:

When model has not been trained for the appropriate length of time.
Model is too simple.
Model cannot capture the underlying pattern between input and output data.

Example: Linear model for non-linear problem

Solutions:

Train longer
Add more features

Over-Fitting

Performs good on training data, but poor on test data. Training Error = Low, Test Error = High. Results in High Variance.

Causes:

The model is too complex. It captures noise.
It has been trained longer than required.

Example: Deep neural network with little data

Solution:

Raise regularization Coefficient: technique to reduce model complexity
Apply early stopping - low number of epochs
Data augmentation: Increase diversity of training data

Bias and Variance

Bias

Difference between predicted value and actual value

High Bias means high difference → under-fitting → reduce bias by add more features in training data

Variance

Measures consistency of model. How will model perform if we train it on a different sample of same dataset?

High Variance means model is sensitive to changes in training data - it learns noise → over-fitting problem

Reduce variance by feature selection → reduce number of features, raise Regularization Coefficient to reduce complexity

Evaluation Metrics

Evaluation Metric for Classification Models

Confusion Matrix

Tool for visualizing the performance of a classification model.

I. Precision

How many times we predicted positives correctly? Best: when false positives are costly.

II. Recall

How many times we have to recall decision. Walk back. Made wrong decision. Best: false negatives are costly.

III. Accuracy

Measures proportion of correctly predicted instances out of total number of instances.

IV. F1

Harmonic mean of precision and recall. Gives balance in extreme imbalanced datasets.

Use Case: To determine whether model performance has improved after fine-tuning.

AUC-ROC: Area under the curve - Receiver Operator Curve

Uses "sensitivity" and "I-specificity". This is threshold-independent parameter. It gives holistic view of model performance.

Use Case: Binary classification of fraud detection cases.

Evaluation Metric for Regression Models

I. Mean Absolute Error

Difference between predicted value and actual value

II. Mean Absolute Percentage Error

Computes percentage average of difference

III. R Square

Explains variance in your model.

Amazon SageMaker

End-to-end AWS managed service for:

Collecting and preparing data
Building, training, deploying, monitoring custom Models from scratch.

Components/Tools of Amazon SageMaker

Studio: IDE for machine learning

Data Wrangler: Data preparation, transformation and feature engineering

Canvas: No-code tool to build models using visual interface.

Feature Store: Central repository to store, share and manage features for ML models.

TensorBoard: Visualize and compare model convergence of different training jobs

Different Services/Features in Amazon SageMaker

I. Automatic Model Tuning

You just define the objective metric. AMT finds the best set of hyper-parameters that maximize your chosen objective metric.

II. SageMaker Clarify

Detects bias in data and models. Explainability: Generates explainability reports that explain reasons behind specific decisions/predictions made by model.

Important Use-Case: To ensure fairness and transparency in ML models.

III. SageMaker GroundTruth

Data labelling service for ML projects. Allows to use human laborers to annotate data.

IV. SageMaker JumpStart

Hub of pre-built ML models and Solutions. Accelerates the development.

V. SageMaker Model Monitor

Monitors AI models in production. Detects drifts

VI. SageMaker Governance Tools

A. Model Card: Documents key/essential details of a model such as purpose, training data, ethical considerations, and deployment environment. This documentation helps adherence to compliance.

B. Model Dashboard: Centralized view (visualization) of all of your ML models (their status, performance, bias)

C. Role Manager: Define roles, manage access

D. Model Registry: Central repo to store, approve and version ML models

Other Concepts

Network Isolation Mode

Security feature in SageMaker. SageMaker training job container runs in complete isolation without internet access

DeepAR Algorithm

Popular forecasting algorithm in SageMaker for time series prediction. It is based on recurrent neural network - RNN. It can handle multiple time-series simultaneously.

SageMaker Model Parallelism

A feature designed to help training large deep-learning models that cannot fit into the memory of a single GPU.

Last Updated: January 2026 | Notes by Nadir Hussain