AWS AI Practitioner Certification Notes
A comprehensive guide covering Artificial Intelligence fundamentals, Generative AI concepts, Responsible AI practices, Compliance & Governance, Security & Privacy, and AWS ML Services.
AI Introduction
Artificial Intelligence
Development of intelligent systems that otherwise require human intelligence.
AI Layers
- Data Layer
- ML Algorithm Layer
- Model Layer
- Application Layer
Machine Learning
Branch of AI that focuses on building methods that allow machines to learn. Make predictions based on the data.
Deep Learning
Branch of ML, which uses neural networks to train the ML models.
Generative AI
Branch of deep learning in which pre-trained ML models understand and generate human-like text.
Hierarchy
AI → ML → Deep Learning (Neural Networks) → GenAI
Core Concepts
Transformer Architecture
Core of GenAI Models. Uses self-attention mechanism to look at all the tokens in a sequence of text at once, and figure out which ones are most relevant to each other.
Neural Networks
Computational models made up of layers of interconnected nodes that process data and learn patterns.
- Many hidden layers
- Back-propagation process
Important Neural Networks
GAN: Generative Adversarial Network
Approach in which two neural networks compete against each other.
- Generator: Creates synthetic data
- Discriminator: Detects whether data is fake or real
Use Case: Data augmentation
RNN: Recurrent Neural Network
Meant for processing sequential data - where the order of elements matter.
Use Cases: Speech recognition, Video Analysis
CNN: Convolutional Neural Network
- They process pixel data of an image
- They learn spatial hierarchies (edges, corners) of features from input images
Use Case: Single Image Analysis
ResNet: Residual Network
Advanced CNN architecture
SVM: Support Vector Machine
Classification and regression algorithm. Separates data points into distinct classes by finding boundary lines called hyperplanes.
WaveNet
Model to generate raw audio waveform. Used in Speech Synthesis. Generate high quality human-like audio.
Types of AI Models
Foundation Model
Large, general-purpose AI models trained on broad data.
Amazon Titan is a high performing FM from AWS itself.
Large Language Models (LLMs)
Specialized in understanding and generating coherent human-like text. Example: GPT
Diffusion Models
Generate images from text prompts.
Examples: Stability.AI, DALL-E
- Diffusion models synthesize images starting from pure noise and then gradually de-noise
- Classifier-free guidance: Technique to refine or improve quality of generated images. This is done by controlling how strongly diffusion model follows your text prompt
Generative AI Concepts
Drawbacks of GenAI
I. Hallucinations
Asserting something true which is actually incorrect.
Mitigation:
- Lower Temperature value to produce more focused output
- Use RAG to retrieve authentic information, to ground model output with authentic data
- Amazon Automated Reason Check Feature in Bedrock Guardrails
II. Non-Determinism
Response will differ even for same prompt. Key reason for non-determinism is Temperature Settings of Foundation Model - which controls creativity.
III. Limited Context Windows
The number of tokens an LLM can consider when generating text.
IV. Recency: Cutoff Date
Responses might be outdated
V. Cost Intensive
VI. Data Challenges
- Scarcity of new data - We can run out of training data. Synthetic data is the solution
- Proliferation of AI-generated content may lead to degradation of models
Prompt Engineering: Overview
Designing, developing and optimizing prompts to enhance the output of Foundation Models.
Prompt engineering is often the first recommended step for customization of foundation models. It is the most economical solution - lowest cost.
Prominent Use Case: Effective for guiding tone and style of generated output.
Anatomy of a Prompt
Instruction, Context, Input data, Output Indicator
Parameters for Controlling Output
Temperature (0-1): Controls Creativity/Randomness
- High temperature value means more creative, less predictable response
- Due to temperature, GenAI models are non-deterministic
- Lower temperature value to reduce Hallucinations
Top-K
Limits the model to choose/select from top-K number of most probable tokens. It first sorts all possible tokens by probability.
Top-P: Nucleus Sampling (Size of the pool)
Pick next token from that smallest set of tokens whose probability adds up to >= P. High value of P means consider broad pool. Influences the percentage of most likely candidates that model considers for next token.
Prompt Engineering Techniques
I. Zero-Shot Prompting
Just straight instruction without any examples. Model has to rely on its pre-trained knowledge. (Important point)
II. Few-Shot Prompting
Provide few examples of the tasks to the model to guide its output.
III. Chain of Thought Prompting
Breaking down problem step by step - human-like reasoning instead of just guessing the answer.
IV. Negative Prompting
Explicitly instruct the model what not to include in the response.
V. Adversarial Prompting (Important)
Deliberately crafting inputs designed to exploit weaknesses in model's behavior.
Purpose and use case: To test robustness of models.
Risks in Prompt Engineering
I. Model Poisoning
Malicious data inserted into training data of FM.
II. Hijacking and Prompt Injection
Influencing model output by malicious instructions in prompt. It involves untrusted external content. Focus is manipulating model behavior by asking it to ignore previous instructions.
III. Jailbreaking
Attempting to override and restrictions and constraints of AI model. Focus is on bypassing safety controls/restrictions applied at system level in AI model.
IV. Exposure
Model may output sensitive or confidential data: Use Guardrails
Responsible AI
Core Dimensions of Responsible AI
Fairness, Transparency, Explainability, Governance, Controllability, Security & Privacy
Services to Ensure Responsible AI
- Guardrails in Amazon Bedrock
- SageMaker Clarity: Detects bias and explainability. Ensures fairness and transparency
- SageMaker Model Monitor: Quality analysis in production. Detects drifts
Services for Governance
SageMaker ModelCard, ModelDashboard, Role Manager
Interpretability & Explainability
Interpretability
The degree to which human can understand how a model works internally.
Decision trees have high interpretability.
Explainability
The degree to which models decisions/predictions can be explained.
Partial Dependence Plots
Global explanation - shows average effect of a single feature across whole dataset. To show how a single feature affects predicted outcomes of a ML model.
Shapley Values
Provides local explanation of an individual prediction by showing how each feature contributes to a specific model output.
Human Centered Design
Designing systems around human needs, values and context.
I. Design for Amplified Decision Making
AI systems should enhance/augment human decision making capabilities of humans by:
- Minimizing risks in high pressure environment
- Bringing more clarity and simplicity
Compliance & Governance
Compliance Challenges
In order to comply with regulatory frameworks, we face following challenges in AI:
I. Complexity and Opacity
It is difficult to audit how AI systems make decisions.
II. Dynamism
ML models continue to change, they are not static.
III. Emergent Capabilities
Unintended capabilities a system may have.
IV. Algorithms Accountability
Algorithms should be transparent and explainable.
Compliance Tools
Model Cards, AI Service Cards: Compliance tools
Governance
A. Governance Framework
- Establish Governance Board or Committee
- Define roles and responsibilities
- Implement Policies and Procedures
B. Governance Strategies
- Clear timelines for review
- Technical reviews, Non-technical reviews
- Testing and validation procedures
Data Lineage
Data lineage refers to complete end-to-end journey of data lifecycle from origin to consumption.
From where this data come → how it reached here → What happened along the way
Document
- Source citation
- Details of collection process
- Methods used to clean data
- Pre-processing and transforming data
Security & Privacy
Techniques for Security of AI Systems
I. Threat Detection
Identification of real-time active threats.
II. Vulnerability Management
Identifying, assessing and mitigating security weaknesses. Patch management and update process.
III. Infrastructure Protection
Access control list, Network segmentation, encryption
Secure Data Engineering: Best Practices
I. Assess Data Quality
Evaluate Completeness, Accuracy, Consistency. Identify biases, errors, inconsistencies.
II. Apply Data Privacy Enhancing Technologies
Data masking, obfuscation, encryption, tokenization.
III. Data Access Control
Only authenticated and authorized users should have access to data.
IV. Data Integrity
Robust data backup, recovery strategy.
Security Scoping Matrix for GenAI
Deals security risks associated with Deployment of GenAI applications.
Order of Ownership (Low - High)
- Consumer APP (Low Ownership): You just use third-party APP. At this stage, you do not own model or training data
- Using Enterprise App
- Pre-trained Models
- Fine-tuned Models
- Self-Trained Models (High Ownership)
AWS Security Services
I. Amazon Macie
Uses machine learning to identify sensitive data such as personally identifiable information. It identifies patterns that match common type of sensitive information such as credit card numbers.
II. AWS Config
Continuously monitors configuration of your AWS resources over time.
III. Amazon Inspector
Automatic Security assessment of EC2, Container Images and Lambda functions.
IV. AWS Artifact
Provides on-demand access to AWS compliance documentation and AWS agreements.
AWS ML Services
Amazon Rekognition
Computer Vision service.
Detects objects in images.
Content Moderation APIs: Can analyze images and detect inappropriate content written on images.
Amazon Comprehend
NLP Service. Performs sentiment analysis on text. Extract insights from unstructured text data.
Amazon Translate
Automatically translates text between various languages. Batch translation feature allows translating large volume of text.
Amazon Personalize
You provide unique data signals of user activities such as "history, page views, preferences". It then gives personalized recommendations.
Amazon Polly
Text to natural sounding speech.
Amazon EMR: Elastic MapReduce
Big-data platform that enables large scale data processing. Key strength: native support for PySpark. Runs complex feature engineering workflows in a highly distributed manner.
AWS Glue: ETL Service
Extract, Transform, Load operations before feature engineering.
Amazon Augmented AI
Human in the loop service. Allows human reviews to model predictions. Human reviews are triggered under certain conditions.
Amazon Transcribe
Speech to Text Service.
Amazon Textract
OCR - Optical Character Recognition Service. Automatically extracts text and structured data (tables) from documents and images. It cannot process videos.
Summary
This guide covers the fundamental concepts of AI, Generative AI, Responsible AI practices, compliance and governance frameworks, security best practices, and AWS ML services. It serves as a comprehensive reference for AI practitioners working with AWS cloud services.
Last Updated: January 2026 | Notes by Nadir Hussain