Databases

Table of Contents
Relational Databases #
A relational database is a type of database that organizes data into rows and columns, which collectively form a table where the data points are related to each other.
Data is typically structured across multiple tables, which can be joined together via a primary key or a foreign key. These unique identifiers demonstrate the different relationships which exist between tables, and these relationships are usually illustrated through different types of data models.
No-SQL Databases #
- No-SQL = non-relational databases
- No-SQL databases are purpose built for specific data models and have flexible schemas for building modern applications
Benefits:
- Flexibility (easy to evolve data model)
- Scalability (designed to scale out by using distributed clusters)
- High-Performance (optimized for a specific data model)
- Highly functional (types optimized for the data model)
Use cases: Key-value, document, graph, in-memory, search datamases
RDS and Aurora #
Amazon RDS #
RDS stands for Relational Database Service. It is a managed DB service.
It allows creating databases in the cloud that are managed by AWS:
- Postgres
- MySQL
- MariaDB
- Oracle
- Microsoft SQL Server
- IBM DB2
- Aurora (AWS Proprietary)
Advantage of using RDS vs deploying DB on EC2:
- RDS is a managed service
- Automated Provisioning and OS patching
- Continuous backups and restore to specific timestamp (Point in Time Restore)
- Monitoring dashboards
- Read replicas for improved read performance
- Multi-AZ setup for DR
- Maintenance windows for upgrades
- Scaling capability (both, vertical and horizontal)
- Storage backed by EBS
- Not possible to SSH into DB instances (managed service)
Example RDS application architecture #
Amazon Aurora #
- Aurora is proprietary technology from AWS (not open sourced)
- PostgreSQL and MySQL are both supported
- Aurora is “AWS Optimized” and claims to be 5x performance improved over MySQL on RDS and 3x the performance of Postgres on RDS
- Aurora storage automatically grows and increments of 10GB (up to 128 TB)
- Aurora costs about 20% more than RDS
Amazon Aurora Serverless #
- Automated (on-demand) database with autoscaling based on actual usage
- PostgreSQL and MySQL are both supported as Aurora Serverless DB
- No capacity planning required
- Least management overhead
- Pay-per-second, COULD BE more effective
Use cases: infrequent, intermittent or unpredictable workloads.
Aurora with no management overhead = Aurora Serverless. #
Create RDS database #
Aurora and RDS > Create a database
RDS Deployment options #
- Read Replicas
- Scale the read workload of your DB
- Can create up to 15 replicas
- Data is only written to the main DB
- Multi-AZ
- Failover in case of AZ outage (High-Availability)
- Data only read/written to the main DB
- Can only have 1 AZ as a failover
- Multi-Region
- Multi-Region (Read Replicas)
- Writes only to the main database
- Local performance for global reads
- Additional replication cost
- Use case: DR in another region
Other Database Types #
Amazon ElastiCache #
The same way RDS is to get managed Relational Databases, ElastiCache is to get managed Redis or Memcached.
Caches are in-memory databases with high performance and low latency
Helps reducing load from databases with read-intensive workloads
AWS taking care of OS maintenance, patching, optimizations, setup, configuration, monitoring, failure recovery and backups
More: https://docs.aws.amazon.com/elasticache/
DynamoDB #
- Fully managed, Highly Available with replication across 3AZ
- No-SQL database - not a relational DB
- Scales to massive workloads, distributed, “serverless”
- Millions of requests per second, trillions of row, 100s TB of storage
- Fast and consistent performance
- Single-digit millisecond latency
- Integrated with IAM for security, authorization and administration
- Low cost and auto scaling capabilities
- Standard & Infrequent Access (IA) Table Class
DynamoDB Accelerator (DAX) #
- Fully Managed in-memory cache for DynamoDB
- 10x performance improvement when accessing DynamoDB tables
DAX is only used for DynamoDB where Elasticache can be used for other databases.
DynamoDB Global Tables #
- Makes DynamoDB table accessible with low latency in multiple-regions
- Active-Active replication (read/write to any AWS Region)
Redshift #
- Redshift is based on PostgreSQL
- It’s OLAP - Online Analytical Processing (analytics and data warehousing)
- Load data once every hour, not every second
- 10x better performance than other data warehouses
- Scales to PBs of data
- Columnar storage of data (instead of rows)
- Massively Parallel Query Execution (MPP)
- Pay-as-you-go based on the instances provisioned
- Has a SQL interface for performing queries
Redshift Serverless #
- Auto Scaling
- Run analytics workload without managing data warehouse infrastructure
- Pay only for what you use
- Use cases: Reporting, real-time analytics
Amazon EMR #
- EMR stands for “Elastic MapReduce”
- EMR helps creating Hadoop clusters (Big Data) to analyze and process vast amounts of data
- The clusters can be made of hundreds of EC2 instances
- EMR takes care of all the provisioning and configuration
- Auto-scaling and integrated with Spot instances
- Use cases: data processing, machine learning, web indexing, big data
Athena #
- Serverless query service to perform analytics against S3 objects
- Uses standard SQL language to query the files
- Supports CSV, JSON, ORD, Avro, Parquet
- Pricing: $5 per TB of data scanned
- Use cases: Business intelligence, analytics, reporting, analyze & query VPC Flow Logs, ELB Logs, CloudTrail logs, etc.
Exam tip: analyze data in S3 using serverless SQL = Athena #
QuickSight #
Allows creating dashboards for services used in AWS. Per-session pricing.
- Serverless machine-learning powered business intelligence service to create interactive dashboards
- Use cases:
- Business analytics
- Building visualisations
- Ad-hoc analysis
- Get business insights using data
- Integrated with RDS, Aurora, Athena, Redshift, S3
More: https://docs.aws.amazon.com/quicksight/
DocumentDB #
Aurora version for MongoDB (NoSQL database).
- MongoDB is used to store, query and index JSON data
- Fully Managed, Highly Available with replication across 3AZ
- DocumentDB storage automatically grows in increments of 10 GB
Neptune #
- Fully managed graph database
- A popular graph dataset would be a social network
- Users have friends
- Posts have comments
- Comments have likes from users
- Users share and like posts
- Highly Available across 3AZ with up to 15 replicas
- Build and run applications working with highly connected datasets = optimized for those complex queries
- Can store up to billions of relations and query the graph with milliseconds latency
- Use cases: knowledge graphs (Wikipedia), fraud detection, recommendation engines, social networking
Amazon Timestream #
- Serverless time series database
- Automatically scales up and down to adjust capacity
- Store and analyze trillions of events per day
Amazon managed Blockchain #
- Blockchain makes it possible to build applications where multiple parties can execute transactions without the need for a trusted, central authority
- Amazon managed Blockchain is a managed service that allows:
- Join public Blockchain networks
- Create your own scalable, private network
- Compatible with:
- Hyperledger Fabric
- Ethereum
AWS Glue #
Managed Extract, Transform and Load (ETL) service.
- Useful to prepare and transform data for analytics
- Fully serverless service
DMS #
DMS - Database Migration Service
Quick and secure migrate databases to AWS
The source database remains available during the migration
Homogeneous migrations: i.e. Oracle to Oracle
Heterogeneous migrations: i.e. MSSQL to Aurora
Database Summary #
- Relational Databases - OLTP: RDS & Aurora (SQL)
- Differences between Multi-AZ, Read Replicas, Multi-Region
- In-memory Database: ElastiCache
- Key/Value Database: DynamoDB (serverless) & DAX (cache for DynamoDB)
- Warehouse - OLAP: Redshift (SQL)
- Hadoop Cluster: EMR
- Athena: query data on Amazon S3 (serverless & SQL)
- QuickSight: dashboards on your data (serverless)
- DocumentDB: “Aurora for MongoDB” (JSON – NoSQL database)
- Amazon QLDB: Financial Transactions Ledger (immutable journal, cryptographically verifiable)
- Amazon Managed Blockchain: managed Hyperledger Fabric & Ethereum blockchains
- Glue: Managed ETL (Extract Transform Load) and Data Catalog service
- Database Migration: DMS
- Neptune: graph database
- Timestream: time-series database
» Sources « #
- Amazon RDS and Aurora Documentation: https://docs.aws.amazon.com/rds/
- ElastiCache: https://docs.aws.amazon.com/elasticache/
- QuickSight: https://docs.aws.amazon.com/quicksight/
» Table of contents (CLF-C02) « #
» Disclaimer « #
Disclaimer: Content for educational purposes only, no rights reserved.
Most of the content in this series is coming from Stephane Maarek’s Ultimate AWS Certified Cloud Practitioner CLF-C02 2025 course on Udemy.
I highly encourage you to take the Stephane’s courses as they are awesome and really help understanding the subject.
More about Stephane Maarek:
This article is just a summary and has been published to help me learning and passing the practitioner exam.