Skip to main content
  1. Posts/

Databases

·1382 words·7 mins

Relational Databases #

A relational database is a type of database that organizes data into rows and columns, which collectively form a table where the data points are related to each other.

Data is typically structured across multiple tables, which can be joined together via a primary key or a foreign key. These unique identifiers demonstrate the different relationships which exist between tables, and these relationships are usually illustrated through different types of data models.

No-SQL Databases #

  • No-SQL = non-relational databases
  • No-SQL databases are purpose built for specific data models and have flexible schemas for building modern applications

Benefits:

  • Flexibility (easy to evolve data model)
  • Scalability (designed to scale out by using distributed clusters)
  • High-Performance (optimized for a specific data model)
  • Highly functional (types optimized for the data model)

Use cases: Key-value, document, graph, in-memory, search datamases

RDS and Aurora #

Amazon RDS #

RDS stands for Relational Database Service. It is a managed DB service.

It allows creating databases in the cloud that are managed by AWS:

  • Postgres
  • MySQL
  • MariaDB
  • Oracle
  • Microsoft SQL Server
  • IBM DB2
  • Aurora (AWS Proprietary)

Advantage of using RDS vs deploying DB on EC2:

  • RDS is a managed service
    • Automated Provisioning and OS patching
    • Continuous backups and restore to specific timestamp (Point in Time Restore)
    • Monitoring dashboards
    • Read replicas for improved read performance
    • Multi-AZ setup for DR
    • Maintenance windows for upgrades
    • Scaling capability (both, vertical and horizontal)
    • Storage backed by EBS
  • Not possible to SSH into DB instances (managed service)

Example RDS application architecture #

Source: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Welcome.html

Amazon Aurora #

  • Aurora is proprietary technology from AWS (not open sourced)
  • PostgreSQL and MySQL are both supported
  • Aurora is “AWS Optimized” and claims to be 5x performance improved over MySQL on RDS and 3x the performance of Postgres on RDS
  • Aurora storage automatically grows and increments of 10GB (up to 128 TB)
  • Aurora costs about 20% more than RDS

Amazon Aurora Serverless #

  • Automated (on-demand) database with autoscaling based on actual usage
  • PostgreSQL and MySQL are both supported as Aurora Serverless DB
  • No capacity planning required
  • Least management overhead
  • Pay-per-second, COULD BE more effective

Use cases: infrequent, intermittent or unpredictable workloads.

Aurora with no management overhead = Aurora Serverless. #

Create RDS database #

Aurora and RDS > Create a database

RDS Deployment options #

  • Read Replicas
    • Scale the read workload of your DB
    • Can create up to 15 replicas
    • Data is only written to the main DB
  • Multi-AZ
    • Failover in case of AZ outage (High-Availability)
    • Data only read/written to the main DB
    • Can only have 1 AZ as a failover
  • Multi-Region
    • Multi-Region (Read Replicas)
    • Writes only to the main database
    • Local performance for global reads
    • Additional replication cost
    • Use case: DR in another region

Other Database Types #

Amazon ElastiCache #

  • The same way RDS is to get managed Relational Databases, ElastiCache is to get managed Redis or Memcached.

  • Caches are in-memory databases with high performance and low latency

  • Helps reducing load from databases with read-intensive workloads

  • AWS taking care of OS maintenance, patching, optimizations, setup, configuration, monitoring, failure recovery and backups

More: https://docs.aws.amazon.com/elasticache/

DynamoDB #

  • Fully managed, Highly Available with replication across 3AZ
  • No-SQL database - not a relational DB
  • Scales to massive workloads, distributed, “serverless”
  • Millions of requests per second, trillions of row, 100s TB of storage
  • Fast and consistent performance
  • Single-digit millisecond latency
  • Integrated with IAM for security, authorization and administration
  • Low cost and auto scaling capabilities
  • Standard & Infrequent Access (IA) Table Class

DynamoDB Accelerator (DAX) #

  • Fully Managed in-memory cache for DynamoDB
  • 10x performance improvement when accessing DynamoDB tables

DAX is only used for DynamoDB where Elasticache can be used for other databases.

DynamoDB Global Tables #

  • Makes DynamoDB table accessible with low latency in multiple-regions
  • Active-Active replication (read/write to any AWS Region)

Redshift #

  • Redshift is based on PostgreSQL
  • It’s OLAP - Online Analytical Processing (analytics and data warehousing)
  • Load data once every hour, not every second
  • 10x better performance than other data warehouses
  • Scales to PBs of data
  • Columnar storage of data (instead of rows)
  • Massively Parallel Query Execution (MPP)
  • Pay-as-you-go based on the instances provisioned
  • Has a SQL interface for performing queries

Redshift Serverless #

  • Auto Scaling
  • Run analytics workload without managing data warehouse infrastructure
  • Pay only for what you use
  • Use cases: Reporting, real-time analytics

Amazon EMR #

  • EMR stands for “Elastic MapReduce
  • EMR helps creating Hadoop clusters (Big Data) to analyze and process vast amounts of data
  • The clusters can be made of hundreds of EC2 instances
  • EMR takes care of all the provisioning and configuration
  • Auto-scaling and integrated with Spot instances
  • Use cases: data processing, machine learning, web indexing, big data

Athena #

  • Serverless query service to perform analytics against S3 objects
  • Uses standard SQL language to query the files
  • Supports CSV, JSON, ORD, Avro, Parquet
  • Pricing: $5 per TB of data scanned
  • Use cases: Business intelligence, analytics, reporting, analyze & query VPC Flow Logs, ELB Logs, CloudTrail logs, etc.
Exam tip: analyze data in S3 using serverless SQL = Athena #

QuickSight #

Allows creating dashboards for services used in AWS. Per-session pricing.

  • Serverless machine-learning powered business intelligence service to create interactive dashboards
  • Use cases:
    • Business analytics
    • Building visualisations
    • Ad-hoc analysis
    • Get business insights using data
  • Integrated with RDS, Aurora, Athena, Redshift, S3

More: https://docs.aws.amazon.com/quicksight/

DocumentDB #

Aurora version for MongoDB (NoSQL database).

  • MongoDB is used to store, query and index JSON data
  • Fully Managed, Highly Available with replication across 3AZ
  • DocumentDB storage automatically grows in increments of 10 GB

Neptune #

  • Fully managed graph database
  • A popular graph dataset would be a social network
    • Users have friends
    • Posts have comments
    • Comments have likes from users
    • Users share and like posts
  • Highly Available across 3AZ with up to 15 replicas
  • Build and run applications working with highly connected datasets = optimized for those complex queries
  • Can store up to billions of relations and query the graph with milliseconds latency
  • Use cases: knowledge graphs (Wikipedia), fraud detection, recommendation engines, social networking

Amazon Timestream #

  • Serverless time series database
  • Automatically scales up and down to adjust capacity
  • Store and analyze trillions of events per day

Amazon managed Blockchain #

  • Blockchain makes it possible to build applications where multiple parties can execute transactions without the need for a trusted, central authority
  • Amazon managed Blockchain is a managed service that allows:
    • Join public Blockchain networks
    • Create your own scalable, private network
  • Compatible with:
    • Hyperledger Fabric
    • Ethereum

AWS Glue #

Managed Extract, Transform and Load (ETL) service.

  • Useful to prepare and transform data for analytics
  • Fully serverless service

DMS #

DMS - Database Migration Service

  • Quick and secure migrate databases to AWS

  • The source database remains available during the migration

  • Homogeneous migrations: i.e. Oracle to Oracle

  • Heterogeneous migrations: i.e. MSSQL to Aurora

Database Summary #

  • Relational Databases - OLTP: RDS & Aurora (SQL)
  • Differences between Multi-AZ, Read Replicas, Multi-Region
  • In-memory Database: ElastiCache
  • Key/Value Database: DynamoDB (serverless) & DAX (cache for DynamoDB)
  • Warehouse - OLAP: Redshift (SQL)
  • Hadoop Cluster: EMR
  • Athena: query data on Amazon S3 (serverless & SQL)
  • QuickSight: dashboards on your data (serverless)
  • DocumentDB: “Aurora for MongoDB” (JSON – NoSQL database)
  • Amazon QLDB: Financial Transactions Ledger (immutable journal, cryptographically verifiable)
  • Amazon Managed Blockchain: managed Hyperledger Fabric & Ethereum blockchains
  • Glue: Managed ETL (Extract Transform Load) and Data Catalog service
  • Database Migration: DMS
  • Neptune: graph database
  • Timestream: time-series database

» Sources « #

» Table of contents (CLF-C02) « #

1. What is Cloud Computing2. IAM3. Budget
4. EC25. Security Groups6. Storage
7. AMI8. Scalability & High Availability9. Elastic Load Balancing
10. Auto Scaling Group11. S312. Databases
13. Other Compute Services14. Deployments15. AWS Global Infrastructure
16. Cloud Integrations17. Cloud Monitoring18. VPC
19. Security and Compliance20. Machine Learning21. Account Management and Billing
22. Advanced Identity23. Other Services24. AWS Architecting & Ecosystem
25. Preparing for AWS Practitioner exam

» Disclaimer « #

Disclaimer: Content for educational purposes only, no rights reserved.

Most of the content in this series is coming from Stephane Maarek’s Ultimate AWS Certified Cloud Practitioner CLF-C02 2025 course on Udemy.

I highly encourage you to take the Stephane’s courses as they are awesome and really help understanding the subject.

More about Stephane Maarek:

This article is just a summary and has been published to help me learning and passing the practitioner exam.