Skip to main content
  1. Posts/

Cloud Monitoring

·1003 words·5 mins

CloudWatch Metrics #

CloudWatch provides metrics for every service in AWS.

It is possible to create CloudWatch Dashboards that consist of different metrics.

CloudWatch console

CloudWatch Console: https://console.aws.amazon.com/cloudwatch

Important Metrics #

  • EC2 instances: CPU Utilization, Status Checks, Network (not RAM!)
    • Default metrics every 5 minutes
    • Option for Detailed Monitoring ($!): metrics every 1 minute
  • EBS Volumes: Disk Read / Writes
  • S3 Buckets: Bucket size, Number of Objects, All requests
  • Billing: Total Estimated Charge (only in us-east-1)
  • Service Limits: how much of service API’s was used (Lambda?)
  • Custom metrics: push your own metrics
# See All metrics
CloudWatch > Metrics > All metrics

# Create a custom alarm
CloudWatch > Alarms > All alarms

CloudWatch Alarms #

  • Alarms are used to trigger notifications for any metric
  • Alarms actions
    • Auto Scaling - increase or decrease EC2 instances “desired” count
    • EC2 Actions - stop, terminate, reboot or recover and EC2 instance
    • SNS notifications - send a notification into an SNS topic
  • Various options (sampling, %, max, min, etc…)
  • Can choose the period on which to evaluate an alarm
  • Example: create a billing alarm on the CloudWatch Billing metric
  • Alarm States: OK, INSUFFICIENT_DATA, ALARM

Billing alarms are only available in us-east-1

CloudWatch Logs #

CloudWatch Logs can collect logs from:

  • Elastic Beanstalk - collection of logs from application
  • ECS - collection of logs from containers
  • AWS Lambda - collection of logs from functions
  • CloudTrail based on filter
  • CloudWatch log agents - on EC2 machines or on-premises servers
  • Route53 - DNS logs

CloudWatch Logs for EC2 / on-premise #

CloudWatch agent is required to be installed (and configured?) on EC2 instance to push the logs that are needed.

CloudWatch agent can also be installed on on-premise servers too.

IAM permissions must be set for CloudWatch Logs to function correctly.

Amazon EventBridge #

  • Schedule - cron jobs (scheduled scripts)
  • Event Pattern -Event rules to react to a service doing something
  • Trigger Lambda functions, send SQS / SNS messages

  • Default Event Bus - for AWS Services
  • Partner Event Bus - for external entities (i.e. Zendesk) sending events to the cloud
  • Custom Event Bus - custom

EventBridge > Create rule

AWS CloudTrail #

AWS CloudTrail provides governance, compliance and audit your AWS Account and it is ENABLED BY DEFAULT. #

It can get a history of events / API calls made within your AWS account by:

  • Console
  • SDK
  • CLI
  • AWS Services

Logs from CloudTrail can be stored in CloudWatch Logs or S3. Trail can be applied to All Regions (default) or a single region.

If a resource is deleted in AWS, investigate CloudTrail first! #

# Check Events History
CloudTrail > Event history

AWS X-Ray #

Visual analysis of our applications. Common view of entire architecture. #
  • Troubleshooting performance
  • Understanding dependencies in a microservices architecture
  • Pinpoint service issues
  • Review request behavior
  • Find Errors and Exceptions
  • SLA’s
  • Identify impacted users

Amazon CodeGuru #

ML-powered service for automated code reviews and application performance recommendations. #

Provides 2 functionalities:

  • CodeGuru Reviewer - automated code reviews for static code analysis (development)
    • Identify critical issues, security vulnerabilities and hard to find bugs
    • Common coding best practices, resource leaks, security detection, input validation
    • Uses Machine Learning and automated reasoning
    • Hard-learned lessons across millions of code reviews on 1000s of open-source repositories
    • Supports Java and Python
    • Integrates with GitHub, Bitbucket and AWS CodeCommit
  • CodeGuru Profiler - visibility / recommendations about application performance during runtime (production)
    • Helps understand the runtime behavior of an application
    • Identify if application is consuming excessive CPU, etc…
    • Features:
      • Identify and remove code inefficiencies
      • Improve application performance (e.g. reduce CPU utilization)
      • Decrease compute costs
      • Provide heap summary (identify which objects using up the memory)
      • Anomaly detection
  • Supports applications running on AWS as well as on-premise
  • Minimal overhead on application

AWS Health Dashboard #

Shows all regions, all services health. General information, not specific to you. #

Shows historical information for each day.

Has an RSS feed that can be subscribed to.

AWS Health Dashboard - Your Account #

AWS Account Health Dashboard provides alerts and remediation guidance when AWS is experiencing events that may impact you.

While the Service Health Dashboard displays the general status of AWS services, Account Health Dashboard gives you a personalized view into the performance and availability of the AWS services underlying your AWS resources.

It can aggregate data from an entire AWS Organization.

Summary #

  • CloudWatch
    • Metrics - monitor the performance of AWS services and billing metrics
    • Alarms - automate notification, perform EC2 action, notify to SNS based on metric
    • Logs - collects log files from EC2 instances, servers, Lambda functions
  • Events (EventBridge) - react to events in AWS or trigger a rule on a schedule
  • CloudTrail - audit API calls made within your AWS account
    • CloudTrail Insights - automated analysis of your CloudTrail Events
  • X-Ray - trace requests made through your distributed applications (analyze flow)
  • AWS Health Dashboard - general status of ALL AWS services across all regions
  • AWS Account Health Dashboard - AWS events that impact your infrastructure
  • Amazon CodeGuru - automated code reviews and application performance recommendations

» Sources « #

CloudWatch / CloudWatch Logs / CloudWatch Events (EventBridge):

» Table of contents (CLF-C02) « #

1. What is Cloud Computing2. IAM3. Budget
4. EC25. Security Groups6. Storage
7. AMI8. Scalability & High Availability9. Elastic Load Balancing
10. Auto Scaling Group11. S312. Databases
13. Other Compute Services14. Deployments15. AWS Global Infrastructure
16. Cloud Integrations17. Cloud Monitoring18. VPC
19. Security and Compliance20. Machine Learning21. Account Management and Billing
22. Advanced Identity23. Other Services24. AWS Architecting & Ecosystem
25. Preparing for AWS Practitioner exam

» Disclaimer « #

Disclaimer: Content for educational purposes only, no rights reserved.

Most of the content in this series is coming from Stephane Maarek’s Ultimate AWS Certified Cloud Practitioner CLF-C02 2025 course on Udemy.

I highly encourage you to take the Stephane’s courses as they are awesome and really help understanding the subject.

More about Stephane Maarek:

This article is just a summary and has been published to help me learning and passing the practitioner exam.