Why monitor
It is important for maintaining reliability and stability. It is easier to debug if a multi-point failure occurs, and tune performance if we can collect and compare the monitor data.
Things to Consider Before Setup EC2 monitor
First of all, we need to be clear about our monitoring objectives. That is, what do we want to archive with our monitoring?
After being clear on our goals, we can consider
- what resources do we need to monitor?
How long should we monitor the resources?
How often do we need to track our resources?
To achieve the goals above,
what is the most suitable monitoring tool?
Who is going to monitor the resources?
What to do after knowing something goes wrong.
Automated Monitor Setup
When initializing an EC2, AWS provides the baseline monitoring setup by default. Data monitoring is in a 5-minute period by default. Detailed monitoring covers data in a 1-minute period.
Status Check Metrics
System status check is checking the underlying hardware status. If there is any problem, AWS needs to be involved in the solution.
Instance status check is checking the network configuration of the instance. To debug the problem, ensure communication success to the instance. For example, IAM role setting, security groups, VPC setting, etc.
Instance Level Metrics
CPU Utilization identifies the processing power required to run the application upon the instance. For T instances, when the instance is idle or tasks that require less than baseline CPU performance, the instance earns CPU credit. Credits can be spent when instances need to do more work.
Network Utilization identifies the volume of incoming and outgoing traffic.
Disk performance metrics determine the speed of the hard disk of the instance.
Custom System-level Metrics
We can prepare CloudWatch Monitoring scripts written in Perl, or we can install CloudWatch Agent to collect system-level metrics below:
Memory Utilization
Disk Swap Utilization - Swap space is a portion of a hard disk drive that is used for virtual memory. When RAM is full, the operating system moves data from RAM to swap space. High swap space indicates that the instance is running low on physical memory. "Page file Utilization" and "Disk Swap Utilization" are interchangeable. "Page file" is used more often in Windows System.
Disk Space Utilization - track the usage of instance store volumes and EBS volumes attached to the instance.
Log Collection