Skip to content

AWS monitor heartbeat

SkyperTHC edited this page Jun 22, 2022 · 14 revisions

How to monitor the healthiness of the GSRN - The AWS way (?)

Objective: Continuously use gs-netcat to test all GSRN servers (within and outside of AWS). On failure notify the admin. This is a fully functional test. For non-functional tests and for general system metrics (like CPU and Memory usage) use NetData instead.

The functional test heartbeat.sh is embedded within a docker image.

  • Use AWS Elastic Container Registry (ECR) to store the docker image
  • Use AWS Fargate to run the docker image
  • Use AWS Cloudewatch to send notification if GSRN goes bad

We use AWS region us-east-2.

Create an ECR policy

Select IAM -> Policies

  1. Select Create Policy.
  2. Under Service select Elastic Container Registry.
  3. Select All Elastic Container Registry actions (ecr:*)
  4. Under Resources select Specific and Add ARN and for Repository name select Any.
  5. Click Next: Tags and Next: Review.
  6. Under Name specify ECR_FullAccess (or any other name you like).
  7. Click Create policy.

Create an IAM User and assign policies

Create a new user under IAM -> Users

  1. Select Add users and name the new user fargate_user.
  2. Select Programmatic access and AWS Management Console Access.
  3. De-select User must create a new password at next sign-in.
  4. Click Next: Permissions.

Attach the policy

  1. Select Attach existing policies directly and select AmazonECS_FullAccess and AmazonEC2ContainerRegistryPowerUser and ECR_FullAccess.
  2. Click Next: Tags and then Next: Review and then Create user.
  3. Note down the Account ID, Access key ID, Secret access key and Password.

Deploying a Docker Container to ECS

Create an ECR registry

Select Elastic Container Registry.

  1. Select Create repository and fill in the name of the repository as gsrn_heartbeat and leave everything else default.
  2. Note down the URI (e.g. [account ID].dkr.ecr.us-east-2.amazonaws.com/gsrn_heartbeat).

Upload the docker image into the ECR registry

Sign in to AWS using the credentials of fargate_user:

aws configure

Retrieve the AWS login password and ready docker to sign in to the ECR Registry:

aws ecr get-login-password --region us-east-2 | docker login --username AWS --password-stdin [account ID].dkr.ecr.us-east-2.amazonaws.com

Create the Docker image

docker build -t gsrn_hb .

TAG the docker image with the URI.

docker tag gsrn_hb [account ID].dkr.ecr.us-east-2.amazonaws.com/gsrn_heartbeat

Push the tagged image

docker push [account ID].dkr.ecr.us-east-2.amazonaws.com/gsrn_heartbeat

Create a Fargate Cluster

With your AWS user (not fargate_user):

  1. Go to Elastic Container Service (ECS).
  2. Select Create Cluster and Networking only.
  3. Name the cluster fargate-gsrn-heartbeat and leave the rest as is.

Create an ECS Task

  1. Select Create new Task Definition and Fargate.
  2. Enter the name for task (GSRN heartbeat).
  3. Select 0.5 for Task Memory and 0.25 vCPU for Task CPU.

Under Container definition select Add Container.

  1. We use GSRN-Heartbeat as Container name.
  2. Enter the ARN of the docker image [account ID].dkr.ecr.us-east-2.amazonaws.com/gsrn_heartbeat
  3. Select Linux for Operationg system family.
  4. Under Command enter gs1.thc.org gs2.thc.rog gs3.thc.org gs4.thc.org gs5.thc.org.
  5. Select Add and then Create.

Run the Task

Select the GSRNHeartbeat:1 task and under Action select Run Task.

  1. Set Launch Type to FARGATE
  2. Select any Cluster VPC and any Subnets.
  3. Select Run Task

Get Notifications

The steps are:

  1. Create a Topic under AWS Simple Notification Service (SNS) and add an email address to it.
  2. Create a filter that matches a string from the log file and records every match to a metric.
  3. Create an alarm if a change in the metric is detected.

Create an AWS SNS

Go to Simple Notification Service -> Create Topic

  1. Use Standard
  2. Set the Name to GSRN-Heartbeat
  3. Click Create topic at the bottom right.

Create a subscription

  1. Set Protocol to Email
  2. Under Endpoint enter the email address. I use root@thc.org.
  3. Select Create Subscription at the bottom right.

Notification if OK stops

It is not directly possible to match a pattern in a log file and raise an alarm (e.g. send an email). Instead a pattern is matched and a metric is created. Then an alarm can be triggered when the metric changes.

The heartbeat docker instance creates a log entry with "OK_COUNT=" every 60 seconds (unless GSRN is failing). Create a metric entry every time this pattern is encountered (1 every 60 seconds).

Go to Logs -> Log Group

  1. Select the log group /ecs/GSRN-heartbeat
  2. Click Action -> Create Metric Filter
  3. Under Filter Pattern enter OK_COUNT= and click Next.
  4. Under Filter Name write OK-COUNT-FILTER

Under Metric Details set

  1. Metric namespace to GSRN Heartbeat
  2. Metric name to OK-COUNT
  3. Metric value to 1
  4. Default value to 0
  5. Click on Next and then Create metric filter at the bottom right.

Go to Alarm -> All Alarms -> Create alarm -> Select metric

  1. Under Custom namespace click on GSRN Heartbeat.
  2. Click on Metrics with no dimensions.
  3. Select OK-COUNT
  4. Click on Select metric at the bottom right.
  5. Under Conditions select Lower and under than... write 1.
  6. Under Additional configuration -> Missing data treatment select Treat missing data as bad (breaching threshold).
  7. Select Next at the bottom right.
  8. Under Send a notification to... select GSRN-Heartbeat1.
  9. Click Next at the bottom right.

Under Add name and description:

  1. Set Alarm Name to FAILED GSRN HEARTBEAT.
  2. Set Alarm description to GSRN Server may be down. Check Cloudwatch log..
  3. Click Next at the bottom right and then Create alarm.

TODO:

  • How to allow task to have NAT access to Internet without assigning a public IP and without creating my own NAT GW?
  • How to include the log entry that triggered the metrics alarm?

Helpful links: