How to monitor your service

In GDS, we follow the Service Manual guidance on how to monitor the status of services and set performance metrics.

We recommend using Pingdom to monitor your service’s availability from outside our network. To further make sure your service is working, you should:

run regular smoke tests using a browser automation app such as Selenium
implement a tool to ensure user journeys are working as you expect
monitor applications for errors using an error tracking app such as Sentry
implement configuration management to set up repeatable monitoring

Using metrics-based monitoring

Collecting metrics on the performance of your service is useful for capacity planning and autoscaling. You should apply metrics-based monitoring to measure aggregated numerical data about your service and create Grafana dashboards to view metrics from your datasource, for example related to your infrastructure or application.

We recommend using Prometheus to gather metrics and monitor your service, as this is what many GDS teams are using. Amazon provides a Managed Prometheus service, which you may wish to use instead of hosting Prometheus yourself. You may also consider using CloudWatch Metrics instead for simpler use-cases where the complexity of Prometheus is not required.

This page was last reviewed on 20 November 2024. It needs to be reviewed again on 20 May 2025 by the page owner #gds-way .