How to monitor your service
In GDS, we follow the Service Manual guidance on how to monitor the status of services and set performance metrics.
We recommend using Pingdom to monitor your service’s availability from outside our network. To further make sure your service is working, you should:
- run regular smoke tests using a browser automation app such as Selenium
- implement a tool to ensure user journeys are working as you expect
- monitor applications for errors using an error tracking app such as Sentry
- implement configuration management to set up repeatable monitoring
Using metrics-based monitoring
Collecting metrics on the performance of your service is useful for capacity planning and autoscaling. You should apply metrics-based monitoring to measure aggregated numerical data about your service and create Grafana dashboards to view metrics from your datasource, for example related to your infrastructure or application.
We recommend using Prometheus to gather metrics and monitor your service, as this is what many GDS teams are using. Amazon provides a Managed Prometheus service, which you may wish to use instead of hosting Prometheus yourself. You may also consider using CloudWatch Metrics instead for simpler use-cases where the complexity of Prometheus is not required.