What we would need, in order of priority:
- Metric collection with dashboard of graphs
- Prometheus + Grafana?
- Example metrics:
- Number of requests received
- Number of responses received
- Number of libp2p messages sent/received
- Fix Helm deployment
- We need to add a dependency so the nodes deploy after the Rendezvous Server
- Automated deployments after merges to
- Error Reporting
- Reports to a Discord channel
- Which service do we use?
- Durable, Searchable logs
- Cloudwatch Logs? ElasticSearch? Papertrail?
- End-to-end liveness test for Medusa (running every 5 minutes)
- We can run a script in Github Actions that checks the most recent request and triggers an error alert if a response has not been received for 5 minutes.