- Decision making
- KPIs and OKRs
- Remote Work
- Asset Management Policy
- Business Continuity & Disaster Recovery Policy
- Data Management Policy
- Information Security Roles and Responsibilities
- Operations Security Policy
- Risk Management Policy
- Secure Development Policy
- Third-Party Risk Management Policy
- Human Resources Security Policy
- Access Control Policy
- Incident Response Plan
- Cryptography Policy
- Information Security Policy and Acceptable Use Policy
- Node-RED Dashboard
- Pricing Principles
- Product Categories
- Customer department
- Engineering & Design Practices
- Front End
- Packaging Guidelines
- Platform Ops
- Security Policy
- Website A/B Testing
- Internal Operations
- People Ops
- Sales & Marketing
Observability is the ability to understand the internal state and behavior of a system by analyzing its outputs, without requiring knowledge of its internal workings. In the context of DevOps, this means having a holistic view of your applications and infrastructure, including their health, performance, and any potential issues.
# Tools we use
Prometheus is used to monitor the health of our applications and infrastructure. It collects metrics from various sources, including our applications, Kubernetes, and AWS. We use it to monitor the following:
- Application Metrics: Prometheus collects metrics from our applications, including HTTP requests, database queries, and background jobs.
- Kubernetes Metrics: Prometheus collects metrics from Kubernetes, including CPU and memory usage, pod status, and network traffic.
Loki is a log aggregation system designed to work seamlessly with Prometheus. We use Loki to collect, store, and query logs from our applications and infrastructure. It complements Prometheus by providing a way to analyze logs alongside metrics.
Grafana is a popular open-source platform for creating, sharing, and managing dashboards. It complements Prometheus and Loki by providing a unified interface for visualizing and analyzing observability data. Key features include:
- Data Source Integration: Grafana supports various data sources, including Prometheus and Loki, making it an ideal choice for aggregating and visualizing metrics and logs in one place.
- Customizable Dashboards: Grafana offers extensive customization options, enabling you to build tailored dashboards that provide the insights you need.
- Alerting: You can set up alerting rules in Grafana to proactively monitor your systems based on your metrics and logs data.
# AWS CloudWatch
AWS CloudWatch is a monitoring and observability service that provides data and actionable insights for AWS, hybrid, and on-premises applications and infrastructure resources. In our case we use it to monitor infrastructure-related resources in AWS.
# Uptime Robot
Uptime Robot is used to monitor our public facing sites, including FlowFuse Cloud. This polls each endpoint at regular intervals and raises an alarm if an error is detected. The alerts are sent to
#ops-uptime-alerts in slack and emailed to the CTO.