- Decision making
- KPIs and OKRs
- Remote Work
- Asset Management Policy
- Business Continuity & Disaster Recovery Policy
- Cryptography Policy
- Data Management Policy
- Information Security Roles and Responsibilities
- Operations Security Policy
- Risk Management Policy
- Secure Development Policy
- Third-Party Risk Management Policy
- Human Resources Security Policy
- Access Control Policy
- Incident Response Plan
- Information Security Policy and Acceptable Use Policy
- Node-RED Dashboard
- Pricing Principles
- Product Categories
- Development & Design Practices
- Front End
- How We Work
- Markdown How-To
- Using Git
- Website A/B Testing
- Internal Operations
- People Ops
- Sales & Marketing
- Boiler Plate Descriptions
- Content Channels
- Content Types
- Marketing - Website
# Business Continuity & Disaster Recovery Policy
|Policy owner||Effective date|
The purpose of this business continuity plan is to prepare FlowFuse in the event of service outages caused by factors beyond our control (e.g., natural disasters, man-made events), and to restore services to the widest extent possible in a minimum time frame.
This policy applies to the FlowFuse Cloud production environment, and any individual instances of the platform FlowFuse manages on behalf of customers.
It also applies to any systems used in support of the business critical operations of the company.
# Continuity of Critical Services
As an all-remote company, our business-critical systems are provided by third-party vendors and we rely on their SLAs in the event of any outage.
# FlowFuse Platform General Disaster Recovery Procedures
In the event of a situation that has material impact to the FlowFuse Cloud platform, or any equivalent system managed by FlowFuse on behalf of our customers, the following procedures apply.
# Notification Phase
This phase deals with the initial identification of a situation that impacts a production system. The procedure is as follows:
- An incident is identified. Full details are relayed to the CTO.
- The CTO announces an incident in Slack and directs the engineering/devops teams to gather information and assess the impact and estimated recovery time.
- If the incident will result in a prolonged outage or otherwise, the CTO will activate the Recovery phase.
- The CTO will notify the necessary teams of this decision, including the CEO.
# Recovery Phase
The goal is to fully restore the affected system within 24 hours of the outage and to minimise further disruption for the affected users.
The following steps should be taken. The CTO will coordinate these actions with the required teams.
- Notify affected users to begin initial communication
- Assess damage to the environment
- Create a new production environment
- Ensure new environment is properly secured
- Deploy platform code to new environment
- Restore data backup to new environment
- Verify platform deployment
- Verify logging, monitoring and alerting functionality
- Update DNS and other necessary records to point to the new environment
- Notify affected users through established channels
# Resolution Phase
Once the system has been restored, the CTO will initiate a post mortem of the event to ensure any lessons from the outage and subsequent recovery can be reviewed and captured.