Documentation Index
Fetch the complete documentation index at: https://private-7c7dfe99-port-beta-badge-galaxy-extend.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Data resiliency
This page covers the disaster recovery recommendations for ClickHouse Cloud, and guidance for customers to recover from an outage. ClickHouse Cloud doesn’t currently support automatic failover, or automatic syncing across multiple geographical regions.Definitions
It is helpful to cover some definitions first. RPO (Recovery Point Objective): The maximum acceptable data loss measured in time following a disruptive event. Example: An RPO of 30 mins means that in the event of a failure the DB should be restorable to data no older than 30 mins. This, of course, depends on how frequently backups are taken. RTO (Recovery Time Objective): The maximum allowable downtime before normal operations must resume following an outage. Example: An RTO of 30 mins means that in the event of a failure, the team is able to restore data and applications and get normal operations going within 30 mins. Database Backups and Snapshots: Backups provide durable long-term storage with a separate copy of the data. Snapshots don’t create an additional copy of the data, are usually faster, and provide better RPOs.Database backups
Having a backup of your primary service is an effective way to utilize the backup and restore from it in the event of primary service downtime. ClickHouse Cloud supports the following capabilities for backups.- Default backups
- External backups (in the customer’s own storage bucket)
This feature isn’t currently available in PCI/ HIPAA services
- Configurable backups
Restoring from a Backup
- Default backups, in the ClickHouse Cloud bucket, can be restored to a new service in the same region.
- External backups (in customer object storage) can be restored to a new service in the same or different region.
Backup and restore duration guidance
Backup and restore durations depend on several factors such as the size of the database as well as the schema and the number of tables in the database. In our testing, we have seen smaller backups ~1 TB take up to 10-15 minutes or longer to back up. Backups less than 20 TB usually complete within an hour, and backing up ~50 TB of data should take 2-3 hours. Backups get economies of scale at larger sizes, and we have seen backups of up to 1 PB for some internal services complete in under 10 hours. We recommend testing with your own database or sample data to get better estimates as the actual duration depends on several factors as outlined above. Restore durations are similar to back up durations for similar size. As mentioned above, we recommend testing with your own database to have an idea of how long it will take to restore the backup.There is currently NO support for automatic failover between 2 ClickHouse Cloud instances whether in the same or different region.
There is currently NO automatic syncing of data between different ClickHouse Cloud services in the same or different regions .i.e. Active-Active replication
Recovery process
This section explains the various recovery options and the process that can be followed in each case.Primary service data corruption
In this case, the data can be restored from the backup to another service in the same region. The backup could be up to 24 hours old if using the default backup policy, or up to 6 hours old (if using configurable backups with 6 hours frequency).Restoration steps
To restore from an existing backup- Go to the “Backups” section of the ClickHouse Cloud console.
- Click on the three dots under “Actions” for the specific backup that you want to restore from.
- Give the new service a specific name and restore from this backup
Primary region downtime
You can export backups to your own cloud provider bucket. If you’re concerned about regional failures, we recommend exporting backups to a different region. Keep in mind that cross-region data transfer charges will apply. If the primary region goes down, the backup in another region can be restored to a new service in a different region. Once the backup has been restored to another service, you will need to ensure that any DNS, load balancer, or connection string configurations are updated to point to the new service. This may involve:- Updating environment variables or secrets
- Restarting application services to establish new connections
Backup / restore to an external bucket is currently not supported for services utilizing Transparent Data Encryption (TDE).
Additional options
There are some additional options to consider.- Dual-writing to separate clusters
- Utilize CSP replication