A backup of a data source is useful only if data can be restored from it. If backups aren’t tested, you might find yourself in a situation where your workload has been impacted by an event and you need to recover data from your backups, but the backups are faulty and restoring data is not possible, or exceeds your RTO. To avoid such situations, backups taken should always be tested to ensure they can be used to recover data.
In this lab, you will leverage AWS Lambda to automatically test all backups created to ensure recovery is successful, and clean up any resources that were created during the restore test process to save on cost. This will ensure you are aware of any faulty backups that might be unusable to recover data from. Automating this process with notifications enabled will ensure there is minimal operational overhead and that the Operations teams are aware of backup and restore statuses.
When testing recovery, it is important to define the criteria for successful data recovery from the restored resource. This will depend on a variety of factors such as the data source, the type of data, the margin for error, etc. Organizations and workload owners are responsible for defining this success criteria.
The EC2 Instance that was created as part of this lab is running a simple web application. For this use-case, I have determined that data recovery is successful if the application is running on the restored resource as well. If the restored resource is missing any application critical files, the healthchecks made against the restored resource will fail, indicating an issue with the backup.
For the purpose of this lab, we will simulate the action performed by AWS Backup when creating backups of data sources by creating an on-demand backup to see if the backup is successful. Once the backup is completed, you will receive a notification stating that the backup job has completed and the lambda function will get invoked. The Lambda function will make API calls to start restoring data from the backup that was created. This will help ascertain that the backup is good. Once the restore process has been completed, you will receive another notification confirming this, and the lambda function will get invoked again to clean up new resources that were created as part of the restore. Once the cleanup has been completed, you will receive one last notification confirming cleanup.
Access AWS Management Console
In RESOURCE TYPE section, select EC2, paste Instance ID from Output of CloudFormation Stack.
Return to AWS Management Console interface
In the CloudWatch interface
/aws/lambda/RestoreTestFunction-<YOUR CLOUDFORMATION STACK NAME>
)In the Log groups interface