When it comes time for moving a database to the cloud, or to another cloud provider, there are several approaches to consider. Over the past ten years there has been an immense amount of innovation that continues to challenge our existing approaches, while guiding us into alternative methods, of managing information. Yet, storing data for applications is still required. While there are more options now than ever, everyone’s good old friend, the relational database management system (RDBMS), is still a powerhouse in this arena and MySQL is an undisputed heavyweight. The major cloud providers all offer a managed version of MySQL and chances are good that there is a MySQL database somewhere in your stack and if you are like many of Caserta’s clients, those databases play an important role in your business.
Downtime, complexity, level of effort, and the rollback strategy should all influence the approach to moving a database, whether from on-prem to the cloud or from one cloud provider to another. The choice of using a managed database will also inform and guide this decision. Managed databases such as Azure Database for MySQL, Amazon RDS, Amazon RDS Aurora, and Google Cloud SQL deliver ease of use as well as scalability, performance, and convenience.
Instead of having an internal team provisioning cloud hardware, installing software, configuring the database and operating system, adding monitoring, managing patching and adding hardware capacity to handle growth, the cloud provider handles all of these tasks in an automated and convenient way. Cloud providers will offer a version of the database that has been optimized for operation in that cloud. Using managed databases is highly recommended in most situations and there are only a few cases where using them may not be advantageous.
There are three traditional ways of migrating MySQL data: Backup and Restore, Export/Import, and Replication.
Backup and Restore
One of the oldest solutions is to safely make a copy of the files that constitute the database and then copy those files to another location where the database software can access them, allowing a new database instance to be started and used. In a disaster recovery solution, the backup files are brought in to restore the database to its previous state, hence the name “backup and restore.” In a managed cloud environment, providers usually do not allow access to the underlying files due to the nature of the service. Part of the management approach is to allow controlled access to the underlying system along with strict controls governing configuration of database parameters and options. Therefore, in a managed solution, the traditional “Backup and Restore” approach is not going to be an available option. Even if access is available to the required files on-prem, it is unlikely that a facility will exist to allow them to be loaded into a managed cloud instance.
This is the recommended approach if downtime can be scheduled. Data that is stored in the database files is written out via the database engine using a format that can be read into another database. In MySQL the mysqldump program is used to perform this extract and the data is typically written as text-based insert statements. Most managed database solutions provide a managed import where the extract file can be read from storage and then applied to the database. Most managed database offerings also offer administrators the ability to apply this extract file against the managed database instance directly using the mysql command line interface. When working with certain managed database solutions, it may not be possible to put the database into a read only mode, therefore requiring the extract to be conducted without user interaction. A simple approach would be to shutdown or block application server access to the database. Another option would be to take an extract from a copy of the database that is not being used in the application stack. However, this will mean that the copy that the extract is based on and the actual production database users use, will drift and be out of sync. The recommended approach is to schedule a maintenance window or an outage for the time it takes to backup and restore the database.
Although a bit more complex than the other options described above, replication provides a great deal of flexibility and may be what is needed if export/import does not fit the bill. It is normally set up by taking a copy of a source database (called the master) and, after starting up that copy (called the replica), connecting the replica to the master to read and apply changes (inserts, updates, deletes) made on the master. These changes will eventually be replicated to the replicas and even though the term ‘eventually’ is used, this is normally measured in milliseconds. It is possible to leverage multiple replicas at a time. One reason for using replication in this fashion is to provide horizontal scalability where changes can be made against the master while read queries can be spread across all replicas and the master allowing for better response time, or to dedicate a replica for certain analytics activities while maintaining performance benchmarks. Another great use of replication is to prepare a database to be moved, setting up replication in a target environment and letting replication sync up with the master. At that time or some time in the future that node can be provisioned as the new master. Most managed MySQL instances currently use version 5.x and therefore binary logging (with or without GTID) is used. As replication is a bit more complex, what is supported in the managed solution will dictate migration options. For example, at the time of writing, AWS does not support GTID with RDS Aurora or any RDS MySQL version other than 5.7, while Google Cloud SQL insists on it. In this case, it is not possible to establish direct replication between managed MySQL servers between these two providers unless AWS RDS MySQL 5.7 is in use. It should be noted that RDS Aurora is generally preferred on AWS for performance reasons. It is also important to note that the cloud providers may use a proprietary mechanism for providing replicas. Such is the case with AWS Aurora which leverages volume management. While this uses a different replication method as outlined in the MySQL documentation the end result of having a performant replica is the same.
With so many options available such as using managed databases or not, the type of data transfer strategy employed, the frequency selected (near real time, daily, monthly, etc.) multiplied by the number of different configurations (internal or external to the cloud, external on-prem or external cloud), there are numerous ways to configure a straightforward or more sophisticated hybrid approach for moving data where the cloud is involved.
Real World Use Case
On a recent project, a leading manufacturer had specific operations goals around a data migration from one cloud provider to another. Caserta was engaged to migrate managed databases from AWS to GCP factoring in the most important goal: minimum downtime. Another important goal was to stand up and run the solution stack in GCP in parallel with the stack in AWS with full functionality until such time as the AWS stack was no longer in operation. One of the proposals Caserta brought forward was to use replication between AWS and GCP, but since direct replication between AWS RDS and GCP Cloud SQL is not possible, a MySQL instance installed on Google Compute Engine (GCE) was introduced to interface with RDS. With the self-managed MySQL on GCE instance in place, multi-master replication (i.e. bidirectional replication) between AWS and GCP was achieved. Since AWS RDS does not support GTID, the self-managed MySQL instance on GCE in GCP was not configured to use GTID-based replication. This configuration allowed for full functionality between the two stacks. Once the AWS solution was shutdown during a small maintenance window, GTID was set up on the self-managed MySQL server and configured as a master node with a managed Cloud SQL replica, enabling the movement of the data to GCP Cloud SQL. At a later maintenance window, the GCP Cloud SQL instance was promoted to master, thus completing the entire journey in only a few steps with minimal downtime during regular maintenance windows.