Cloud and Intercloud Migration: Assessment and Failure Modes
With the financial industry in general and the data world specifically becoming ever more remote and mobile, an increasing number of firms are seeing the benefits of hosting their data capabilities in the cloud.
Cloud hosted data makes it easier for information to flow through your organization and helps break down siloes and remove inefficiencies. This is especially true in the post-COVID world, where an increasing number of employees are working from home or other remote locations instead of in a traditional office setting.
However, migrating to a new cloud server is not always a straightforward process and requires as much care and strategizing as any other business process. This is true whether you are migrating from physical infrastructure to the cloud, or from one cloud provider/server to another.
Assessment
Like every solid corporate strategy, cloud migration must begin with a comprehensive assessment of current capabilities and a clear vision of what you want to achieve with the process.
You need to ensure inventory is up to date such as how fresh and dependable your data sources are and what gaps might exist. You also need to assess how much downtime workloads can tolerate during the process and calculate whether zero-downtime can be weighed against added migration complexity. These tend to be a zero-sum game which means you can’t have your cake and eat it when it comes to these factors.
"Assess which workloads can afford a downtime, and the maximum length of time that those downtimes can be,” advises Google Cloud. "Migrating workloads while experiencing zero or nearly zero downtimes is harder than migrating workloads that can afford downtimes. To complete a zero-downtime migration, you need to design for and implement redundancy for each workload to migrate. You also need to coordinate these redundant instances.”
If a workload supports clustering and redundancy, you can deploy multiple instances of that workload, even across different environments, such as the source environment and the target environment. Therefore, assessing which workloads support these factors will help your migration run far more smoothly. Likewise, you need to assess the configuration of your workloads and how these might differ post-migration.
"Consider how you roll out updates to the configuration of each workload that you want to migrate,” continues Google Cloud. "This consideration is critical for the success of your migration because you might have to update the configuration of your workloads while you migrate them to the target environment.”
Failure Modes
Migrating to the cloud doesn’t always run smoothly and you need to be prepared for things to go awry. Adequately anticipating these bumps in the road can help ensure you don’t expose your workloads to conditions from which they may not be able to recover. For example:
- What happens if a workload loses connectivity to the network?
- Is a workload able to resume its work from where it left off after being stopped?
- What happens if the performance of a workload or its dependencies is inadequate?
- What happens if there are two workloads that have the same identifier in the architecture?
- What happens if a scheduled task doesn't run?
- What happens if two workloads process the same request?
Gathering information about expected modes of failure and their potential effects and assessing which can be recovered from and which would represent an irreversible fail state. By preparing for failure, you will dramatically decrease the chances of it occurring. This can be achieved by running simulations of these occurrences and testing methods of countering them in a consequence free environment.
"After you assess those failure modes and their effects, validate your findings in a non-critical environment by simulating failures and injecting faults that emulate those failure modes,” continues Google. "For example, if a workload is designed to automatically recover after a network connectivity loss, validate the automatic recovery by forcibly interrupting its connectivity and restoring it afterwards.”
Of course, it’s impossible to prepare for all outcomes and eventualities, but by adding these considerations into your planning stage, you can make sure you are as prepared as humanly possible for failure modes and remain agile and effective should they occur.
Final Thoughts
Initial assessment and failure mode planning are two of the best practices you can implement when embarking on any cloud or intercloud migration strategies. Business continuity should be your keenest focus and, by covering these bases as thoroughly as possible, you give yourself the greatest chance possible of a smooth transition in the first place and being able to respond effectively should things not go to plan.