Migrating your information warehouse workloads is among the most difficult but important duties for any group. Whether or not the motivation is the expansion of what you are promoting and scalability necessities or lowering the excessive license and {hardware} price of your present legacy techniques, migrating will not be so simple as transferring recordsdata. At Databricks, our Skilled Companies crew (PS), has labored with lots of of consumers and companions on migration tasks and have a wealthy document of profitable migrations. This weblog publish will discover greatest practices and classes discovered that any information skilled ought to take into account when scoping, designing, constructing, and executing a migration.
5 phases for a profitable migration
At Databricks, we have now developed a five-phase course of for our migration tasks based mostly on our expertise and experience.
Earlier than beginning any migration challenge, we start with the discovery section. Throughout this section, we intention to grasp the explanations behind the migration and the challenges of the present legacy system. We additionally spotlight the advantages of migrating workloads to the Databricks Knowledge Intelligence Platform. The invention section includes collaborative Q&A classes and architectural discussions with key stakeholders from the client, Databricks. Moreover, we use an automatic discovery profiler to realize insights into the legacy workloads and estimate the consumption prices of the Databricks Platform to calculate TCO discount.
After finishing the invention section, we transfer on to a extra in-depth evaluation. Throughout this stage, we make the most of automated analyzers to guage the complexity of the present code and acquire a high-level estimate of the hassle and price required. This course of offers priceless insights into the structure of the present information platform and the purposes it helps. It additionally helps us refine the scope of the migration, eradicate outdated tables, pipelines, and jobs, and start contemplating the goal structure.
Within the migration technique and design section, we’ll finalize the small print of the goal structure and the detailed design for information migration, ETL, saved process code translation, and Report and BI modernization. At this stage, we may even map out the expertise between the supply and goal property. As soon as we have now finalized the migration technique, together with the goal structure, migration patterns, toolings, and chosen supply companions, Databricks PS, together with the chosen SI companion, will put together a migration Assertion of Work (SOW) for the Pilot (Part I) or a number of phases for the challenge. Databricks has a number of licensed Migration Brickbuilder SI companions who present automated tooling to make sure profitable migrations. Moreover, Databricks Skilled Companies can present Migration Assurance providers together with an SI companion.
After the assertion of labor (SOW) is signed, Databricks Skilled Companies (PS) or the chosen Supply Accomplice carries out a manufacturing pilot section. On this section, a clearly outlined end-to-end use case is migrated to Databricks from the legacy platform. The info, code, and reviews are modernized to Databricks utilizing automated instruments and code converter accelerators. Finest practices are documented, and a Dash retrospective captures all the teachings discovered to establish areas for enchancment. A Databricks onboarding information is created to function the blueprint for the remaining phases, that are usually executed in parallel sprints utilizing agile Scrum groups.
Lastly, we progress to the full-fledged Migration execution section. We repeat our pilot execution method, integrating all the teachings discovered. This helps in establishing a Databricks Middle of Excellence (CoE) throughout the group and scaling the groups by collaborating with buyer groups, licensed SI companions, and our Skilled Companies crew to make sure migration experience and success.
Classes discovered
Assume Massive, Begin Small
It is essential through the technique section to completely perceive what you are promoting’s information panorama. Equally essential is to check a number of particular end-to-end use circumstances through the manufacturing pilot section. Regardless of how nicely you propose, some points might solely come up throughout implementation. It is higher to face them early to search out options. An effective way to decide on a pilot use case is to begin with the top aim – for instance, choose a reporting dashboard that is essential for what you are promoting, work out the information and processes wanted to create it, after which strive creating the identical dashboard in your goal platform as a take a look at. This will provide you with a good suggestion of what the migration course of will contain.
Automate the invention section
We start through the use of questionnaires and interviewing the database directors to grasp the scope of the migration. Moreover, our automated platform profilers scan by means of the information dictionaries of databases and hadoop system metadata to offer us with precise data-driven numbers on CPU utilizations, % ETL vs % BI utilization, utilization patterns by varied customers, and repair principals. This data may be very helpful in estimating the Databricks prices and the ensuing TCO Financial savings. Code complexity analyzers are additionally priceless as they supply us with the variety of DDLs, DMLs, Saved procedures, and different ETL jobs to be migrated, together with their complexity classification. This helps us decide the migration prices and timelines.
Leverage Automated Code Converters
Using automated code conversion instruments is important to expedite migration and decrease bills. These instruments assist in changing legacy code, akin to saved procedures or ETL, to Databricks SQL. This ensures that no enterprise guidelines or features applied within the legacy code are neglected as a result of lack of documentation. Moreover, the conversion course of usually saves builders over 80% of improvement time, enabling them to promptly evaluation the transformed code, make obligatory changes, and concentrate on unit testing. It’s essential to make sure that the automated tooling can convert not solely the database code but in addition the ETL code from legacy GUI-based platforms.
Past Code Conversion—Knowledge Issues Too
Migrations typically create a deceptive impression of a clearly outlined challenge. Once we take into consideration migration, we normally concentrate on changing code from the supply engine to the goal. Nevertheless, it is essential to not overlook different particulars which might be essential to make the brand new platform usable.
For instance, it’s essential to finalize the method for information migration, much like code migration and conversion. Knowledge migration might be successfully achieved through the use of Databricks LakeFlow Join the place relevant or by selecting considered one of our CDC Ingestion companion instruments. Initially, through the improvement section, it might be obligatory to hold out historic and catch-up hundreds from the legacy EDW, whereas concurrently constructing the information ingestion from the precise sources to Databricks. Moreover, you will need to have a well-defined orchestration technique utilizing Databricks Workflows, Delta Dwell Tables, or related instruments. Moreover, your migrated information platform ought to align together with your software program improvement and CI/CD practices earlier than the migration is taken into account full.
Do not ignore governance and safety
Governance and safety are different parts which might be typically neglected when designing and scoping a migration. No matter your present governance practices, we suggest utilizing the Unity Catalog at Databricks as your single supply of reality for centralized entry management, auditing, lineage, and information discovery capabilities. Migrating and enabling the Unity Catalog will increase the hassle required for the entire migration. Additionally, discover the distinctive capabilities that a few of our Governance companions present.
Knowledge Validation and Person Testing is important for profitable migration
It’s essential for the success of the challenge to have correct information validation and energetic participation from enterprise Topic Matter Specialists (SMEs) throughout Person Acceptance Testing section. The Databricks migration crew and our licensed System Integrators (SIs) use parallel testing and information reconciliation instruments to make sure that the information meets all the information high quality requirements with none discrepancies. Robust alignment with executives ensures well timed and centered participation of enterprise SMEs throughout user-acceptance testing, facilitating a fast transition to manufacturing and settlement on decommissioning older techniques and reviews as soon as the brand new system is in place.
Make It Actual – operationalize and observe your migration
Implement good operational greatest practices, akin to information high quality frameworks, exception dealing with, reprocessing, and information pipeline observability controls, to seize and report course of metrics. This may assist establish and report any deviations or delays, permitting for fast corrective actions. Databricks options like Lakehouse Monitoring and our system billing tables assist in observability and FinOps monitoring.
Belief the specialists
Migrations might be difficult. There’ll at all times be tradeoffs to stability and surprising points and delays to handle. You want confirmed companions and options for the folks, course of, and expertise facets of the migration. We suggest trusting the specialists at Databricks Skilled Companies and our licensed migration companions, who’ve in depth expertise in delivering high-quality migration options in a well timed method. Attain out to get your migration evaluation began.