- Mailing Lists
- in
- Uber Reduces Database Lock Time by 94% with Major MySQL Fleet Upgrade
Archives
- By thread 3677
-
By date
- June 2021 10
- July 2021 6
- August 2021 20
- September 2021 21
- October 2021 48
- November 2021 40
- December 2021 23
- January 2022 46
- February 2022 80
- March 2022 109
- April 2022 100
- May 2022 97
- June 2022 105
- July 2022 82
- August 2022 95
- September 2022 103
- October 2022 117
- November 2022 115
- December 2022 102
- January 2023 88
- February 2023 90
- March 2023 116
- April 2023 97
- May 2023 159
- June 2023 145
- July 2023 120
- August 2023 90
- September 2023 102
- October 2023 106
- November 2023 100
- December 2023 74
- January 2024 75
- February 2024 75
- March 2024 78
- April 2024 74
- May 2024 108
- June 2024 98
- July 2024 116
- August 2024 134
- September 2024 130
- October 2024 141
- November 2024 99
Uber Reduces Database Lock Time by 94% with Major MySQL Fleet Upgrade
Uber Reduces Database Lock Time by 94% with Major MySQL Fleet Upgrade
The Future of AI, LLMs, and Observability on Google Cloud (Sponsored)Discover 7 key insights for technical leaders from Google’s Director of AI, Dr. Ali Arsanjani, and Datadog’s VP of Engineering, Sajid Mehmood. This ebook provides actionable insights around questions such as:
Disclaimer: The details in this post have been derived from the Uber Engineering Blog. All credit for the technical details goes to the Uber engineering team. The links to the original articles are present in the references section at the end of the post. We’ve attempted to analyze the details and provide our input about them. If you find any inaccuracies or omissions, please leave a comment, and we will do our best to fix them. MySQL serves as the backbone for Uber’s vast and complex operations. For many years, Uber relied upon MySQL version 5.7 to support business-critical features. However, in 2023, they decided to upgrade from MySQL version 5.7 to version 8. In this post, we’ll look at the need for this and the challenges Uber faced in such a large-scale upgrade. We will also investigate the solutions Uber used to achieve the upgrade without violating the Service-Level Objective (SLO). The Need for the UpgradeThe decision to upgrade Uber's MySQL infrastructure from version 5.7 to 8.0 was driven by several critical factors. First, MySQL 5.7 was reaching its end-of-life, meaning it would no longer receive security updates or bug fixes, leaving Uber's infrastructure vulnerable to potential security risks and operational instability. Upgrading to MySQL 8.0 mitigated these risks by ensuring ongoing support and security improvements. Additionally, MySQL 8.0 offered significant performance and concurrency enhancements such as:
Beyond performance, MySQL 8.0 introduced several new functionalities such as:
Overall, these performance, security, and operational benefits made the transition to MySQL 8.0 a critical move for Uber's data infrastructure. Workshop: Implementing Clean Architecture in Next.js (Sponsored)Lazar Nikolov and Sarah Guthals are hosting a free workshop on Implementing Clean Architecture in Next.js. It will dive deep into what clean architecture *actually* is, what problems it solves, and how to implement it in a Next.js application with Sentry. The Scale of The UpgradeUber’s MySQL infrastructure is vast, operating at a scale that supports its global platform operations. Here are some stats about the overall scale that shows the critical role of MySQL in Uber’s services:
Also, to ensure high availability and data redundancy, Uber employs a primary-secondary replication architecture. It works as follows:
Challenges with the UpgradeSeveral challenges had to be addressed during the upgrade of Uber’s MySQL fleet from version 5.7 to 8.0. Some of the major ones are as follows:
Uber conducted thorough regression checks and validation tests to ensure all existing systems and applications continued to work seamlessly with the upgraded database. This process included testing in a staging environment before making production upgrades. By validating every aspect of the system, Uber was able to mitigate the risk of any unexpected issues after the upgrade. Finally, Uber implemented automated rollback mechanisms to safeguard the upgrade process. In the event of any failures or compatibility issues during the upgrade, these mechanisms could automatically revert the changes, ensuring the maintenance of service continuity and data integrity. For instance, in the pre-maintenance stage, where the new MySQL 8.0 nodes operated as replicas, if performance issues or system degradation were detected, Uber could instantly roll back to MySQL 5.7 without any risk of data loss. The rollback capability was crucial for addressing any latency, resource consumption, or service degradation issues, allowing Uber to revert to a stable state until the issues were resolved. However, once a MySQL 8.0 node was promoted to the primary status, rolling back to MySQL 5.7 became more complex because replication between the new and old versions was no longer possible. In other words, Uber had to ensure everything was functioning correctly before promoting the new nodes to avoid irreversible complications. Upgrade StrategyWhen upgrading its massive MySQL infrastructure from version 5.7 to 8.0, Uber had two possible strategies to choose from: side-by-side upgrade and in-place upgrade. In-Place UpgradeAn in-place upgrade involves directly upgrading the existing MySQL installation to the new version (MySQL 8.0) on the same nodes. The process typically requires stopping the MySQL service, upgrading the software, and restarting it. While this method can be simpler in terms of setup, it also comes with significant drawbacks:
Due to these limitations, Uber decided against the in-place upgrade method. Side-by-Side UpgradeUber chose a side-by-side upgrade approach, which allowed for a smoother and less risky transition. See the diagram below: In this method, the new MySQL 8.0 nodes were set up and operated alongside the existing MySQL 5.7 nodes. This approach was more suitable for Uber’s infrastructure due to the following reasons:
Scaling the Upgrade Process with AutomationTo manage the complexity of upgrading such a large infrastructure, Uber implemented an automated workflow. With more than 2,100 clusters and over 16,000 nodes, upgrading each node manually was an impossible task. Automation ensured that the process was scalable, efficient, and free from human error. Two main aspects of this automation are:
Four-Stage Upgrade Process for MySQLUber’s MySQL upgrade from version 5.7 to 8.0 was carefully planned and executed in a four-stage process. This approach ensured minimal service disruption and allowed Uber to transition its massive data infrastructure safely. Let’s break down the four stages in simple terms: 1. Pre-Maintenance StageIn the pre-maintenance stage, new MySQL 8.0 nodes were added as replicas to the existing MySQL 5.7 clusters. A "node" here is a server running a MySQL instance. By adding these MySQL 8.0 nodes as replicas, they could work alongside the old 5.7 nodes without disrupting any operations. This setup ensured that the old system (MySQL 5.7) continued functioning normally while the new system (MySQL 8.0) was being integrated, allowing Uber to keep everything running smoothly. 2. System Monitoring (Soak Period)After setting up the MySQL 8.0 nodes, Uber entered the system monitoring stage, also known as the "soak period." This stage lasted for about a week and was crucial for testing the new system under real-world conditions. During this time, Uber monitored the MySQL 8.0 nodes as they handled real production traffic (read operations), checking for issues such as slow performance, errors, or increased resource usage. This period was essential to detect potential problems before making the final switch to MySQL 8.0. 3. Maintenance StageOnce the soak period confirmed that everything was working smoothly, Uber moved to the maintenance stage. In this phase, the MySQL 8.0 node was promoted to primary status, meaning it now handled all write operations and became the main database for that cluster. This promotion marked the point where MySQL 8.0 officially became the main database, while the MySQL 5.7 nodes were demoted or turned off for write traffic. 4. Post-Maintenance StageFinally, in the post-maintenance stage, Uber removed all the old MySQL 5.7 nodes that were no longer needed. At this point, the new MySQL 8.0 nodes were fully operational, and all traffic (both read and write) was being handled by the new system. By completing this step, Uber successfully transitioned to the new version, ensuring that the system was upgraded without any data loss or significant service disruptions. Issues During UpgradeDuring the upgrade of Uber’s MySQL infrastructure to version 8.0, several issues were encountered that required careful handling and technical solutions to ensure the system continued to run smoothly. Here’s a breakdown of the key problems and how they were addressed: Query Execution Plan ChangesOne of the major issues that Uber faced was related to changes in the query execution plans in MySQL 8.0. A query execution plan is the path the database system uses to retrieve data. In some clusters, MySQL 8.0 chose different paths compared to version 5.7, leading to increased latencies (delays) and higher resource consumption. These changes could slow down certain operations, affecting the performance of dashboards and other tools that relied on quick access to data. For instance, clusters powering key dashboards at Uber experienced noticeable slowdowns. Uber worked with Percona, a database consulting company, to develop a patch that optimized the execution plans for the affected clusters. By applying this patch, Uber was able to restore performance and reduce resource consumption, bringing the system back to optimal operation. Unsupported Queries and ConfigurationsMySQL 8.0 introduced new syntax rules and stricter configurations, which caused some queries that worked in MySQL 5.7 to fail after the upgrade. Specifically, some clusters didn’t have the STRICT_TRANS_TABLES SQL mode enabled, which is a default setting in MySQL 8.0. This mode enforces stricter rules on handling invalid or missing data. Uber had to carefully adjust configurations and rewrite certain queries to align with MySQL 8.0’s new syntax and rules. For example, they enabled the STRICT_TRANS_TABLES and ONLY_FULL_GROUP_BY modes, which made the system more robust but required changes to some of the legacy queries and applications. Collation and Character Set ChangesMySQL 8.0 also brought changes to the default character set and collation. The character set controls how text is stored, and the collation determines how text is compared. In MySQL 5.7, Uber had been using the utf8mb4_unicode_520_ci collation, but MySQL 8.0 switched to the new utf8mb4_0900_ai_ci collation. This change in the default character set and collation caused issues with sorting and comparing text data across different clusters, particularly when dealing with different languages or special characters. The system needed consistency in collation settings to function correctly, but this shift created mismatches. Uber had to align the collation settings across its systems to ensure all nodes used the same character set and collation. This required detailed configuration changes and testing to ensure compatibility and proper sorting behavior across all clusters. Client Library IncompatibilityMany client libraries that Uber used to connect to the MySQL database were not initially compatible with MySQL 8.0. Client libraries are essential for applications to communicate with the database, and outdated versions of these libraries did not support some of the new features and functions introduced in MySQL 8.0. Without updating these libraries, Uber’s applications couldn’t fully utilize the benefits of MySQL 8.0, and some applications experienced failures or errors when trying to connect to the upgraded database. Uber upgraded these client libraries across its systems. This process involved rigorous testing in a staging environment to ensure that all client libraries worked properly with MySQL 8.0 before the full upgrade. Once the testing was complete, the libraries were deployed in production, ensuring a smooth transition. Improvements After The UpgradeThe upgrade to MySQL 8.0 brought significant performance improvements to Uber’s infrastructure, both on the server side and client side. Let’s look at both. Server-Side Performance:
Client-Side Performance:
ConclusionThrough careful planning, automation, and a phased rollout strategy, Uber successfully transitioned its vast data systems with minimal downtime and disruption. The new version brought significant benefits in terms of performance, security, and functionality, helping Uber improve its operational efficiency and user experience. Some key learnings are as follows:
References: SPONSOR USGet your product in front of more than 1,000,000 tech professionals. Our newsletter puts your products and services directly in front of an audience that matters - hundreds of thousands of engineering leaders and senior engineers - who have influence over significant tech decisions and big purchases. Space Fills Up Fast - Reserve Today Ad spots typically sell out about 4 weeks in advance. To ensure your ad reaches this influential audience, reserve your space now by emailing sponsorship@bytebytego.com
© 2024 ByteByteGo |
by "ByteByteGo" <bytebytego@substack.com> - 11:35 - 22 Oct 2024