Archives
- By thread 3652
-
By date
- June 2021 10
- July 2021 6
- August 2021 20
- September 2021 21
- October 2021 48
- November 2021 40
- December 2021 23
- January 2022 46
- February 2022 80
- March 2022 109
- April 2022 100
- May 2022 97
- June 2022 105
- July 2022 82
- August 2022 95
- September 2022 103
- October 2022 117
- November 2022 115
- December 2022 102
- January 2023 88
- February 2023 90
- March 2023 116
- April 2023 97
- May 2023 159
- June 2023 145
- July 2023 120
- August 2023 90
- September 2023 102
- October 2023 106
- November 2023 100
- December 2023 74
- January 2024 75
- February 2024 75
- March 2024 78
- April 2024 74
- May 2024 108
- June 2024 98
- July 2024 116
- August 2024 134
- September 2024 130
- October 2024 141
- November 2024 73
-
How can US manufacturing and construction address skilled-worker shortages?
On Point
6 figures, no college Brought to you by Liz Hilton Segel, chief client officer and managing partner, global industry practices, & Homayoun Hatami, managing partner, global client capabilities
—Edited by Jana Zabkova, senior editor, New York
This email contains information about McKinsey's research, insights, services, or events. By opening our emails or clicking on links, you agree to our use of cookies and web tracking technology. For more information on how we use and protect your information, please review our privacy policy.
You received this newsletter because you subscribed to the Only McKinsey newsletter, formerly called On Point.
Copyright © 2024 | McKinsey & Company, 3 World Trade Center, 175 Greenwich Street, New York, NY 10007
by "Only McKinsey" <publishing@email.mckinsey.com> - 11:06 - 30 Apr 2024 -
How to Execute End-to-End Tests at Scale
How to Execute End-to-End Tests at Scale
Running E2E tests reliably and efficiently is a critical piece of the puzzle for any software organization. There are mainly two expectations software teams have when it comes to testing: Ship as fast as possible without introducing (or reintroducing) bugs͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ Forwarded this email? Subscribe here for moreRunning E2E tests reliably and efficiently is a critical piece of the puzzle for any software organization.
There are mainly two expectations software teams have when it comes to testing:
Ship as fast as possible without introducing (or reintroducing) bugs
Run tests as cheaply as possible without compromising on quality.
In today’s issue, we are fortunate to host guest author John Gluck, Principal Testing Advocate at QA Wolf. He’ll be sharing insights into QA Wolf’s specialized infrastructure capable of running thousands of concurrent E2E tests in just a few minutes and meeting the expectations of their customers.
QA Wolf is a full-service solution for mid-to-large product teams who want to speed up their QA cycles and reduce the cost of building, running, and maintaining comprehensive regression test coverage.
Also, Mufav Onus of QA Wolf spoke at Kubecon 2024 in Paris about how they automatically resume pods on spot instances after unexpected shutdowns. Take a look.
The Challenge of Running E2E Tests
Running E2E tests efficiently is challenging for any organization. The runners tend to cause resource spikes, which cause tests and applications to behave unpredictably. That’s why it’s fairly common for large product teams to strategically schedule their test runs. As the number of tests and the number of runs increases, the challenges become exponentially more difficult to overcome.
While the largest companies in the world may run 10,000 end-to-end tests each month, and a handful run 100,000, QA Wolf runs more than 2 million. At our scale, to support the number of customers that we do, our infrastructure has to address three major concerns:
Availability - Customers can execute their tests at any time with no restrictions on the number or frequency of parallel runs. The system must be highly available. We can’t use scheduling tricks to solve this.
Speed - Tests need to run fast. DORA recommends 30 minutes as the maximum time for test suite execution, and customers want to follow this principle.
Reliability - Node contention issues, instance hijacking, and test system execution outages (the things in-house test architects deal with on a regular basis) are simply not tolerable when people are paying you to execute their tests.
For better or worse, StackOverflow didn’t have blueprints for the kind of test-running infrastructure we needed to build. Success came from lots of experimentation and constant refinement.
In this post, we discuss the problems we faced and the decisions we made so that we could solve them through experimentation.
The Tech Stack Breakdown
To set the stage, we are completely cloud-native and built our infrastructure on the Google Cloud Platform (GCP).
We went with GCP for its GKE (Kubernetes) implementation and cluster autoscaling capabilities, which are critical for handling the demand for test execution nodes. There are similar tools out there, but our engineers also had previous experience with GCP, which helped us get started.
We adopted a GitOps approach so we could run lots of configuration experiments on our infrastructure quickly and safely without disrupting ongoing operations.
Argo CD was a good choice because of its support for GitOps and Kubernetes. A combination of Helm and Argo Workflows helps make the deployment process consistent and organized. We used Argo CD Application Sets and App of Apps patterns which are considered best practices.
For IaC, we chose Pulumi because it’s open source, and unlike Terraform, it doesn’t force developers to adopt another DSL (Domain-Specific Language)
Lastly, we used Typescript to write the tests. Our customers look at the test code written for them, and Typescript makes it easy to understand. We chose Playwright as the test executor and test framework for multiple reasons, such as:
Playwright can handle the complex tests that customers may need to automate.
Simpler APIs and an easier install prevent customers from being locked into our solution.
It’s backed by Microsoft, and more active development is expanding the list of native capabilities.
The Ecosystem
For the infrastructure ecosystem, we went with one VPC and three main application clusters.
Each of the three clusters has a specific role:
The application cluster
The test instance or runner cluster
Operations cluster
The operations cluster is the primary cluster and manages the other two clusters. Argo CD runs within this cluster.
See the diagram below that shows this arrangement.
At the time of startup, the operations cluster creates both the application and runner clusters. It provisions warm nodes on the runner cluster, each containing two pods, and each pod is built on a single container image.
See the diagram below for reference:
This structure is fully expendable. Our developers can tear down the entire system and rebuild it from scratch with the touch of a button, which increases predictability for developers and is also great for supporting disaster recovery.
GKE’s cluster autoscaler scales the warm nodes on the runner cluster up and down based on demand.
Latest articles
If you’re not a paid subscriber, here’s what you missed.
To receive all the full articles and support ByteByteGo, consider subscribing:
The Customer-Facing Application
The customer-facing application is a specialized IDE where our QA engineers can write, run, and maintain Playwright tests. It has views for managing configuration and third-party integrations with visualization dashboards.
Writing Tests
The tests built and maintained by our in-house QA engineers are autonomous, isolated, idempotent, and atomic, so they can run predictably in a fully parallelized context.
When a QA engineer saves a test, the application persists the code for the test onto GCS with its corresponding helpers and any associated parsed configuration needed to run it in Playwright. This is the Run Data File for the test. In case you don’t know, GCS is the GCP equivalent of AWS S3.
In the initial implementation, we tried passing the Run Data File as a payload in HTML, but the payload containing the test code for all tests in a run was too large for Kubernetes etcd. To get around this, we took the path of least resistance by writing all the code to a central file and giving the client a reference to the file location to pass back to the application.
The Execution Flow
As mentioned earlier, we orchestrate runs with Argo Workflows because it can run on a Kubernetes cluster without external dependencies.
Customers or QA engineers can use a scheduler in the application or an API call to start a test run. When a test run is triggered, the application gathers the locations of all necessary Run Data Files. It also creates a new database record for each test run, including a unique build number that acts as an identifier for the test run request. The application uses the build number later to associate system logs and video locations.
Lastly, it passes the Run Data file locations list to the Run Director service.
The below diagram depicts the entire execution flow at a high level.
The Run Director
The Run Director is a simple, long-living, horizontally-scalable HTTP service.
When invoked, the Run Director reports the initial test run status to the application via a webhook and the build number. For each location in the list, the Run Director invokes an Argo Workflows template and hydrates it with the Run Data file at that location. By performing both actions simultaneously, individual test runs can be started faster so that all the tests in the run can finish more quickly.
The Argo Workflow then provisions a Kubernetes pod for each test run requested from the available warm nodes. It attaches the code for each test to a volume on a corresponding container on the pod. This approach allows us to use the same container build for every test execution. If there aren’t enough pods for the run on warm nodes, GKE uses cluster autoscaling to meet demand.
Each test runs in its own pod and container, which isolates the tests and makes it easier for the developers to troubleshoot them. Running tests like this also confines resource consumption issues to the node where the specific tests are having trouble.
The test code runs from the container entry point. Argo Workflow drives the provisioning process and starts each container with the help of Kubernetes
The application runs all the tests in headed browsers. This is important because the container is destroyed after the test finishes, and the headed browser makes it possible to capture videos of tests. The videos are an essential debugging tool to know about what happened at the moment, especially in cases where it’s difficult to recreate a particular failure.
Due to the high standard for test authorship and the infrastructure reliability, the primary cause of test failure is when the system-under-test (SUT) is not optimized for testing. It makes sense when you think about it. The slower the SUT, the more the test is required to poll, increasing the demand on the processor running the test. Though we can’t tell the customer how to build their application to improve test performance, we can isolate each test’s resource consumption to prevent it from impacting other tests.
The Flake Detection
We maintain a very high standard of test authorship, which allows us to make certain assumptions.
Since the tests are expected to pass, we can safely assume that a test failure or error is due to an anomaly, such as a temporarily unavailable SUT. The application schedules such failures for automatic retry. It flags any other failure – such as a suspected infrastructure problem – for investigation and doesn’t retry. Argo Workflows will attempt to re-run a failed test three times.
If the tests pass on retry, the application resumes as usual and assumes the failure was anomalous. In case all retries fail, the system creates a defect report, and a QA engineer investigates to confirm whether the failure is due to a bug in the application or some other issue.
The Run Shard Clusters
One of the most significant advantages of the Run Director service is the concept of Run Shard Clusters.
The sharding strategy allows us to spread the various test runs across clusters located worldwide. We have a GCP global VPC with a bunch of different subnets in different regions. This makes it possible to provision sharding clusters in different regions that can be accessed privately via the Run Director service.
Shard clusters provide several advantages, such as:
Replicated high availability - If one region goes down, not everything grinds to a halt.
Closer-to-home testing - The ability to run customer tests close to their home region results in a more accurate performance of their applications and systems.
Experimentation - We can experiment with different versions of our Argo Workflows implementation or different run engines without cutting overall traffic to the same version. This also allows us to experiment with cost-saving measures such as spot instances.
Reporting
Of course, our customers also want to see test results, so we needed to create a reliable system that allowed them to do so.
Once the test finishes running and retrying (if needed), the Argo Workflow template uploads any run artifacts saved by Playwright back to GCS using the build number. Some of this information will be aggregated and appear on our application’s dashboard. Other pieces of information from these artifacts are displayed at the test level, such as logs and run history.
On the infrastructure side, the Argo Workflow triggers Kubernetes to shut down the container and detach the volume, ensuring that the system doesn’t leave unnecessary resources running. This helps keep down operational costs.
Conclusion
Our unique approach was developed to meet customer needs for speed, availability, and reliability. We are one of the few companies running e2e tests at this scale, so we needed to discover how to create a system to support that through trial and error; therefore, we designed our system to also support fast iteration. Our cost-efficient, full parallel test execution is the backbone of our application and we see it delivering value for our customers on a daily basis.
If you’d like to learn more about QA Wolf’s test run infrastructure or how it can help you ship faster with fewer escapes, visit their website to schedule a demo.
Related Links
SPONSOR US
Get your product in front of more than 500,000 tech professionals.
Our newsletter puts your products and services directly in front of an audience that matters - hundreds of thousands of engineering leaders and senior engineers - who have influence over significant tech decisions and big purchases.
Space Fills Up Fast - Reserve Today
Ad spots typically sell out about 4 weeks in advance. To ensure your ad reaches this influential audience, reserve your space now by emailing hi@bytebytego.com.
Like Comment Restack © 2024 ByteByteGo
548 Market Street PMB 72296, San Francisco, CA 94104
Unsubscribe
by "ByteByteGo" <bytebytego@substack.com> - 12:29 - 30 Apr 2024 -
Join me on Thursday for troubleshooting applications and back end performance with APM 360
Hi MD,
It's Liam Hurrell, Manager of Customer Training at New Relic University, here. Are you ready to revolutionise your application monitoring and troubleshooting practices with our latest offering, New Relic APM 360? If so, you can register for the free online workshop that I'll be hosting on Thursday, 2nd May at 10 AM BST/11 AM CEST.
APM 360 allows you to get a unified view of critical telemetry data across your stack and development lifecycle. Prevent issues before they escalate and troubleshoot faster with integrated infrastructure monitoring, error user impact analysis, and distributed tracing. In this workshop, I’ll show you how New Relic APM 360 can help eliminate blind spots with guided workflows.
You can find the full agenda on the registration page here. While we recommend attending the hands-on workshop live, you can also register to receive the recording.
Hope to see you then,
Liam HurrellManager, Customer TrainingNew Relic
This email was sent to info@learn.odoo.com as a result of subscribing or providing consent to receive marketing communications from New Relic. You can tailor your email preferences at any time here.Privacy Policy © 2008-23 New Relic, Inc. All rights reserved
by "Liam Hurrell, New Relic" <emeamarketing@newrelic.com> - 05:02 - 30 Apr 2024 -
What motivates gen AI talent and keeps them in their jobs?
On Point
Gen AI users’ most wanted skills Brought to you by Liz Hilton Segel, chief client officer and managing partner, global industry practices, & Homayoun Hatami, managing partner, global client capabilities
•
The right skills. When it comes to crafting an effective gen AI talent strategy, organizations have focused mostly on increasing productivity. But to match the right talent to the right jobs, leaders should first know how gen AI is changing the way employees view their work experience, McKinsey senior partner Aaron De Smet and coauthors share. To that end, they recently surveyed about 12,800 workers across 16 sectors. One surprising finding? Heavy users and creators of gen AI overwhelmingly feel they need higher-level cognitive and social–emotional skills to do their jobs.
—Edited by Belinda Yu, editor, Atlanta
This email contains information about McKinsey's research, insights, services, or events. By opening our emails or clicking on links, you agree to our use of cookies and web tracking technology. For more information on how we use and protect your information, please review our privacy policy.
You received this newsletter because you subscribed to the Only McKinsey newsletter, formerly called On Point.
Copyright © 2024 | McKinsey & Company, 3 World Trade Center, 175 Greenwich Street, New York, NY 10007
by "Only McKinsey" <publishing@email.mckinsey.com> - 01:12 - 30 Apr 2024 -
Remote is #1 Global Employment Platform, freshly landed in Korea, new webinar alert, and more!
Remote is #1 Global Employment Platform, freshly landed in Korea, new webinar alert, and more!
Your monthly global update is here from Remote. Dive in to see the latest.Featured news
Results are in! 🚨
Remote is the best Global Employment Platform 🎉Remote has received the #1 spot in 28 different categories in G2's Spring Report including:
- Global Employment Platform
- Employer of Record
- Payroll provider in Europe
This is a testament to the hard work of our team and the love from our customers. 🌟 Discover why we are #1 in so many categories!
The #1 global HR platform Remote has landed in South Korea 🚀
We’re here to help you through the entire HR process so you can expand globally with confidence in your own language. Start your global journey with confidence!
Your one-stop shop for US expansion 👋
Expanding into the US market is a significant step for every ambitious business, but it comes with its share of challenges. We’ve combined everything you need for effortless expansion, including local payroll, health insurance benefits, state compliance, and PEO services — in one platform, with exclusive discounts.
Growing stateside has never been easier or more cost-effective.
Featured webinar
Mark your calendar!🎙
Webinar on May 14: Complexities in Global HiringDon’t miss out! Hear from a panel of global employment and tax experts from Deloitte, Bytez, and Remote.
Gain expert perspectives on international employment laws, recruiting strategies, and managing payroll across multiple countries.
New product feature
Contractor invoice management now on Remote Mobile App 📱
Introducing the ultimate on-the-go invoicing solution. Contractors can now enjoy invoice management on the Remote Mobile App, empowering them to create, set up recurring invoices, upload invoice PDFs, track invoice status, and more—all in one place. Download or update our app today and experience the freedom of mobile invoicing!
Download Remote Mobile: App Store | Google Play
Contractor Management Masterclass
Take Remote’s Contractor Management Masterclass to learn the fundamentals of Contractor Management and earn your certification badge to show your mastery.
Discover the best locations in the world to work remotely 🗺️
This is not your average list. We considered a wide range of factors to create a tool that works for everyone. You will find national capitals and major destinations here, but you will also discover several surprises.
Download the 2024 Remote Influencer Report
Our fourth annual Remote Influencer Report is the ultimate guide to the most inspiring and influential thought leaders in the world of remote work, packed full of free resources and expert advice.
📍 See you at HR Leaders Forum, Sydney on 30th April - 1st May 🇦🇺
We're thrilled to be part of the HR Leaders Forum in Sydney! Swing by our booth to discover how our global HR platform can revolutionize your organization's processes. Don't forget to catch our speaking session by Paula Dieli, VP of EOR Operations at Remote on day two, 1st May at 11.20am.
📍 Free Happy Hour and Networking event at Unleash,
Las Vegas on 8 May 🇺🇸Traveling to Vegas for Unleash next week? Join Leapsome, Remote, and your fellow HR professionals for an exclusive After Party on May 8th.
Remote is the global HR platform you deserve
Onboard, pay, and manage employees and contractors around the world with Remote. You focus on finding the best hires — we'll handle the rest.
You received this email because you are subscribed to News & Offers from Remote Europe Holding B.V
Update your email preferences to choose the types of emails you receive.
Unsubscribe from all future emailsRemote Europe Holding B.V
Copyright © 2024 Remote Europe Holding B.V All rights reserved.
Kraijenhoffstraat 137A 1018RG Amsterdam The Netherlands
by "Remote" <hello@remote-comms.com> - 12:02 - 30 Apr 2024 -
Are you sure? A leader’s guide to strategizing during uncertainty
Almost perfect Brought to you by Liz Hilton Segel, chief client officer and managing partner, global industry practices, & Homayoun Hatami, managing partner, global client capabilities
The ancient Stoic philosophers believed that it was essential to be prepared for downturns—whether contemplating war, shipwrecks, torture, or exile, the “premeditation of the evils and troubles that might lie ahead” was a way to manage life’s inevitable disasters. Modern leaders may be unlikely to endure shipwrecks, but they are beset with uncertainty of all kinds, from geopolitical upheavals to global supply chain shocks. Operating in a near-permanent state of uncertainty may require its acceptance in ways that traditional strategists may not be prepared for. This week, we explore this concept in more detail.
“Imperfectionism sounds like a bad thing, but what we mean is accepting the ambiguity of not having perfect knowledge before making strategic moves,” says investor and McKinsey alumnus Charles Conn in an episode of our Inside the Strategy Room podcast. In a world marked by disruptions, conventional approaches to strategy can “yield either incomplete or misleading results,” he says; instead, dynamic, real-time, and nonlinear actions have a better chance of success. Adopting a few different mindsets can help leaders go with the flow rather than wait for certainty. For example, an “Ever Curious” mindset “starts with an audacious question or a long-term vision” that leaders should encourage their teams to cultivate, Conn suggests. “Your people have fantastic ideas, but there’s nothing more boring than being told to stay in your lane. You need to change incentive structures so that people aren’t penalized for working on the side.”
That’s the percentage of respondents to our March 2024 global economic conditions survey who view geopolitical instability and conflict as the top threat to economic growth. With more than 60 countries holding national elections this year, concerns about political leadership transitions have displaced worries about inflation and supply chain disruptions. Nevertheless, respondents are more hopeful about the global economic outlook than they were in December 2023. The mixed economic picture—for example, unemployment is inching up in some countries—means that leaders may need to be more attuned to uncertainty in the coming months.
The COVID-19 pandemic led to widespread operational disruptions and a heightened sense of uncertainty about the future. “Our normal style of operating tends to lock in assumptions,” says McKinsey senior partner Patrick Finn in an episode of our Inside the Strategy Room podcast. “It is challenging for executives to admit that their direction of travel is wrong because the underlying information has changed.” When faced with information instability, leaders need to recognize that “the solutions to the crisis may have to be invented, not only implemented,” adds partner Mihir Mysore. “You have to go all the way to the basic tenets of the problem you are trying to solve. Many companies are not set up to do that.” Strategies may need to be flexible and include constant testing of assumptions. “In extreme uncertainty, there is no such thing as a forecast; there are only scenarios,” says Finn.
Do you generally expect the worst? You may be on to something. While a positive outlook often contributes to success, thinking negatively does not necessarily lead to defeat—and may help you deal better with uncertainty, according to research. For example, a business leader may deliberately encourage negative thinking among employees, asking them to imagine what could or will go wrong to prevent them from being blindsided during tough times or to manage stakeholder expectations. And high-stakes uncertainty—where you’re waiting for critical decisions or outcomes—can help you build business resilience as well as the strength to face geopolitical tensions.
Lead by managing uncertainty.
– Edited by Rama Ramaswami, senior editor, New York
Share these insights
Did you enjoy this newsletter? Forward it to colleagues and friends so they can subscribe too. Was this issue forwarded to you? Sign up for it and sample our 40+ other free email subscriptions here.
This email contains information about McKinsey’s research, insights, services, or events. By opening our emails or clicking on links, you agree to our use of cookies and web tracking technology. For more information on how we use and protect your information, please review our privacy policy.
You received this email because you subscribed to the Leading Off newsletter.
Copyright © 2024 | McKinsey & Company, 3 World Trade Center, 175 Greenwich Street, New York, NY 10007
by "McKinsey Leading Off" <publishing@email.mckinsey.com> - 04:42 - 29 Apr 2024 -
What are executives’ top concerns about the world economy?
On Point
How consumer confidence is trending Brought to you by Liz Hilton Segel, chief client officer and managing partner, global industry practices, & Homayoun Hatami, managing partner, global client capabilities
•
Economic resilience. Despite uncertainty fueled by geopolitical and economic concerns, consumption among surveyed economies is holding up strongly, except in the eurozone. This is encouraging, given that consumer confidence globally looks somewhat downbeat, with the OECD indicator trending below the long-term average. The latest McKinsey Global Survey on economic conditions, however, paints a rosier picture. Views of the global economy are the most positive they’ve been since March 2022, McKinsey Global Institute chair Sven Smit and coauthors find.
•
Top concerns. In a year packed with national elections, executives are concerned about political uncertainty. They increasingly view transitions of political leadership as a primary hazard to the global economy, particularly in Asia–Pacific, Europe, and North America. They also regard policy and regulatory changes as a top threat to their companies’ performance, and they offer more muted optimism than in December about their companies’ prospects. Read the latest McKinsey Global Economics Intelligence executive summary for key economic trends and risks.
—Edited by Belinda Yu, editor, Atlanta
This email contains information about McKinsey's research, insights, services, or events. By opening our emails or clicking on links, you agree to our use of cookies and web tracking technology. For more information on how we use and protect your information, please review our privacy policy.
You received this newsletter because you subscribed to the Only McKinsey newsletter, formerly called On Point.
Copyright © 2024 | McKinsey & Company, 3 World Trade Center, 175 Greenwich Street, New York, NY 10007
by "Only McKinsey" <publishing@email.mckinsey.com> - 01:39 - 29 Apr 2024 -
The week in charts
The Week in Charts
Nursing retention, agtech investments, and more Share these insights
Did you enjoy this newsletter? Forward it to colleagues and friends so they can subscribe too. Was this issue forwarded to you? Sign up for it and sample our 40+ other free email subscriptions here.
This email contains information about McKinsey's research, insights, services, or events. By opening our emails or clicking on links, you agree to our use of cookies and web tracking technology. For more information on how we use and protect your information, please review our privacy policy.
You received this email because you subscribed to The Week in Charts newsletter.
Copyright © 2024 | McKinsey & Company, 3 World Trade Center, 175 Greenwich Street, New York, NY 10007
by "McKinsey Week in Charts" <publishing@email.mckinsey.com> - 03:32 - 27 Apr 2024 -
EP109: Top 6 Tools to Turn Code into Beautiful Diagrams
EP109: Top 6 Tools to Turn Code into Beautiful Diagrams
This week’s system design refresher: Top 9 Must-Read Blogs for Engineers Top 6 Tools to Turn Code into Beautiful Diagrams What is DevSecOps? Top 5 Trade-offs in System Designs Top 8 Cache Eviction Strategies SPONSOR US POST/CON 24 | April 30 - May 1 (Sold Out)(Sponsored)͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ Forwarded this email? Subscribe here for moreThis week’s system design refresher:
Top 9 Must-Read Blogs for Engineers
Top 6 Tools to Turn Code into Beautiful Diagrams
What is DevSecOps?
Top 5 Trade-offs in System Designs
Top 8 Cache Eviction Strategies
SPONSOR US
POST/CON 24 | April 30 - May 1 (Sold Out)(Sponsored)
POST/CON 24 is sold out, but you can still join the waitlist here for a complimentary ticket and be the first to know if tickets become available!
You’ll hear keynote speakers, see demos of new Postman features, watch an incredible panel discussion on AI—and so much more. If you can’t join us in person, mark your calendar and save this link to check out the livestream of it all on May 1, from 9 am - 11:30 am PDT.
Top 9 Must-Read Blogs for Engineers
Top 6 Tools to Turn Code into Beautiful Diagrams
Diagrams
Go Diagrams
Mermaid
PlantUML
ASCII diagrams
Markmap
Over to you: Did we miss anything? What's your favorite?
Latest articles
If you’re not a paid subscriber, here’s what you missed.
To receive all the full articles and support ByteByteGo, consider subscribing:
What is DevSecOps?
DevSecOps emerged as a natural evolution of DevOps practices with a focus on integrating security into the software development and deployment process. The term "DevSecOps" represents the convergence of Development (Dev), Security (Sec), and Operations (Ops) practices, emphasizing the importance of security throughout the software development lifecycle.
The diagram below shows the important concepts in DevSecOps.Automated Security Checks
Continuous Monitoring
CI/CD Automation
Infrastructure as Code (IaC)
Container Security
Secret Management
Threat Modeling
Quality Assurance (QA) Integration
Collaboration and Communication
Vulnerability Management
Top 5 Trade-offs in System Designs
Everything is a trade-off.
Everything is a compromise.
There is no right or wrong design.
The diagram below shows some of the most important trade-offs.Cost vs. Performance
Reliability vs. Scalability
Performance vs. Consistency
Security vs. Flexibility
Development Speed vs. Quality
Over to you: What trade-offs have you made in the past?
Top 8 Cache Eviction Strategies
LRU (Least Recently Used)
LRU eviction strategy removes the least recently accessed items first. This approach is based on the principle that items accessed recently are more likely to be accessed again in the near future.
MRU (Most Recently Used)
Contrary to LRU, the MRU algorithm removes the most recently used items first. This strategy can be useful in scenarios where the most recently accessed items are less likely to be accessed again soon.
SLRU (Segmented LRU)
SLRU divides the cache into two segments: a probationary segment and a protected segment. New items are initially placed into the probationary segment. If an item in the probationary segment is accessed again, it is promoted to the protected segment.
LFU (Least Frequently Used)
LFU algorithm evicts the items with the lowest access frequency.
FIFO (First In First Out)
FIFO is one of the simplest caching strategies, where the cache behaves in a queue-like manner, evicting the oldest items first, regardless of their access patterns or frequency.
TTL (Time-to-Live)
While not strictly an eviction algorithm, TTL is a strategy where each cache item is given a specific lifespan.
Two-Tiered Caching
In Two-Tiered Caching strategy, we use an in-memory cache for the first layer and a distributed cache for the second layer.
RR (Random Replacement)
Random Replacement algorithm randomly selects a cache item and evicts it to make space for new items. This method is also simple to implement and does not require tracking access patterns or frequencies.
SPONSOR US
Get your product in front of more than 500,000 tech professionals.
Our newsletter puts your products and services directly in front of an audience that matters - hundreds of thousands of engineering leaders and senior engineers - who have influence over significant tech decisions and big purchases.
Space Fills Up Fast - Reserve Today
Ad spots typically sell out about 4 weeks in advance. To ensure your ad reaches this influential audience, reserve your space now by emailing hi@bytebytego.com.
Like Comment Restack © 2024 ByteByteGo
548 Market Street PMB 72296, San Francisco, CA 94104
Unsubscribe
by "ByteByteGo" <bytebytego@substack.com> - 11:46 - 27 Apr 2024 -
Get out of your comfort zone
Readers & Leaders
And more ways to live up to your potential THIS MONTH’S PAGE-TURNERS ON BUSINESS AND BEYOND
Imagine taking a moment to peer into the future, five years from now, and discovering that your life has remained exactly as it is today. Would this unaltered state bring you genuine happiness and fulfillment?
In this edition of Readers & Leaders, get clear about what you want your future life to look like and discover practical tools that can help you get there. Bonnie Wan, a 30-plus-year career brand strategist, offers a creative practice to help you realign your path with your dreams—personally, professionally, culturally, and spiritually; Harvard University professor Cass Sunstein investigates why we stop noticing both the great and not-so-great things around us and how to “dishabituate” at the office, in the bedroom, at the store, on social media, and everywhere else; William Ury, expert negotiator and coauthor of the best-selling classic Getting to Yes, shares time-tested practices to constructively engage and transform conflict; and Andy Cohen and Diane Hoskins, global cochairs of architecture and design firm Gensler, shed light on how design affects our everyday lives.
Want early access to these interviews? Download the McKinsey Insights app to read the latest Author Talks now.it bears repeating
Only on the insights app
TURN BACK THE PAGE
Looking for more life advice? Revisit some of our most popular interviews on personal development.
1. The world’s longest study of adult development finds the key to happy living
2. How to conquer fear, prepare for death, and embrace your power
3. Melody Wilding on turning sensitivity into a superpower
4. Actor Terry Crews wants you to open upBUSINESS BESTSELLERS TOP
8
It’s time to treat yourself to a good read. Explore March’s business bestsellers, prepared exclusively for McKinsey by Circana. Check out the full selection on McKinsey on Books.
BUSINESS OVERALL
BUSINESS HARDCOVER
ECONOMICS
DECISION MAKING
ORGANIZATIONAL BEHAVIOR
WORKPLACE CULTURE
COMPUTERS & AI
SUSTAINABILITY
You Can’t Market Manure at Lunchtime: And Other Lessons from the Food Industry for Creating a More Sustainable Company by Maisie Ganzler (Two Rivers Distribution)
BOOKMARK THIS
The Journey of Leadership: How CEOs Learn to Lead From the Inside Out
The Journey of Leadership (Portfolio/Penguin Group), McKinsey’s next major book, will publish in the United States and the United Kingdom on September 10. The book, by Dana Maor, Kurt Strovink, Ramesh Srinivasan, and senior partner emeritus Hans-Werner Kaas, is the first-ever explanation of McKinsey’s step-by-step approach to transforming leaders both professionally and personally, including revealing lessons from its legendary CEO leadership program, “The Bower Forum,” which has counseled 500-plus global CEOs over the past decade. It is a journey that helps leaders hone the psychological, emotional, and, ultimately, human attributes that result in success in today’s most demanding top job.
If you’d like to propose a book or author for #McKAuthorTalks, please email us at Author_Talks@McKinsey.com. Due to the high volume of requests, we will respond only to those being considered.
—Edited by Eleni Kostopoulos, managing editor, New York
SHARE THESE INSIGHTS
Did you enjoy this newsletter? Forward it to colleagues and friends so they can subscribe too. Was this issue forwarded to you? Sign up for it and sample our 40+ other free email subscriptions here.
This email contains information about McKinsey's research, insights, services, or events. By opening our emails or clicking on links, you agree to our use of cookies and web tracking technology. For more information on how we use and protect your information, please review our privacy policy.
You received this email because you subscribed to the Readers & Leaders newsletter.
Copyright © 2024 | McKinsey & Company, 3 World Trade Center, 175 Greenwich Street, New York, NY 10007
by "McKinsey Readers & Leaders" <publishing@email.mckinsey.com> - 11:04 - 27 Apr 2024 -
Stay Ahead of Maintenance Needs: Explore Our Vehicle Health Monitoring Solution
Stay Ahead of Maintenance Needs: Explore Our Vehicle Health Monitoring Solution
Optimize Fleet Operations and Reduce Maintenance Costs.Impact of not having and having an ADAS on your Fleet Safety
Uffizio Technologies Pvt. Ltd., 4th Floor, Metropolis, Opp. S.T Workshop, Valsad, Gujarat, 396001, India
by "Sunny Thakur" <sunny.thakur@uffizio.com> - 08:00 - 26 Apr 2024 -
How can CFOs be at their best?
On Point
6 ways to raise your game Brought to you by Liz Hilton Segel, chief client officer and managing partner, global industry practices, & Homayoun Hatami, managing partner, global client capabilities
•
The CFO conundrum. CFOs know it’s their most important mission to create outsize value for their companies. That means establishing themselves as the CEO’s thought partner along with managing the finance function. But between dealing with challenges such as the COVID-19 pandemic and fulfilling other demands, including cash management and scenario planning, many finance leaders wonder how they can find the time to perform their daily duties and also be a thoughtful, tactical leader, McKinsey partner Matthew Maloney and coauthors share.
—Edited by Belinda Yu, editor, Atlanta
This email contains information about McKinsey's research, insights, services, or events. By opening our emails or clicking on links, you agree to our use of cookies and web tracking technology. For more information on how we use and protect your information, please review our privacy policy.
You received this newsletter because you subscribed to the Only McKinsey newsletter, formerly called On Point.
Copyright © 2024 | McKinsey & Company, 3 World Trade Center, 175 Greenwich Street, New York, NY 10007
by "Only McKinsey" <publishing@email.mckinsey.com> - 01:48 - 26 Apr 2024 -
You’re invited! Join us for a McKinsey Live webinar on productivity through tech investment
Register now New from McKinsey & Company
Productivity is the foundation of prosperity. We need productivity growth now, more than ever. Today, there is more at stake: new seemingly limitless opportunities from technologies like generative AI, but also the need to address rising inflation, fund the energy transition, and raise living standards as the population ages.
On Monday, May 20 at 10:30 a.m EDT / 4:30 p.m CET, join us for a McKinsey Live session where we explore the most important features of productivity growth across countries and sectors, why it slowed, and the critical role of investment in accelerating it.
Olivia White, a senior partner and director of the McKinsey Global Institute, will share insights from MGI’s latest research. She will be joined by Rodney Zemmel, a senior partner and global leader of McKinsey Digital, who will discuss how leaders can rewire their organizations to make the most of technology investments and contribute to a new wave of productivity growth.This email contains information about McKinsey's research, insights, services, or events. By opening our emails or clicking on links, you agree to our use of cookies and web tracking technology. For more information on how we use and protect your information, please review our privacy policy.
You received this email because you subscribed to our McKinsey Global Institute alert list.
Copyright © 2024 | McKinsey & Company, 3 World Trade Center, 175 Greenwich Street, New York, NY 10007
by "McKinsey Global Institute" <publishing@email.mckinsey.com> - 12:19 - 25 Apr 2024 -
What Happens When a SQL is Executed?
What Happens When a SQL is Executed?
Have you ever wondered how a simple SQL command unlocks database power? SQL, or Structured Query Language, is the backbone of modern data management. It allows efficient retrieval, manipulation, and management of information in relational databases.͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ Forwarded this email? Subscribe here for moreLatest articles
If you’re not a subscriber, here’s what you missed this month.
To receive all the full articles and support ByteByteGo, consider subscribing:
Have you ever wondered how a simple SQL command unlocks database power? SQL, or Structured Query Language, is the backbone of modern data management. It allows efficient retrieval, manipulation, and management of information in relational databases.
Behind every query we run, there’s a complex sequence of processes. These transform our commands into actions performed by the database management system (DBMS). Mastering these processes lets us harness their full potential.
As developers, it’s crucial to understand the journey of a SQL statement. This journey takes us through the SQL parser, query optimizer, execution engine, and underlying storage engine. With this insight, we can:
Enhance query performance by understanding how data is stored, indexed, and accessed.
Choose effective indexing strategies, informed by our understanding of the DBMS architecture.
Improve resource management, from memory allocation to caching to query execution parallelism.
Diagnose and address performance bottlenecks effectively by identifying potential areas of contention, resource constraints, or inefficiencies within the system.
Join us as we explore MySQL. We will use it to demonstrate database architectures and how queries are processed.
SQL Standards
SQL standards are developed and maintained by international standards organizations, such as the International Organization for Standardization (ISO) and the American National Standards Institute (ANSI). These standards are shaped with contributions from industry experts and database vendors. Their goal is to ensure interoperability, portability, and consistency across various SQL implementations. This enables developers to write SQL code that can run on multiple database platforms.
The diagram below presents a brief history of SQL standards.
There are 4 stages in SQL standard development. It began with the early versions – SQL-86, SQL-89, and SQL-92- – which formed the foundation and introduced major keywords. SQL:1999 is often recognized as a pivotal point in modern SQL development. However, later standards have continued to introduce valuable features and improvements. These address ongoing challenges in data management and adapt to technological advancements.
Each SQL standard builds on the work of its predecessors, contributing to the ongoing evolution and refinement of the language.
SQL Statement
A SQL statement is a command interpreted and executed by the SQL engine. It is primarily considered a declarative language, focusing on specifying what data should be retrieved or manipulated rather than how to retrieve it.
SQL abstracts the implementation details of data retrieval and manipulation. Developers use SQL's high-level syntax to express their data requirements without worrying about low-level details such as disk I/O operations, data access paths, or index usage.
The portability of SQL across different database platforms is one of its key advantages. SQL queries written for one system often require minimal or no modifications to run on another, as long as they adhere to the SQL standards.
Consider the following example of SQL syntax. The SELECT statement retrieves specific data from one or more tables. It’s paired with the FROM clause, which specifies the tables involved in the query.
To incorporate data from multiple tables, the JOIN clause establishes relationships between them based on shared keys or conditions. The WHERE clause allows for the filtering of rows based on specified conditions, further refining the result set.
For aggregating data based on common attributes, the GROUP BY clause is employed. This facilitates summary statistics or grouping operations. Finally, the ORDER BY clause sorts the query results in a specified order based on columns.
SQL Execution (Query)
Let’s explore the lifecycle of a SQL statement in a relational database, using MySQL as an example.
Broadly speaking, MySQL is structured into two main tiers: the Server tier and the Storage Engine tier.
Server Tier: This tier includes connectors, query caches, analyzers, optimizers, and executors. It covers most of the core service functions of MySQL, including all built-in functions such as date, time, mathematical, and cryptographic functions. All cross-storage-engine functionalities, like stored procedures, triggers, and views, are implemented at this layer.
Storage Engine Tier: This layer is responsible for data storage and retrieval. Its pluggable architecture allows for the use of various storage engines, such as InnoDB, MyISAM, and Memory. InnoDB is the most popular and has been the default storage engine since MySQL version 5.5.5.
When executing a “create table” command, MySQL defaults to using InnoDB. However, users can specify another engine by including options like engine=memory for an in-memory engine or other supported engines according to their needs. Each storage engine offers different methods of accessing table data and supports different features.
The diagram below illustrates the high-level architecture of MySQL. Let's walk through it step by step.
Continue reading this post for free, courtesy of Alex Xu.
A subscription gets you:
An extra deep dive on Thursdays Full archive Many expense it with team's learning budget Like Comment Restack © 2024 ByteByteGo
548 Market Street PMB 72296, San Francisco, CA 94104
Unsubscribe
by "ByteByteGo" <bytebytego@substack.com> - 11:36 - 25 Apr 2024 -
Better health outcomes for women could bring immense economic benefits
On Point
4 areas to focus on Brought to you by Liz Hilton Segel, chief client officer and managing partner, global industry practices, & Homayoun Hatami, managing partner, global client capabilities
•
Longer, not healthier. Contrary to popular belief, women’s health includes more than just reproductive health; it includes any condition that affects women “uniquely, differently, or disproportionately,” McKinsey Global Institute director Kweilin Ellingrud and McKinsey Health Institute affiliated leader Lucy Pérez share in an episode of The McKinsey Podcast. While women have a longer life expectancy than men, on average they spend 25% more of their lives in poor health than men do. The health gap is most prevalent during women’s prime working years, resulting in significant economic repercussions.
—Edited by Jana Zabkova, senior editor, New York
This email contains information about McKinsey's research, insights, services, or events. By opening our emails or clicking on links, you agree to our use of cookies and web tracking technology. For more information on how we use and protect your information, please review our privacy policy.
You received this newsletter because you subscribed to the Only McKinsey newsletter, formerly called On Point.
Copyright © 2024 | McKinsey & Company, 3 World Trade Center, 175 Greenwich Street, New York, NY 10007
by "Only McKinsey" <publishing@email.mckinsey.com> - 01:13 - 25 Apr 2024 -
How should European companies catch up on cloud efforts?
On Point
5 strategic cloud priorities Brought to you by Liz Hilton Segel, chief client officer and managing partner, global industry practices, & Homayoun Hatami, managing partner, global client capabilities
•
Untapped potential. A recent McKinsey survey found that 95% of European companies are capturing value from the cloud. By and large, however, European enterprises—like their counterparts in Asia and in North America—are struggling to attain the full value of cloud technologies. Many companies use cloud tools to improve IT, which tends to generate lower value. Enhancing business operations, however, could create significantly more value, McKinsey partner Bernhard Mühlreiter and coauthors reveal.
•
A critical opportunity. It is critical for European companies to accelerate their cloud ambitions. McKinsey estimates that up to $3 trillion is up for grabs for Forbes Global 2000 companies that go beyond cloud adoption and embrace innovation. The good news is that in Europe, the cloud is seen as pivotal for operational efficiency and overall transformation. In fact, according to our survey, more than 90% of European companies rank their cloud programs a priority. Explore five strategic priorities for European cloud journeys, and visit McKinsey Digital for more on creating value with technology.
—Edited by Belinda Yu, editor, Atlanta
This email contains information about McKinsey's research, insights, services, or events. By opening our emails or clicking on links, you agree to our use of cookies and web tracking technology. For more information on how we use and protect your information, please review our privacy policy.
You received this newsletter because you subscribed to the Only McKinsey newsletter, formerly called On Point.
Copyright © 2024 | McKinsey & Company, 3 World Trade Center, 175 Greenwich Street, New York, NY 10007
by "Only McKinsey" <publishing@email.mckinsey.com> - 01:13 - 24 Apr 2024 -
Advanced Driver Assistance System - Drive your Fleets Smarter, and Safer
Advanced Driver Assistance System - Drive your Fleets Smarter, and Safer
Your Gateway to Safer Roads and Higher Profits System (ADAS)Impact of not having and having an ADAS on your Fleet Safety
Uffizio Technologies Pvt. Ltd., 4th Floor, Metropolis, Opp. S.T Workshop, Valsad, Gujarat, 396001, India
by "Sunny Thakur" <sunny.thakur@uffizio.com> - 08:00 - 23 Apr 2024 -
Your employees are disengaged. Here’s how to turn it around.
Intersection
Get your briefing Employee disengagement is costing companies—to the tune of $90 million in lost productivity annually. Prioritizing six factors can help recapture some of that potential lost value, say McKinsey senior partner Aaron De Smet and coauthors. To learn more about how to keep your employees engaged, check out the latest edition of the Five Fifty.
Share these insights
Did you enjoy this newsletter? Forward it to colleagues and friends so they can subscribe too. Was this issue forwarded to you? Sign up for it and sample our 40+ other free email subscriptions here.
This email contains information about McKinsey’s research, insights, services, or events. By opening our emails or clicking on links, you agree to our use of cookies and web tracking technology. For more information on how we use and protect your information, please review our privacy policy.
You received this email because you subscribed to our McKinsey Quarterly Five Fifty alert list.
Copyright © 2024 | McKinsey & Company, 3 World Trade Center, 175 Greenwich Street, New York, NY 10007
by "McKinsey Quarterly Five Fifty" <publishing@email.mckinsey.com> - 04:40 - 23 Apr 2024 -
How Uber Built Real-Time Chat to Handle 3 Million Tickets Per Week?
How Uber Built Real-Time Chat to Handle 3 Million Tickets Per Week?
Integrate API users 50% faster (Sponsored) Creating a frictionless API experience for your partners and customers no longer requires an army of engineers. Speakeasy’s platform makes crafting type-safe, idiomatic SDKs for enterprise APIs easy. That means you can unlock API revenue while keeping your team focused on what matters most: shipping new products. Make SDK generation part of your API’s CI/CD and distribute libraries that users love at a fraction of the cost of maintaining them in-house.͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ Forwarded this email? Subscribe here for moreIntegrate API users 50% faster (Sponsored)
Creating a frictionless API experience for your partners and customers no longer requires an army of engineers. Speakeasy’s platform makes crafting type-safe, idiomatic SDKs for enterprise APIs easy. That means you can unlock API revenue while keeping your team focused on what matters most: shipping new products. Make SDK generation part of your API’s CI/CD and distribute libraries that users love at a fraction of the cost of maintaining them in-house.
Uber has a diverse customer base consisting of riders, drivers, eaters, couriers, and merchants.
Each user persona has different support requirements when reaching out to Uber’s customer agents through various live and non-live channels. Live channels are chat and phone while non-live is Uber’s inApp messaging channel.
For the users, the timely resolution of issues takes center stage. However, Uber’s main concern revolves around customer satisfaction and the cost of resolution of tickets. To keep costs in control, Uber needs to maintain a low CPC (cost-per-contact) with a good customer satisfaction rating.
Based on their analysis, they found that the live chat channel offers the most value when compared to other channels. It allows for:
Higher automation rate
Higher staffing efficiency since agents can handle multiple chats at the same time
High FCR (first contact resolution)
However, from 2019 to 2024, only 1% of all support interactions (also known as contacts) were served via the live chat channel because the chat infrastructure at Uber wasn’t capable of meeting the demand.
In this post, we look at how Uber built their real-time chat channel to work at the required scale.
The Legacy Chat Architecture
The legacy architecture for live chat at Uber was built using the WAMP protocol. WAMP or Web Application Messaging Protocol is a WebSocket subprotocol that is used to exchange messages between application components.
It was primarily used for message passing and PubSub over WebSockets to relay contact information to the agent’s machine.
The below diagram shows a high-level flow of the chat contact from being created to being routed to an agent on the front end.
This architecture had some core issues as follows:
1 - Reliability
Once the traffic scaled to 1.5X, the system started to face reliability issues.
Almost 46% of the events from the backend were not getting delivered to the Agent’s browser, adding to the customer’s wait time to speak to an agent. It also created delays for the agent resulting in wastage of bandwidth.
2 - Scale
Once the request per second crossed 10, the system’s performance deteriorated due to high memory usage and file descriptor leaks.
Also, it wasn’t possible to horizontally scale the system due to limitations with older versions of the WAMP library.
3 - Observability and Debugging
There were major issues related to observability and debugging:
It wasn’t possible to track the health of the chat contacts. This made it difficult to understand whether chat contacts were missing SLAs due to engineering or staffing issues.
Almost 8% of the chat volume wasn’t getting routed to any customer agent.
The WAMP protocol and its libraries were deprecated and didn’t provide sufficient insights into their inner workings. This made debugging for issues difficult.
4 - Stateful
The services in the architecture were stateful resulting in maintenance and restart complications.
This caused frequent spikes in message delivery time and losses.
Latest articles
If you’re not a paid subscriber, here’s what you missed.
To receive all the full articles and support ByteByteGo, consider subscribing:
Goals of the New Chat Architecture
Due to these challenges, Uber decided to build a new real-time chat infrastructure with the following goals:
Scale up the chat traffic from 1% to 80% of the overall contact volume by the end of 2023. This came to around 3 million tickets per week.
The process of connecting a customer to an agent after identification should have a greater than 99.5% success rate on the first trial.
Build end-to-end observability and debuggability over the entire chat flow.
Build stateless services that can be scaled horizontally.
The New Live Chat Architecture
It was important for the new architecture to be simple to improve transparency and scalability.
The team at Uber decided to go with the Push Pipeline. It was a simple WebSocket server that agent machines would connect to and be able to send and receive messages through one generic socket channel.
The below diagram shows the new architecture.
Below are the details of the various components:
Front End UI
This is used by the agents to interact with the customers.
Widgets and different actions are made available to the agents to take appropriate actions for the customer.
Contact Reservation
The router is the service that finds the most appropriate match between the agent and contact depending on the contact details.
An agent is selected based on the concurrency set of the agent’s profile such as the number of chats an agent can handle simultaneously. Other considerations include:
SLA-based routing: Chat contacts are prioritized based on SLA.
Sticky routing: Reopened contacts are sent to the same agents that handled the ticket earlier.
Priority routing: Prioritizing based on different rules
On finding the match, the contact is pushed into a reserved state for the agent.
Push Pipeline
When the contact is reserved for an agent, the information is published to Kafka and is received by the GQL Subscription Service.
On receiving the information through the socket via GraphQL subscriptions, the Front End loads the contact for the agent along with all the necessary widgets and actions.
Agent State
When the agent starts working, he/she goes online via a toggle on the Front End.
This updates the Agent State service, allowing the agent to be mapped to a contact.
GQL Subscription Service
The front-end team was already using GraphQL for HTTP calls to the services. Due to this familiarity, the team selected GraphQL subscriptions for pushing data from the server to the client.
The below diagram shows how GraphQL subscriptions work on a high level.
In GraphQL subscriptions, the client sends messages to the server via subscription requests. The server matches the queries and sends back messages to the client machines. In this case, the client machines are the agent machines.
Uber’s engineering team used GraphQL over WebSockets by leveraging the graphql-ws library. The library had almost 2.3 million weekly downloads and was also recommended by Apollo.
To improve the availability, they used a few techniques:
Enabled bidirectional ping pong messages to prevent hung sockets and disconnect them automatically in case of an unreliable connection.
Backed-off reconnects are automatically attempted after any disconnects. On successful reconnection, all the reserved or assigned contacts are fetched so the agent can receive them.
For a reserved contact, if the front end doesn’t send an acknowledgment to the chat service, they reserve the same contact for another available agent. The health of the WebSocket and http protocols are checked by sending a heartbeat over the GraphQL subscriptions.
Test Results from the New Chat Architecture
Uber performed functional and non-functional tests to ensure that both customers and agents received the best experience.
Some results from the tests were as follows:
Each machine was able to establish a 10K socket connection. They were able to horizontally scale and test for 20 times the number of events supported by the previous architecture.
Shadow traffic flows were tested with a capacity of 40K contacts and 2000 agents daily. The process didn’t reveal any problems and the latency and availability were satisfactory.
Existing traffic was directed through the new system with the old user interface for agents to serve as a reliability test. They were able to successfully manage the traffic while maintaining latency within the defined SLAs.
The earlier system had bugs where agents who finished work for the day remained online in the system if they closed their tabs. This resulted in increased customer wait times since the pipeline tried to push events to these offline agents. With the new system, they started to automatically log out agents based on recent acknowledgment misses.
The chat channel has been scaled to around 36% of the overall Uber Contact volume that is routed to agents. Over the next few months, the plan is to increase the share further.
The error rate of delivering the contact has gone down from 46% to just 0.45% in the new architecture.
The new architecture has fewer services, protocols, and better observability thereby improving the overall simplicity of the setup.
References:
SPONSOR US
Get your product in front of more than 500,000 tech professionals.
Our newsletter puts your products and services directly in front of an audience that matters - hundreds of thousands of engineering leaders and senior engineers - who have influence over significant tech decisions and big purchases.
Space Fills Up Fast - Reserve Today
Ad spots typically sell out about 4 weeks in advance. To ensure your ad reaches this influential audience, reserve your space now by emailing hi@bytebytego.com.
Like Comment Restack © 2024 ByteByteGo
548 Market Street PMB 72296, San Francisco, CA 94104
Unsubscribe
by "ByteByteGo" <bytebytego@substack.com> - 11:36 - 23 Apr 2024 -
Re: Audit Report Overview !
Please let me know, if you are interested in our services or not. A prompt reply will be highly appreciated.
Please provide some response even if you are not interested.
Thanks
From: Rose Williams
Sent: Monday, April 08, 2024 9:07 AM
To: info@learn.odoo.com
Subject: WEB Accessibility report !
I have found your website via Google.
I was going through your website www.learn.odoo.com & I personally see a lot of potential in your website & business.
With your permission, I would like to send you an Audit report of your website with Prices showing you a few things to greatly improve these search results for you.
These things are not difficult, and my report will be very specific. It will show you exactly what needs to be done to move you up in the rankings dramatically.
We can place your website on Google's 1st page.
May I send you a “Quote & Report?” If you are interested.
Thanks & Regards,
Rose Williams
Marketing Manager
by "Rose Williams" <rosewilliams091@outlook.com> - 08:51 - 23 Apr 2024