- Mailing Lists
- in
- A Brief History of Airbnb’s Architecture
Archives
- By thread 3808
-
By date
- June 2021 10
- July 2021 6
- August 2021 20
- September 2021 21
- October 2021 48
- November 2021 40
- December 2021 23
- January 2022 46
- February 2022 80
- March 2022 109
- April 2022 100
- May 2022 97
- June 2022 105
- July 2022 82
- August 2022 95
- September 2022 103
- October 2022 117
- November 2022 115
- December 2022 102
- January 2023 88
- February 2023 90
- March 2023 116
- April 2023 97
- May 2023 159
- June 2023 145
- July 2023 120
- August 2023 90
- September 2023 102
- October 2023 106
- November 2023 100
- December 2023 74
- January 2024 75
- February 2024 75
- March 2024 78
- April 2024 74
- May 2024 108
- June 2024 98
- July 2024 116
- August 2024 134
- September 2024 130
- October 2024 141
- November 2024 171
- December 2024 64
Join me on Thursday for Level up your Kubernetes Observability with New Relic
How could cities improve health for all?
A Brief History of Airbnb’s Architecture
A Brief History of Airbnb’s Architecture
Boost your AI projects with Brave (Sponsored)You probably know Brave as the privacy browser. But did you also know it has a search API that can power your AI projects? Introducing, the Brave Search API.
Ready to tinker? Try it for free for up to 2,000 queries per month. Brian Chesky and Joe Gabbia moved to San Francisco in 2007. While they were looking to raise money for their business ideas, they needed to pay their rent. Incidentally, there was a design conference coming to town at the time which meant lots of designers would be looking for accommodation. They came up with the idea of placing an air mattress in their living room and turning it into a bed and breakfast. In 2008, Nathan Blecharczyk joined Brian and Joe as CTO and Co-founder and they started a venture known as AirBed and Breakfast. This was the birth of Airbnb. Fast forward to today and Airbnb is operating in 200+ countries and 4 million hosts have welcomed over 1.5 billion guests across the globe. As Airbnb grew by leaps and bounds, their software architecture has also evolved to keep pace with the changing times. In this post, we will look at the evolution of Airbnb’s architecture over the years, the lessons they learned along the way and the tools they developed to support the vision. The Initial VersionAs with most startups, the first version of Airbnb’s web application was a monolithic application. It was built using Ruby-on-Rails and was internally known as the monorail. Monolith in Airbnb’s view was a single tier unit that was responsible for both client and server-side functionality. What does this mean in practice? It means that the model, view and controller layers are combined together in a single repository. The below diagram shows this approach. There were several advantages to this monolithic approach:
The problems started as Airbnb’s engineering team grew rapidly. Airbnb was doubling year over year which meant more and more developers were adding new code as well as changing existing code within the monolithic application. Over time, the code base started to get more tightly coupled and ownership of data became unclear. For example, it was difficult to determine which tables were owned by which application functionality. Any developer could make changes to any part of the application and it became difficult to track and coordinate the changes. This situation led to multiple issues such as:
To alleviate these pain points, Airbnb embarked on a migration journey to move from monolithic to a Service-Oriented Architecture or SOA. SOA at AirbnbHow did Airbnb view Service-Oriented Architecture or SOA? For them, SOA is a network of loosely coupled services where clients make their requests to some sort of gateway and the gateway routes these requests to multiple services and databases. Adopting SOA allowed Airbnb to build and deploy services separately. Also, these services can be scaled independently and ownership becomes more clearly defined. However, building services is one thing. It’s also extremely important to design these services with a disciplined approach. Airbnb decided on some key principles to design these services with a disciplined approach:
In Airbnb’s view, these principles are extremely important and they help create a logical path that all engineers can follow in order to build a shared understanding of the service architecture. Airbnb’s Migration JourneyWith the above principles and goals, Airbnb began the migration journey from the monorail to a brand-new service-oriented approach. It was a long migration process that went through multiple iterations along the way. Version 1In this version, every request went through the monorail. The monorail was basically responsible for the presentation view, business logic and also, the data access. You can consider it as the initial state of Airbnb’s architecture. Version 2The next version was a hybrid where the monorail coexisted with the service-oriented architecture. The main difference was that the monorail only handled the routing and view layer. Its job was to send incoming API traffic to the network of new services that were responsible for the business logic, data model and access. What kind of services are we talking about over here? Airbnb classified their services into four different types as shown in the below diagram. Here are the details of the various types of services:
With these service definitions, Airbnb started by building the data service layer. For example, they started with the home data service that acted as the foundational layer of the Airbnb business. The current monorail setup accessed data from the table using the Active Record data access library in Rails. They intercepted the incoming requests at the Active Record level and instead of routing to the database, they sent these requests to the new homes data service. The homes data service was then responsible for routing to the data store. The below diagram shows this approach. After creating the core data services, Airbnb also migrated the core business logic as well to the SOA approach. For example, migrating services like the pricing derived data service that needed some information about a home from the homes data service as well as other stores such as offline price and trends. The next step involved migrating presentation services such as the checkout presentation service that depends on pricing information and homes information from the derived data service and the core data service. All of these changes were part of Version 2 where both the monorail and the new services co-existed within the same request cycle. Version 3In this version of the migration journey, the monorail was completely eliminated. The client makes a request to the API gateway, which acts as a service layer responsible for middleware and routing. The gateway populates the request context and routes the requests to the SOA network where the various services are responsible for the presentation logic and data access logic. The web client is handled a little differently. There’s a specific service to handle the web requests. Why is it needed? This service returns the HTML to the web client by calling the API Gateway and populating the response received in the required format. The API Gateway takes care of all the middleware functions and propagates the request through the SOA network. The below diagram tries to show this scenario: Migrating Reads and WritesBy now, you may have realized that going from a monolithic architecture to a service-oriented one is not an overnight process. For Airbnb, a lot of time was spent in the middle phase where both the monorail and the new services had to be supported as first-class citizens. A request could go through the monorail or through the services. This means that it was a critical requirement to ensure that functionality didn’t break apart for both the routes and the responses were equal. To support this, Airbnb built comparison frameworks for reads as well as writes. ReadsThe first use of these comparison frameworks was in read operations because reads are idempotent. You can issue multiple read requests and get the same response. The idea was to issue dual reads and compare the response from read path A that went through the monorail with the response from read path B that went through the new services. The captured responses were then emitted as standard events that were consumed and sent to an offline comparison framework. The comparison framework was placed behind an admin tool so that the traffic can be controlled without the need for code changes and deployments. Once the responses are adjusted, Airbnb engineers could slowly ramp up traffic through the service path and monitor the comparison for differences. Once the comparison looks clean, all read requests are moved to the new service. WritesFor writes, things had to be done differently because one cannot dual write to the same database. Instead, a shadow database was utilized. Let’s say the monorail is making a call to a presentation service that’s hitting the production database. This is write path A. Now, a middle-tier service is introduced to offload some validations from the presentation service. Initially, this middle-tier service will write to a shadow database instead of the main production database. At this point, it becomes easy to issue strongly consistent read requests to both the production and shadow database and compare the results. Once the comparison is clean, we can move the writes via the new service and to the production database. Pros and Cons of SOAAs Airbnb went through the migration journey from the monorail to a SOA-based architecture, some pros and cons started to become obvious. Some pros were as follows:
However, there were some cons as well:
Tools and Techniques to Support SOAAs we saw from the previous section, the migration to SOA unlocked multiple challenges for the Airbnb engineering team. For example, single requests now fan out to multiple services and increase the chances of failure. Also, separating the data model in multiple databases is good for service-level consistency but it makes transactionality more difficult to enforce. Service orchestration also became more complex over time. With hundreds of engineers building services, Airbnb needed many more EC2 instances. Ultimately, this warranted a move towards using Kubernetes. To make it easy for the engineering team to build services, the infrastructure team at Airbnb created a lot of building blocks along the journey. API FrameworkAirbnb created an in-house API framework built using the Thrift language. This framework is used by all Airbnb services to define clean APIs that can talk to each other. For example, let’s say Service A wants to talk to Service B. The Service B engineer only has to define the endpoint in simple Thrift language and the framework will auto-generate the endpoint logic to handle common stuff such as schema validations, observability metrics and so on. Also, it creates a multi-threaded RPC client that Service A can use to talk to Service B. The client handles various functionalities such as retry logic, error propagation and transport. What’s the advantage of this? Engineers can focus on handling the core business logic and not spend any time worrying about the details of inter-service communication. To boost developer productivity, the Airbnb infra team also developed the API Explorer where engineers can browse different services, figure out which endpoints to call and even use an API playground to figure out how to call those endpoints. Automated Canary Analysis with SpinnakerAirbnb also leverages Spinnaker which is an open-source continuous delivery platform. Spinnaker is used for application management and deployment across various cloud platforms. It supports all major platforms such as AWS, Azure, Kubernetes and so on, making it extremely easy to spin up new deployment environments. You can create pipelines in Spinnaker representing a particular delivery process that starts with the creation of a build artifact and goes all the way to deploying the artifact in an environment. With Spinnaker, Airbnb was able to easily set up the environments for performing automated canary analysis. Basically, they deploy both the old and new snapshots to two temporary environments followed by routing a small percentage of traffic to both of them. Based on the traffic analysis and error rates, an aggregate score is generated for the canary environment that helps decide whether to fail or promote the canary to the next stage in the deployment process. PowergridAirbnb also built an in-house library called Powergrid that makes running tasks in parallel easy. With Powergrid, they were able to organize the code execution as a Directed Acyclic Graph (DAG). Each node of this DAG is a function or a task. Using this, the Airbnb engineers can model each service endpoint as a data flow with requests as the input and response as the output. Since Powergrid supports multithreading and concurrency, it can be used to run tasks in parallel. The below diagram shows the concept of Powergrid. For example, consider that the host wants to send a special offer to the guest. However, the process has multiple checks and validations to be performed before this can be done. With Powergrid, these validations can be performed by the respective services in parallel. After aggregating the responses, the special offer can be sent to the guest. Simplifying Service DependenciesOnce Airbnb started down the path of SOA, there was no turning back. However, initially there was a lack of service governance and dependency management that led to a complicated service interaction graph. It’s always a risky situation in a service-oriented approach where the call graph becomes extremely complex.
To handle this situation, Airbnb decided to simplify the service dependencies using the concept of Service Blocks. Basically, you can think of each block as a collection of services related to a particular business functionality. For example, the Listing Block encapsulates both the data and business logic related to the core listing attributes. Similarly, you can have other blocks such as the User Block and Availability Block. The Block can then expose a nice and clean facade with consistent read and write endpoints to upstream clients. Under the hood, the facade orchestrates the coordination between data and business logic services as needed. Also, strict checks are implemented to prevent direct calls to any internal services within the block. This approach greatly reduces the complexity of a service-oriented call graph. ConclusionTo conclude things, Airbnb had several important lessons during the whole journey of migration from the monorail to SOA. Here are a few important ones to takeaway:
References:
© 2024 ByteByteGo |
by "ByteByteGo" <bytebytego@substack.com> - 11:39 - 20 Feb 2024