- Mailing Lists
- in
- Facebook’s Database Handling Billions of Messages (Cassandra Deep Dive)
Archives
- By thread 4453
-
By date
- June 2021 10
- July 2021 6
- August 2021 20
- September 2021 21
- October 2021 48
- November 2021 40
- December 2021 23
- January 2022 46
- February 2022 80
- March 2022 109
- April 2022 100
- May 2022 97
- June 2022 105
- July 2022 82
- August 2022 95
- September 2022 103
- October 2022 117
- November 2022 115
- December 2022 102
- January 2023 88
- February 2023 90
- March 2023 116
- April 2023 97
- May 2023 159
- June 2023 145
- July 2023 120
- August 2023 90
- September 2023 102
- October 2023 106
- November 2023 100
- December 2023 74
- January 2024 75
- February 2024 75
- March 2024 78
- April 2024 74
- May 2024 108
- June 2024 98
- July 2024 116
- August 2024 134
- September 2024 130
- October 2024 141
- November 2024 171
- December 2024 115
- January 2025 216
- February 2025 140
- March 2025 220
- April 2025 27
From chaos to confidence: accelerating API governance maturity
yes... I bought a $40M SaaS business for YOU!
Facebook’s Database Handling Billions of Messages (Cassandra Deep Dive)
Facebook’s Database Handling Billions of Messages (Cassandra Deep Dive)
Google’s 7 predictions on AI, LLM, and Observability (Sponsored)Read the 7 key takeaways from Google’s Director of AI and Datadog’s VP of Engineering as they break down their predictions of the future:
Disclaimer: The details in this post have been derived from Cassandra Research Paper and other sources. All credit for the technical details goes to the Facebook engineering team. The links to the original articles are present in the references section at the end of the post. We’ve attempted to analyze the details and provide our input about them. If you find any inaccuracies or omissions, please leave a comment, and we will do our best to fix them. Cassandra is a powerful database system designed to store and manage massive amounts of data across many computers. Facebook originally developed it to support a feature called Inbox Search, which allows users to quickly search through their messages. The goal was to support billions of messages sent by Facebook users every day. Storing and efficiently searching through such a massive amount of data is a big challenge. Traditional databases, like MySQL, struggled to handle this workload because they were not designed to scale easily. To solve this, Facebook engineers took inspiration from two existing technologies:
By combining the best parts of these two systems, Facebook created Cassandra, which became a decentralized, highly scalable, and fault-tolerant database. Later, it was released as open-source software, allowing companies like Netflix, Twitter, and Apple to use and improve it. In this article, we’ll take a deep dive into Cassandra and understand what makes it special. The Key Features of CassandraSome key features of Cassandra are as follows:
Setting targets for developer productivity metrics — March 24th (Sponsored)Setting targets for developer productivity metrics takes careful consideration: we need to think through the potential tradeoffs or incentives created (hello Goodhart's law!), whether the targets are realistic, and which goals are appropriate at different levels of the organization. Join Abi Noda and Laura Tacho, DX CEO and CTO, for a discussion about how to properly set targets for productivity metrics so you can continue to push for improvement and accountability from your own teams. Join this discussion to learn:
Cassandra’s Data ModelCassandra’s data model is quite different from traditional relational databases like MySQL. At its core, Cassandra’s data model is like a multi-dimensional map (or dictionary), where each piece of data is indexed by a row key. This means that instead of rigidly defining tables and columns in advance, data can be stored in a way that best suits the needs of the application. The data is organized into column families that are of two types:
Columns can be sorted by timestamp or name, depending on the application’s needs. Primary key lookup is the main way to retrieve data. Instead of running complex queries like in SQL databases, Cassandra retrieves data by directly accessing the row key. The structure of a column consists of the following parts:
Cassandra API OverviewCassandra follows a key-based lookup approach, meaning every operation revolves around the row key. Unlike relational databases that support complex queries (like JOINs or subqueries), Cassandra prioritizes speed and scalability by keeping its API lightweight. Therefore, Cassandra provides a simple API structure that allows applications to interact with the database using three main operations. 1 - Insert DataThe interface is insert(table, key, rowMutation). This command adds new data to Cassandra. The “table” is where the data will be stored and the “key” uniquely identifies the row. The rowMutation represents the changes made to the row, such as adding new columns or updating existing ones. 2 - Retrieve DataThe API interface is get(table, key, columnName). It fetches data from the database. The “table” specifies where to look and the “key” identifies which row to retrieve. The “columnName” specifies which part of the row is needed. 3 - Delete DataThe interface is delete(table, key, columnName). This command removes data from the database. It can delete an entire row or just a specific column within a row. Cassandra System ArchitectureCassandra is designed as a highly scalable and fault-tolerant distributed database. It does not rely on a single central server but instead follows a peer-to-peer model, where all nodes in the system are equal. Cassandra organizes its nodes (servers) in a ring structure. Each piece of data is assigned to a node using consistent hashing, which ensures even distribution across all nodes. When new nodes are added, Cassandra automatically rebalances the data without requiring a complete reorganization. See the diagram below that shows how consistent hashing works. There is no master node, meaning any node can handle read and write requests. Since all nodes are equal, there is no single point of failure. If a node fails, other nodes in the system can continue handling requests without disruption. Replication MechanismsCassandra ensures that data is copied across multiple nodes to prevent data loss and improve availability. Developers can choose between different replication strategies:
Gossip Protocols in CassandraCassandra uses a gossip protocol to allow nodes (servers) in the system to communicate with each other efficiently. This protocol is inspired by how rumors spread in real life. Instead of requiring a central system to keep track of everything, information is passed from one node to another in small, periodic updates. Gossip protocols are great because they have a low network overhead. Instead of flooding the system with updates, nodes exchange small bits of information at regular intervals. Even if some nodes go offline, others can still function because they share information across the network. Cassandra uses Scuttlebutt, a specialized Gossip Protocol, to keep track of which nodes are active or inactive. Each node periodically exchanges information about itself and other nodes with its neighbors, ensuring that the entire cluster remains up to date. Instead of a simple "up or down" status, Cassandra assigns a suspicion level to each node.
In other words, Cassandra’s failure detection is probabilistic, meaning it adapts to network conditions instead of rigid timeout rules. This helps prevent false alarms caused by temporary delays or slow responses. Query Execution in CassandraCassandra is designed to handle high-speed data writes and efficient reads while ensuring durability and fault tolerance. Instead of storing data like traditional relational databases, which write changes immediately to disk, Cassandra follows a log-structured storage model that optimizes speed and reliability. How Cassandra Handles Writes?Cassandra follows a multi-step process when writing data. The process consists of three main components:
This write process is efficient because, unlike traditional databases that modify data in place (causing random disk writes), Cassandra writes data sequentially, which is much faster and more efficient. Since SSTables are never modified, Cassandra avoids the overhead of complex locking mechanisms found in relational databases. Also, Cassandra can recover lost data if a node crashes because every write is first recorded in the Commit Log. How Cassandra Handles Reads?Unlike traditional databases that rely on complex indexing, Cassandra optimizes read performance using a combination of in-memory lookups and efficient disk scans. Here’s a step-by-step look at the read process:
Facebook Inbox Search Use CaseAs mentioned, Cassandra was originally developed at Facebook to solve the challenge of storing and searching billions of messages efficiently. Before Cassandra, Facebook used MySQL for storing these messages, but as the platform grew, MySQL struggled to handle the increasing volume of data and high query load. To address this, Facebook deployed Cassandra on a 150-node cluster, which stored over 50 terabytes (TB) of messages. The system needed to support fast and scalable searches while handling constant write operations as users sent and received messages. Facebook’s Inbox Search allows users to find messages using two types of queries:
One of the biggest challenges in Facebook’s messaging system was ensuring low-latency searches across a massive dataset. Cassandra’s highly optimized architecture allowed it to achieve impressive performance:
Join the NVIDIA GTC Event (Virtual GTC is Free!) [Sponsored]Join your fellow engineers at GTC25 in San Jose, California (March 17-21). This flagship event by NVIDIA is bringing you more than 1000 session, 400+ exhibits, technical hands-on training, and tons of unique networking events. ConclusionCassandra is a highly scalable, distributed database system designed to handle large volumes of data while ensuring fault tolerance and high availability. Its peer-to-peer architecture and ring-based design make it particularly well-suited for applications that require continuous uptime and seamless scaling across multiple data centers. One of Cassandra’s key strengths is its ability to handle high write-throughput efficiently, making it ideal for real-time applications, such as messaging platforms, recommendation systems, and IoT data storage. However, Cassandra is not a replacement for traditional relational databases. It is not optimized for complex queries, joins, or transactional consistency, which makes it less suitable for applications requiring strong relational integrity. For businesses and developers building large-scale, distributed systems, Cassandra provides a robust, flexible, and highly available solution that can grow with demand while maintaining performance and reliability. References: SPONSOR USGet your product in front of more than 1,000,000 tech professionals. Our newsletter puts your products and services directly in front of an audience that matters - hundreds of thousands of engineering leaders and senior engineers - who have influence over significant tech decisions and big purchases. Space Fills Up Fast - Reserve Today Ad spots typically sell out about 4 weeks in advance. To ensure your ad reaches this influential audience, reserve your space now by emailing sponsorship@bytebytego.com. © 2025 ByteByteGo |
by "ByteByteGo" <bytebytego@substack.com> - 11:43 - 11 Mar 2025