EP135: Big Data Pipeline Cheatsheet for AWS, Azure, and Google Cloud

EP135: Big Data Pipeline Cheatsheet for AWS, Azure, and Google Cloud

This week’s system design interview:
͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­͏     ­
Forwarded this email? Subscribe here for more

The Enterprise Ready Conference for engineering leaders (Sponsored)

The Enterprise Ready Conference is a one-day event in SF, bringing together product and engineering leaders shaping the future of enterprise SaaS.

The event features a curated list of speakers with direct experience building for the enterprise, including OpenAI, Vanta, Checkr, Dropbox, and Canva.

Topics include advanced identity management, compliance, encryption, and logging — essential yet complex features that most enterprise customers require.

If you are a founder, exec, PM, or engineer tasked with the enterprise roadmap, this conference is for you. You’ll get detailed insights from industry leaders that have years of experience navigating the same challenges you face today. And best of all, it’s completely free since it’s hosted by WorkOS.

Request an invite


This week’s system design interview:

  • Big Data Pipeline Cheatsheet for AWS, Azure, and Google Cloud

  • A Cheatsheet on Comparing API Architectural Styles

  • 10 Key Data Structures We Use Every Day

  • A Cheatsheet to Build Secure APIs

  • SPONSOR US


Big Data Pipeline Cheatsheet for AWS, Azure, and Google Cloud

No alt text provided for this image

Each platform offers a comprehensive suite of services that cover the entire lifecycle:

  1. Ingestion: Collecting data from various sources

  2. Data Lake: Storing raw data

  3. Computation: Processing and analyzing data

  4. Data Warehouse: Storing structured data

  5. Presentation: Visualizing and reporting insights

AWS uses services like Kinesis for data streaming, S3 for storage, EMR for processing, RedShift for warehousing, and QuickSight for visualization.

Azure’s pipeline includes Event Hubs for ingestion, Data Lake Store for storage, Databricks for processing, Cosmos DB for warehousing, and Power BI for presentation.

GCP offers PubSub for data streaming, Cloud Storage for data lakes, DataProc and DataFlow for processing, BigQuery for warehousing, and Data Studio for visualization.

Over to you: What else would you add to the pipeline?


A Cheatsheet on Comparing API Architectural Styles

graphical user interface

It covers the 6 most popular API architectural styles:

  1. SOAP

  2. REST

  3. GraphQL

  4. gRPC

  5. WebSocket

  6. Webhook

Over to you: Which other architectural style have you used?


200+ hours of research on Advanced AI Tools for Technical Leaders & Professionals (Sponsored)

Imagine cutting hours of manual work, solving complex issues instantly, and having AI optimise your entire workflow—making you 10x more productive.

This FREE 3 hour Mini Crash Course on AI will help you save 16+ hours every week & put your 50% of your work on autopilot. 

Save your seat now (usually $399 but free for first 100 readers)

You will learn 20+ AI tools & use AI to:

  • Apply data-backed insights to drive faster, smarter decision-making

  • Streamline reporting, content generation, and operational processes in seconds

  • Detect performance bottlenecks in real time to keep systems running smoothly and efficiently.

  • Dynamically balance load in event-driven systems by routing traffic and predicting spikes, so you never need to adjust manually.

  • Accelerates big data processing by automating insights and flagging anomalies instantly for faster problem-solving, and much more!

By letting AI handle these critical tasks, you’ll save hours each week, unlock more time for strategic thinking & innovation and spend more time with family.

Register for the Crash Course Now


10 Key Data Structures We Use Every Day

No alternative text description for this image
  • list: keep your Twitter feeds

  • stack: support undo/redo of the word editor

  • queue: keep printer jobs, or send user actions in-game

  • hash table: cashing systems

  • Array: math operations

  • heap: task scheduling

  • tree: keep the HTML document, or for AI decision

  • suffix tree: for searching string in a document

  • graph: for tracking friendship, or path finding

  • r-tree: for finding the nearest neighbor

  • vertex buffer: for sending data to GPU for rendering

Over to you: Which additional data structures have we overlooked?


A Cheatsheet to Build Secure APIs

graphical user interface, application

An insecure API can compromise your entire application. Follow these strategies to mitigate the risk:

  1. Using HTTPS
    Encrypts data in transit and protects against man-in-the-middle attacks.
    This ensures that data hasn’t been tampered with during transmission.

  2. Rate Limiting and Throttling
    Rate limiting prevents DoS attacks by limiting requests from a single IP or user.
    The goal is to ensure fairness and prevent abuse.

  3. Validation of Inputs
    Defends against injection attacks and unexpected data format.
    Validate headers, inputs, and payload

  4. Authentication and Authorization
    Don’t use basic auth for authentication. Instead, use a standard authentication approach like JWTs
    Use a random key that is hard to guess as the JWT secret
    Make token expiration short
    For authorization, use OAuth

  5. Using Role-based Access Control
    RBAC simplifies access management for APIs and reduces the risk of unauthorized actions.
    Granular control over user permission based on roles.

  6. Monitoring
    Monitoring the APIs is the key to detecting issues and threats early.
    Use tools like Kibana, Cloudwatch, Datadog, and Slack for monitoring
    Don’t log sensitive data like credit card info, passwords, credentials, etc.

Over to you: What else would you do to build a secure API?


SPONSOR US

Get your product in front of more than 1,000,000 tech professionals.

Our newsletter puts your products and services directly in front of an audience that matters - hundreds of thousands of engineering leaders and senior engineers - who have influence over significant tech decisions and big purchases.

Space Fills Up Fast - Reserve Today

Ad spots typically sell out about 4 weeks in advance. To ensure your ad reaches this influential audience, reserve your space now by emailing sponsorship@bytebytego.com

 
Like
Comment
Restack
 

© 2024 ByteByteGo
548 Market Street PMB 72296, San Francisco, CA 94104
Unsubscribe

Get the appStart writing


by "ByteByteGo" <bytebytego@substack.com> - 11:36 - 26 Oct 2024