

Discover more from EcZachly Data Engineering Newsletter
How to pass the data architecture interviews in big tech
Trading off complexity, latency and correctness
The architecture interview is often what stands in the way of you and getting a fancy senior+ data engineering role in big tech. The part of this interview that people fear the most is its open-ended nature and the depth you need to have in order to not sound like an idiot. In this newsletter, I’ll be covering data architecture in depth and how to not sound like an idiot and ace these interviews with a little bit more ease!
What does the data architecture interview look like?
If you’ve made it to the data architecture interview, congratulations! This interview round is usually one of the very last rounds before you get an offer.
This interview most often consists of a 60-90 minute discussion (possibly including some whiteboarding) about the tradeoffs of various technical decisions and how you could overcome or compensate for these tradeoffs.
What is tested in the data architecture interview
The key thing to remember about these interviews is tradeoffs!
A part of this will be considering the differences between Lambda and Kappa data architectures.
Some key points I want to talk about here:
Lambda (used by companies like Meta and Airbnb)
More complex since you have a “speed layer” and a “batch layer”
More likely to be correct because it can “true up” the speed layer. The batch layer is more trustworthy
Picks up low latency and correctness at the expense of complexity
Kappa (used by companies like Uber)
Simpler since it’s a “streaming-only” paradigm
Has a harder time with data quality alerts
Picks up low latency and simplicity at the expense of correctness
The serving layer
This is often a low-latency store like Redis, Memcached, or Druid. Or it could be a NoSQL store like Cassandra or MongoDB.
It can be a higher-latency store like Iceberg too though
A key thing to remember here for the serving layer is picking the right sized database, Redis for example can’t store huge data!
Knowing where you can insert data quality checks
Data contract patterns like signal table vs. write-audit-publish
How to test streaming pipelines for errors
Options (in order from most taxing to least):
Fail the job on egregious errors to troubleshoot
Output error rows to a separate Kafka topic
Ignore the error rows with filter conditions
Knowing the tradeoffs of different database choices
Alex’s book and newsletter on the System Design interview will serve you well here and with a lot of these other topics
CAP Theorem is very important when determining which database
You have consistent and available databases like Postgres, which have a hard time scaling horizontally
You have available and partition-tolerant databases like Cassandra which are “eventually consistent” which means they will return a consistent result given enough time
You have consistent and partition-tolerant databases like HBase and MongoDB which will give you consistent results but sacrifice the availability of data in the event of a failure!
Example: Airbnb guest viewing counter problem
When I was interviewing for Airbnb, I was asked the following question in my data architecture interview, “How would you design a counter on a web page that would count the number of guests that have viewed a listing in a given day?”
Your initial instinct as a data engineer is to “log the event data and update with some sort of batch process.”
But that’s not the point of these interviews! The point is to engage with the interviewer and delve deeper into what the actual business requirements are to find an architecture that would work best.
Asking good follow-up questions is really important in these interviews. My immediate set of questions were:
How quickly does this data need to be updated?
Asking latency-based questions will be critical. Remembering in this case there are actually a few latencies to consider
The latency between client and server
This should always be kept low since slow server response times destroy revenue for companies
In other words, you should probably not keep this data in a data lake but in a low-latency store like Redis or Memcached
The latency between the correct data generation and what the client sees
Intuitively, this latency is caused by the delay in the schedule of the logged data. If the count isn’t materialized and aggregated instead, it will increase the latency above. Materialized counts will have a latency in displaying the correct number and what they currently display.
How accurate does this data need to be?
Accuracy and latency often trade-off in data architectures. Asking about accuracy expectations will show that you know there’s no such thing as zero-latency, one-hundred-percent accurate data.
The interviewer responded with, “It needs to be at least 90% accurate all the time”
After understanding the latency and correctness requirements of this problem. My initial thought was a diagram that looked like this:
Every time a guest asks for a listing page, it triggers the database to update a counter for that listing which can be served to the user as well!
The interviewer wasn’t happy with this design because he said that the correctness of the number of guests would slowly drift as guests refreshed the page.
I fumbled a bit with this pushback. Initially, I was thinking of a trigger that updated a unique array of guests. But this wouldn’t be performant on the app side.
Then I remembered we don’t have to do everything in the “real-time” path and we could solve this problem with a batch job.
I added a small piece where the requests get logged to Kafka and then there’s an hourly process that updates the counters with their correct, deduped value.
This was something that really pleased the interviewer. I came up with this design in about 20-25 minutes and we spent the next 30 minutes talking about Airbnb and what the role looked like!
Data architecture interviews can be much more complicated than this example. I’ve failed a few architecture interviews with Clubhouse and Robinhood because they were asking for something very specific that I couldn’t figure out! But that’s for another newsletter!
Conclusion
If you enjoyed this newsletter, please share it with your friends! Did I miss anything else that has helped you pass these interviews?
We’ll be launching the next iteration of the EcZachly boot camp on November 6th and sign-ups will start this week so get ready for another email for those!
People who buy the self-paced course here will get a large discount and first priority for the November 6th boot camp seats!