forked from Jeevan-kumar-Raj/Grokking-System-Design
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
8aa637f
commit c04c97a
Showing
51 changed files
with
722 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
[Grokking System Design Interview](https://www.educative.io/collection/5668639101419520/5649050225344512) | ||
==== | ||
Source: [educative](https://www.educative.io) | ||
|
||
## Interview Process | ||
- Scope the problem | ||
- Don’t make assumptions. | ||
- Ask clarifying questions to understand the constraints and use cases. | ||
- Steps | ||
- Requirements clarifications | ||
- System interface definition | ||
- Sketch up an abstract design | ||
- Building blocks of the system | ||
- Relationships between them | ||
- Steps | ||
- Back-of-the-envelope estimation | ||
- Defining data model | ||
- High-level design | ||
- Identify and address the bottlenecks | ||
- Use the fundamental principles of scalable system design | ||
- Steps | ||
- Detailed design | ||
- Identifying and resolving bottlenecks | ||
|
||
## Distributed System Design Basics | ||
- [Key Characterics](basics/key-characteristics.md) | ||
- [Load balancing](basics/load-balancing.md) | ||
- [Caching](basics/caching.md) | ||
- [Sharding](basics/sharding.md) | ||
- [Indexes](basics/indexes.md) | ||
- [Proxies](basics/proxies.md) | ||
- [Queues](basics/queues.md) | ||
- [Redundancy](basics/redundancy.md) | ||
- [SQL vs. NoSQL](basics/sql-vs-nosql.md) | ||
- [CAP Theorem](basics/cap-theorem.md) | ||
- [Consistent Hashing](basics/consistent-hashing.md) | ||
- [Client Server Communication](basics/client-server-communication.md) | ||
|
||
## System Designs | ||
- [Short URL Service](designs/short-url.md) | ||
- [Pastebin](designs/pastebin.md) | ||
- [Instagram](designs/instagram.md) | ||
- [Dropbox](designs/dropbox.md) | ||
- [Twitter](designs/twitter.md) | ||
- [Youtube](designs/youtube.md) | ||
- [Twitter Search](designs/twitter-search.md) | ||
- [Web Crawler](designs/web-crawler.md) | ||
- [Facebook Newsfeed](designs/facebook-newsfeed.md) | ||
- [Yelp](designs/yelp.md) | ||
- [Uber Backend](designs/uber-backend.md) | ||
- [Ticketmaster](designs/ticketmaster.md) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
* [Contents](README.md) | ||
|
||
* Basics | ||
- [Key Characterics](basics/key-characteristics.md) | ||
- [Loading balancing](basics/load-balancing.md) | ||
- [Caching](basics/caching.md) | ||
- [Sharding](basics/sharding.md) | ||
- [Indexes](basics/indexes.md) | ||
- [Proxies](basics/proxies.md) | ||
- [Queues](basics/queues.md) | ||
- [Redundancy](basics/redundancy.md) | ||
- [SQL vs. NoSQL](basics/sql-vs-nosql.md) | ||
- [CAP Theorem](basics/cap-theorem.md) | ||
- [Consistent Hashing](basics/consistent-hashing.md) | ||
- [Client Server Communication](basics/client-server-communication.md) | ||
|
||
* Designs | ||
- [Short URL Service](designs/short-url.md) | ||
- [Pastebin](designs/pastebin.md) | ||
- [Instagram](designs/instagram.md) | ||
- [Dropbox](designs/dropbox.md) | ||
- [Twitter](designs/twitter.md) | ||
- [Youtube](designs/youtube.md) | ||
- [Twitter Search](designs/twitter-search.md) | ||
- [Web Crawler](designs/web-crawler.md) | ||
- [Facebook Newsfeed](designs/facebook-newsfeed.md) | ||
- [Yelp](designs/yelp.md) | ||
- [Uber Backend](designs/uber-backend.md) | ||
- [Ticketmaster](designs/ticketmaster.md) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
Caching | ||
==== | ||
|
||
- Take advantage of the locality of reference principle: recently requested data is likely to be requested again. | ||
- Exist at all levels in architecture, but often found at the level nearest to the front end. | ||
|
||
## Application server cache | ||
- Cache placed on a request layer node. | ||
- When a request layer node is expanded to many nodes | ||
- Load balancer randomly distributes requests across the nodes. | ||
- The same request can go to different nodes. | ||
- Increase cache misses. | ||
- Solutions: | ||
- Global caches | ||
- Distributed caches | ||
|
||
## Distributed cache | ||
- Each request layer node owns part of the cached data. | ||
- Entire cache is divided up using a consistent hashing function. | ||
- Pro | ||
- Cache space can be increased easily by adding more nodes to the request pool. | ||
- Con | ||
- A missing node leads to cache lost. | ||
|
||
## Global cache | ||
- A server or file store that is faster than original store, and accessible by all request layer nodes. | ||
- Two common forms | ||
- Cache server handles cache miss. | ||
- Used by most applications. | ||
- Request nodes handle cache miss. | ||
- Have a large percentage of the hot data set in the cache. | ||
- An architecture where the files stored in the cache are static and shouldn’t be evicted. | ||
- The application logic understands the eviction strategy or hot spots better than the cache | ||
|
||
## Content distributed network (CDN) | ||
- For sites serving large amounts of static media. | ||
- Process | ||
- A request first asks the CDN for a piece of static media. | ||
- CDN serves that content if it has it locally available. | ||
- If content isn’t available, CDN will query back-end servers for the file, cache it locally and serve it to the requesting user. | ||
- If the system is not large enough for CDN, it can be built like this: | ||
- Serving static media off a separate subdomain using lightweight HTTP server (e.g. Nginx). | ||
- Cutover the DNS from this subdomain to a CDN later. | ||
|
||
## Cache invalidation | ||
- Keep cache coherent with the source of truth. Invalidate cache when source of truth has changed. | ||
- Write-through cache | ||
- Data is written into the cache and permanent storage at the same time. | ||
- Pro | ||
- Fast retrieval, complete data consistency, robust to system disruptions. | ||
- Con | ||
- Higher latency for write operations. | ||
- Write-around cache | ||
- Data is written to permanent storage, not cache. | ||
- Pro | ||
- Reduce the cache that is no used. | ||
- Con | ||
- Query for recently written data creates a cache miss and higher latency. | ||
- Write-back cache | ||
- Data is only written to cache. | ||
- Write to the permanent storage is done later on. | ||
- Pro | ||
- Low latency, high throughput for write-intensive applications. | ||
- Con | ||
- Risk of data loss in case of system disruptions. | ||
|
||
## Cache eviction policies | ||
- FIFO: first in first out | ||
- LIFO: last in first out | ||
- LRU: least recently used | ||
- MRU: most recently used | ||
- LFU: least frequently used | ||
- RR: random replacement |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
[CAP Theorem](https://en.wikipedia.org/wiki/CAP_theorem) | ||
==== | ||
|
||
- Consistency: every read receives the most recent write or an error. | ||
- Availability: every request receives a response that is not an error. | ||
- Partition tolerance: the system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes | ||
- CAP theorem implies that in the presence of a network partition, one has to choose between consistency and availability | ||
- CAP is frequently misunderstood as if one has to choose to abandon one of the three guarantees at all times. In fact, the choice is really between consistency and availability only when a network partition or failure happens; at all other times, no trade-off has to be made. | ||
- [ACID](https://en.wikipedia.org/wiki/ACID) databases choose consistency over availability. | ||
- [BASE](https://en.wikipedia.org/wiki/Eventual_consistency) systems choose availability over consistency. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
Client-Server Communication | ||
==== | ||
|
||
## Standard HTTP Web Request | ||
1. Client opens a connection and requests data from server. | ||
2. Server calculates the response. | ||
3. Server sends the response back to the client on the opened request. | ||
|
||
## Ajax Polling | ||
The client repeatedly polls (or requests) a server for data, and waits for the server to respond with data. If no data is available, an empty response is returned. | ||
|
||
1. Client opens a connection and requests data from the server using regular HTTP. | ||
2. The requested webpage sends requests to the server at regular intervals (e.g., 0.5 seconds). | ||
3. The server calculates the response and sends it back, like regular HTTP traffic. | ||
4. Client repeats the above three steps periodically to get updates from the server. | ||
|
||
Problems | ||
- Client has to keep asking the server for any new data. | ||
- A lot of responses are empty, creating HTTP overhead. | ||
|
||
## HTTP Long-Polling | ||
The client requests information from the server exactly as in normal polling, but with the expectation that the server may not respond immediately. | ||
|
||
1. The client makes an initial request using regular HTTP and then waits for a response. | ||
2. The server delays its response until an update is available, or until a timeout has occurred. | ||
3. When an update is available, the server sends a full response to the client. | ||
4. The client typically sends a new long-poll request, either immediately upon receiving a response or after a pause to allow an acceptable latency period. | ||
|
||
Each Long-Poll request has a timeout. The client has to reconnect periodically after the connection is closed, due to timeouts. | ||
|
||
## WebSockets | ||
- A persistent full duplex communication channels over a single TCP connection. Both server and client can send data at any time. | ||
- A connection is established through WebSocket handshake. | ||
- Low communication overhead. | ||
- Real-time data transfer. | ||
|
||
## Server-Sent Event (SSE) | ||
1. Client requests data from a server using regular HTTP. | ||
2. The requested webpage opens a connection to the server. | ||
3. Server sends the data to the client whenever there’s new information available. | ||
|
||
- Use case: | ||
- When real-time traffic from server to client is needed. | ||
- When server generates data in a loop and sends multiple events to client. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
Consistent Hashing | ||
==== | ||
|
||
## Simple hashing | ||
Problems of simple hashing function `key % n` (`n` is the number of servers): | ||
- It is not horizontally scalable. Whenever a new cache host is added to the system, all existing mappings are broken. | ||
- It may not be load balanced, especially for non-uniformly distributed data. Some servers will become hot spots. | ||
|
||
## Consistent Hashing | ||
- Consistent hashing maps a key to an integer. | ||
- Imagine that the integers in the range are placed on a ring such that the values are wrapped around. | ||
- Given a list of servers, hash them to integers in the range. | ||
- To map a key to a server: | ||
- Hash it to a single integer. | ||
- Move clockwise on the ring until finding the first cache it encounters. | ||
- When the hash table is resized (a server is added or deleted), only `k/n` keys need to be remapped (`k` is the total number of keys, and `n` is the total number of servers). | ||
- To handle hot spots, add “virtual replicas” for caches. | ||
- Instead of mapping each cache to a single point on the ring, map it to multiple points on the ring (replicas). This way, each cache is associated with multiple portions of the ring. | ||
- If the hash function is “mixes well,” as the number of replicas increases, the keys will be more balanced. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
Indexes | ||
==== | ||
|
||
- Improve the performance of search queries. | ||
- Decrease the write performance. This performance degradation applies to all insert, update, and delete operations. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
# Key Characteristics of Distributed Systems | ||
|
||
## Scalability | ||
- The capability of a system to grow and manage increased demand. | ||
- A system that can continuously evolve to support growing amount of work is scalable. | ||
- Horizontal scaling: by adding more servers into the pool of resources. | ||
- Vertical scaling: by adding more resource (CPU, RAM, storage, etc) to an existing server. This approach comes with downtime and an upper limit. | ||
|
||
## Reliability | ||
- Reliability is the probability that a system will fail in a given period. | ||
- A distributed system is reliable if it keeps delivering its service even when one or multiple components fail. | ||
- Reliability is achieved through redundancy of components and data (remove every single point of failure). | ||
|
||
## Availability | ||
- Availability is the time a system remains operational to perform its required function in a specific period. | ||
- Measured by the percentage of time that a system remains operational under normal conditions. | ||
- A reliable system is available. | ||
- An available system is not necessarily reliable. | ||
- A system with a security hole is available when there is no security attack. | ||
|
||
## Efficiency | ||
- Latency: response time, the delay to obtain the first piece of data. | ||
- Bandwidth: throughput, amount of data delivered in a given time. | ||
|
||
## Serviceability / Manageability | ||
- Easiness to operate and maintain the system. | ||
- Simplicity and spend with which a system can be repaired or maintained. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
Load Balancing (LB) | ||
==== | ||
|
||
Help scale horizontally across an ever-increasing number of servers. | ||
|
||
## LB locations | ||
- Between user and web server | ||
- Between web servers and an internal platform layer (application servers, cache servers) | ||
- Between internal platform layer and database | ||
|
||
## Algorithms | ||
- Least connection | ||
- Least response time | ||
- Least bandwidth | ||
- Round robin | ||
- Weighted round robin | ||
- IP hash | ||
|
||
## Implementation | ||
- Smart clients | ||
- Hardware load balancers | ||
- Software load balancers |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
Proxies | ||
==== | ||
|
||
- A proxy server is an intermediary piece of hardware / software sitting between client and backend server. | ||
- Filter requests | ||
- Log requests | ||
- Transform requests (encryption, compression, etc) | ||
- [Cache](caching.md) | ||
- Batch requests | ||
- Collapsed forwarding: enable multiple client requests for the same URI to be processed as one request to the backend server | ||
- Collapse requests for data that is spatially close together in the storage to minimize the reads |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
Queues | ||
==== | ||
|
||
- Queues are used to effectively manage requests in a large-scale distributed system, in which different components of the system may need to work in an asynchronous way. | ||
- It is an abstraction between the client’s request and the actual work performed to service it. | ||
- Queues are implemente on the asynchronious communication protocol. When a client submits a task to a queue they are no longer required to wait for the results | ||
- Queue can provide protection from service outages and failures. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
Redundancy | ||
==== | ||
|
||
- Redundancy: **duplication of critical data or services** with the intention of increased reliability of the system. | ||
- Server failover | ||
- Remove single points of failure and provide backups (e.g. server failover). | ||
- Shared-nothing architecture | ||
- Each node can operate independently of one another. | ||
- No central service managing state or orchestrating activities. | ||
- New servers can be added without special conditions or knowledge. | ||
- No single point of failure. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
Sharding / Data Partitioning | ||
==== | ||
|
||
## Partitioning methods | ||
- Horizontal partitioning | ||
- Range based sharding. | ||
- Put different rows into different tables. | ||
- Con | ||
- If the value whose range is used for sharding isn’t chosen carefully, the partitioning scheme will lead to unbalanced servers. | ||
- Vertical partitioning | ||
- Divide data for a specific feature to their own server. | ||
- Pro | ||
- Straightforward to implement. | ||
- Low impact on the application. | ||
- Con | ||
- To support growth of the application, a database may need further partitioning. | ||
- Directory-based partitioning | ||
- A lookup service that knows the partitioning scheme and abstracts it away from the database access code. | ||
- Allow addition of db servers or change of partitioning schema without impacting application. | ||
- Con | ||
- Can be a single point of failure. | ||
|
||
## Partitioning criteria | ||
- Key or hash-based partitioning | ||
- Apply a hash function to some key attribute of the entry to get the partition number. | ||
- Problem | ||
- Adding new servers may require changing the hash function, which would need redistribution of data and downtime for the service. | ||
- Workaround: [consistent hashing](https://en.wikipedia.org/wiki/Consistent_hashing). | ||
- List partitioning | ||
- Each partition is assigned a list of values. | ||
- Round-robin partitioning | ||
- With `n` partitions, the `i` tuple is assigned to partition `i % n`. | ||
- Composite partitioning | ||
- Combine any of above partitioning schemes to devise a new scheme. | ||
- Consistent hashing is a composite of hash and list partitioning. | ||
- Key -> reduced key space through hash -> list -> partition. | ||
|
||
## Common problems of sharding | ||
Most of the constraints are due to the fact that operations across multiple tables or multiple rows in the same table will no longer run on the same server. | ||
|
||
- Joins and denormalization | ||
- Joins will not be performance efficient since data has to be compiled from multiple servers. | ||
- Workaround: denormalize the database so that queries can be performed from a single table. But this can lead to data inconsistency. | ||
- Referential integrity | ||
- Difficult to enforce data integrity constraints (e.g. foreign keys). | ||
- Workaround | ||
- Referential integrity is enforced by application code. | ||
- Applications can run SQL jobs to clean up dangling references. | ||
- Rebalancing | ||
- Necessity of rebalancing | ||
- Data distribution is not uniform. | ||
- A lot of load on one shard. | ||
- Create more db shards or rebalance existing shards changes partitioning scheme and requires data movement. |
Oops, something went wrong.