[学习笔记]System Design Examples - 2. Design a key-value store

liuxf666

772

收藏 2019-04-20

#0. Requirements:

put(key, value)
get(key)

1 Single Server key-value store
# Problem: can't fit everything in memory
Solutions:

Compress the data
Store the data on disk and only put frequently used data in hash table

2 Distributed Key-Value store
#1 CAP theorem
refer to previous notes.

#2 Design Goals

Ability to store "big data" (> 1T)
High availability: always respond quickly, even during failures
Scalability

- The system can be scaled to support thousands of servers easily- Addition/deletion of servers should be easy
- Heterogeneity: the work distribution must be proportional to the capability of individual servers.
- Tunable tradeoffs between consistency and latency
- Low latency read and write operations (On average < 10 ms for reads, < 20 ms for writes)
- Comprehensible conflict resolution
- Robust failure detection and resolution techniques
#3 System Architecture

Data Partition
- Goals:

- distribute data across multiple servers evenly.
- minimize data movement when nodes are added or removed.

Consistent hashing:
Approach:

- servers (virtual nodes, i.e. 100 virtual nodes/server) are placed on a hash ring.
- a key is hashed into the same ring, and it is stored in the first server that it encounters while traveling in clockwise direction.

Advantages for consistent hashing

- High scalability
- Heterogeneity: make the number of virtual nodes proportional to the server capacity

Data Replication: data replicated asynchronously over N servers (N is a configurable parameter, N < number of servers in the system)

Consistency: Quorum consensus can be used to guarantee consistency for both read and write operations.
- Failure Scenarios

#1. learn the techniques to detect failures.
#2. learn common failure scenarios and failure resolution strategies

Failure Detection

#1 all-to-all multicasting - not efficient when there are lots of servers in the system
#2 decentralized failure detection methods like gossip protocol for inter-node communication.