Measurements for data system: reliability, scalability and maintainability1.
Reliability: tolerating hardware and software faults, human errors
- [NOTE] fault != failure (part vs. whole)
2.
Scalability: measuring load and performance, latency, percentiles, throughput
-
Load: Load can be described with a few numbers which we call
load parameters. The best choice of parameters depends on the architecture of your system: it may be requests per second to a web server, the ratio of reads to writes in a database, the number of simultaneously active users in a chat room, the hit rate on a cache, or something else. Perhaps the average case is what matters for you, or perhaps your bottleneck is dominated by a small number of extreme cases.
- Performance:
-- In a
batch processing system, usually care about
throughput - the number of records we can process per second, or the total time it takes to run a job on a dataset of a certain size.
-- In
online systems, usually care more about service’s response time - that is, the time between a client sending a request and receiving a response.
- Latency vs. Throughout
3.
Maintainability: operability, simplicity and evolvablity
- Operability: Making Life Easy for Operations
- Simplicity: Managing Complexity
- Evolvability: Making Change Easy