ARTS Week1
by
Background: ARTS activity
Algorithm
Construct Binary Search Tree from Preorder Traversal
Solution: A binary search tree is given in this problem where every node satisfy the conditions that node.left.val < node.val and node.right.val > node.val. Therefore, every BST node has an implicit attached interval which defines the range of node value falls in between. Let’s take the example of the following tree and see how these intervals look like:
8
/ \
5 10
/ \ \
1 7 12
(-INF, INF)
/ \
(-INF, 8) (8, INF)
/ \ \
(-INF, 5) (5, 8) (10, INF)
Each BST node is attached with an interval and the root node is attached with (-INF, INF). Except root node, each attached interval is determined by the attached interval of its parent node and whether it is its parent’s left child or right child.
- Left child’s interval: (lower bound of parent, parent’s value)
- Right child’s interval: (parent’s value, upper bound of parent)
Therefore, given the preorder traversal of a BST, we can rearrange the tree structure by this property of BST. The main idea is to traverse the preorder nodes from the input, find the parent of the current node. If a node satisfies the following condition, then it is the current node’s parent:
- It lies in front of the current node in the pre-order.
- If current value is within the node’s left interval and the node’s left child is undefined,
- or if current value is within the node’s right interval and node’s right child is undefined.
Since we need to keep trace of the nodes from top to bottom and then go to another branch, stack is a good choice to hold the nodes and the corresponding intervals during the reconstruction. Iterate the preorder input, find the parent in the stack, then recalculate the interval and push node’s value, the interval into the stack and continue the iteration. Here is the solution:
Review
The Google File System
- Design Assumption: The GFS design is based on the observations file system workloads from Google.
- The system is built from many inexpensive commodity Linux machines – failures are very common.
- FS stores modest number of large files (size in MB or even GB).
- Two types of common workloads: large amount of streaming read and small amount of random reads; large sequential write(append). GFS is to optimize those two workloads rather than providing an more efficient FS on all utility.
- FS should provide well-defined semantics for conncurrent write.
- High throughput rather than low latency is the most important requirement of GFS applications.
- Interface
- GFS doesn’t provide APIs that are compliant with POSIX but adapts to hierachical FS and CRUD operations.
(GFS abstracts a layer above Linux file system to achieve their goals on performance, availability, reliability and scalability, so they provide GFS-style APIs. That means the clients are expected to be cooperate. However, the design of GFS guarantees the simplicity of client’s implementation.)
- Low cost snapshots of files / directories.
- GFS doesn’t provide APIs that are compliant with POSIX but adapts to hierachical FS and CRUD operations.
- Architecture
- Cluster: only one master and hundreds of chunk servers
(here it is chunk server not slave server because chunk is not a simple copy of master).
- Single master:
- Benefits: simplicity consideration
- Precondition: must minimize its involvement in reads and writes.
- Role: control center and store meta data.
- Chunk server:
- Benefits: scalable
- Role: data storage, consistency and atomicity of record append is acquired here.
- use large chunk size to reduce numbers of requests of clients and reduce network overhead when operations are batched and reduce the size of meta data stored in the master.
- Single master:
- Meta data:
- What is included: files and chunk namespace, mapping from files to chunks, location of chunk replicas
The first and second type of meta data is persistent as operation logs on master and remote machine. The third one is synchronized from chunk servers to master when starting up. The third one is synchronized from chunk servers to master when starting up. The third one is synchronized from chunk servers to master when starting up. The third one is synchronized from chunk servers to master when starting up. This makes sure that the meta data is small enough to fit in the master’s memory to achieve fast and efficient performance and enable the master to scan the state of the cluster periodically (like garbage collection, re-replication, rebalance)
- Chunk location: synchronize through heart-beat messages.
- Operation log:
- persisitent
- changes are not visible to clients until the logs are flushed.
- logs are batched to process to reduce the impact on the system throughput.
- What is included: files and chunk namespace, mapping from files to chunks, location of chunk replicas
- Consistency model:
- file namespace mutations are atomic.
- consistency and definedness.
- Cluster: only one master and hundreds of chunk servers
- System Interaction
- Lease and Muation Order: master assigns a primary from the chunk servers with a lease and primary decides the concurrent mutations order to apply on all chunks.
lease is a common tools to remain a system in some states for a while in stateless system.
- primary decides the sequential order of concurrent mutations.
- primary can ask for extension before expiration if ongoing mutations on this chunks exist.
- a new primary is granted if the current primary fails. Heart-beat messages can bring the state of chunks to the master.
- Data Flow:
- data is pushed along a linear chain to fully utilize the outbound bandwidth of a chunk server.
- the chain is formed by chunk servers and they all link to the closest chunk server by IP address.
- pipelining - start forwarding as soon as data is received.
something like pipeline in Linux
- Atomic Record Appends:
- usage: multiple producers/single consumer model; merged result from clients.
- lazy allocation for chunk space:
- Snapshots
- Lease and Muation Order: master assigns a primary from the chunk servers with a lease and primary decides the concurrent mutations order to apply on all chunks.
- Master Operation
- Namespace Management and Locking:
- no data structure for directory
linear table for all files and directories
- requires many locks recursively
the table is in the memory, so it is fast enough to apply locks recursively.
- when creating files, read locks are applied on upper level directories and write lock on the creating file.
no other update to directories is permitted then and duplicate creation of file is not allowed.
- locks are applied at some orders to prevent deadlock
break the occurrence of circular wait to prevent deadlock
- no data structure for directory
- Replica Replacement Policy
- to maximize data reliablity and availability, maximize network bandwidth utilization.
- spread data across racks rather than only across machine
- can exploit the aggregate bandwidth of multi-racks.
- Creation, Re-replication, Rebalancing
- Chunk Creation
- equalize disck utility
- prevent imbalance of chunk creation over time.
further for load balance, the more chunks on a chunk server, the higher possibility of the server be affected by the upcoming mutations.
- replica across racks
- Replica Re-replication
- happens when the availability falls to a specific level
- a chunk has been corrupted.
- prioritize the procedure
something really bad happens, make replication for those servers first.
- Rebalancing
- Chunk Creation
- Garbage Collection
- Mechanism
- deletion of a file is logged
- rename the deleted file to a hidden file
- removed it when master scans the metadata periodically (3 days)
- Reliable and simple solution:
- failures happen as chunk is created on some replica, or heart-beat messages from chunk server get lost.
now some chunk server is out of date or not known to master
- reclaimation is merged into regular background scanning of master to amortize the cost.
- when frequent deletions is the common workload or the disk resource is tight, the number of replicas and garbage collection are configurable.
- failures happen as chunk is created on some replica, or heart-beat messages from chunk server get lost.
- Stale Replica Detection
- Use chunk version!
- Mechanism
- Namespace Management and Locking:
- Fault Tolerance and Diagnosis
- High Availablity
- Fast Recovery: resend requests or reconnected whenever experience something goes wrong
- Replication
- Chunk Replication
- Replication re-replication
- checksum, parity code
- Master Replication
- If the process of primary master can’t be restarted, monitoring infrastructure outside GFS will start to look for another master
- Master has some “shadow” duplicates.
- “Shadow” master is read-only during recovery. They have to replay the logs to keep track of all metadata until the new primary master is booted.
- Chunk Replication
- Data integrity
- High Availablity
Tips
Sharing
Chaos engineering: Chaos Engineering: Why the World Needs More Resilient Systems And google SRE book on monitoring
tags: