Architecture of Distributed Systems

202401181241
Status: #idea
Tags: DC

Architecture of Distributed Systems

Layered Architecture Style

Each layer provides a specific service
There should be NO dependency between the layers (with the exception of the interface)
Very common in Distributed Information Systems
TODO image from slide 4
Hybrid layering
- Allows you to skip a layer for performance layers
An upcall can be similar to an interrupt. Hence, the handle would be ISR. Upcall is raising the interrupt.
- OS defines placeholders called handlers, which it calls
- The OS doesn't know who it calls.
$Layer N$ does not know anything about the implementation of $Layer N - 1$ , so you can change the implementation
Eg. Sockets in NetProg. How TCP is implemented is hidden. All you know is that you selected you want to use TCP on a particular port
Eg. Search engine (TODO pic from slide 6)

Service Oriented Architecture

Object-based

TODO image form slide 7
Performs 2 compilations
1. Normal
2. Second compiler
  - Creates code for proxy on client. It has same method as the remote object. The code is code for network communication.
  - Creates a skeleton for the server code
  - Eg. RMI compiler in Java

Resource-based

Used in Azure
REST architecture
- Trivia: Created by a university student as a part of PhD thesis
All services offer the same/similar interface
Server is stateless. It does not remember previous requests. Helps in scaling
Operations are similar to CRUD (in DBMS)
- PUT - Create
- GET - Read
- POST - Update
- DELETE - Delete
Eg. Amazon S3
- The objects are files
- Buckets are similar to directory. But you can only put an object in a bucket. No bucket in a bucket
- Not using REST APIs allows you to use parameters, where you can have compile-time checks to catch errors
- Non-REST APIs are easier to understand the function of each API.
- Published interface -> More stable and generic interface, which does not change. So server can change a lot of things, and overall structure remains the same
- Neither is strictly better

Microservice Architecture

Develop an application as a suite of small components
- Implements a business capability and is independently deployable
- Independently replaceable and upgradable, hence loosely coupled
Published interface (does not change)
Each microservice can be replicated a different number of times.

Design Principle

Create a service that is based on business functionality
Smart endpoint, dump pipe
- Put all your intelligence in the module itself.
- Do not put intelligence in the pipe (where the data flows)
- Do not assume that there is any intelligence in the pipe. Keep the communication as simple as possible

Design of Data Persistence

Multiple types of data store (not necessarily relational)
TODO Slide 13 of arch

System as a collection of processes
Event based - Events are produced and subscribed to

	Temporally Coupled	Temporally Decoupled
Referentially Coupled	Direct (Phone call)	Mailbox (E-Mail)
Referentially Decoupled	Event-based	Shared data space

Info

Temporally coupled processes need to be running at the same time

Note

Kafka is a light-weight and extremely scalable event-based data streaming tool

Example: Linda Tuple Space

3 simple opertations
- in(t): Remove tuple matching template t
- rd(t): Obtain a copy of a tuple matching template t
- out(t): Add tuple t to the tuple space. Calling it twice will create 2 copies
in and rd are blocking operations. Caller will block till tuple is found/becomes available

Middleware

A software layer, on top of the OS of distributed nodes to assist distributed applications
Provides
- Inter-app communication
- Security
- Accounting
- Failure masking (recovery)
Design patterns
- Wrapper: Component that offers and interface for the clinet and solves incompatible interface problem
  - Amazon S3: Storage is RESTful API or boto library
- 1-1 communication
  - Legacy software use this
  - With $N$ applications $O (N^{2})$ adapters
- Brokers
  - With $N$ applications: $O (N)$ adapters
  - Sophisticated brokers are called application servers

Interceptors

TODO: Image of Slide 19

Example: Network File System (NFS)

Each NFS server provides a standardised view of its local file system
TODO Image from slide 4

Example: Web Server

TODO Image from slide 5

Common Gateway Interface (CGI)
Can also have server side scripting

<strong><?php ech $_SERVER['REMOTE_ADDR']; ?></strong>

Different Types of Distribution

Vertical distribution
- Layers are distributed in multiple machines
- Client server
Horizontal distribution
- Both client and servers are physically split and distributed in different machines
- Peer to peer
  - All processes are equal
  - Each process will act as a clinet and server

Attention

Each machine need NOT be running identical functions

Peer to Peer

Structured

Node adheres to a particular topology (ring, grid, tree)
- Efficient data lookup

Hypercube

Distributed hashing example
- P2P node stores <K, V> pair
- Key is the index to route to the node with the data
  TODO: Image of n-dimensional cube topology (Slide 7)

Chord/Ring

Each node has m bit ID
Only nodes in bold actually exist
Store data with the smallest ID $\geq K$
Each directed edge is a shortcut.
- They are created in a way S.T the shortest distance between $(u, v)$ is $O (\log N)$
Go to the node which is preceding 3, and closest to 3. It can only pick its neighbour, hence 1 is not picked.
$join (u)$
1. $v = lookup (u)$
2. Insert $u$ before $v$
3. $succ (u) = v$
4. Links to $v$ now go to $u$
5. Copy data from $v$ to $u$ whose $succ (k) = u$
$leave (u)$
1. $u$ informs departure to its $pred$ and $succ$
2. Copy its data to $succ (u)$

Unstructured

Random graph - Each node maintains a random list of neighbours with probability $P [< u, v >]$
Searching
- Flooding
  1. $u$ sends request for $d$ for all neighbours
  2. If a neighbour $v$ has seen it, it ignores
  3. If $v$ has $d$ , it sends it to the node who requested the data $d$ from it (need not be $u$ )
  4. Else, $v$ forwards the request to its neighbours
  - Flood to $d$ randomly choses neighbours
    - After $k$ steps, $R (k) = d \cdot (d - 1)^{k - 1}$ nodes will have been reached
    - With having $\frac{r}{N}$ fraction of nodes having data
    - We will have found the data if $\frac{r}{N} \cdot R (k) \geq 1$
- Random walk
  1. $u$ passes request for $d$ to randomly chooses neighbour $v$
  2. If $v$ does not have $d$ , it forwards it to one of its random neighbours
  - Probability that data is found after $k$ attempts $P [k] = \frac{n}{R} {(1 - \frac{r}{N})}^{k - 1}$
  - Expected number of nodes to be probed before finding the data $E [X] = Σ k P [k] \approx \frac{N}{r} when 1 << r << N$

Example: BitTorrent Example

TODO: Image from slide 11
Uses the tracker model
In an alternative implementation of BitTorrent, a node also joins a DHT to assist in tracking file downloads
- Each node acts as a tracker for a small set of files
- New peer will communicate only with those peers and not with the initial tracker
- Initial tracker for the requested file is looked up in the DHT through a magnet link
- DHT indexes the torrent file (and not the pieces of the file)

Hybrid Architecture: Edge-Cloud

Computation happens close to the cyber-physical system
Industry automation
- Each factory - Edge
- Cloud is used to connect multiple factories

References

Published Interface - Article by Martin Fowler in IEEE Software
Network File SystemPreviously on "The Bold Type." You are the future
Trackerless Torrents
How does DHT get bootstrapped?

Architecture of Distributed Systems

Layered Architecture Style

Service Oriented Architecture

Object-based

Resource-based

Microservice Architecture

Design Principle

Design of Data Persistence

Publish-Subscribe Style

Example: Linda Tuple Space

Middleware

Interceptors

Example: Network File System (NFS)

Example: Web Server

Different Types of Distribution

Peer to Peer

Structured

Hypercube

Chord/Ring

Unstructured

Example: BitTorrent Example

Hybrid Architecture: Edge-Cloud

References