Process, Virtualisation, Client, Server
202401301229
Status: #idea
Tags: DC
Process, Virtualisation, Client, Server
- Abstractions
- To the process
- Contiguous memory is available
- API calls and system calls
- To the OS
- ISA abstracts the hardware away
- Privileged ISA
- Uses control registers
- Traps and interrupts
- System clock
- MMU access (TLB and page tables)
- I/O device access
- To the process
Process, Threads
- Virtual processors in software
- Processor
- Provides a set of instructions
- TODO: Slide 7 (context)
- Thread
- Minimal software processor in whose context a series of instructions can be executed
- TODO: Slide 7 (context)
- Process
- A software processor in whose context one or more threads can be executed
- TODO: Slide 7 (context)
- Processor
- Context switching
- Threads
- Share same address space
- Switching is OS independent
- Processes
- Kernel trapping
- OS is involved
- Threads
Client- and Server-side Threading
Client-side threading
- Hide network latency
- Each call is a blocking HTTP request
- Multiple RPC
- Several calls to several servers
Server-side threading
- Improve performance
- Hide latency
- Dispatcher-worker pattern
- TODO: Image from slide 8
Virtualisation
TODO: image from slide 9
Hypervisor -> Also called VMM (Virtual Machine Monitor)
Benefits
- Server consolidation
- Data center management
- High availability -> Automatic restart
- Disaster recovery
- Fault tolerance
- Easy to test and develop in
VMs vs Containers
TODO image from slide 11
Hypervisor (VMM)
- Properties it must have
- Equivalence
- App should have same behaviour as it does when running on native platform directly
- VMMs should not change the functionality of the app
- Efficiency
- Performance of the app should remain more or less same
- A significant part of instructions should be executed directly by the hardware without any VMM supervision, to make its execution comparable to bare metal execution
- Resource control
- VMM should have full control of the resources allocated
- Equivalence
- Additional properties
- Isolation
- If a VM fails, it should not impact another VM running on the same VMM
- If a VM needs too much resources, it should not impact other VMs in the environment
- It is done using checkpointing
- Encapsulation
- Availability: Through live migration
TODO: @Arnav Gupta
- Availability: Through live migration
- Isolation
Binary Translation
- Translating instructions from one ISA (of the VM) to another (of the host machine)
- Why not do binary translation before execution (static)?
- In CISC ISA (x86), instructions are variable lengths
- Data is interspersed with instructions
- Computing target of jump is difficult
- Dynamic translation
- Translate while in execution
- Cache the translated code for repeated use
- Ideally, translation is done before PC reaches the block (so that it is already present in the cache)
A block is a set of instrucitons, where if the control enters it, it will for sure execute
Eg. Jump and branch instructions will end a block
- TODO: Images from slide 16,17 (2/6/24)
- SPC is Source Program Counter
- Identity translation
- No change in the instruction. Copied as is.
- Inline translation
- Replace an institution with a call to a special routine
- Eg.
cli->`call HANDLE_CLI
More efficient code: VMWare's Trick
- Replacing privileged instructions with a callout
- Replaces a trap-emulate with a callout mechanism, as trap-emulate is expensive
- (Bad) Alternative solution is:
- VMM is in ring 3
cliwill fail- So VMM needs to have a trap handler which will run every time to handle this issue
- Binary translation is avoiding this
- Emulator routine
- Change mode to privileged
- Check privilege level in VM
- Emulate instruction
- Compute target
- Restore mode to user
- Jump to target
- VMWare's approach summary:
- Perform simple translation in translation blocks
- For direct jumps and function calls -> Transfer control to translated address
- For indirect jumps and function returns -> Use the
lookup()function which converts address in guest code to a translate address - For privileged instructions -> Replace privileged instructions with callouts to the VMM
Overheads of Full Virtualisation
- Binary translation requires multiple calls by hypervisor to translate single guest call
- Guest OS is optimised with the assumption of direct hardware access
- Memory virtualisation requires software handling of page tables, which is inefficient
- IO virtualisation is complex
- Forfeit Direct Memory Access (DMA) and other hardware optimisations available on bare metal
Paravirtualization
- What if we require that guest OS is modified to run in the VMM?
- This is called paravirtualization
- Requires a modified guest OS to make it hypervisor
- Modified device drivers are aware of virtualisation
- Instead of privileged system calls, guest OS makes hyper calls directly to the hypervisor (No need for BT)
- Gives excellent performance
- Challenge -> Difficult with proprietary OS's and VMMs
- Modifying guest OS's for Xen
- Linux: 1.36% of kernel (including device drivers)
- Windows XP: 0.04% of the kernel
- Xen pioneered the idea of paravirtualization
- Replace privileged operations with hypercalls
- Defer most memory management to the VMM
- If the guest executes a privileged instruction, crash it -> Makes Xen very lightweight
- Some interrupts are forwarded directly, without Xen's intervention (aka no equivalent hypercall)
- Eg.
int 0x80-> System calls
TODO image from "Solution 2 - Paravirtualization"
- Eg.
Hardware Assistance for Virtualization
- Allows virtualization of unmodified guest OS without responding to BT
- Introduces 'root' and 'user' CPU operating modes
- Each mode has its own rings 0-3
- VMM runs in root mode (below ring 0)
- Guest OS runs in ring 0 of user mode
- All of its privileged system calls are trapped to the hypervisor
- CPU is aware of memory virtualization
- Memory mapping is done by the MMU (hardware)
- Page faults are also handled by MMU
- The privileged x86 commands are now allowed in VMX root mode, i.e. only VMM can execute
- CPU understands VMControl Structure (VMCS in Intel) or VMControl Block (VMCB in AMD)
- A data structure in memory to maintain the CPU state
- 1 VMCS for each VM, one for the monitor
- CPU is told the physical address of each VMCS
- Problem
- Some operations are much slower when using
VMEXITvs BT
- Some operations are much slower when using
Compute intensive workloads have similar performance in both hardware virtualisation and BT (because most calls are identity translated)
Memory operations are much faster with hardware virtualization
IO is significantly worse with full virtualization. It requires hardware-assisted IO virtualisation (like SR-IOV)
- TODO image from slide 30 (2/6/24)
Intel VTx: VM Entry & Exit
- VM Entry: VMM -> Guest OS
- Enters VMX non-root operation
- Loads guest state and exit criteria from VMCS
VMLAUNCHinstruction used on initial entryVMRESUMEused on subsequent entries
- VM Exit: Guest OS -> VMM
VMEXITinstruction used- Enters VMX root operation
- Saves guest state and loads VMM state from VMCS
- Majorly handled by the hardware
Need for Ongoing Dynamic VM Placement
TODO: Image from slide 39 (8/2/24)
Scenario 1: New VMs
- Where to place the new VM in the cluster
Scenario 2: Consolidation
- Move a VM from one system to one which is very lightly loaded
- Optimises energy usage
Scenario 3: Load balancing
- Move VM from one heavily loaded machine to another with more available resources
Containers
- Lightweight version of a VM
- Often used to wrap an application
- Namespace ->Processes in a container is given their own view of identifiers (like PID)
- Union file system ->
- Control groups -> Resource restrictions can be imposed on a collection of processes
- TODO Image from slide 34 (2/8/24)
As containers are directly dependant on the underlying OS, their migration in heterogeneous environment is far from trivial, to simply impractical, just as process migration is.
Stateless Server
- Never keep track of the status of the clients
- Consequences
- Extremely scalable
- State inconsistencies are reduced
- Possible loss of performance
- Web server is stateless but it maintains state indirectly
- Maintains logs of the clients' requests
- Let the client (browser remember) -> Cookies