Introduction to DataSpaces
202501101109
Status: #idea
Tags: DataSpaces
Introduction to DataSpaces
- Consists of a bunch of nodes, whose memory is used as a shared buffer.
- It allows applications to directly share memory
Architecture

Communication Layer (Margo RPC Library)
- Uses Margo
- Does the actual data movement, serialisation, etc.
- Also, does RPC
Distributed In-Memory Object Store Layer
- Allocates memory buffers on distributed compute nodes, and uses it as an in-mem object store
- DHT is made, to index the data.
Question
What is the ‘query engine’ that is described?
Part of DataSpaces v1, not v2
Core Service Layer
Coordination and Data Sharing Service
- Creates the in-mem object store
- Manages the memory buffers
Scalable Messaging Service
- Pub-Sub service
- Users can:
- sub to events in regions of interest
- define event-triggered actions
- be notified when an event occurs
Mapping & Scheduling Service
- Manages the placement of data (both in-situ and in-transit)
- In-situ data processing occurs on the same processor cores that run the simulation
- In-transit data processing runs on dedicated compute nodes of the staging area
- Supports data-centric mapping and scheduling of tasks
- If the same data is used in subsequent steps, the tasks are scheduled on the same node (in order to reduce network usage)
Programming Abstraction Layer
- Builds on existing parallel programming paradigms like MPI and PGAS
- Core APIs
put-getoperatorspub-suboperators- User can register events
- Event-triggered actions can be defined
- Workflows are defined as a DAG
- Each node represents a data operation.
- Each edge represents dataflow