Request based pub/sub architecture
Pub/sub
Pub/sub makes it easy to get updates on specific topics as soon as they happen.
A pub/sub system need to keep track of all subscribers (S) and their keys (K) that they subscribe to.
When a value (V) is updated the subscribers are directly notified.
A data structure like the following are required.
Subscribers: [ K -> [S] ]
When a value is published ...
K -> V => K -> V'
The new value V' is sent to subscribers [S] if K exist in Subscribers.
This is very powerful and fast if you have seldom updates. But there might be issues.
- Updates occur on update and not on request i.e. every update will result in a search in the subscribers list
- Change requests cannot be done after updates has occurred
- Typically subscribers need to notified directly which require subscribers to be always connected (e.g. over web socket)
- State is stored in the server
Request based change query
The proposed alternative is to let clients keep the state of the subscriptions and request changes on demand based on local state instead.
The system consist of the following
- An unique id ("key") which represent the asset (K)
- The asset referenced by the key ("value") (V)
- A hash of the value which allow changes to be detected more efficiently (H)
The idea is to retrieve all values that have changed its values for a set of keys.
This means that you don't need to subscribe to get updates. If you have a set of keys and corresponding hash values you are able to get all keys that have changed on demand request.
Keys (K), values (V) and hashes (H) of values are actually stored on the server.
Values: [ K -> V ]
Hashes: [ K -> H ]
When a value is published ...
K -> V => K -> V'
The new value is stored and a new hash of the value is calculated.
Changes are only determined on request from clients.
A request is able to determine which keys (K2) with values changed for a set of keys (K1) and its locally stored hashes.
[(K, H)] -> [K]
All hash values for requested keys need to looked up.
Each hash is compared and ones that differ is returned.
The advantages are the following:
- Changes are controlled by clients rather than server
- Multiple changes can be consolidated in one operation which improves efficiency
- Data is stored on the server, but subscriptions are unnecessary
- Many updates and seldom change queries should be efficient
The disadvantages are the following:
- Even no updates will require the same effort
- Server need to store values
All keys must be sorted to be efficient.
Pub/sub with changes queued for clients
Yet another middle ground is keep subscriptions i.e. a list of keys and hashes for each client and update result on update instead of on request.
A subscription for a client (C) is a set of keys and its hashes from last update request.
Subsription: C -> [(K, H)]
When a value is updated and the changed key and hash is stored for each client that subscribes to it.
Each key has a list of clients that are to be noted. Most often numer of clients are low.
Subscribers of key: K -> [C]
Updates are stored as a list of keys and hashes for updated keys.
Updates: C -> [(K, H)]
In this variant a request for changes keys will only return the pre-generated list of changed keys for the specific client.
If most keys only have one or two clients this should be quite efficient. A value update only require one extra lookup of clients interested in the key update.
This solution makes change requests very efficient. So if change requests happen more often than value changes this may be more efficient.
References
- Redis Keyspace Notifications
- Benchmarking LevelDB vs. RocksDB vs. HyperLevelDB vs. LMDB Performance for InfluxDB
- LMDB vs. LevelDB
- Badger
- Sanakrirja DB - New, fast rust based database
- ALICE
- Database of databases
- The details of Figma’s multiplayer system - not really directly related, but interesting from distributed system perspective
Tags: MQTT, IoT, Gateway