-
Notifications
You must be signed in to change notification settings - Fork 5
Description
Enhancement Proposal
Abstract
Ella Core is easy to deploy and operate, but one of its main issues is its inability to scale its user plane capacity and survive faults. This specification outlines an approach to implement scaling in Ella Core. The recommended approach uses Raft to share persistent data between nodes and a modified PFCP protocol to allow the Ella Core leader unit to order a different unit to forward packets.
State sharing and Leadership
Nodes share persistent data via the Raft consensus algorithm. dqlite will be used as the embedded, replicated SQL engine to back the persistent data across the cluster.
API changes
Raft Cluster
- PUT
api/v1/cluster: Edit cluster configuration- enabled
- n2_vip
- POST
api/v1/cluster/stepdown: Steps down from leadership
Raft peers
- POST
api/v1/cluster/peers: Add a raft peer - GET
api/v1/cluster/peers: List raft peers - DELETE
api/v1/cluster/peers/<peer id>: Delete a peer
UI changes
We should add a new Cluster page to the UI.
A new cluster communication endpoint
Ella Core will expose a new network endpoint (using a dedicated cluster address/port) for inter-node communication.
All communication between cluster nodes will be secured using mutual TLS. Each node will have its certificate and private key, and nodes will validate each other’s certificates before accepting connections.
User Plane Selection with a modified PFCP protocol
As the number of Core nodes increases, the user plane capacity should also increase. To implement User Plane scaling, the leader unit should select which unit will handle the user plane traffic for a given session.
In 5G networks, the PFCP protocol is used between the SMF and the UPF to manage PDU tunnels in the UPF. Here, we propose a simplified PFCP protocol over HTTPs, used between the leader and follower units. Only the "session" part of the protocol needs to be implemented, as "associations" can be assumed from nodes already being part of the cluster.
Further Information
Load Balancing
Load Balancing is an optional part of scaling. Users may use an HTTPs load balancer in front of the node's API services to ensure they always access the leader node via the same address, even if the leader changes. An NGAPP load balancer can also be used between the radios and Ella Core units so that gNodeBs always send signaling information to the leader node.
Reference
- Raft: https://raft.github.io/
- dqlite: https://github.com/canonical/dqlite
- Vault raft configuration: https://developer.hashicorp.com/vault/docs/configuration/storage/raft