Skip to content

Add High Availability (HA) support #164

@gruyaume

Description

@gruyaume

Enhancement Proposal

Abstract

Ella Core is easy to deploy and operate, but one of its main issues is its inability to scale its user plane capacity and survive faults. This specification outlines an approach to implement scaling in Ella Core. The recommended approach uses Raft to share persistent data between nodes and a modified PFCP protocol to allow the Ella Core leader unit to order a different unit to forward packets.

State sharing and Leadership

Nodes share persistent data via the Raft consensus algorithm. dqlite will be used as the embedded, replicated SQL engine to back the persistent data across the cluster.

API changes

Raft Cluster

  • PUT api/v1/cluster: Edit cluster configuration
    • enabled
    • n2_vip
  • POST api/v1/cluster/stepdown: Steps down from leadership

Raft peers

  • POST api/v1/cluster/peers: Add a raft peer
  • GET api/v1/cluster/peers: List raft peers
  • DELETE api/v1/cluster/peers/<peer id>: Delete a peer

UI changes

We should add a new Cluster page to the UI.

A new cluster communication endpoint

Ella Core will expose a new network endpoint (using a dedicated cluster address/port) for inter-node communication.
All communication between cluster nodes will be secured using mutual TLS. Each node will have its certificate and private key, and nodes will validate each other’s certificates before accepting connections.

User Plane Selection with a modified PFCP protocol

As the number of Core nodes increases, the user plane capacity should also increase. To implement User Plane scaling, the leader unit should select which unit will handle the user plane traffic for a given session.

In 5G networks, the PFCP protocol is used between the SMF and the UPF to manage PDU tunnels in the UPF. Here, we propose a simplified PFCP protocol over HTTPs, used between the leader and follower units. Only the "session" part of the protocol needs to be implemented, as "associations" can be assumed from nodes already being part of the cluster.

Further Information

Load Balancing

Load Balancing is an optional part of scaling. Users may use an HTTPs load balancer in front of the node's API services to ensure they always access the leader node via the same address, even if the leader changes. An NGAPP load balancer can also be used between the radios and Ella Core units so that gNodeBs always send signaling information to the leader node.

Reference

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions