Integrating AI and data to monetize 5G Cloud Platforms

As of Q1 2022 for 5G adoption we have just passed 600M mark and expected to hit 2.5B connections by 2025 that is almost 500M every year . If we combine IoT and Device ecosystem the scale can go horrendously big .One of the biggest advantage and also challenge that comes from this once in a life time opportunity is “scale” .

Simply putting in to context “automation” and using ML/AI is a must to achieve both Network SLA’s ,efficiency and Optimizing Network TCO

AI has potential in creating value in terms of enhanced workload availability and improved performance and efficiency for 5G and Telco Cloud . However the biggest problem when it comes to use “AI” and Machine learning Telco’s is “Data” and “data Models” because simply there is no standardization or model definition on how Telco systems including Infrastructure expose the “Information” to upper layers Since data sets are huge in this domain with n permutations therefore first step to normalize is the Use case driven normalization of data that can be consumed both by Network and Data science domains . This will enable to develop a future Telco that can detect and also self heal itself .

Understanding Data Integration Architecture

Considering the 5G architecture which is based on Open API and Horizontal services design A.K.A SBA the Data integration and using AI should be an easy problem that can be divided in following to define a pipeline

  1. Telemetry and data
  2. Each layer data exposure as an API starting from Baremetal and then extending upwards towards Cloud , SDN , NFVO , Assurance etc
  3. Data models and engines to disseminate information

However it is easier said than done because of many reasons including

  1. What will be key data sets
  2. how FCAPS of each layer can be dis-aggregated i.e dropping one layer data without confirming dependency is a kill

Business Architecture

In order to address this we need to understand and gain experience from other industries and SDO’s and to see how it can both be agreed and integrated in Telco Networks , this lead us to approach this as a use case driven approach and select those domains and business challenges that can deliver quick results

"Follow the Money to deliver use cases that can monetize 5G

We have analyzed lot of use cases both from academia and industry and compiled a complete list here

From this we infer there are just too many ways Telco’s are solving same problems and this is what make us understand that there should be clear definition of “data Models” and use cases that should be defined at first steps .

The most important of which are :

  1. Using Machine Learning to Detect Noisy Neighbors in 5G Networks.

2.Towards Black-Box Anomaly Detection in Virtual Network Functions

3. Causality Inference for Failure in NFV

4. Self Adaptive Deep Learning Based System for Anomaly Detection in 5G

5. Correlating multiple Events and Data in an Ethernet Network

This leads us to define following as first steps for AI and Intelligence as applied to Telco’s

Source: LFN acumos

Analysis

Data Lakes , Log analysis and correlation

Detection

Anomaly detection including pattern detection , trend and Multi layer correlation

Prediction

Intelligent prediction including capacity ,SLA , Scaling and Cloud KPIs

Generation

Measure data and Synthetize it using frameworks like eBPF

Data Monetization is first to make 5G Profitable

Adressing both the Data Architecture and Business Architecture is vital as different Telco’s including in cases different BU’s in same Customer take it differently and what makes it worst is manipulate and store data lakes using different forms i.e Infrastructure metrics , Agents , Databases which is hard to apply between different data sets and hence it is biggest issue to Monetize one key assets of 5G which is “data” and hence to define a pipeline that can be shared between all of tenants including vertical industry

The latest White Paper: Intelligent Networking, AI and Machine Learning

Next Steps of “Thoth”

As said before we are defining few key use cases in LFN project “Thoth” to learn and elaborate from there applying concepts of Events , Anomaly and Prediction across layers and first phase use cases are

  1. VM failure
  2. Container Failure
  3. Node Failure
  4. Link Failure
  5. Middle layer Link failure

The detailed list can be seen here Use cases

Advertisement

Improving Telco’s Enterprise reach through Slicing and RAN control

Growing business in 5G era largely depends on ecosystem enablement and on idea that Telco’s can build a future Infrastructure that can deliver business outcomes for not only traditional Telco customers but also for broader verticals may it be Manufacturing ,Mining, Retail , Finance , Public safety , Tourism or eventually anything .

This means “Programmable” and “Automated” infrastructure is the base to achieve any such business outcome . Applying this to Telco’s 5G and Transformation journey will mean both “Network Slicing” and the “Private Networks” and although i totally agree with idea that both will co-exist and proliferate but to be fair it is a fact that although Network slicing has delivered outcomes in Labs and Demos it has still number of challenges when apply to practice .

Recently i have heard many views from many reputable names including my friend Dean Bubley and Karim so i thought to share my views on this topic highlighting some work we have been doing with our partners and customers in APAC as well in the GSMA and to allude to some improvements we have achieved in last year since i shared my views with industry on Network Slicing and its Delivery .

Today Network slicing has been live in a number of customers including Singtel that has achieved substantial outcomes with Slicing however the bigger challenge still remain un-answered

How will Network slicing address RAN resources ?

How will Network Slicing can help to monetize low hanging fruits of Edge together with Telco domain slicing

In 3GPP Release17 we have been doing some exciting progress on later with a new architecture and API exposure for co-deployment of Edge with RAN but again prior to this we need to extend the Slicing towards the Access Networks both from Technology and Business Architecture Perspective and this is what i will share in this paper

Experience Learnt from 5G Networks rollout

As many Telco’s in 2021+ accelerated 5G rollout and built 5G SA Core Networks one think proved more than before that limitation on

  • RAN resources are always scarce
  • AI need to be enabled to intelligently modify slicing in real time
  • Spectrum and RAN layers will be a top pressings time for Telco’s to deliver value
  • RAN resource isolation must follow performance: cost baseline
  • How to handle resources in peak time and pre-empt some over others is vital
  • Regulatory and GDPR is vital to achieve anything big in this domain

Orchestration must precede Network Slicing

From Above experience we can infer that it is really not about Network slicing but rather “Open” , “Control” and “API” to enable End to end Network slicing LCM and Orchestration all the way from UE to RAN to Edge to Core to Cloud

  • Dynamic control of resources with Telco level visibility in Key
  • RAN automation is first step to be done before Slicing change RAN resources
  • Cloud operations model is vital to support Network slicing because although there are many business verticals the Telco’s really have to build an efficient and Multi tenant operational model to win it

Cloud Operations Model that is secure and Multi tenant must be enabled across all Telecom Infrastructure

RAN SLA’s for vertical industry

The notion of Network slicing still lies in selling a SLA vs Selling a Network .

First of all RAN resources are always limited and secondly each vertical industry has its own traffic profile and trajectory which can never be planned using old Telecom simulation tools it means dynamic learning and resource adjustment is key . This all alludes to the fact that changing network while ensuring network KQI remain intact is something that require Full visibility and programmable Control

It leads us to consider following architecture first before Slicing is full enabled for the RAN

  • Slice LCM must be supported by automatic Infrastructure that is elastic and Telco grade at the same time
  • Scale out architecture must be enabled in RAN
  • RF and Spectrum resource scheduling is the most expensive and intricate resources for services and we must enable their dynamic control

Intelligent Networking ML/AI must be enabled first

Automation can deliver a myriad of outcomes including better control , real time changes , optimization , compliance and FCAPS for each tenant however it is not sufficient

Intent driven networks that uses power of data , ML and AI to orchestrate and adapt network is a capability that should be enabled on a network scale before network slicing can deliver a business outcome

It leads us to consider following architecture first before Slicing is full enabled for the RAN

  • Slice LCM must be supported by automatic Infrastructure that is elastic and Telco grade at the same time
  • Scale out architecture must be enabled in RAN
  • RF and Spectrum resource scheduling is the most expensive and intricate resources for services and we must enable their dynamic control

Components of a RAN Slice

Although the Core Slicing capability still exists on OSS and SMO layers that are outside the RAN still the real power of Slicing will come as we address the RT capability of RAN slicing which enables us to deliver following for a business tenant

  • RRM
  • Connection management
  • MM
  • Spectrum layers

All of this must be available to package as a NSSF functional instance as alluded below

Partnerships and Ecosystem

According to the latest GSMA report one use case enablement for any vertical will require at least 7 Players to work together , so RAN slicing or in other words Slicing Business outcomes is not a matter of one body or business to solve . Today to incrementally deliver the business outcomes following are key organizations collaborating to adress those challenges

  • ETSI NFV
  • GSMA
  • MEF
  • IETF
  • O-RAN and TIP
  • ONAP
  • 5GAA , ZVEI etc

We are also taking an aggregation approach where we are summing all the knowledge from these bodies and deliver as a outcome for our customers . you can reach out for more details .

How APAC Telco’s are applying Data, Cloud and Automation to define Future Mode of Operations

As CSPs continue to evolve to be a digital player they are facing some new challenges like size and traffic requirements are increasing at an exponential pace, the networks which were previously only serving telco workloads are now required to be open for a range of business, industrial and services verticals.  These factors necessitate the CSPs to revamp their operations model that is digital, automated, efficient and above all services driven. Similarly, the future operations should support innovation rather than relying on offerings from existing vendor operations models, tool and capabilities.

As CSPs will require to operate and manage both the legacy and new digital platforms during the migrations phase hence it is also imperative that operations have a clear transition strategy and processes that can meet both PNF and VNF service requirements with optimum synergy where possible.

In work done by our team with our customers specifically in APAC , future network should address the following challenges for its operations transformations.

  • Fault Management: Fault management in the digital era is more complex as there are no dedicated infrastructure for the applications. The question therefore arises how to demarcate the fault and corelate cross layer faults to improve O&M troubleshooting of ICT services.
  • Service Assurance: The future operations model requires being digital in nature with minimum manual intervention, fully aligned with ZTM (Zero Touch Management) and data driven using principles of closed loop feedback control.
  • Competency: To match operation requirements of future digital networks the skills of engineers and designers will play a pivotal role in defining and evolving to future networks. The new roles will require network engineers to be more technologist rather than technicians
  • IT Technology: IT technology including skills in data centers, cloud and shared resources will be vital to operate the network. Operation teams need to understand impacts of scaling elasticity and network healing on operational services

Apply ODA (Open Digital Architecture) for Future Operations Framework (FMO)

TM Forum ODA A.K.A Open Digital Architecture is a perfect place to start but since it is just an architecture and can lead to different implementation and application architecture so below i will try to share how in real brown field networks it is being applied . I will cover all modules except for AI and intelligent management which i shall be discussed in a separate paper .

Lack of automation in legacy telco networks is an important pain point that needs to be addressed promptly in the future networks. It will not only enable CSPs to avoid the toil of repetitive tasks but also allow them to reduce risks of man-made mistakes.

In order to address the challenges highlighted above it is vital to  develop an agile operations models that improves customer experience , optimize CAPEX , AI operations and Business process transformation

Such a strategic vision will be built on an agile operations model that can fulfill the following:

  • Efficiency and Intelligent Operation:  Telecom efficiency is based on data driven architectures using AI and providing actionable information and automation, Self-Healing Network capability and automation of network and as follows
    •  Task Automation & Established foundation
    •  Proactive Operation & Advanced Operation
    •  Machine managed & intelligent Operation.
  • Service Assurance: Building a service assurance framework to achieve an automated Network Surveillance, Service Quality Monitoring, Fault Management, Preventive Management and Performance Management to ensure close loop feedback control for delivery of zero touch incident handling.
  • Operations Support: Building a support framework to achieve automated operation acceptance, change & configuration management.

Phased approach to achieve Operational transformation

Based on the field experience we achieved with our partners and customers through Telecom transformation we can summarize the learning as follows

  • People transformation: Transforming teams and workforce that matches the DevOps concept to streamline organization and hence to deliver services in an agile and efficient manner. This is vital because 5G , Cloud and DevOps is a journey of experience not deployment of solutions , start quickly to embark the digital journey
  • Business Process transformation: Working together with its partners for unification, simplification and digitization of end-to-end processes. The new process will enable Telco’s  to quickly adapt the network to offer new products and to reduce time for troubleshooting.
  • Infrastructure transformation: Running services on digital platforms and cloud, matching a clear vision to swap the legacy infrastructure.

If PNF to VNF/CNF migration is vital the Hybrid Network management is critical

  • Automation and tools: Operations automation using tools like  workforce management, ticket management etc is vital but not support vision of full automation . The services migration to cloud will enable automated delivery of services across the whole life cycle. Programming teams should   join operations to start a journey where the network can be managed through power of software like Python, JAVA, GOLANG and YANG models. It will also enable test automation, a vision which will enable operations teams to validate any change before applying it to live network.

Having said this i hope it shall serve as a high level guide for architects adressing operational transformation , as we can see AI and Intelligent Managmeent is vital piece of it and i shall write on this soon .

Building Smart and Green Telecom Infrastructure using AI and Data

source: Total Telecom green summit

During last year industry has witnessed Telco’s increased spend and maturity in Cloud and Automation Platforms . During Pandemic it is proven that Digital and Cloud is the answer our customers require to design , build and Operate Future Telecom Networks .

The Second key Pillar forcing Telecom industry towards Autonomous networks is to deliver business outcomes while doing business responsibly .

Getting Business outcomes and doing a sustainable business that supports Green Vision has been a not related discussion in Telecom Industry

But now infusion of Data and Cloud is really enabling it , it is expected that we as industry can cutdown at least 50% of Power emissions in coming decade but how it will become possible . According to Pareto’s law the last 20% will be most difficult .

This is where my team main focus has been to build robust AI and Automation use cases that are intelligent enough and that solves broader issues . Today the biggest focus for ML/AI for Telco’s that can really put them lead such outcomes are

  1. Smart Capacity management
  2. O&M of networks that reduces emissions and improves availability
  3. Service assurance based on data

The biggest Challenge in Transformation is Fragmentation

The biggest bottleneck is making such outcomes is related to data . Intricately “Data” is both the problem and the Solution because of so many sources of truth and different ingestion mechanisms . Do check details on #Dell Streaming data platforms and how we are solving this problem

https://www.delltechnologies.com/en-au/storage/streaming-data-platform.htm

Today under the umbrella of Anuket , 3GPP , TMF and ITU are all collaborating to come a validated and composite solution to deliver those use cases . So in a nutshell it is vital to build a holistic and unified view to deliver data driven AI use cases

Scope and Scale of Intelligent automation

The biggest bottleneck is coming from the fact that in real world Telco Apps can never be fully cloud native , at some level both the state and resiliency requirements and App requirements has to be kept and to come with intelligent work load driven decisions . The decade long journey of Telecom Transformation has revealed that just building everything as a code and expecting it to work and Telco’s can rollback their NOC sizes simply not works .

This is where intelligence from layers above the Orchestration and SDN will be of help like google does in the Internet era .

The second biggest issue is in the Scalable Telco solutions itself , it is proven that Telco’s face unique challenges as they move from hundred’s to thousand of nodes . So imagine running AI for heterogenous environments each coming with different outcomes can never deliver power and scale Telco’s need in the new era .

Telco grade AIOPS models

It is true that with 5G and Business digital transformation the industry really want to ramp up to build an improved user experience and unified model to expand portfolio towards vertical markets as well , this is only possible if we can have a coordinated system , workflow management and data sharing and exposure with strict TSR security measures . Similarly this model should cover full LCM including FCAPS model .

Building Intelligent Telco’s

Although using AI and ML is an exciting ambition for a Telco still the bottom line is how to build these platforms on top of NFVI and Existing Orchestration and Automation frameworks . In other words really business case to build an intelligent networks starts with using Data and ML to automate the entire network . Although in this aspect the scope can extend not just to service domain but also to business domains i.e automate business process including event correlation , anomaly and RCA

Building a Unified AI Platform

Although this intention or target is clear however in context of networks this is complex as we need to solve challenge of data security , regulation as well as what it really means to do the certification of an AI platform because focus should be given that allow this layer to be built from solution from many vendors so a loose coupling with more focus on Network service and AI algorithms is a key to build this platform

Instead of focusing on network element certification focus of AI platform is service level compatibility , data models and AI algorithms

However lack of unified standard specially on trusted data normalization , sharing and exposure is certainly forcing operators to adopt a Be-Spoke solutions to build AI platforms and that itself is a big impediment to wide scale adoption of AI and ML in the Networks

To move forward more close collaboration between different standard bodies and governance by more Telco centric organization like TMF is the answer with immediate focus to be given to Data standardization , labs integration and to enable shared data sets and algorithms to evolve and support wide deployments of ML and AI in Telecom Networks

Latest Industry progress and standardization

Although this is the early time of AI platforms standardization still we need to aggregate the progress between different bodies lest we can only expect the plethora of silo solutions each with a different specifications

  • ONAP as baseline of automation platform has components like DCAE and AI engine that makes sense to make it the defacto baseline standard
  • Anuket is the Cloud Infrastructure reference and it has recently launched a new project “Thoth” to look in to AI network standardization
  • ETSI ZSM is E2E automation platform across full LCM of a Telecom network and certainly an important direction
  • ETSI ENI or enhanced network intelligence is another body that closely defines AI specifications in the context of Telecom
  • TMF as a broader Telecom body is defining architectures including ODA and AIOPS that really breaks down on how a Telco can take a phased approach to build these platforms

Above all early involvement and support from Telecom operators and partners is very important to realize this goal . I hope in this year we will see more success and standardization on these initiatives so lets work together and stay tuned .

How to Migrate applications to the cloud

Building #Cloud and #Applications on #Cloud is only half a journey , biggest issues surface as we traverse the last 10% which is #Migration and #Handoff to #Operations ?

Questions like #Coexistence ,#NewOnM models and #Processes reengineering required to address following challenges:
a. Theoretically zero downtime by building E2E Architecture e.g UPF pool
b. Delivery pipeline for migration through unified tools solving touching every NE one by one an improve TTM
c. Service consistency automatic verification between Cloud and Legacy
d. 5 9’s reliability and TSR gold standard security

Customers expect #partners who offer #Operational tools and services optimized for
1) E2E whole solution support ownership in MVI
2) SPOC support services
3) Tools and Platform services specially on PaaS and NFVO to co-develop
4) Cloud assurance services specially for #business readiness , #Transition , #TaaS ,#Solution emergency support and CSR

And Partner must work with #Telco to #remodel processes starting with
1) Incident handling for L1
2) Config management supporting multi layers
3) Change management with focus on scaling , SDN and Policy enforcement
4) Release management supporting #devops viz staging to prod

#Cloud#Migration#Operations#Services#Applications#Networking#5G#Edge

Understanding Openshift-4 installation for Developer and Lab Environments

As Linux is the defacto OS for innovation in the Datacenters sameway the OpenSHift is proving to be a Catalyst for both Enterprise and Telco’s Cloud transformation . In this blog i will like to share my experience with two environments one is minishift that is a home brew environment for developers and others based on Pre-existing infrastructure .

As you know Openshift is a cool platform as a part of these two modes it support a wide variety of deployment options including hosted platforms on

  • AWS
  • Google
  • Azure
  • IBM

However for hosted platforms we will use full installers with out any customization so this is simply not complex provided you must use only Redhat guide for deployment.

Avoid common Mistakes

  • As a pre requisite you must have a bastion host to be used as bootstrap node
  • Linux manifest NTP , registry ,key should be available while for Full installation the DNS is to be prepared before cloud installer kicks in .
  • Making ignition files on your own (Always use and generate manifest from installers)
  • FOr Pre-existing the Control plane is based on Core OS while workers can be RHel or COreOS while for full stack everything including workers must be based on CoreOS
  • Once started installation whole cluster must be spinned within 24hours otherwise you need to generate new keys before proceed as controller will stop ping as license keys have a 24hour validity
  • As per my experience most manifest for full stack installation is created by installers viz. Cluster Node instances , Cluster Networks and bootstrap nodes

Pain points in Openshift3 installation

Since most openshift installation is around complex Ansible Playbooks , roles and detailed Linux files configuration all the way from DNS , CSR etc so it was a dire need to make it simple and easy for customers and it is what RedHat has done by moving to Opinionated installation which make it simple to install with only high level information and later based on each environment the enterprise can scale as per needs for Day2 requirements , such a mode solves three fundamental issues

  • Installer customization needs (At least this was my experience in OCP3)
  • Full automation of environment
  • Implement CI/CD

Components of installation

There are two pieces you should know for OCP4 installation

Installer

Installers is a linux manifest coming from RedHat directly and need very less tuning and customization

Ignition Files

Ignition files are first bootstrap configs needed to configure both the bootstrap , control and compute nodes .If you have managed the Openstack platform before you know we need separate Kickstart and cloud-init files and in Ignition files process RedHat makes simple both steps . For details on Ignition process and cluster installation refer to nice stuff below

Minishift installation:

Pre-requisites:

Download the CDK (RedHat container development Kit) from below :
https://developers.redhat.com/products/cdk/hello-world/#fndtn-windows

  1. copy CDK in directory C:/users/Saad.Sheikh/minishift and in CMD go in that directory
  2. minishift setup-cdk
  3. It will create .minishift in your path C:/users/Saad.Sheikh
  4. set MINISHIFT_USERNAME=snasrullah.c
  5. minishift start –vm-driver virtualbox
  6. Add the directory containing oc.exe to your PATH
    1. FOR /f “tokens=*” %i IN (‘minishift oc-env’) DO @call %i
  7. minishift stop
  8. minishift start
  9. Below message will come just ignore it and enjoy
    error: dial tcp 192.168.99.100:8443: connectex: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. – verify you have provided the correct host and port and that the server is currently running.
    Could not set oc CLI context for ‘minishift’ profile: Error during setting ‘minishift’ as active profile: Unable to login to cluster
  10. oc login -u system:admin

The server is accessible via web console at:
https://192.168.99.100:8443/console

You are logged in as:
User: developer
Password:

To login as administrator:
oc login -u system:admin

Openshift installation based on onprem hosting

This mode is also known as UPI (User provided infrastructure) and it has the following the key steps for OCP full installation

Step1: run the redhat installer

Step2: Based on manifests build the ignition files for the bootstrap nodes

Step3: The control node boots and fetches information from the bootstrap server

Step4: The etcd provisioned on control node scales to 3 nodes to build a 3 control nore HA cluster

Finally the bootstrap node is depleted and removed

Following is the scripts i used to spin my OCP cluster

1#@Reboot the machine bootstrap during reboot go to PXE and install CoreOS

2#openshift-install --dir=./ocp4upi

3@rmeove the bootstrap IP's entries from /etc/haproxy/haproxy.cfg 
4# systemctl reload haproxy

5#set the kubeconfig ENV variables 
6# export kubeconfig=~/ocp4upi/auth/kubeconfig

7# verify the installation 
8# oc get pv
9# oc get nodes
10# oc get custeroperator

11#approve any CSR and certificates 
12# oc get csr -o go-template='{{range.items}}{{if no .status}}{{.metadata .name}}{{""\n""}}{{end}} | xargs oc adm certificate approve

13#login to OCP cluster GUI using 
https://localhost:8080

Do try it out and share your experience what you think about OCP4.6 installation .

Disclaimer: All commands and processes i validated in my home lab environment and you need tune and check your environment before apply as some tuning may be needed .

Evaluating Gaps and Solutions to build Open 5G Platforms and Capabilities

Since the release of much awaited 3GPP Release-16   in June last year lot of vendors have proliferated their products and brought their 5G SA A.K.A Standalone products to market and with promises like support of Slicing , massive IoT , uRLLC and improved , Edge capability ,NPN and IAB backhauling it is just natural all big Telco’s in APAC and globally have already started their journey towards 5G Standalone core . However, most of the commercial deployments are based on vendor E2E stack which is a good way to start journey and offer services quickly however with the type of services and versatility of solution specially on the industry verticals required and expected from both 3GPP Release16 and SA Core it is just a matter of time when one vendor cannot fulfill all the solutions and that is when a dire need to build a Telco grade Cloud platform will become a necessity.

During the last two years we have done a lot of work and progress in both better understanding of what will be the Cloud platforms  for 5G era , it is correct that as of now the 5G Core container platform  from open cloud perspective is not fully ready but we are also not too far from making it happen . From community Anuket  Kali that we are targeting in June is expecting to fulfill many gaps and our release cycle for XGVELA will try to close many gaps , so in a nutshell 2021 is the year where we expect a Production ready open cloud platforms avoiding all sorts of vendor lock ins .

Let’s try to understand top issues enlisted based on  5G SA deployments in Core and Edge  Vendors are mostly leveraging existing NFVI to evolve to CaaS by using a middle layer  shown Caas on Iaas , the biggest challenge is this interface is not open which means  there are many out of box enhancements done by each vendor and this is one classic case of “When open became the closed “

https://cntt-n.github.io/CNTT/doc/ref_model/chapters/chapter04.html

The most enhancement done on the adaptors for container images  are as follows

  • Provides container orchestration, deployment, and scheduling capabilities.
  • Provides container Telco  enhancement capabilities: Hugepage memory, shared memory, DPDK, CPU core binding, and isolation
  • Supports container network capabilities, SR-IOV+DPDK, and multiple network planes.
  • Supports the IP SAN storage capability of the VM container.
  1. Migration path from Caas on IaaS towards BMCaaS is not smooth and it will involve complete service deployment, it is true with most operators investing heavily in last few years to productionize the NFVi no body is really considering to empty pockets again to build purely CaaS new and stand-alone platform however smooth migration must be considered
  2. We are still in early phase of 5G SA core and eMBB is only use case so still we have not tested the scaling of 5G Core with NFVi based platforms
  3. ETSI Specs for CISM are not as mature as expected and again there are lot of out of box customizations done by each vendor VNFM to cater this.

Now lets come to point where the open platforms are lacking and how intend to fix it

Experience #1: 5G Outgoing traffic from PoD

The traditional Kubernetes and CaaS Platforms today handles and scales well with ingress controller however 5G PoD’s and containers outgoing traffic is not well addressed as both N-S and E-W traffic follows same path and it becomes an issue of scaling finally.

We know some vendors like Ericsson  who already bring products like ECFE and LB in their architecture to address these requirements.

Experience#2: Support for non-IP protocols

PoD is natively coming with IP and all external communication to be done by Cluster IP’s it means architecture is not designed for non-IP protocols like VLAN, L2TP, VLAN trunking

Experience#3: High performance workloads

Today all high data throughputs are supported CNI plugin’s which natively are like SR-IOV means totally passthrough, an Operator framework to enhance real time processing is required something we have done with DPDK in the open stack world

Experience#4: Integration of 5G SBI interfaces

The newly defined SBI interfaces became more like API compared to horizontal call flows, however today all http2/API integration is based on “Primary interfaces” .

It becomes a clear issue as secondary interfaces for inter functional module is not supported

Experience#5: Multihoming for SCTP and SI is not supported

For hybrid node connectivity at least towards egress and external networks still require a SCTP link and/or SIP endpoints which is not well supported

Experience#6: Secondary interfaces for CNF’s

Secondary interfaces raise concerns for both inter-operability, monitoring and O&M, secondary interfaces is very important concept in K8S and 5G CNF’s as it is needed during

  • For all Telecom protocols e.g BGP
  • Support for Operator frameworks (CRD’s)
  • Performance scenarios like CNI’s for SR-IOV

today only viable solution is by NSM i.e service mesh that solves both management and monitoring issues

Experience#7: Platform Networking Issues in 5G

Today in commercial networks for internal networking most products are using Multus+VLAN while for internal based on Multus+VxLAN it requires separate planning for both underlay and overlay and that becomes an issue for large scale 5G SA Core Network

Similarly, top requirements for service in 5G Networks are

  • Network separation on each logical interface e.g VRF and each physical sub interface
  • Outgoing traffic from PoD
  • NAT and reverse proxy

Experience#8: Service Networking Issues in 5G

For primary networks we are relying on Calico +IPIP while for secondary network we are relying ion Multus

Experience#9: ETSI specs specially for BM CaaS

Still I believe the ETSI specs for CNF’s are lacking compared to others like 3GPP and that is enough to make a open solution move to a closed through adaptors and plugin’s something we already experienced during SDN introduction in the cloud networks today a rigorous updates are expected on

  • IFA038 which is container integration in MANO
  • IFA011 which is VNFD with container support
  • Sol-3 specs updated for the CIR (Container image registry) support

Experience#10: Duplication of features on NEF/NRM and Cloud platforms  

In the 5G new API ecosystem operators look at their network as a platform opening it to application developers. API exposure is fundamental to 5G as it is built into the architecture natively where applications can talk back to the network, command the network to provide better experience in applications however the NEF and similarly NRF service registry are also functions available on platforms. Today it looks a way is required to share responsibility for such integrations to avoid duplicates

Reference Architectures for the Standard 5G Platform and Capabilities

Cap#1: Solving Data Integration issues   

Real AI is the next most important thing for Telco’s as they evolve in their automation journey from conditional #automation to partial autonomy . However to make any fully functional use case will require first to solve #Data integration architecture as any real product to be successful with #AI in Telco will require to use Graph Databases and Process mining and both of it will based on assumption that all and valid data is there .

Cap#2: AI profiles for processing in Cloud Infra Hardware profiles    

With 5G networks relying more on robust mechanisms to ingest and use data of AI , it is very important to agree on hardware profiles that are powerful enough to deliver AI use cases to deliver complete AI pipe lines all the way from flash base to tensor flow along with analytics .  

Cap#3: OSS evolution that support data integration pipeline    

To evolve to future ENI architecture for use of AI in Telco and ZSM architecture for the closed loop to be based on standard data integration pipeline like proposed in ENI-0017 (Data Integration mechanisms)

Cap#4: Network characteristics      

A mature way to handle outgoing traffic and LB need to be included in Telco PaaS

Cap#5: Telco PaaS     

Based on experience with NFV it is clear that IaaS is not the Telco service delivery model and hence use cases like NFVPaaS has been in consideration for the early time of NFV . With CNF introduction that will require a more robust release times it is imperative and not optional to build a stable Telco PaaS that meet Telco requirements. As of today, the direction is to divide platform between general PaaS that will be part of standard cloud platform over release iterations while for specific requirements will be part of Telco PaaS.

The beauty of this architecture is no ensure the multi-vendor component selection between them. The key characteristics to be addressed are

Paas#6: Telco PaaS Tools    

The agreement on PaaS tools over the complete LCM , there is currently a survey running in the community to agree on this and this is an ongoing study

https://wiki.anuket.io/display/HOME/Joint+Anuket+and+XGVELA+PaaS+Survey

Paas#2: Telco PaaS Lawful interception

During recent integrations for NFV and CNF we still rely on Application layer LI characteristics as defined by ETSI and with open cloud layer ensuring the necessary LI requirements are available it is important that PaaS include this part through API’s

Paas#3: Telco PaaS Charging Characteristics

The resource consumption and reporting of real time resources is very important as with 5G and Edge we will evolve towards the Hybrid cloud  

Paas#4: Telco PaaS Topology management and service discovery

A single API end point to expose both the topology and services towards Application is the key requirement of Telco PaaS

Paas#5: Telco PaaS Security Hardening

With 5G and critical services security hardening has become more and more important, use of tools like Falco and Service mesh is important in this platform

Paas#6: Telco PaaS Tracing and Logging

Although monitoring is quite mature in Kubernetes and its Distros the tracing and logging is still need to be addressed. Today with tools like Jaeger and Kafka /EFK needs to be include in the Telco PaaS

Paas#7: Telco PaaS E2E DevOps

For IT workloads already the DevOps capability is provided by PaaS in a mature manner through both cloud and application tools but with enhancements required by Telco workloads it is important the end-to-end capability of DevOps is ensured. Today tools like Argo need to be considered and it need to be integrated with both the general PaaS and Telco PaaS

Paas#8: Packaging

Standard packages like VNFD which cover both Application and PaaS layer

Paas#8: Standardization of API’s

API standardization in ETSI fashion is the key requirement of NFV and Telco journey and it needs to be ensured in PaaS layer as well. For Telco PaaS it should cover VES , TMForum,3GPP , ETSI MANO etc . Community has made following workings to standardize this

  • TMF 641/640
  • 3GPP TS28.532 /531/ 541
  • IFA029 containers in NFV
  • ETSI FEAT17 which is Telco DevOps
  • ETSI TST10 /13 for API testing and verification  

Based on these features there is an ongoing effort with in the LFN XGVELA community and I hope more and more users, partners and vendors can join to define the Future Open 5G Platform

https://github.com/XGVela/XGVela/wiki/XGVela-Meeting-Logistics

Network Slicing and Automation for 5G (Rel-15+) – A RAN Episode

Figure 1. 5G network slices running on a common underlying multi-vendor and multi-access network. Each slice is independently managed and addresses a particular use case.
Courtesy of IEEE

Network Slicing is a great concept which has always been an attractive jargon for vendors who wish to bundle it with products to sell their products and solutions . However with the arrival of 3GPP Release16 and subsequent products arriving in market things are starting to change ,with so many solutions and requirements finding a novel slicing architecture that fits all is both technically complex and business wise not making lot of ROI sense . Today we will try to analyze and answer the latest progress and directions to solve this dilemma

Slicing top challenges

Based on our recent work in GSMA and 3GPP we believe below are the top questions both to evolve and proliferate slicing solutions

  • Can a Public Slicing solution fulfill vertical industry requirements
  • How to satisfy vertical industry that Slicing solution can fulfil their needs like data sovereignty , SLA , Security , Performance
  • Automation and Intelligence , can a public slicing solution flexible enough to provide all intelligence for each industry
  • Slicing for cases of 5G Infra sharing

Solution baseline principles

When we view Slicing or any tenant provisioning solution it is very important as E2E all layers including business fulfillment , network abstraction and Infrastructure including wireless adhere to the same set of principles .

This image has an empty alt attribute; its file name is image.png

A nice description of it can be found in 3GPP TS28.553 about management and orchestration for Network slicing and 3GPP TS28.554 KPI for 5G solutions and slicing . In summary once we take the systems view for Network Slicing the principles can be summarized to following

  • Slice Demarcation: A way to isolate each tenant and a possibility to offer different features of slicing bundle to different tenants , for example a Large enterprise with 10 features and 20 SLA while for small businesses 5 features and 5 SLA will do
  • Performance: A way to build a highly performant system , the postulate is once we engineer and orchestrate it will it work E2E
  • Observability : With 4B+ devices added every year and with industry setting a futuristic target of a Million Private networks by 2025 its just a pressing issue how to observe and handle such networks in real time

I think when we talk about Slicing mostly we speak about key Technology enablers like NFV , Cloud , MEC , SDN which is obviously great since a software of Network and Infra is vital . However not speaking about RAN #wireless and WNV (Wireless Network virtualization) is not a just . In this paper i just want to shed some light from RAN perspective , consider the fact still today around 65% of customers CAPEX/OPEX pumping in RAN and Transport it is vital to see this view for both conformant and realistic solution . if NFV/SDN/Cloud demands sharing among few 100’s tenants the RAN demands sharing among Million so resource sharing , utilization and optimization is vital

RAN Architecture

From E2E perspective the RAN part of slice is selected based on GST and NSSAI which is done by UE or the Core Network however its easier said than done when we need to view E2E Slicing following should be considered to build a scalable slicing solution

RAN#1: Spectrum and Power resources

The massive requirements for business towards services and slices require a highly efficient Radio resources , luckily low,mid and high bands combined with Massive Mimo is handling this part however not just spectrum and how to utilize this in efficient manner in form of form factor and power is vital .

When we need view RAN view of Slicing its not just the Spectrum it self or RF signal but also the Spectrum management like Macro , Femto and Het Nets including Open cellular . In summary still this part we are not able to understand well as it require some novel algorithms like MINLP (mixed integer non linear) programming which focus to optimize cost while increase resource usage at same time . As per latest trend a tiered RAN architecture combined by new algo like game matching through ML/AI is the answer to standardize this

RAN#2: RAN Dis-aggregation

Just like how NFV/SDN and Orchestration did for Core similarly Open-RAN and RIC (RAN intelligence controller ) will do for RAN . If you want to know more may be you need check author’s writeup about RAN evolution

RAN#3 RAN resource optimize

Based on our Field trials we find the use of Edge and MEC with RAN and specially for CDN will save around 25% of resources , the RAN caching is vital combined with LBO( Local break out) will help Telco’s fulfill the very pressing requirements from verticals . Again this is not just a cloudlet and software issue as different RAN architectures require a different approach like D2D RAN solution , Het Net and macro etc

RAN#4 Mid Haul optimize

Mid haul and Back haul capacity optimization is vital for slicing delivery and today this domain is still in a R&D funnel . A TIP project CANDI Converged Architectures for Network Disaggregation & Integration is some how evolving to understand this requirement

RAN#5 Edge Cost model

Edge solution for Slicing in context of RAN is cost model problem e.g how many MEC servers and location and it can relieve RAN and RF layer processing is the key , our latest work with Telco Edge cloud with different models for different site configuration is the answer

RAN#6 Isolation , elasticity and resource limitation

This is the most important issue for RAN slicing primarily due the the fact that they are different conflicting dimensions viz. extra resource isolation may make impossible to share resources and will limit services during peak and critical times , similarly much elasticity will make isolation and separation practically impossible , solutions for matching algorithms is the answer as it will help to build a RAN system which is not only less complex but also highly conformant . This is a make and break for RAN architecture for slicing

RAN#7 RAN infrastructure sharing for 5G

Today already the Infra sharing has started between ig players in Europe , the one questions that comes what about if a use purchase a slice and service from a tenant ,consider a whole sale view where the Infra is processed by sharing and bundling of resources from all national carriers due to reason that obviously the 5G infra from single operator is not sufficient from both coverage and capacity perspective

RAN#8 RAN Resource RAGF problem

In case of service mobility or congestion how UE can access the resources quickly may be in other sector or sites

RAN#9 Slice SLA

SLA of slices and its real time monitoring is the key requirements of business , however imagine a situation where shortage of shared resource pool make impossible to deliver the SLA

RAN#10 Slice Operations

Slice operations is not just the view of BSS and Operations as real time RAN resource usage and optimization is necessary , Have you ever thought how perfectly managed slice can exist with normal Telco service specially when you find there is a key event and many users will use service . I think this so some dimension still not well addressed . I have no hesitate to say when many CXO’s of enterprise convince them they should opt to build their own 5G private network this is exactly the problem they fear .

Summary

In today’s writeup i have tried to explain both the current progress, challenges and steps to build a successful slicing solution keeping the hat of a RAN architect , i believe this is very important to see the Radio view point which somehow i firmly believe has not gotten its due respect and attention in both standard bodies and by vendors , in my coming blog i shall summarize some key gaps and how we can approach it as still the slicing products and solutions are not carrier grade and it need further tuning to ensure E2E slicing and services fulfillment .

5G Site Solutions Disaggregation using Open RAN

According to Latest Market insights the RAN innovation for Telecom Lags behind others initiative by 7years which means call for more innovative and Disruptive delivery models for the Site solutions specially for next Wave of 5G Solutions .

However to reach the goal of fully distribute and Open RAN there needs to build a pragmatic view of brown fields and finding the Sweet Spot for its introduction and wide adoption .

Here are my latest thoughts on this and how Telecom Operators should adopt it . There is a still a time for industry wide adoption of Open RAN but as yo will find time to act is now .

What you will

What you will learn

  • Building Delivery Models for Open RAN in a brownfield
  • Understand what,when and how of Open RAN
  • What is Open RAN and its relation with 5G
  • Current Industry solutions
  • Define phases of Open RAN delivery
  • Present and Next Steps 5. Architecture and State of Play