“Industry and Experience based knowledge sharing about Cloud ,5G ,Edge,O-RAN,Network Dis-aggregation and Industry 4.0 Transformation ~ A Cloud Architect's Journey
As of Q1 2022 for 5G adoption we have just passed 600M mark and expected to hit 2.5B connections by 2025 that is almost 500M every year . If we combine IoT and Device ecosystem the scale can go horrendously big .One of the biggest advantage and also challenge that comes from this once in a life time opportunity is “scale” .
Simply putting in to context “automation” and using ML/AI is a must to achieve both Network SLA’s ,efficiency and Optimizing Network TCO
AI has potential in creating value in terms of enhanced workload availability and improved performance and efficiency for 5G and Telco Cloud . However the biggest problem when it comes to use “AI” and Machine learning Telco’s is “Data” and “data Models” because simply there is no standardization or model definition on how Telco systems including Infrastructure expose the “Information” to upper layers Since data sets are huge in this domain with n permutations therefore first step to normalize is the Use case driven normalization of data that can be consumed both by Network and Data science domains . This will enable to develop a future Telco that can detect and also self heal itself .
Understanding Data Integration Architecture
Considering the 5G architecture which is based on Open API and Horizontal services design A.K.A SBA the Data integration and using AI should be an easy problem that can be divided in following to define a pipeline
Telemetry and data
Each layer data exposure as an API starting from Baremetal and then extending upwards towards Cloud , SDN , NFVO , Assurance etc
Data models and engines to disseminate information
However it is easier said than done because of many reasons including
What will be key data sets
how FCAPS of each layer can be dis-aggregated i.e dropping one layer data without confirming dependency is a kill
Business Architecture
In order to address this we need to understand and gain experience from other industries and SDO’s and to see how it can both be agreed and integrated in Telco Networks , this lead us to approach this as a use case driven approach and select those domains and business challenges that can deliver quick results
"Follow the Money to deliver use cases that can monetize 5G
We have analyzed lot of use cases both from academia and industry and compiled a complete list here
From this we infer there are just too many ways Telco’s are solving same problems and this is what make us understand that there should be clear definition of “data Models” and use cases that should be defined at first steps .
The most important of which are :
Using Machine Learning to Detect Noisy Neighbors in 5G Networks.
2.Towards Black-Box Anomaly Detection in Virtual Network Functions
3. Causality Inference for Failure in NFV
4. Self Adaptive Deep Learning Based System for Anomaly Detection in 5G
5. Correlating multiple Events and Data in an Ethernet Network
This leads us to define following as first steps for AI and Intelligence as applied to Telco’s
Source: LFN acumos
Analysis
Data Lakes , Log analysis and correlation
Detection
Anomaly detection including pattern detection , trend and Multi layer correlation
Prediction
Intelligent prediction including capacity ,SLA , Scaling and Cloud KPIs
Generation
Measure data and Synthetize it using frameworks like eBPF
Data Monetization is first to make 5G Profitable
Adressing both the Data Architecture and Business Architecture is vital as different Telco’s including in cases different BU’s in same Customer take it differently and what makes it worst is manipulate and store data lakes using different forms i.e Infrastructure metrics , Agents , Databases which is hard to apply between different data sets and hence it is biggest issue to Monetize one key assets of 5G which is “data” and hence to define a pipeline that can be shared between all of tenants including vertical industry
As said before we are defining few key use cases in LFN project “Thoth” to learn and elaborate from there applying concepts of Events , Anomaly and Prediction across layers and first phase use cases are
Growing business in 5G era largely depends on ecosystem enablement and on idea that Telco’s can build a future Infrastructure that can deliver business outcomes for not only traditional Telco customers but also for broader verticals may it be Manufacturing ,Mining, Retail , Finance , Public safety , Tourism or eventually anything .
This means “Programmable” and “Automated” infrastructure is the base to achieve any such business outcome . Applying this to Telco’s 5G and Transformation journey will mean both “Network Slicing” and the “Private Networks” and although i totally agree with idea that both will co-exist and proliferate but to be fair it is a fact that although Network slicing has delivered outcomes in Labs and Demos it has still number of challenges when apply to practice .
Recently i have heard many views from many reputable names including my friend Dean Bubley and Karim so i thought to share my views on this topic highlighting some work we have been doing with our partners and customers in APAC as well in the GSMA and to allude to some improvements we have achieved in last year since i shared my views with industry on Network Slicing and its Delivery .
Today Network slicing has been live in a number of customers including Singtel that has achieved substantial outcomes with Slicing however the bigger challenge still remain un-answered
How will Network slicing address RAN resources ?
How will Network Slicing can help to monetize low hanging fruits of Edge together with Telco domain slicing
In 3GPP Release17 we have been doing some exciting progress on later with a new architecture and API exposure for co-deployment of Edge with RAN but again prior to this we need to extend the Slicing towards the Access Networks both from Technology and Business Architecture Perspective and this is what i will share in this paper
Experience Learnt from 5G Networks rollout
As many Telco’s in 2021+ accelerated 5G rollout and built 5G SA Core Networks one think proved more than before that limitation on
RAN resources are always scarce
AI need to be enabled to intelligently modify slicing in real time
Spectrum and RAN layers will be a top pressings time for Telco’s to deliver value
RAN resource isolation must follow performance: cost baseline
How to handle resources in peak time and pre-empt some over others is vital
Regulatory and GDPR is vital to achieve anything big in this domain
Orchestration must precede Network Slicing
From Above experience we can infer that it is really not about Network slicing but rather “Open” , “Control” and “API” to enable End to end Network slicing LCM and Orchestration all the way from UE to RAN to Edge to Core to Cloud
Dynamic control of resources with Telco level visibility in Key
RAN automation is first step to be done before Slicing change RAN resources
Cloud operations model is vital to support Network slicing because although there are many business verticals the Telco’s really have to build an efficient and Multi tenant operational model to win it
Cloud Operations Model that is secure and Multi tenant must be enabled across all Telecom Infrastructure
RAN SLA’s for vertical industry
The notion of Network slicing still lies in selling a SLA vs Selling a Network .
First of all RAN resources are always limited and secondly each vertical industry has its own traffic profile and trajectory which can never be planned using old Telecom simulation tools it means dynamic learning and resource adjustment is key . This all alludes to the fact that changing network while ensuring network KQI remain intact is something that require Full visibility and programmable Control
It leads us to consider following architecture first before Slicing is full enabled for the RAN
Slice LCM must be supported by automatic Infrastructure that is elastic and Telco grade at the same time
Scale out architecture must be enabled in RAN
RF and Spectrum resource scheduling is the most expensive and intricate resources for services and we must enable their dynamic control
Intelligent Networking ML/AI must be enabled first
Automation can deliver a myriad of outcomes including better control , real time changes , optimization , compliance and FCAPS for each tenant however it is not sufficient
Intent driven networks that uses power of data , ML and AI to orchestrate and adapt network is a capability that should be enabled on a network scale before network slicing can deliver a business outcome
It leads us to consider following architecture first before Slicing is full enabled for the RAN
Slice LCM must be supported by automatic Infrastructure that is elastic and Telco grade at the same time
Scale out architecture must be enabled in RAN
RF and Spectrum resource scheduling is the most expensive and intricate resources for services and we must enable their dynamic control
Components of a RAN Slice
Although the Core Slicing capability still exists on OSS and SMO layers that are outside the RAN still the real power of Slicing will come as we address the RT capability of RAN slicing which enables us to deliver following for a business tenant
RRM
Connection management
MM
Spectrum layers
All of this must be available to package as a NSSF functional instance as alluded below
Partnerships and Ecosystem
According to the latest GSMA report one use case enablement for any vertical will require at least 7 Players to work together , so RAN slicing or in other words Slicing Business outcomes is not a matter of one body or business to solve . Today to incrementally deliver the business outcomes following are key organizations collaborating to adress those challenges
ETSI NFV
GSMA
MEF
IETF
O-RAN and TIP
ONAP
5GAA , ZVEI etc
We are also taking an aggregation approach where we are summing all the knowledge from these bodies and deliver as a outcome for our customers . you can reach out for more details .
As CSPs continue to evolve to be a digital player they are facing some new challenges like size and traffic requirements are increasing at an exponential pace, the networks which were previously only serving telco workloads are now required to be open for a range of business, industrial and services verticals. These factors necessitate the CSPs to revamp their operations model that is digital, automated, efficient and above all services driven. Similarly, the future operations should support innovation rather than relying on offerings from existing vendor operations models, tool and capabilities.
As CSPs will require to operate and manage both the legacy and new digital platforms during the migrations phase hence it is also imperative that operations have a clear transition strategy and processes that can meet both PNF and VNF service requirements with optimum synergy where possible.
In work done by our team with our customers specifically in APAC , future network should address the following challenges for its operations transformations.
Fault Management: Fault management in the digital era is more complex as there are no dedicated infrastructure for the applications. The question therefore arises how to demarcate the fault and corelate cross layer faults to improve O&M troubleshooting of ICT services.
Service Assurance: The future operations model requires being digital in nature with minimum manual intervention, fully aligned with ZTM (Zero Touch Management) and data driven using principles of closed loop feedback control.
Competency: To match operation requirements of future digital networks the skills of engineers and designers will play a pivotal role in defining and evolving to future networks. The new roles will require network engineers to be more technologist rather than technicians
IT Technology: IT technology including skills in data centers, cloud and shared resources will be vital to operate the network. Operation teams need to understand impacts of scaling elasticity and network healing on operational services
TM Forum ODA A.K.A Open Digital Architecture is a perfect place to start but since it is just an architecture and can lead to different implementation and application architecture so below i will try to share how in real brown field networks it is being applied . I will cover all modules except for AI and intelligent management which i shall be discussed in a separate paper .
Lack of automation in legacy telco networks is an important pain point that needs to be addressed promptly in the future networks. It will not only enable CSPs to avoid the toil of repetitive tasks but also allow them to reduce risks of man-made mistakes.
In order to address the challenges highlighted above it is vital to develop an agile operations models that improves customer experience , optimize CAPEX , AI operations and Business process transformation
Such a strategic vision will be built on an agile operations model that can fulfill the following:
Efficiency and Intelligent Operation: Telecom efficiency is based on data driven architectures using AI and providing actionable information and automation, Self-Healing Network capability and automation of network and as follows
Task Automation & Established foundation
Proactive Operation & Advanced Operation
Machine managed & intelligent Operation.
Service Assurance: Building a service assurance framework to achieve an automated Network Surveillance, Service Quality Monitoring, Fault Management, Preventive Management and Performance Management to ensure close loop feedback control for delivery of zero touch incident handling.
Operations Support: Building a support framework to achieve automated operation acceptance, change & configuration management.
Based on the field experience we achieved with our partners and customers through Telecom transformation we can summarize the learning as follows
People transformation: Transforming teams and workforce that matches the DevOps concept to streamline organization and hence to deliver services in an agile and efficient manner. This is vital because 5G , Cloud and DevOps is a journey of experience not deployment of solutions , start quickly to embark the digital journey
Business Process transformation: Working together with its partners for unification, simplification and digitization of end-to-end processes. The new process will enable Telco’s to quickly adapt the network to offer new products and to reduce time for troubleshooting.
Infrastructure transformation: Running services on digital platforms and cloud, matching a clear vision to swap the legacy infrastructure.
If PNF to VNF/CNF migration is vital the Hybrid Network management is critical
Automation and tools: Operations automation using tools like workforce management, ticket management etc is vital but not support vision of full automation . The services migration to cloud will enable automated delivery of services across the whole life cycle. Programming teams should join operations to start a journey where the network can be managed through power of software like Python, JAVA, GOLANG and YANG models. It will also enable test automation, a vision which will enable operations teams to validate any change before applying it to live network.
Having said this i hope it shall serve as a high level guide for architects adressing operational transformation , as we can see AI and Intelligent Managmeent is vital piece of it and i shall write on this soon .
During last year industry has witnessed Telco’s increased spend and maturity in Cloud and Automation Platforms . During Pandemic it is proven that Digital and Cloud is the answer our customers require to design , build and Operate Future Telecom Networks .
The Second key Pillar forcing Telecom industry towards Autonomous networks is to deliver business outcomes while doing business responsibly .
Getting Business outcomes and doing a sustainable business that supports Green Vision has been a not related discussion in Telecom Industry
But now infusion of Data and Cloud is really enabling it , it is expected that we as industry can cutdown at least 50% of Power emissions in coming decade but how it will become possible . According to Pareto’s law the last 20% will be most difficult .
This is where my team main focus has been to build robust AI and Automation use cases that are intelligent enough and that solves broader issues . Today the biggest focus for ML/AI for Telco’s that can really put them lead such outcomes are
Smart Capacity management
O&M of networks that reduces emissions and improves availability
Service assurance based on data
The biggest Challenge in Transformation is Fragmentation
The biggest bottleneck is making such outcomes is related to data . Intricately “Data” is both the problem and the Solution because of so many sources of truth and different ingestion mechanisms . Do check details on #Dell Streaming data platforms and how we are solving this problem
Today under the umbrella of Anuket , 3GPP , TMF and ITU are all collaborating to come a validated and composite solution to deliver those use cases . So in a nutshell it is vital to build a holistic and unified view to deliver data driven AI use cases
Scope and Scale of Intelligent automation
The biggest bottleneck is coming from the fact that in real world Telco Apps can never be fully cloud native , at some level both the state and resiliency requirements and App requirements has to be kept and to come with intelligent work load driven decisions . The decade long journey of Telecom Transformation has revealed that just building everything as a code and expecting it to work and Telco’s can rollback their NOC sizes simply not works .
This is where intelligence from layers above the Orchestration and SDN will be of help like google does in the Internet era .
The second biggest issue is in the Scalable Telco solutions itself , it is proven that Telco’s face unique challenges as they move from hundred’s to thousand of nodes . So imagine running AI for heterogenous environments each coming with different outcomes can never deliver power and scale Telco’s need in the new era .
Telco grade AIOPS models
It is true that with 5G and Business digital transformation the industry really want to ramp up to build an improved user experience and unified model to expand portfolio towards vertical markets as well , this is only possible if we can have a coordinated system , workflow management and data sharing and exposure with strict TSR security measures . Similarly this model should cover full LCM including FCAPS model .
Building Intelligent Telco’s
Although using AI and ML is an exciting ambition for a Telco still the bottom line is how to build these platforms on top of NFVI and Existing Orchestration and Automation frameworks . In other words really business case to build an intelligent networks starts with using Data and ML to automate the entire network . Although in this aspect the scope can extend not just to service domain but also to business domains i.e automate business process including event correlation , anomaly and RCA
Building a Unified AI Platform
Although this intention or target is clear however in context of networks this is complex as we need to solve challenge of data security , regulation as well as what it really means to do the certification of an AI platform because focus should be given that allow this layer to be built from solution from many vendors so a loose coupling with more focus on Network service and AI algorithms is a key to build this platform
Instead of focusing on network element certification focus of AI platform is service level compatibility , data models and AI algorithms
However lack of unified standard specially on trusted data normalization , sharing and exposure is certainly forcing operators to adopt a Be-Spoke solutions to build AI platforms and that itself is a big impediment to wide scale adoption of AI and ML in the Networks
To move forward more close collaboration between different standard bodies and governance by more Telco centric organization like TMF is the answer with immediate focus to be given to Data standardization , labs integration and to enable shared data sets and algorithms to evolve and support wide deployments of ML and AI in Telecom Networks
Latest Industry progress and standardization
Although this is the early time of AI platforms standardization still we need to aggregate the progress between different bodies lest we can only expect the plethora of silo solutions each with a different specifications
ONAP as baseline of automation platform has components like DCAE and AI engine that makes sense to make it the defacto baseline standard
Anuket is the Cloud Infrastructure reference and it has recently launched a new project “Thoth” to look in to AI network standardization
ETSI ZSM is E2E automation platform across full LCM of a Telecom network and certainly an important direction
ETSI ENI or enhanced network intelligence is another body that closely defines AI specifications in the context of Telecom
TMF as a broader Telecom body is defining architectures including ODA and AIOPS that really breaks down on how a Telco can take a phased approach to build these platforms
Above all early involvement and support from Telecom operators and partners is very important to realize this goal . I hope in this year we will see more success and standardization on these initiatives so lets work together and stay tuned .
Questions like #Coexistence ,#NewOnM models and #Processes reengineering required to address following challenges: a. Theoretically zero downtime by building E2E Architecture e.g UPF pool b. Delivery pipeline for migration through unified tools solving touching every NE one by one an improve TTM c. Service consistency automatic verification between Cloud and Legacy d. 5 9’s reliability and TSR gold standard security
Customers expect #partners who offer #Operational tools and services optimized for 1) E2E whole solution support ownership in MVI 2) SPOC support services 3) Tools and Platform services specially on PaaS and NFVO to co-develop 4) Cloud assurance services specially for #business readiness , #Transition , #TaaS ,#Solution emergency support and CSR
And Partner must work with #Telco to #remodel processes starting with 1) Incident handling for L1 2) Config management supporting multi layers 3) Change management with focus on scaling , SDN and Policy enforcement 4) Release management supporting #devops viz staging to prod
As Linux is the defacto OS for innovation in the Datacenters sameway the OpenSHift is proving to be a Catalyst for both Enterprise and Telco’s Cloud transformation . In this blog i will like to share my experience with two environments one is minishift that is a home brew environment for developers and others based on Pre-existing infrastructure .
As you know Openshift is a cool platform as a part of these two modes it support a wide variety of deployment options including hosted platforms on
AWS
Google
Azure
IBM
However for hosted platforms we will use full installers with out any customization so this is simply not complex provided you must use only Redhat guide for deployment.
Avoid common Mistakes
As a pre requisite you must have a bastion host to be used as bootstrap node
Linux manifest NTP , registry ,key should be available while for Full installation the DNS is to be prepared before cloud installer kicks in .
Making ignition files on your own (Always use and generate manifest from installers)
FOr Pre-existing the Control plane is based on Core OS while workers can be RHel or COreOS while for full stack everything including workers must be based on CoreOS
Once started installation whole cluster must be spinned within 24hours otherwise you need to generate new keys before proceed as controller will stop ping as license keys have a 24hour validity
As per my experience most manifest for full stack installation is created by installers viz. Cluster Node instances , Cluster Networks and bootstrap nodes
Pain points in Openshift3 installation
Since most openshift installation is around complex Ansible Playbooks , roles and detailed Linux files configuration all the way from DNS , CSR etc so it was a dire need to make it simple and easy for customers and it is what RedHat has done by moving to Opinionated installation which make it simple to install with only high level information and later based on each environment the enterprise can scale as per needs for Day2 requirements , such a mode solves three fundamental issues
Installer customization needs (At least this was my experience in OCP3)
Full automation of environment
Implement CI/CD
Components of installation
There are two pieces you should know for OCP4 installation
Installer
Installers is a linux manifest coming from RedHat directly and need very less tuning and customization
Ignition Files
Ignition files are first bootstrap configs needed to configure both the bootstrap , control and compute nodes .If you have managed the Openstack platform before you know we need separate Kickstart and cloud-init files and in Ignition files process RedHat makes simple both steps . For details on Ignition process and cluster installation refer to nice stuff below
copy CDK in directory C:/users/Saad.Sheikh/minishift and in CMD go in that directory
minishift setup-cdk
It will create .minishift in your path C:/users/Saad.Sheikh
set MINISHIFT_USERNAME=snasrullah.c
minishift start –vm-driver virtualbox
Add the directory containing oc.exe to your PATH
FOR /f “tokens=*” %i IN (‘minishift oc-env’) DO @call %i
minishift stop
minishift start
Below message will come just ignore it and enjoy error: dial tcp 192.168.99.100:8443: connectex: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. – verify you have provided the correct host and port and that the server is currently running. Could not set oc CLI context for ‘minishift’ profile: Error during setting ‘minishift’ as active profile: Unable to login to cluster
To login as administrator: oc login -u system:admin
Openshift installation based on onprem hosting
This mode is also known as UPI (User provided infrastructure) and it has the following the key steps for OCP full installation
Step1: run the redhat installer
Step2: Based on manifests build the ignition files for the bootstrap nodes
Step3: The control node boots and fetches information from the bootstrap server
Step4: The etcd provisioned on control node scales to 3 nodes to build a 3 control nore HA cluster
Finally the bootstrap node is depleted and removed
Following is the scripts i used to spin my OCP cluster
1#@Reboot the machine bootstrap during reboot go to PXE and install CoreOS
2#openshift-install --dir=./ocp4upi
3@rmeove the bootstrap IP's entries from /etc/haproxy/haproxy.cfg
4# systemctl reload haproxy
5#set the kubeconfig ENV variables
6# export kubeconfig=~/ocp4upi/auth/kubeconfig
7# verify the installation
8# oc get pv
9# oc get nodes
10# oc get custeroperator
11#approve any CSR and certificates
12# oc get csr -o go-template='{{range.items}}{{if no .status}}{{.metadata .name}}{{""\n""}}{{end}} | xargs oc adm certificate approve
13#login to OCP cluster GUI using
https://localhost:8080
Do try it out and share your experience what you think about OCP4.6 installation .
Disclaimer: All commands and processes i validated in my home lab environment and you need tune and check your environment before apply as some tuning may be needed .
Since the release of much awaited 3GPP Release-16 in June last year lot of vendors have proliferated their products and brought their 5G SA A.K.A Standalone products to market and with promises like support of Slicing , massive IoT , uRLLC and improved , Edge capability ,NPN and IAB backhauling it is just natural all big Telco’s in APAC and globally have already started their journey towards 5G Standalone core . However, most of the commercial deployments are based on vendor E2E stack which is a good way to start journey and offer services quickly however with the type of services and versatility of solution specially on the industry verticals required and expected from both 3GPP Release16 and SA Core it is just a matter of time when one vendor cannot fulfill all the solutions and that is when a dire need to build a Telco grade Cloud platform will become a necessity.
During the last two years we have done a lot of work and progress in both better understanding of what will be the Cloud platforms for 5G era , it is correct that as of now the 5G Core container platform from open cloud perspective is not fully ready but we are also not too far from making it happen . From community Anuket Kali that we are targeting in June is expecting to fulfill many gaps and our release cycle for XGVELA will try to close many gaps , so in a nutshell 2021 is the year where we expect a Production ready open cloud platforms avoiding all sorts of vendor lock ins .
Let’s try to understand top issues enlisted based on 5G SA deployments in Core and Edge Vendors are mostly leveraging existing NFVI to evolve to CaaS by using a middle layer shown Caas on Iaas , the biggest challenge is this interface is not open which means there are many out of box enhancements done by each vendor and this is one classic case of “When open became the closed “
The most enhancement done on the adaptors for container images are as follows
Provides container orchestration, deployment, and scheduling capabilities.
Provides container Telco enhancement capabilities: Hugepage memory, shared memory, DPDK, CPU core binding, and isolation
Supports container network capabilities, SR-IOV+DPDK, and multiple network planes.
Supports the IP SAN storage capability of the VM container.
Migration path from Caas on IaaS towards BMCaaS is not smooth and it will involve complete service deployment, it is true with most operators investing heavily in last few years to productionize the NFVi no body is really considering to empty pockets again to build purely CaaS new and stand-alone platform however smooth migration must be considered
We are still in early phase of 5G SA core and eMBB is only use case so still we have not tested the scaling of 5G Core with NFVi based platforms
ETSI Specs for CISM are not as mature as expected and again there are lot of out of box customizations done by each vendor VNFM to cater this.
Now lets come to point where the open platforms are lacking and how intend to fix it
Experience #1: 5G Outgoing traffic from PoD
The traditional Kubernetes and CaaS Platforms today handles and scales well with ingress controller however 5G PoD’s and containers outgoing traffic is not well addressed as both N-S and E-W traffic follows same path and it becomes an issue of scaling finally.
We know some vendors like Ericsson who already bring products like ECFE and LB in their architecture to address these requirements.
Experience#2: Support for non-IP protocols
PoD is natively coming with IP and all external communication to be done by Cluster IP’s it means architecture is not designed for non-IP protocols like VLAN, L2TP, VLAN trunking
Experience#3: High performance workloads
Today all high data throughputs are supported CNI plugin’s which natively are like SR-IOV means totally passthrough, an Operator framework to enhance real time processing is required something we have done with DPDK in the open stack world
Experience#4: Integration of 5G SBI interfaces
The newly defined SBI interfaces became more like API compared to horizontal call flows, however today all http2/API integration is based on “Primary interfaces” .
It becomes a clear issue as secondary interfaces for inter functional module is not supported
Experience#5: Multihoming for SCTP and SI is not supported
For hybrid node connectivity at least towards egress and external networks still require a SCTP link and/or SIP endpoints which is not well supported
Experience#6: Secondary interfaces for CNF’s
Secondary interfaces raise concerns for both inter-operability, monitoring and O&M, secondary interfaces is very important concept in K8S and 5G CNF’s as it is needed during
For all Telecom protocols e.g BGP
Support for Operator frameworks (CRD’s)
Performance scenarios like CNI’s for SR-IOV
today only viable solution is by NSM i.e service mesh that solves both management and monitoring issues
Experience#7: Platform Networking Issues in 5G
Today in commercial networks for internal networking most products are using Multus+VLAN while for internal based on Multus+VxLAN it requires separate planning for both underlay and overlay and that becomes an issue for large scale 5G SA Core Network
Similarly, top requirements for service in 5G Networks are
Network separation on each logical interface e.g VRF and each physical sub interface
Outgoing traffic from PoD
NAT and reverse proxy
Experience#8: Service Networking Issues in 5G
For primary networks we are relying on Calico +IPIP while for secondary network we are relying ion Multus
Experience#9: ETSI specs specially for BM CaaS
Still I believe the ETSI specs for CNF’s are lacking compared to others like 3GPP and that is enough to make a open solution move to a closed through adaptors and plugin’s something we already experienced during SDN introduction in the cloud networks today a rigorous updates are expected on
IFA038 which is container integration in MANO
IFA011 which is VNFD with container support
Sol-3 specs updated for the CIR (Container image registry) support
Experience#10: Duplication of features on NEF/NRM and Cloud platforms
In the 5G new API ecosystem operators look at their network as a platform opening it to application developers. API exposure is fundamental to 5G as it is built into the architecture natively where applications can talk back to the network, command the network to provide better experience in applications however the NEF and similarly NRF service registry are also functions available on platforms. Today it looks a way is required to share responsibility for such integrations to avoid duplicates
Reference Architectures for the Standard 5G Platform and Capabilities
Cap#1: Solving Data Integration issues
Real AI is the next most important thing for Telco’s as they evolve in their automation journey from conditional #automation to partial autonomy . However to make any fully functional use case will require first to solve #Data integration architecture as any real product to be successful with #AI in Telco will require to use Graph Databases and Process mining and both of it will based on assumption that all and valid data is there .
Cap#2: AI profiles for processing in Cloud Infra Hardware profiles
With 5G networks relying more on robust mechanisms to ingest and use data of AI , it is very important to agree on hardware profiles that are powerful enough to deliver AI use cases to deliver complete AI pipe lines all the way from flash base to tensor flow along with analytics .
Cap#3: OSS evolution that support data integration pipeline
To evolve to future ENI architecture for use of AI in Telco and ZSM architecture for the closed loop to be based on standard data integration pipeline like proposed in ENI-0017 (Data Integration mechanisms)
Cap#4: Network characteristics
A mature way to handle outgoing traffic and LB need to be included in Telco PaaS
Cap#5: Telco PaaS
Based on experience with NFV it is clear that IaaS is not the Telco service delivery model and hence use cases like NFVPaaS has been in consideration for the early time of NFV . With CNF introduction that will require a more robust release times it is imperative and not optional to build a stable Telco PaaS that meet Telco requirements. As of today, the direction is to divide platform between general PaaS that will be part of standard cloud platform over release iterations while for specific requirements will be part of Telco PaaS.
The beauty of this architecture is no ensure the multi-vendor component selection between them. The key characteristics to be addressed are
Paas#6: Telco PaaS Tools
The agreement on PaaS tools over the complete LCM , there is currently a survey running in the community to agree on this and this is an ongoing study
During recent integrations for NFV and CNF we still rely on Application layer LI characteristics as defined by ETSI and with open cloud layer ensuring the necessary LI requirements are available it is important that PaaS include this part through API’s
Paas#3: Telco PaaS Charging Characteristics
The resource consumption and reporting of real time resources is very important as with 5G and Edge we will evolve towards the Hybrid cloud
Paas#4: Telco PaaS Topology management and service discovery
A single API end point to expose both the topology and services towards Application is the key requirement of Telco PaaS
Paas#5: Telco PaaS Security Hardening
With 5G and critical services security hardening has become more and more important, use of tools like Falco and Service mesh is important in this platform
Paas#6: Telco PaaS Tracing and Logging
Although monitoring is quite mature in Kubernetes and its Distros the tracing and logging is still need to be addressed. Today with tools like Jaeger and Kafka /EFK needs to be include in the Telco PaaS
Paas#7: Telco PaaS E2E DevOps
For IT workloads already the DevOps capability is provided by PaaS in a mature manner through both cloud and application tools but with enhancements required by Telco workloads it is important the end-to-end capability of DevOps is ensured. Today tools like Argo need to be considered and it need to be integrated with both the general PaaS and Telco PaaS
Paas#8: Packaging
Standard packages like VNFD which cover both Application and PaaS layer
Paas#8: Standardization of API’s
API standardization in ETSI fashion is the key requirement of NFV and Telco journey and it needs to be ensured in PaaS layer as well. For Telco PaaS it should cover VES , TMForum,3GPP , ETSI MANO etc . Community has made following workings to standardize this
TMF 641/640
3GPP TS28.532 /531/ 541
IFA029 containers in NFV
ETSI FEAT17 which is Telco DevOps
ETSI TST10 /13 for API testing and verification
Based on these features there is an ongoing effort with in the LFN XGVELA community and I hope more and more users, partners and vendors can join to define the Future Open 5G Platform
Network Slicing is a great concept which has always been an attractive jargon for vendors who wish to bundle it with products to sell their products and solutions . However with the arrival of 3GPP Release16 and subsequent products arriving in market things are starting to change ,with so many solutions and requirements finding a novel slicing architecture that fits all is both technically complex and business wise not making lot of ROI sense . Today we will try to analyze and answer the latest progress and directions to solve this dilemma
Slicing top challenges
Based on our recent work in GSMA and 3GPP we believe below are the top questions both to evolve and proliferate slicing solutions
Can a Public Slicing solution fulfill vertical industry requirements
How to satisfy vertical industry that Slicing solution can fulfil their needs like data sovereignty , SLA , Security , Performance
Automation and Intelligence , can a public slicing solution flexible enough to provide all intelligence for each industry
Slicing for cases of 5G Infra sharing
Solution baseline principles
When we view Slicing or any tenant provisioning solution it is very important as E2E all layers including business fulfillment , network abstraction and Infrastructure including wireless adhere to the same set of principles .
A nice description of it can be found in 3GPP TS28.553 about management and orchestration for Network slicing and 3GPP TS28.554 KPI for 5G solutions and slicing . In summary once we take the systems view for Network Slicing the principles can be summarized to following
Slice Demarcation: A way to isolate each tenant and a possibility to offer different features of slicing bundle to different tenants , for example a Large enterprise with 10 features and 20 SLA while for small businesses 5 features and 5 SLA will do
Performance: A way to build a highly performant system , the postulate is once we engineer and orchestrate it will it work E2E
Observability : With 4B+ devices added every year and with industry setting a futuristic target of a Million Private networks by 2025 its just a pressing issue how to observe and handle such networks in real time
I think when we talk about Slicing mostly we speak about key Technology enablers like NFV , Cloud , MEC , SDN which is obviously great since a software of Network and Infra is vital . However not speaking about RAN #wireless and WNV (Wireless Network virtualization) is not a just . In this paper i just want to shed some light from RAN perspective , consider the fact still today around 65% of customers CAPEX/OPEX pumping in RAN and Transport it is vital to see this view for both conformant and realistic solution . if NFV/SDN/Cloud demands sharing among few 100’s tenants the RAN demands sharing among Million so resource sharing , utilization and optimization is vital
RAN Architecture
From E2E perspective the RAN part of slice is selected based on GST and NSSAI which is done by UE or the Core Network however its easier said than done when we need to view E2E Slicing following should be considered to build a scalable slicing solution
RAN#1: Spectrum and Power resources
The massive requirements for business towards services and slices require a highly efficient Radio resources , luckily low,mid and high bands combined with Massive Mimo is handling this part however not just spectrum and how to utilize this in efficient manner in form of form factor and power is vital .
When we need view RAN view of Slicing its not just the Spectrum it self or RF signal but also the Spectrum management like Macro , Femto and Het Nets including Open cellular . In summary still this part we are not able to understand well as it require some novel algorithms like MINLP (mixed integer non linear) programming which focus to optimize cost while increase resource usage at same time . As per latest trend a tiered RAN architecture combined by new algo like game matching through ML/AI is the answer to standardize this
RAN#2: RAN Dis-aggregation
Just like how NFV/SDN and Orchestration did for Core similarly Open-RAN and RIC (RAN intelligence controller ) will do for RAN . If you want to know more may be you need check author’s writeup about RAN evolution
RAN#3 RAN resource optimize
Based on our Field trials we find the use of Edge and MEC with RAN and specially for CDN will save around 25% of resources , the RAN caching is vital combined with LBO( Local break out) will help Telco’s fulfill the very pressing requirements from verticals . Again this is not just a cloudlet and software issue as different RAN architectures require a different approach like D2D RAN solution , Het Net and macro etc
RAN#4 Mid Haul optimize
Mid haul and Back haul capacity optimization is vital for slicing delivery and today this domain is still in a R&D funnel . A TIP project CANDI Converged Architectures for Network Disaggregation & Integration is some how evolving to understand this requirement
RAN#5 Edge Cost model
Edge solution for Slicing in context of RAN is cost model problem e.g how many MEC servers and location and it can relieve RAN and RF layer processing is the key , our latest work with Telco Edge cloud with different models for different site configuration is the answer
RAN#6 Isolation , elasticity and resource limitation
This is the most important issue for RAN slicing primarily due the the fact that they are different conflicting dimensions viz. extra resource isolation may make impossible to share resources and will limit services during peak and critical times , similarly much elasticity will make isolation and separation practically impossible , solutions for matching algorithms is the answer as it will help to build a RAN system which is not only less complex but also highly conformant . This is a make and break for RAN architecture for slicing
RAN#7 RAN infrastructure sharing for 5G
Today already the Infra sharing has started between ig players in Europe , the one questions that comes what about if a use purchase a slice and service from a tenant ,consider a whole sale view where the Infra is processed by sharing and bundling of resources from all national carriers due to reason that obviously the 5G infra from single operator is not sufficient from both coverage and capacity perspective
RAN#8 RAN Resource RAGF problem
In case of service mobility or congestion how UE can access the resources quickly may be in other sector or sites
RAN#9 Slice SLA
SLA of slices and its real time monitoring is the key requirements of business , however imagine a situation where shortage of shared resource pool make impossible to deliver the SLA
RAN#10 Slice Operations
Slice operations is not just the view of BSS and Operations as real time RAN resource usage and optimization is necessary , Have you ever thought how perfectly managed slice can exist with normal Telco service specially when you find there is a key event and many users will use service . I think this so some dimension still not well addressed . I have no hesitate to say when many CXO’s of enterprise convince them they should opt to build their own 5G private network this is exactly the problem they fear .
Summary
In today’s writeup i have tried to explain both the current progress, challenges and steps to build a successful slicing solution keeping the hat of a RAN architect , i believe this is very important to see the Radio view point which somehow i firmly believe has not gotten its due respect and attention in both standard bodies and by vendors , in my coming blog i shall summarize some key gaps and how we can approach it as still the slicing products and solutions are not carrier grade and it need further tuning to ensure E2E slicing and services fulfillment .
According to Latest Market insights the RAN innovation for Telecom Lags behind others initiative by 7years which means call for more innovative and Disruptive delivery models for the Site solutions specially for next Wave of 5G Solutions .
However to reach the goal of fully distribute and Open RAN there needs to build a pragmatic view of brown fields and finding the Sweet Spot for its introduction and wide adoption .
Here are my latest thoughts on this and how Telecom Operators should adopt it . There is a still a time for industry wide adoption of Open RAN but as yo will find time to act is now .
What you will
What you will learn
Building Delivery Models for Open RAN in a brownfield
Understand what,when and how of Open RAN
What is Open RAN and its relation with 5G
Current Industry solutions
Define phases of Open RAN delivery
Present and Next Steps 5. Architecture and State of Play