As of Q1 2022 for 5G adoption we have just passed 600M mark and expected to hit 2.5B connections by 2025 that is almost 500M every year . If we combine IoT and Device ecosystem the scale can go horrendously big .One of the biggest advantage and also challenge that comes from this once in a life time opportunity is “scale” .
Simply putting in to context “automation” and using ML/AI is a must to achieve both Network SLA’s ,efficiency and Optimizing Network TCO
AI has potential in creating value in terms of enhanced workload availability and improved performance and efficiency for 5G and Telco Cloud . However the biggest problem when it comes to use “AI” and Machine learning Telco’s is “Data” and “data Models” because simply there is no standardization or model definition on how Telco systems including Infrastructure expose the “Information” to upper layers Since data sets are huge in this domain with n permutations therefore first step to normalize is the Use case driven normalization of data that can be consumed both by Network and Data science domains . This will enable to develop a future Telco that can detect and also self heal itself .
Understanding Data Integration Architecture
Considering the 5G architecture which is based on Open API and Horizontal services design A.K.A SBA the Data integration and using AI should be an easy problem that can be divided in following to define a pipeline
Telemetry and data
Each layer data exposure as an API starting from Baremetal and then extending upwards towards Cloud , SDN , NFVO , Assurance etc
Data models and engines to disseminate information
However it is easier said than done because of many reasons including
What will be key data sets
how FCAPS of each layer can be dis-aggregated i.e dropping one layer data without confirming dependency is a kill
In order to address this we need to understand and gain experience from other industries and SDO’s and to see how it can both be agreed and integrated in Telco Networks , this lead us to approach this as a use case driven approach and select those domains and business challenges that can deliver quick results
"Follow the Money to deliver use cases that can monetize 5G
We have analyzed lot of use cases both from academia and industry and compiled a complete list here
From this we infer there are just too many ways Telco’s are solving same problems and this is what make us understand that there should be clear definition of “data Models” and use cases that should be defined at first steps .
The most important of which are :
Using Machine Learning to Detect Noisy Neighbors in 5G Networks.
2.Towards Black-Box Anomaly Detection in Virtual Network Functions
3. Causality Inference for Failure in NFV
4. Self Adaptive Deep Learning Based System for Anomaly Detection in 5G
5. Correlating multiple Events and Data in an Ethernet Network
This leads us to define following as first steps for AI and Intelligence as applied to Telco’s
Data Lakes , Log analysis and correlation
Anomaly detection including pattern detection , trend and Multi layer correlation
Intelligent prediction including capacity ,SLA , Scaling and Cloud KPIs
Measure data and Synthetize it using frameworks like eBPF
Data Monetization is first to make 5G Profitable
Adressing both the Data Architecture and Business Architecture is vital as different Telco’s including in cases different BU’s in same Customer take it differently and what makes it worst is manipulate and store data lakes using different forms i.e Infrastructure metrics , Agents , Databases which is hard to apply between different data sets and hence it is biggest issue to Monetize one key assets of 5G which is “data” and hence to define a pipeline that can be shared between all of tenants including vertical industry
As said before we are defining few key use cases in LFN project “Thoth” to learn and elaborate from there applying concepts of Events , Anomaly and Prediction across layers and first phase use cases are
As CSPs continue to evolve to be a digital player they are facing some new challenges like size and traffic requirements are increasing at an exponential pace, the networks which were previously only serving telco workloads are now required to be open for a range of business, industrial and services verticals. These factors necessitate the CSPs to revamp their operations model that is digital, automated, efficient and above all services driven. Similarly, the future operations should support innovation rather than relying on offerings from existing vendor operations models, tool and capabilities.
As CSPs will require to operate and manage both the legacy and new digital platforms during the migrations phase hence it is also imperative that operations have a clear transition strategy and processes that can meet both PNF and VNF service requirements with optimum synergy where possible.
In work done by our team with our customers specifically in APAC , future network should address the following challenges for its operations transformations.
Fault Management: Fault management in the digital era is more complex as there are no dedicated infrastructure for the applications. The question therefore arises how to demarcate the fault and corelate cross layer faults to improve O&M troubleshooting of ICT services.
Service Assurance: The future operations model requires being digital in nature with minimum manual intervention, fully aligned with ZTM (Zero Touch Management) and data driven using principles of closed loop feedback control.
Competency: To match operation requirements of future digital networks the skills of engineers and designers will play a pivotal role in defining and evolving to future networks. The new roles will require network engineers to be more technologist rather than technicians
IT Technology: IT technology including skills in data centers, cloud and shared resources will be vital to operate the network. Operation teams need to understand impacts of scaling elasticity and network healing on operational services
TM Forum ODA A.K.A Open Digital Architecture is a perfect place to start but since it is just an architecture and can lead to different implementation and application architecture so below i will try to share how in real brown field networks it is being applied . I will cover all modules except for AI and intelligent management which i shall be discussed in a separate paper .
Lack of automation in legacy telco networks is an important pain point that needs to be addressed promptly in the future networks. It will not only enable CSPs to avoid the toil of repetitive tasks but also allow them to reduce risks of man-made mistakes.
In order to address the challenges highlighted above it is vital to develop an agile operations models that improves customer experience , optimize CAPEX , AI operations and Business process transformation
Such a strategic vision will be built on an agile operations model that can fulfill the following:
Efficiency and Intelligent Operation: Telecom efficiency is based on data driven architectures using AI and providing actionable information and automation, Self-Healing Network capability and automation of network and as follows
Task Automation & Established foundation
Proactive Operation & Advanced Operation
Machine managed & intelligent Operation.
Service Assurance: Building a service assurance framework to achieve an automated Network Surveillance, Service Quality Monitoring, Fault Management, Preventive Management and Performance Management to ensure close loop feedback control for delivery of zero touch incident handling.
Operations Support: Building a support framework to achieve automated operation acceptance, change & configuration management.
Based on the field experience we achieved with our partners and customers through Telecom transformation we can summarize the learning as follows
People transformation: Transforming teams and workforce that matches the DevOps concept to streamline organization and hence to deliver services in an agile and efficient manner. This is vital because 5G , Cloud and DevOps is a journey of experience not deployment of solutions , start quickly to embark the digital journey
Business Process transformation: Working together with its partners for unification, simplification and digitization of end-to-end processes. The new process will enable Telco’s to quickly adapt the network to offer new products and to reduce time for troubleshooting.
Infrastructure transformation: Running services on digital platforms and cloud, matching a clear vision to swap the legacy infrastructure.
If PNF to VNF/CNF migration is vital the Hybrid Network management is critical
Automation and tools: Operations automation using tools like workforce management, ticket management etc is vital but not support vision of full automation . The services migration to cloud will enable automated delivery of services across the whole life cycle. Programming teams should join operations to start a journey where the network can be managed through power of software like Python, JAVA, GOLANG and YANG models. It will also enable test automation, a vision which will enable operations teams to validate any change before applying it to live network.
Having said this i hope it shall serve as a high level guide for architects adressing operational transformation , as we can see AI and Intelligent Managmeent is vital piece of it and i shall write on this soon .
During last year industry has witnessed Telco’s increased spend and maturity in Cloud and Automation Platforms . During Pandemic it is proven that Digital and Cloud is the answer our customers require to design , build and Operate Future Telecom Networks .
The Second key Pillar forcing Telecom industry towards Autonomous networks is to deliver business outcomes while doing business responsibly .
Getting Business outcomes and doing a sustainable business that supports Green Vision has been a not related discussion in Telecom Industry
But now infusion of Data and Cloud is really enabling it , it is expected that we as industry can cutdown at least 50% of Power emissions in coming decade but how it will become possible . According to Pareto’s law the last 20% will be most difficult .
This is where my team main focus has been to build robust AI and Automation use cases that are intelligent enough and that solves broader issues . Today the biggest focus for ML/AI for Telco’s that can really put them lead such outcomes are
Smart Capacity management
O&M of networks that reduces emissions and improves availability
Service assurance based on data
The biggest Challenge in Transformation is Fragmentation
The biggest bottleneck is making such outcomes is related to data . Intricately “Data” is both the problem and the Solution because of so many sources of truth and different ingestion mechanisms . Do check details on #Dell Streaming data platforms and how we are solving this problem
Today under the umbrella of Anuket , 3GPP , TMF and ITU are all collaborating to come a validated and composite solution to deliver those use cases . So in a nutshell it is vital to build a holistic and unified view to deliver data driven AI use cases
Scope and Scale of Intelligent automation
The biggest bottleneck is coming from the fact that in real world Telco Apps can never be fully cloud native , at some level both the state and resiliency requirements and App requirements has to be kept and to come with intelligent work load driven decisions . The decade long journey of Telecom Transformation has revealed that just building everything as a code and expecting it to work and Telco’s can rollback their NOC sizes simply not works .
This is where intelligence from layers above the Orchestration and SDN will be of help like google does in the Internet era .
The second biggest issue is in the Scalable Telco solutions itself , it is proven that Telco’s face unique challenges as they move from hundred’s to thousand of nodes . So imagine running AI for heterogenous environments each coming with different outcomes can never deliver power and scale Telco’s need in the new era .
Telco grade AIOPS models
It is true that with 5G and Business digital transformation the industry really want to ramp up to build an improved user experience and unified model to expand portfolio towards vertical markets as well , this is only possible if we can have a coordinated system , workflow management and data sharing and exposure with strict TSR security measures . Similarly this model should cover full LCM including FCAPS model .
Building Intelligent Telco’s
Although using AI and ML is an exciting ambition for a Telco still the bottom line is how to build these platforms on top of NFVI and Existing Orchestration and Automation frameworks . In other words really business case to build an intelligent networks starts with using Data and ML to automate the entire network . Although in this aspect the scope can extend not just to service domain but also to business domains i.e automate business process including event correlation , anomaly and RCA
Building a Unified AI Platform
Although this intention or target is clear however in context of networks this is complex as we need to solve challenge of data security , regulation as well as what it really means to do the certification of an AI platform because focus should be given that allow this layer to be built from solution from many vendors so a loose coupling with more focus on Network service and AI algorithms is a key to build this platform
Instead of focusing on network element certification focus of AI platform is service level compatibility , data models and AI algorithms
However lack of unified standard specially on trusted data normalization , sharing and exposure is certainly forcing operators to adopt a Be-Spoke solutions to build AI platforms and that itself is a big impediment to wide scale adoption of AI and ML in the Networks
To move forward more close collaboration between different standard bodies and governance by more Telco centric organization like TMF is the answer with immediate focus to be given to Data standardization , labs integration and to enable shared data sets and algorithms to evolve and support wide deployments of ML and AI in Telecom Networks
Latest Industry progress and standardization
Although this is the early time of AI platforms standardization still we need to aggregate the progress between different bodies lest we can only expect the plethora of silo solutions each with a different specifications
ONAP as baseline of automation platform has components like DCAE and AI engine that makes sense to make it the defacto baseline standard
Anuket is the Cloud Infrastructure reference and it has recently launched a new project “Thoth” to look in to AI network standardization
ETSI ZSM is E2E automation platform across full LCM of a Telecom network and certainly an important direction
ETSI ENI or enhanced network intelligence is another body that closely defines AI specifications in the context of Telecom
TMF as a broader Telecom body is defining architectures including ODA and AIOPS that really breaks down on how a Telco can take a phased approach to build these platforms
Above all early involvement and support from Telecom operators and partners is very important to realize this goal . I hope in this year we will see more success and standardization on these initiatives so lets work together and stay tuned .
Cloud computing use in the Telecom industry has been increasingly adopted during the last decade. It has changed many shapes and architectures since the first phase of NFV that started back in 2012. In today’s data hungry world there is an increasing demand to move Cloud architectures from central clouds to loosely coupled distributed clouds both to make sense from cost perspective by slashing transport cost to anchor all user traffic back to central data centers but also certainly from security perspective where major customer prefers to keep data on premises. Similarly, with the Hyperscale’s and public cloud providers targeting Telco industry it is evident that the future Cloud will be a fully distributed and multi cloud constituted by many on-premise and public cloud offerings.
Since 5G by design is based on Cloud concepts like
Service based architectures
Hence it is evident that many operators are embarking on a journey to build open and scalable 5G clouds that are capable to handle the future business requirements from both Telco and industry verticals. The purpose of this paper is to highlight the key characteristics of such Clouds and how we must collaborate with rich ecosystem to make 5G a success to achieve industry4.0 targets.
Cloud Native Infrastructure for 5G Core and Edge
Cloud native do not refer to a particular technology but a set of principles that will ensure future Applications are fully de-coupled from the Infrastructure, on atomic level it can a VM or container or may be futuristic serverless and unikernels. As of today, the only community accepted Cloud native standard for 5G and Cloud is an OCI compliant infrastructure. In general cloud native for Telco means a Telecom application as per 3GPP , IETF and related standard that meets criteria of Cloud native principles as shared in this paper, support vision of immutable infrastructure, declarative and native DevSecOps for the whole Infrastructure.
Cloud native is the industry defacto for develop and deliver applications in the Cloud and since 5G by its design is service based and microservice enabled so the basic principle for 5G infrastructure is Cloud native which will support scalability, portability, openness and most importantly flexibility on board a wide variety of Applications.
As per latest industry studies the data in 5G era will quadruple every year this will make Cloud native a necessity to provision automated infrastructures that will be fully automated, support common SDK’s and above all will enable CI/CD across the full application life cycle
Scalability to deploy services in many PoP’s is the other key requirements for 5G along with possibility to build or tear the service on the fly. As 5G deployments will scale so is cloud instances and it is a necessity that future Cloud infrastructure can be scaled and managed automatically
Application portability is the other key characteristics of 5G cloud. As 5G use cases will become mature there is an increasing requirement to deploy different applications in different clouds and to connect them is a loosely based meshes. In addition, as Network capacities and usage will increase the applications must be capable to move across the different clouds
What Cloud means for Telco 5G
Telco operators through their mission critical infrastructure holds a seminal place in the post covid-19 digital economy. Telecom networks use impacts economy, society, commerce and law order directly this is why Telecom networks are designed with higher availability, reliability and performance.
The biggest challenge for Cloud Native Infrastructure for
Granularity of Telco App decomposition
O&M and Operational frameworks
Due to the reason that Telco 5G applications need to fulfill special SLA based performance and functions which somehow are not possible in the containerized and Kubernetes based Cloud platforms of today so we must define a Telco definition of Cloud. Similarly, how we will connect workloads E-W is very important. The questions become more prevalent as we move towards edge . The downside is that any deviation from standard Cloud native means we cannot achieve the promised of Scaling, performance and distribution the very purpose for which we have built these platforms ,
Any tweaks on the cloud principles means we can not provision and manage a truly automated Cloud Infrastructure following DevSecOps which is so vital to deliver continuous updates and new software codes in the 5G infrastructure. Lacking such functions means we can not meet fast pace innovation requirements which are necessary for the 5G new use cases specially for the vertical markets
The last and most important factor is leveraging advances from hyper scalers to achieve Cloud and 5G deployments , today we already see a movement in market where a carrier grade Clouds from famous distros like IBM can be deployed on top of public clouds but here top question impacting is whether “abstraction will impact performance” , the one top reason NFV first wave was not such disruptive because we defined so many models and used model to define another model which obviously added to complexity and deployment issues . Cloud Native for 5G Telco need to address and harmonize it as well
Applications for 5G
Application economy is vital for the success of 5G and Edge . However, based on T1 operators’ deployments of open 5G platforms has revealed that just deploying a open Infrastructure is not enough as adherence of Cloud by application vendors will vary and to truly take advantage of Cloud it is vital to define principles for a Infrastructure lead delivery by devising frameworks and tools to test and benchmark the 5G applications classification as Gold , Bronze , Silver with common direction to achieve a fully gold standard applications in the 5G era . Although Cloud native by principles support vision to achieve common, shared and automated infrastructure but it is easier said than done in real practice as achieving a Telco grade conformance for
Telco services is complex that require rigorous validation and testing. Based on real Open 5G cloud deployments and corresponding CNF benchmarking there are still certain gaps in standards that need both standardization and testing.
Application resources over committing
Application networking dependence that slows scaling
Use of SDN in 5G Cloud
Lack of Open Telemetry which makes customized EMS mandatory
Hybrid management of VNF and CNF’s
Luckily there are a number of industry initiatives like CNCF Conformance , CNTT RI2 , NFV NOC , OPNFV which fundamentally address this very issues and already we have seen the results . It is vital that 5G Cloud infrastructures are capable to support east to use SDK’s and tools that vendors and developers can use flexibly to offer and deploy different applications in the 5G era.
In Next Part I shall try to elaborate how Open Telemetry and Automation is driving next era of growth using ML and AI driven solutions
Questions like #Coexistence ,#NewOnM models and #Processes reengineering required to address following challenges: a. Theoretically zero downtime by building E2E Architecture e.g UPF pool b. Delivery pipeline for migration through unified tools solving touching every NE one by one an improve TTM c. Service consistency automatic verification between Cloud and Legacy d. 5 9’s reliability and TSR gold standard security
Customers expect #partners who offer #Operational tools and services optimized for 1) E2E whole solution support ownership in MVI 2) SPOC support services 3) Tools and Platform services specially on PaaS and NFVO to co-develop 4) Cloud assurance services specially for #business readiness , #Transition , #TaaS ,#Solution emergency support and CSR
And Partner must work with #Telco to #remodel processes starting with 1) Incident handling for L1 2) Config management supporting multi layers 3) Change management with focus on scaling , SDN and Policy enforcement 4) Release management supporting #devops viz staging to prod
Recent PoC’s of P4 programming models with multiple registers alongside Intel 3rd gen processors and agileX(10nm) gave critical indications about Future delivery model of 5G Edge Infrastructure and Networks by solving following Telco requirements
Telco Key Edge Infrastructure Requirements
Workload placements at edge with possibility to make them portable based on real time use
Enhancing Infrastructure models and form factors for the Edge
Evaluate P4 as baseline to build unified model to program Compute/Network , VNF/CNF got real time use cases like traffic steering
Results with Smart NICS and Latest X86 Chips
Below are key findings
The hardware can deliver consistent throughput (Required 10Gbps per core ) till 10 register pipelines and degrades exponentially after it @12% for each pair of registers after this
Impact on latency is prominent as we increase registers like 40-50% variation after register >10
P4 architecture with Intel 3rd generation can help us solve and optimize for this issues dramatically
Key Achievements with P4 in last year
Unified programming model from Core to Cloud to Edge demo
With most of Tier1 Operators rolling the ball for early experience of 5G Standalone based on 3GPP Release-16 that can offer very new and unique experiences to 5G around uRLLC and mIOT along with improved QOS for broadband the whole industry is looking forward to accelerate 5G adoption in 2021 .
This is an ideal time for the industry to find new ways to improve human life and economy in the post covid world . However the biggest problem with 5G experience so far has been the Delivery model that can offer both Cost and Scale advantages . With the hyperscalers eyeing the Telco billion dollar market it is necessary for all vendors and Telco’s themselves to analyze and find ways towards new business models that are based on
By coherence I mean when we go to 5G Radio site what components will be residing there
A 5G Radio site components
By Data I mean till now depsite dleiverying thouand’s of 5G sites we can not offer DaaS(Data as a service) to vertical’s the only visibiltiy we have is on horizontal interfaces in a 3GPP way .
The third and the most important piece is RF and RAN service . You talk to any RAN guy and he is not interested in both Coherence and Data unless we answer him a RAN service that is atleast same lr better the legacy .
This makes the story of Open RAN very exciting to analyze and this is the story of my experience of leading such projects during last years in both Company and Industry . This is my humble opinion that Open RAN and other such innovative solutions must not be analyzed only through tehnology but an end to end view where some use cases are specificcally requiring such solutions to be succcessful .
Why Open RAN
For me the Open RAN is not about Cloud but finding a new and disruptive Delivery model to offer RaaS (RAN as a service). Have you ever imagined what will happen if a Hyperscaler like AWS or Azure acquire a couple of RAN companies who can build a Cloud Native RAN Applications , the Telco can order it online and it can be spined in the Data center or on the Edge Device in a PnP(Plug and Play manner)
If you think I am exaggerating already there are disucssions and PoC’s happening aorund this . So Open RAN is not about Cloud but about doing somethnig similar by the Telco Industry in a Open and Carrier grade fashion .
This is where Open RAN is gaining momentum to bring power of Open , Data driven and Cloud based disaggregated solution to the RAN site .Future of Telco is a well-designed software stack that extends all the way from Core the last mile Site of Network Crucially, it also allows for placing more compute, network and storage closer to the source of unrelenting volume of data – devices, applications, and end-users.
There is another aspect which is often overlooked which is transport cost , as per filed trial result use of Open RAN CPRI interface 7-2 increased the front haulf capacity utilziation by at least 30-40% primarily as there are lot of propietry overheads in CPRI world .
“For CXO it means at least 30-40% direct savings on Metro and transport Costs”
What is Open RAN
5G RAN has a huge capacity requirements and naturally to scale this network disaggregation and layered network architecture is the key . Open RAN architecture can be depicted as follows .
Radio Unit that handles the digital front end (DFE) and the parts of the PHY layer, as well as the digital beamforming functionality .It is almost same architecture as in DBS (Distributed Basestation) architecture fofered by many legacy vendors . Conceptually AAU (Active antenna unit) is considered together with radio unit
Distributed unit handles the real time L1 and L2 scheduling functions mainly MAC split . it is realt time part of BBU (Baseband unit)
responsible for non-real time, higher L2 and L3 , it is non real time functions of BBU like reosurce pooling , optimization etc
RAN Intelligent contorller is the intelligence component of Open RAN that will collect all data and offer insights and innovation through xAPPS . similarly it cooperates with NT-RIC (Non real time RIC) which is part of ONAP/MANO to offer end to end end SMO (Service management and Orchestration) functions
Interfaces of Open RAN
When it comes to understand Open RAN we need to understand both those defined by O-RAN and those by 3GPP and this is so important as it do require cross SDO liaisonship
A1 interface is the inerface between non real time RIC /Orchestrator and RAN components
E2 interface is the interface between NT-RIC (RAN Intelligent Controller) and CU/DU
Open FrontHaul is the interface between RU and DU mostly we are focusing on eCPRI 7-2 to standardize it
O2 interface is the interfaces betwee NFVI/CISM and Orchestrator
E1 interface is the the interface between CU-CP (control plane) and CU-UP (user plane)
F1 interface is the interface between CU and DU
NG-c is the interface between gNB CU-UP and AMF in 5G Core Network
To solve all interface and use case issues the ORAN Alliance is working in a number of streams to solve issues .
WG1: Use Cases and Overall Architecture
WG2: The Non-real-time RAN Intelligent Controller and A1 Interface
WG3: The Near-real-time RIC and E2 Interface
WG4: The Open Fronthaul Interfaces,
WG5: The Open F1/W1/E1/X2/Xn Interface
WG6: The Cloudification and Orchestration
WG7: The White-box Hardware Workgroup
WG8: Stack Reference Design
WG9: Open X-haul Transport.
Standard Development Focus Group (SDFG): Strategizes standardization effort. Coordinates and liaises with other standard organizations.
Test & Integration Focus Group (TIGF): Defines test and integration specifications across workgroups.
Open Source Focus Group (OSFG): Successfully established O-RAN SC to bring developer in the Open RAN ecosystem
Early Playground of Open RAN
The change in world economy impacted by geo political factors like a drive to replace Chinese vendors from networks like in Australia for national security reasons naturally change momentum to find both less costly and high-performance systems. Naturally one of the prime networks where Open RAN will get introduced are above .It is true that still there are some gaps in Open RAN performance mainly on the Base band processing and front haul but there are some use cases in which Open RAN already proved to be successful as shown below , the key point here is that although there are some issues but with some use cases ready it is important to introduce Open RAN now and to evolve it in a pragmatic way to ensure they can coexist with traditional RAN solutions
Private 5G Networks
Rural Deployment e.g 8T8R FDD
In building solutions
TIP Open RAN Project Progress
TIP A.K.A Telecom Infra Project is an Open-source project looking after a number of disruptive solutions that can make 5G networks both cost efficient and innovative. Below are some highlights on our progress in the community till 2021
A1: Built the Reference architecture for Multi vendor validations
Through support of operators, vendors and partners built a complete reference architecture to test and validate the complete stack end to end and SI integration
A2: Built the Reference architecture for Multi vendor validations
Worked to define the define the complete RFX requirements for Open RAN and it can be retrieved as below
A4: Success through RIC (RAN Intelligence controller)
There are two-fold advantages of RIC introduction in Open RAN architectures mainly first for RAN automation for both Managmeent and KPI optimization and secondly bring new and disruptive use cases through xAPPS and data driven operations including
Smart troubleshooting of RAN
RAN parameter optimization using AI
New Use cases by exposing API’s to 3rd part developer
A5: RAN configuration standardizations
Benchmarked a number of RAN configurations to be deployed in the field including
A6: Chip advancements brought prime time for Open RAN with 64T64R trial in 2021
Customers realy on 5G massive multiple-input and multiple-output (MIMO) to increase capacity and throughput. With Intel’s newest Intel Xeon processors, Intel Ethernet 800 series adapters and Intel vRAN dedicated accelerators, customers can double massive MIMO throughput in a similar power envelope for a best-in-class 3x100MHz 64T64R vRAN configuration”
Challenges and Solutions
In last two years we have come far to innovate and experiment with a lot of solutions on Open RAN , the list of issues we solved are huge so lets focus on only the top challenges today and how we are solving them today, let’s say to set a 2021 target our target is by the time O-RAN alliance freezes coming releases Dawn (June 2021) and E Release (Dec 2021) we as a community are able to fix those top issues as below .I apologize to leave down some less complex issues like deploy Open RAN , focus groups key interfaces specifications status specially momentum around 7-2 etc . I feel I ran of time, pages and energy and I fear I will test your patience with a bigger tome
P#1: Ecosystem issues in Open RAN
Based on our field trial we found for 8T8R we can achieve around 40% of cost reductio with Open RAN with the future transport architectures like “Open ROADM” we can build whole RAN network in open manner and achieve great cost and efficiency advantages. However when components are from different vendors the non-technical factors to make it successful is really challenging e.g
How to convince all team including RAN guys that what we are doing is right J and we should get rid of the black boxes
How to let all partners work together in a team which are historically competitors
Making software integration teams
P#2: Radio Issues in Open RAN
The Site swap scenarios in most brown field environments require efficient Antennas and Radios that are
2G + 3G +4G +5G supported
Massive Mimo and beam forming support
This is a fact that till now most of effort has been on the DU/CU part but now we need more attention solving the radios and antennas issues .
Lesson we learnt in 2020-2021 is that everything can not be solved by the software as finally every software need to run over a hardware . An increased focus on Radio , Antennas and COTS hardware is a must to acceerate any software innovation
P#3: Improve RAN KPI
No disruptive solution of Open RAN acceptable unless it can deliver a comparable performance to legacy systems like coverage, speed , KPI .
To make Open RAN main stream DT, tools, RAN benchmarking all need to be addressed and not only the Cloud and automation part
P#4:SI and certification process
We already witnessed a number of SI forms and capabilities during NFV PSI however for a disruptive solution like Open RAN it need a different approach and SI should possess following
Complete vertical stack validation
Its not just the Cloud or the hardware but the end to end working solution that is required
Stack should include Radios and Hardware
Certification should consider RF/radio and hardware validation
Software capability and automation
To make it successful it is very important that SI is rich on bothy tools and capabilities on automation ,data and AI
P#5:Impacts of Telco PaaS and ONAP
To make Open RAN real success it is very important to consider this while building capabilities and specifications of other reference Architectures most of which are Telco PaaS and ONAP . If i go to explain this part i fear the paper length will become too long and may skew towards something not necessarily a RAN issue .
However just to summarize the ONAP community has been working closely with Open RAN to bring the reference architecture in upcoming release
Finally for Telco PaaS we are also working to include Telemetry , Packaging and Test requirements for Open RAN stack . Those who interested in these details kindly do check my early paper below
Open RAN a necessity for 5G era
Early experience with 5G proved that it is about scale and agility, with cost factors driving operators towards an efficient delivery model that is agile, innovative and that can unleash the true potential of network through Data and AI.
In additon as time will pass and more and more use cases will require integration of RAN with 3rd party xAPPs it will be definite need to eolvve to a architecture like Open RAN that will not only support coexistence and integration with legacy systems but also support fast innovation and fleibiltiy over time . With early successful deployments of Open RAN already happened in APAC and US its improtant for all industry Catch the Momentum
Those who are proponents of closed RAN systems often say that an Open system can never compare with monolithic and ASIC based systems performance, similarly they claim the SI and integration challenge to stitch those systems far outweigh any other advantage.
The recent advantages in Silicon like Intel FLEX architecture with ready libraries like OPENESS and Open Vino has really made it possible to achieve almost same performance like monolithic RAN systems.
Above all latest launch of #intel 3rd generation XEON processors are proving to be a game changer in bringing COTS to the last mile sites .
Above all involvement of SI in the ecosystem means the industry is approaching phase where all integration can be done through open API and in no time making true vision of level4 autonomous network possible.
Since the release of much awaited 3GPP Release-16 in June last year lot of vendors have proliferated their products and brought their 5G SA A.K.A Standalone products to market and with promises like support of Slicing , massive IoT , uRLLC and improved , Edge capability ,NPN and IAB backhauling it is just natural all big Telco’s in APAC and globally have already started their journey towards 5G Standalone core . However, most of the commercial deployments are based on vendor E2E stack which is a good way to start journey and offer services quickly however with the type of services and versatility of solution specially on the industry verticals required and expected from both 3GPP Release16 and SA Core it is just a matter of time when one vendor cannot fulfill all the solutions and that is when a dire need to build a Telco grade Cloud platform will become a necessity.
During the last two years we have done a lot of work and progress in both better understanding of what will be the Cloud platforms for 5G era , it is correct that as of now the 5G Core container platform from open cloud perspective is not fully ready but we are also not too far from making it happen . From community Anuket Kali that we are targeting in June is expecting to fulfill many gaps and our release cycle for XGVELA will try to close many gaps , so in a nutshell 2021 is the year where we expect a Production ready open cloud platforms avoiding all sorts of vendor lock ins .
Let’s try to understand top issues enlisted based on 5G SA deployments in Core and Edge Vendors are mostly leveraging existing NFVI to evolve to CaaS by using a middle layer shown Caas on Iaas , the biggest challenge is this interface is not open which means there are many out of box enhancements done by each vendor and this is one classic case of “When open became the closed “
The most enhancement done on the adaptors for container images are as follows
Provides container orchestration, deployment, and scheduling capabilities.
Provides container Telco enhancement capabilities: Hugepage memory, shared memory, DPDK, CPU core binding, and isolation
Supports container network capabilities, SR-IOV+DPDK, and multiple network planes.
Supports the IP SAN storage capability of the VM container.
Migration path from Caas on IaaS towards BMCaaS is not smooth and it will involve complete service deployment, it is true with most operators investing heavily in last few years to productionize the NFVi no body is really considering to empty pockets again to build purely CaaS new and stand-alone platform however smooth migration must be considered
We are still in early phase of 5G SA core and eMBB is only use case so still we have not tested the scaling of 5G Core with NFVi based platforms
ETSI Specs for CISM are not as mature as expected and again there are lot of out of box customizations done by each vendor VNFM to cater this.
Now lets come to point where the open platforms are lacking and how intend to fix it
Experience #1: 5G Outgoing traffic from PoD
The traditional Kubernetes and CaaS Platforms today handles and scales well with ingress controller however 5G PoD’s and containers outgoing traffic is not well addressed as both N-S and E-W traffic follows same path and it becomes an issue of scaling finally.
We know some vendors like Ericsson who already bring products like ECFE and LB in their architecture to address these requirements.
Experience#2: Support for non-IP protocols
PoD is natively coming with IP and all external communication to be done by Cluster IP’s it means architecture is not designed for non-IP protocols like VLAN, L2TP, VLAN trunking
Experience#3: High performance workloads
Today all high data throughputs are supported CNI plugin’s which natively are like SR-IOV means totally passthrough, an Operator framework to enhance real time processing is required something we have done with DPDK in the open stack world
Experience#4: Integration of 5G SBI interfaces
The newly defined SBI interfaces became more like API compared to horizontal call flows, however today all http2/API integration is based on “Primary interfaces” .
It becomes a clear issue as secondary interfaces for inter functional module is not supported
Experience#5: Multihoming for SCTP and SI is not supported
For hybrid node connectivity at least towards egress and external networks still require a SCTP link and/or SIP endpoints which is not well supported
Experience#6: Secondary interfaces for CNF’s
Secondary interfaces raise concerns for both inter-operability, monitoring and O&M, secondary interfaces is very important concept in K8S and 5G CNF’s as it is needed during
For all Telecom protocols e.g BGP
Support for Operator frameworks (CRD’s)
Performance scenarios like CNI’s for SR-IOV
today only viable solution is by NSM i.e service mesh that solves both management and monitoring issues
Experience#7: Platform Networking Issues in 5G
Today in commercial networks for internal networking most products are using Multus+VLAN while for internal based on Multus+VxLAN it requires separate planning for both underlay and overlay and that becomes an issue for large scale 5G SA Core Network
Similarly, top requirements for service in 5G Networks are
Network separation on each logical interface e.g VRF and each physical sub interface
Outgoing traffic from PoD
NAT and reverse proxy
Experience#8: Service Networking Issues in 5G
For primary networks we are relying on Calico +IPIP while for secondary network we are relying ion Multus
Experience#9: ETSI specs specially for BM CaaS
Still I believe the ETSI specs for CNF’s are lacking compared to others like 3GPP and that is enough to make a open solution move to a closed through adaptors and plugin’s something we already experienced during SDN introduction in the cloud networks today a rigorous updates are expected on
IFA038 which is container integration in MANO
IFA011 which is VNFD with container support
Sol-3 specs updated for the CIR (Container image registry) support
Experience#10: Duplication of features on NEF/NRM and Cloud platforms
In the 5G new API ecosystem operators look at their network as a platform opening it to application developers. API exposure is fundamental to 5G as it is built into the architecture natively where applications can talk back to the network, command the network to provide better experience in applications however the NEF and similarly NRF service registry are also functions available on platforms. Today it looks a way is required to share responsibility for such integrations to avoid duplicates
Reference Architectures for the Standard 5G Platform and Capabilities
Cap#1: Solving Data Integration issues
Real AI is the next most important thing for Telco’s as they evolve in their automation journey from conditional #automation to partial autonomy . However to make any fully functional use case will require first to solve #Data integration architecture as any real product to be successful with #AI in Telco will require to use Graph Databases and Process mining and both of it will based on assumption that all and valid data is there .
Cap#2: AI profiles for processing in Cloud Infra Hardware profiles
With 5G networks relying more on robust mechanisms to ingest and use data of AI , it is very important to agree on hardware profiles that are powerful enough to deliver AI use cases to deliver complete AI pipe lines all the way from flash base to tensor flow along with analytics .
Cap#3: OSS evolution that support data integration pipeline
To evolve to future ENI architecture for use of AI in Telco and ZSM architecture for the closed loop to be based on standard data integration pipeline like proposed in ENI-0017 (Data Integration mechanisms)
Cap#4: Network characteristics
A mature way to handle outgoing traffic and LB need to be included in Telco PaaS
Cap#5: Telco PaaS
Based on experience with NFV it is clear that IaaS is not the Telco service delivery model and hence use cases like NFVPaaS has been in consideration for the early time of NFV . With CNF introduction that will require a more robust release times it is imperative and not optional to build a stable Telco PaaS that meet Telco requirements. As of today, the direction is to divide platform between general PaaS that will be part of standard cloud platform over release iterations while for specific requirements will be part of Telco PaaS.
The beauty of this architecture is no ensure the multi-vendor component selection between them. The key characteristics to be addressed are
Paas#6: Telco PaaS Tools
The agreement on PaaS tools over the complete LCM , there is currently a survey running in the community to agree on this and this is an ongoing study
During recent integrations for NFV and CNF we still rely on Application layer LI characteristics as defined by ETSI and with open cloud layer ensuring the necessary LI requirements are available it is important that PaaS include this part through API’s
Paas#3: Telco PaaS Charging Characteristics
The resource consumption and reporting of real time resources is very important as with 5G and Edge we will evolve towards the Hybrid cloud
Paas#4: Telco PaaS Topology management and service discovery
A single API end point to expose both the topology and services towards Application is the key requirement of Telco PaaS
Paas#5: Telco PaaS Security Hardening
With 5G and critical services security hardening has become more and more important, use of tools like Falco and Service mesh is important in this platform
Paas#6: Telco PaaS Tracing and Logging
Although monitoring is quite mature in Kubernetes and its Distros the tracing and logging is still need to be addressed. Today with tools like Jaeger and Kafka /EFK needs to be include in the Telco PaaS
Paas#7: Telco PaaS E2E DevOps
For IT workloads already the DevOps capability is provided by PaaS in a mature manner through both cloud and application tools but with enhancements required by Telco workloads it is important the end-to-end capability of DevOps is ensured. Today tools like Argo need to be considered and it need to be integrated with both the general PaaS and Telco PaaS
Standard packages like VNFD which cover both Application and PaaS layer
Paas#8: Standardization of API’s
API standardization in ETSI fashion is the key requirement of NFV and Telco journey and it needs to be ensured in PaaS layer as well. For Telco PaaS it should cover VES , TMForum,3GPP , ETSI MANO etc . Community has made following workings to standardize this
3GPP TS28.532 /531/ 541
IFA029 containers in NFV
ETSI FEAT17 which is Telco DevOps
ETSI TST10 /13 for API testing and verification
Based on these features there is an ongoing effort with in the LFN XGVELA community and I hope more and more users, partners and vendors can join to define the Future Open 5G Platform
Network Slicing is a great concept which has always been an attractive jargon for vendors who wish to bundle it with products to sell their products and solutions . However with the arrival of 3GPP Release16 and subsequent products arriving in market things are starting to change ,with so many solutions and requirements finding a novel slicing architecture that fits all is both technically complex and business wise not making lot of ROI sense . Today we will try to analyze and answer the latest progress and directions to solve this dilemma
Slicing top challenges
Based on our recent work in GSMA and 3GPP we believe below are the top questions both to evolve and proliferate slicing solutions
Can a Public Slicing solution fulfill vertical industry requirements
How to satisfy vertical industry that Slicing solution can fulfil their needs like data sovereignty , SLA , Security , Performance
Automation and Intelligence , can a public slicing solution flexible enough to provide all intelligence for each industry
Slicing for cases of 5G Infra sharing
Solution baseline principles
When we view Slicing or any tenant provisioning solution it is very important as E2E all layers including business fulfillment , network abstraction and Infrastructure including wireless adhere to the same set of principles .
A nice description of it can be found in 3GPP TS28.553 about management and orchestration for Network slicing and 3GPP TS28.554 KPI for 5G solutions and slicing . In summary once we take the systems view for Network Slicing the principles can be summarized to following
Slice Demarcation: A way to isolate each tenant and a possibility to offer different features of slicing bundle to different tenants , for example a Large enterprise with 10 features and 20 SLA while for small businesses 5 features and 5 SLA will do
Performance: A way to build a highly performant system , the postulate is once we engineer and orchestrate it will it work E2E
Observability : With 4B+ devices added every year and with industry setting a futuristic target of a Million Private networks by 2025 its just a pressing issue how to observe and handle such networks in real time
I think when we talk about Slicing mostly we speak about key Technology enablers like NFV , Cloud , MEC , SDN which is obviously great since a software of Network and Infra is vital . However not speaking about RAN #wireless and WNV (Wireless Network virtualization) is not a just . In this paper i just want to shed some light from RAN perspective , consider the fact still today around 65% of customers CAPEX/OPEX pumping in RAN and Transport it is vital to see this view for both conformant and realistic solution . if NFV/SDN/Cloud demands sharing among few 100’s tenants the RAN demands sharing among Million so resource sharing , utilization and optimization is vital
From E2E perspective the RAN part of slice is selected based on GST and NSSAI which is done by UE or the Core Network however its easier said than done when we need to view E2E Slicing following should be considered to build a scalable slicing solution
RAN#1: Spectrum and Power resources
The massive requirements for business towards services and slices require a highly efficient Radio resources , luckily low,mid and high bands combined with Massive Mimo is handling this part however not just spectrum and how to utilize this in efficient manner in form of form factor and power is vital .
When we need view RAN view of Slicing its not just the Spectrum it self or RF signal but also the Spectrum management like Macro , Femto and Het Nets including Open cellular . In summary still this part we are not able to understand well as it require some novel algorithms like MINLP (mixed integer non linear) programming which focus to optimize cost while increase resource usage at same time . As per latest trend a tiered RAN architecture combined by new algo like game matching through ML/AI is the answer to standardize this
RAN#2: RAN Dis-aggregation
Just like how NFV/SDN and Orchestration did for Core similarly Open-RAN and RIC (RAN intelligence controller ) will do for RAN . If you want to know more may be you need check author’s writeup about RAN evolution
RAN#3 RAN resource optimize
Based on our Field trials we find the use of Edge and MEC with RAN and specially for CDN will save around 25% of resources , the RAN caching is vital combined with LBO( Local break out) will help Telco’s fulfill the very pressing requirements from verticals . Again this is not just a cloudlet and software issue as different RAN architectures require a different approach like D2D RAN solution , Het Net and macro etc
RAN#4 Mid Haul optimize
Mid haul and Back haul capacity optimization is vital for slicing delivery and today this domain is still in a R&D funnel . A TIP project CANDI Converged Architectures for Network Disaggregation & Integration is some how evolving to understand this requirement
RAN#5 Edge Cost model
Edge solution for Slicing in context of RAN is cost model problem e.g how many MEC servers and location and it can relieve RAN and RF layer processing is the key , our latest work with Telco Edge cloud with different models for different site configuration is the answer
RAN#6 Isolation , elasticity and resource limitation
This is the most important issue for RAN slicing primarily due the the fact that they are different conflicting dimensions viz. extra resource isolation may make impossible to share resources and will limit services during peak and critical times , similarly much elasticity will make isolation and separation practically impossible , solutions for matching algorithms is the answer as it will help to build a RAN system which is not only less complex but also highly conformant . This is a make and break for RAN architecture for slicing
RAN#7 RAN infrastructure sharing for 5G
Today already the Infra sharing has started between ig players in Europe , the one questions that comes what about if a use purchase a slice and service from a tenant ,consider a whole sale view where the Infra is processed by sharing and bundling of resources from all national carriers due to reason that obviously the 5G infra from single operator is not sufficient from both coverage and capacity perspective
RAN#8 RAN Resource RAGF problem
In case of service mobility or congestion how UE can access the resources quickly may be in other sector or sites
RAN#9 Slice SLA
SLA of slices and its real time monitoring is the key requirements of business , however imagine a situation where shortage of shared resource pool make impossible to deliver the SLA
RAN#10 Slice Operations
Slice operations is not just the view of BSS and Operations as real time RAN resource usage and optimization is necessary , Have you ever thought how perfectly managed slice can exist with normal Telco service specially when you find there is a key event and many users will use service . I think this so some dimension still not well addressed . I have no hesitate to say when many CXO’s of enterprise convince them they should opt to build their own 5G private network this is exactly the problem they fear .
In today’s writeup i have tried to explain both the current progress, challenges and steps to build a successful slicing solution keeping the hat of a RAN architect , i believe this is very important to see the Radio view point which somehow i firmly believe has not gotten its due respect and attention in both standard bodies and by vendors , in my coming blog i shall summarize some key gaps and how we can approach it as still the slicing products and solutions are not carrier grade and it need further tuning to ensure E2E slicing and services fulfillment .