Integrating AI and data to monetize 5G Cloud Platforms

As of Q1 2022 for 5G adoption we have just passed 600M mark and expected to hit 2.5B connections by 2025 that is almost 500M every year . If we combine IoT and Device ecosystem the scale can go horrendously big .One of the biggest advantage and also challenge that comes from this once in a life time opportunity is “scale” .

Simply putting in to context “automation” and using ML/AI is a must to achieve both Network SLA’s ,efficiency and Optimizing Network TCO

AI has potential in creating value in terms of enhanced workload availability and improved performance and efficiency for 5G and Telco Cloud . However the biggest problem when it comes to use “AI” and Machine learning Telco’s is “Data” and “data Models” because simply there is no standardization or model definition on how Telco systems including Infrastructure expose the “Information” to upper layers Since data sets are huge in this domain with n permutations therefore first step to normalize is the Use case driven normalization of data that can be consumed both by Network and Data science domains . This will enable to develop a future Telco that can detect and also self heal itself .

Understanding Data Integration Architecture

Considering the 5G architecture which is based on Open API and Horizontal services design A.K.A SBA the Data integration and using AI should be an easy problem that can be divided in following to define a pipeline

  1. Telemetry and data
  2. Each layer data exposure as an API starting from Baremetal and then extending upwards towards Cloud , SDN , NFVO , Assurance etc
  3. Data models and engines to disseminate information

However it is easier said than done because of many reasons including

  1. What will be key data sets
  2. how FCAPS of each layer can be dis-aggregated i.e dropping one layer data without confirming dependency is a kill

Business Architecture

In order to address this we need to understand and gain experience from other industries and SDO’s and to see how it can both be agreed and integrated in Telco Networks , this lead us to approach this as a use case driven approach and select those domains and business challenges that can deliver quick results

"Follow the Money to deliver use cases that can monetize 5G

We have analyzed lot of use cases both from academia and industry and compiled a complete list here

From this we infer there are just too many ways Telco’s are solving same problems and this is what make us understand that there should be clear definition of “data Models” and use cases that should be defined at first steps .

The most important of which are :

  1. Using Machine Learning to Detect Noisy Neighbors in 5G Networks.

2.Towards Black-Box Anomaly Detection in Virtual Network Functions

3. Causality Inference for Failure in NFV

4. Self Adaptive Deep Learning Based System for Anomaly Detection in 5G

5. Correlating multiple Events and Data in an Ethernet Network

This leads us to define following as first steps for AI and Intelligence as applied to Telco’s

Source: LFN acumos

Analysis

Data Lakes , Log analysis and correlation

Detection

Anomaly detection including pattern detection , trend and Multi layer correlation

Prediction

Intelligent prediction including capacity ,SLA , Scaling and Cloud KPIs

Generation

Measure data and Synthetize it using frameworks like eBPF

Data Monetization is first to make 5G Profitable

Adressing both the Data Architecture and Business Architecture is vital as different Telco’s including in cases different BU’s in same Customer take it differently and what makes it worst is manipulate and store data lakes using different forms i.e Infrastructure metrics , Agents , Databases which is hard to apply between different data sets and hence it is biggest issue to Monetize one key assets of 5G which is “data” and hence to define a pipeline that can be shared between all of tenants including vertical industry

The latest White Paper: Intelligent Networking, AI and Machine Learning

Next Steps of “Thoth”

As said before we are defining few key use cases in LFN project “Thoth” to learn and elaborate from there applying concepts of Events , Anomaly and Prediction across layers and first phase use cases are

  1. VM failure
  2. Container Failure
  3. Node Failure
  4. Link Failure
  5. Middle layer Link failure

The detailed list can be seen here Use cases

Improving Telco’s Enterprise reach through Slicing and RAN control

Growing business in 5G era largely depends on ecosystem enablement and on idea that Telco’s can build a future Infrastructure that can deliver business outcomes for not only traditional Telco customers but also for broader verticals may it be Manufacturing ,Mining, Retail , Finance , Public safety , Tourism or eventually anything .

This means “Programmable” and “Automated” infrastructure is the base to achieve any such business outcome . Applying this to Telco’s 5G and Transformation journey will mean both “Network Slicing” and the “Private Networks” and although i totally agree with idea that both will co-exist and proliferate but to be fair it is a fact that although Network slicing has delivered outcomes in Labs and Demos it has still number of challenges when apply to practice .

Recently i have heard many views from many reputable names including my friend Dean Bubley and Karim so i thought to share my views on this topic highlighting some work we have been doing with our partners and customers in APAC as well in the GSMA and to allude to some improvements we have achieved in last year since i shared my views with industry on Network Slicing and its Delivery .

Today Network slicing has been live in a number of customers including Singtel that has achieved substantial outcomes with Slicing however the bigger challenge still remain un-answered

How will Network slicing address RAN resources ?

How will Network Slicing can help to monetize low hanging fruits of Edge together with Telco domain slicing

In 3GPP Release17 we have been doing some exciting progress on later with a new architecture and API exposure for co-deployment of Edge with RAN but again prior to this we need to extend the Slicing towards the Access Networks both from Technology and Business Architecture Perspective and this is what i will share in this paper

Experience Learnt from 5G Networks rollout

As many Telco’s in 2021+ accelerated 5G rollout and built 5G SA Core Networks one think proved more than before that limitation on

  • RAN resources are always scarce
  • AI need to be enabled to intelligently modify slicing in real time
  • Spectrum and RAN layers will be a top pressings time for Telco’s to deliver value
  • RAN resource isolation must follow performance: cost baseline
  • How to handle resources in peak time and pre-empt some over others is vital
  • Regulatory and GDPR is vital to achieve anything big in this domain

Orchestration must precede Network Slicing

From Above experience we can infer that it is really not about Network slicing but rather “Open” , “Control” and “API” to enable End to end Network slicing LCM and Orchestration all the way from UE to RAN to Edge to Core to Cloud

  • Dynamic control of resources with Telco level visibility in Key
  • RAN automation is first step to be done before Slicing change RAN resources
  • Cloud operations model is vital to support Network slicing because although there are many business verticals the Telco’s really have to build an efficient and Multi tenant operational model to win it

Cloud Operations Model that is secure and Multi tenant must be enabled across all Telecom Infrastructure

RAN SLA’s for vertical industry

The notion of Network slicing still lies in selling a SLA vs Selling a Network .

First of all RAN resources are always limited and secondly each vertical industry has its own traffic profile and trajectory which can never be planned using old Telecom simulation tools it means dynamic learning and resource adjustment is key . This all alludes to the fact that changing network while ensuring network KQI remain intact is something that require Full visibility and programmable Control

It leads us to consider following architecture first before Slicing is full enabled for the RAN

  • Slice LCM must be supported by automatic Infrastructure that is elastic and Telco grade at the same time
  • Scale out architecture must be enabled in RAN
  • RF and Spectrum resource scheduling is the most expensive and intricate resources for services and we must enable their dynamic control

Intelligent Networking ML/AI must be enabled first

Automation can deliver a myriad of outcomes including better control , real time changes , optimization , compliance and FCAPS for each tenant however it is not sufficient

Intent driven networks that uses power of data , ML and AI to orchestrate and adapt network is a capability that should be enabled on a network scale before network slicing can deliver a business outcome

It leads us to consider following architecture first before Slicing is full enabled for the RAN

  • Slice LCM must be supported by automatic Infrastructure that is elastic and Telco grade at the same time
  • Scale out architecture must be enabled in RAN
  • RF and Spectrum resource scheduling is the most expensive and intricate resources for services and we must enable their dynamic control

Components of a RAN Slice

Although the Core Slicing capability still exists on OSS and SMO layers that are outside the RAN still the real power of Slicing will come as we address the RT capability of RAN slicing which enables us to deliver following for a business tenant

  • RRM
  • Connection management
  • MM
  • Spectrum layers

All of this must be available to package as a NSSF functional instance as alluded below

Partnerships and Ecosystem

According to the latest GSMA report one use case enablement for any vertical will require at least 7 Players to work together , so RAN slicing or in other words Slicing Business outcomes is not a matter of one body or business to solve . Today to incrementally deliver the business outcomes following are key organizations collaborating to adress those challenges

  • ETSI NFV
  • GSMA
  • MEF
  • IETF
  • O-RAN and TIP
  • ONAP
  • 5GAA , ZVEI etc

We are also taking an aggregation approach where we are summing all the knowledge from these bodies and deliver as a outcome for our customers . you can reach out for more details .

How APAC Telco’s are applying Data, Cloud and Automation to define Future Mode of Operations

As CSPs continue to evolve to be a digital player they are facing some new challenges like size and traffic requirements are increasing at an exponential pace, the networks which were previously only serving telco workloads are now required to be open for a range of business, industrial and services verticals.  These factors necessitate the CSPs to revamp their operations model that is digital, automated, efficient and above all services driven. Similarly, the future operations should support innovation rather than relying on offerings from existing vendor operations models, tool and capabilities.

As CSPs will require to operate and manage both the legacy and new digital platforms during the migrations phase hence it is also imperative that operations have a clear transition strategy and processes that can meet both PNF and VNF service requirements with optimum synergy where possible.

In work done by our team with our customers specifically in APAC , future network should address the following challenges for its operations transformations.

  • Fault Management: Fault management in the digital era is more complex as there are no dedicated infrastructure for the applications. The question therefore arises how to demarcate the fault and corelate cross layer faults to improve O&M troubleshooting of ICT services.
  • Service Assurance: The future operations model requires being digital in nature with minimum manual intervention, fully aligned with ZTM (Zero Touch Management) and data driven using principles of closed loop feedback control.
  • Competency: To match operation requirements of future digital networks the skills of engineers and designers will play a pivotal role in defining and evolving to future networks. The new roles will require network engineers to be more technologist rather than technicians
  • IT Technology: IT technology including skills in data centers, cloud and shared resources will be vital to operate the network. Operation teams need to understand impacts of scaling elasticity and network healing on operational services

Apply ODA (Open Digital Architecture) for Future Operations Framework (FMO)

TM Forum ODA A.K.A Open Digital Architecture is a perfect place to start but since it is just an architecture and can lead to different implementation and application architecture so below i will try to share how in real brown field networks it is being applied . I will cover all modules except for AI and intelligent management which i shall be discussed in a separate paper .

Lack of automation in legacy telco networks is an important pain point that needs to be addressed promptly in the future networks. It will not only enable CSPs to avoid the toil of repetitive tasks but also allow them to reduce risks of man-made mistakes.

In order to address the challenges highlighted above it is vital to  develop an agile operations models that improves customer experience , optimize CAPEX , AI operations and Business process transformation

Such a strategic vision will be built on an agile operations model that can fulfill the following:

  • Efficiency and Intelligent Operation:  Telecom efficiency is based on data driven architectures using AI and providing actionable information and automation, Self-Healing Network capability and automation of network and as follows
    •  Task Automation & Established foundation
    •  Proactive Operation & Advanced Operation
    •  Machine managed & intelligent Operation.
  • Service Assurance: Building a service assurance framework to achieve an automated Network Surveillance, Service Quality Monitoring, Fault Management, Preventive Management and Performance Management to ensure close loop feedback control for delivery of zero touch incident handling.
  • Operations Support: Building a support framework to achieve automated operation acceptance, change & configuration management.

Phased approach to achieve Operational transformation

Based on the field experience we achieved with our partners and customers through Telecom transformation we can summarize the learning as follows

  • People transformation: Transforming teams and workforce that matches the DevOps concept to streamline organization and hence to deliver services in an agile and efficient manner. This is vital because 5G , Cloud and DevOps is a journey of experience not deployment of solutions , start quickly to embark the digital journey
  • Business Process transformation: Working together with its partners for unification, simplification and digitization of end-to-end processes. The new process will enable Telco’s  to quickly adapt the network to offer new products and to reduce time for troubleshooting.
  • Infrastructure transformation: Running services on digital platforms and cloud, matching a clear vision to swap the legacy infrastructure.

If PNF to VNF/CNF migration is vital the Hybrid Network management is critical

  • Automation and tools: Operations automation using tools like  workforce management, ticket management etc is vital but not support vision of full automation . The services migration to cloud will enable automated delivery of services across the whole life cycle. Programming teams should   join operations to start a journey where the network can be managed through power of software like Python, JAVA, GOLANG and YANG models. It will also enable test automation, a vision which will enable operations teams to validate any change before applying it to live network.

Having said this i hope it shall serve as a high level guide for architects adressing operational transformation , as we can see AI and Intelligent Managmeent is vital piece of it and i shall write on this soon .

Building Smart and Green Telecom Infrastructure using AI and Data

source: Total Telecom green summit

During last year industry has witnessed Telco’s increased spend and maturity in Cloud and Automation Platforms . During Pandemic it is proven that Digital and Cloud is the answer our customers require to design , build and Operate Future Telecom Networks .

The Second key Pillar forcing Telecom industry towards Autonomous networks is to deliver business outcomes while doing business responsibly .

Getting Business outcomes and doing a sustainable business that supports Green Vision has been a not related discussion in Telecom Industry

But now infusion of Data and Cloud is really enabling it , it is expected that we as industry can cutdown at least 50% of Power emissions in coming decade but how it will become possible . According to Pareto’s law the last 20% will be most difficult .

This is where my team main focus has been to build robust AI and Automation use cases that are intelligent enough and that solves broader issues . Today the biggest focus for ML/AI for Telco’s that can really put them lead such outcomes are

  1. Smart Capacity management
  2. O&M of networks that reduces emissions and improves availability
  3. Service assurance based on data

The biggest Challenge in Transformation is Fragmentation

The biggest bottleneck is making such outcomes is related to data . Intricately “Data” is both the problem and the Solution because of so many sources of truth and different ingestion mechanisms . Do check details on #Dell Streaming data platforms and how we are solving this problem

https://www.delltechnologies.com/en-au/storage/streaming-data-platform.htm

Today under the umbrella of Anuket , 3GPP , TMF and ITU are all collaborating to come a validated and composite solution to deliver those use cases . So in a nutshell it is vital to build a holistic and unified view to deliver data driven AI use cases

Scope and Scale of Intelligent automation

The biggest bottleneck is coming from the fact that in real world Telco Apps can never be fully cloud native , at some level both the state and resiliency requirements and App requirements has to be kept and to come with intelligent work load driven decisions . The decade long journey of Telecom Transformation has revealed that just building everything as a code and expecting it to work and Telco’s can rollback their NOC sizes simply not works .

This is where intelligence from layers above the Orchestration and SDN will be of help like google does in the Internet era .

The second biggest issue is in the Scalable Telco solutions itself , it is proven that Telco’s face unique challenges as they move from hundred’s to thousand of nodes . So imagine running AI for heterogenous environments each coming with different outcomes can never deliver power and scale Telco’s need in the new era .

Telco grade AIOPS models

It is true that with 5G and Business digital transformation the industry really want to ramp up to build an improved user experience and unified model to expand portfolio towards vertical markets as well , this is only possible if we can have a coordinated system , workflow management and data sharing and exposure with strict TSR security measures . Similarly this model should cover full LCM including FCAPS model .

Building Intelligent Telco’s

Although using AI and ML is an exciting ambition for a Telco still the bottom line is how to build these platforms on top of NFVI and Existing Orchestration and Automation frameworks . In other words really business case to build an intelligent networks starts with using Data and ML to automate the entire network . Although in this aspect the scope can extend not just to service domain but also to business domains i.e automate business process including event correlation , anomaly and RCA

Building a Unified AI Platform

Although this intention or target is clear however in context of networks this is complex as we need to solve challenge of data security , regulation as well as what it really means to do the certification of an AI platform because focus should be given that allow this layer to be built from solution from many vendors so a loose coupling with more focus on Network service and AI algorithms is a key to build this platform

Instead of focusing on network element certification focus of AI platform is service level compatibility , data models and AI algorithms

However lack of unified standard specially on trusted data normalization , sharing and exposure is certainly forcing operators to adopt a Be-Spoke solutions to build AI platforms and that itself is a big impediment to wide scale adoption of AI and ML in the Networks

To move forward more close collaboration between different standard bodies and governance by more Telco centric organization like TMF is the answer with immediate focus to be given to Data standardization , labs integration and to enable shared data sets and algorithms to evolve and support wide deployments of ML and AI in Telecom Networks

Latest Industry progress and standardization

Although this is the early time of AI platforms standardization still we need to aggregate the progress between different bodies lest we can only expect the plethora of silo solutions each with a different specifications

  • ONAP as baseline of automation platform has components like DCAE and AI engine that makes sense to make it the defacto baseline standard
  • Anuket is the Cloud Infrastructure reference and it has recently launched a new project “Thoth” to look in to AI network standardization
  • ETSI ZSM is E2E automation platform across full LCM of a Telecom network and certainly an important direction
  • ETSI ENI or enhanced network intelligence is another body that closely defines AI specifications in the context of Telecom
  • TMF as a broader Telecom body is defining architectures including ODA and AIOPS that really breaks down on how a Telco can take a phased approach to build these platforms

Above all early involvement and support from Telecom operators and partners is very important to realize this goal . I hope in this year we will see more success and standardization on these initiatives so lets work together and stay tuned .

Telco Cloud and Application characteristics for 5G and Edge Native World

@ETSI

Cloud computing use in the Telecom industry has been increasingly adopted during the last decade. It has changed many shapes and architectures since the first phase of NFV that started back in 2012. In today’s data hungry world there is an increasing demand to move Cloud architectures from central clouds to loosely coupled distributed clouds both to make sense from cost perspective by slashing transport cost to anchor all user traffic back to central data centers but also certainly from security perspective where major customer prefers to keep data on premises. Similarly, with the Hyperscale’s and public cloud providers targeting Telco industry it is evident that the future Cloud will be a fully distributed and multi cloud constituted by  many on-premise and public cloud offerings.

Since 5G by design is based on Cloud concepts like 

  • Service based architectures
  • Micro services
  • Scalar
  • Automated

Hence it is evident that many operators are embarking on a journey to build open and scalable 5G clouds that are capable to handle the future business requirements from both Telco and industry verticals. The purpose of this paper is to highlight the key characteristics of such Clouds and how we must collaborate with rich ecosystem to make 5G a success to achieve industry4.0 targets.

Cloud Native Infrastructure for 5G Core and Edge

Cloud native do not refer to a particular technology but a set of principles that will ensure future Applications are fully de-coupled from the Infrastructure, on atomic level it can a VM or container or may be futuristic serverless and unikernels. As of today, the only community accepted Cloud native standard for 5G and Cloud is an OCI compliant infrastructure. In general cloud native for Telco means a Telecom application as per 3GPP , IETF and related standard  that meets criteria of Cloud native principles as shared in this paper, support vision of immutable infrastructure, declarative and native DevSecOps for the whole Infrastructure.

Cloud native is the industry defacto for develop and deliver applications in the Cloud and since 5G by its design is service based and microservice enabled so the basic principle for 5G infrastructure is Cloud native which will support scalability, portability, openness and most importantly flexibility on board a wide variety of Applications.

As per latest industry studies the data in 5G era will quadruple every year this will make Cloud native a necessity to provision automated infrastructures that will be fully automated, support common SDK’s and above all will enable CI/CD across the full application life cycle

Scalability to deploy services in many PoP’s is the other key requirements for 5G along with possibility to build or tear the service on the fly. As 5G deployments will scale so is cloud instances and it is a necessity that future Cloud infrastructure can be scaled and managed automatically

Application portability is the other key characteristics of 5G cloud. As 5G use cases will become mature there is an increasing requirement to deploy different applications in different clouds and to connect them is a loosely based meshes. In addition, as Network capacities and usage will increase the applications must be capable to move across the different clouds

What Cloud means for Telco 5G

Telco operators through their mission critical infrastructure holds a seminal place in the post covid-19 digital economy. Telecom networks use impacts economy, society, commerce and law order directly this is why Telecom networks are designed with higher availability, reliability and performance.

The biggest challenge for Cloud Native Infrastructure for

Telco lies

  • Granularity of Telco App decomposition
  • Networking
  • Performance acceleration
  • O&M and Operational frameworks

Due to the reason that Telco 5G applications need to fulfill special SLA based performance and functions which somehow are not possible in the containerized and Kubernetes based Cloud platforms of today so we must define a Telco definition of Cloud. Similarly, how we will connect workloads E-W is very important. The questions become more prevalent  as we move towards edge .  The downside is that any deviation from standard Cloud native means we cannot achieve the promised of Scaling, performance and distribution the very purpose for which we have built these platforms ,

Any tweaks on the cloud principles means we can not provision and manage a truly automated Cloud Infrastructure following DevSecOps which is so vital to deliver continuous updates and new software codes in the 5G infrastructure. Lacking such functions means we can not meet fast pace innovation requirements which are necessary for the 5G new use cases specially for the vertical markets

The last and most important factor is leveraging advances from hyper scalers to achieve Cloud and 5G deployments , today we already see a movement in market where a carrier  grade Clouds from famous distros like IBM can be deployed on top of public clouds but here top question  impacting is whether  “abstraction will impact performance” ,  the one top reason NFV first wave was not such disruptive because we defined so many models and used model to define another model which obviously added to complexity and deployment issues . Cloud Native for 5G Telco need to address and harmonize it as well

Applications for 5G

Application economy is vital for the success of 5G and Edge . However, based on T1 operators’ deployments of open 5G platforms has revealed that just deploying a open Infrastructure is not enough as adherence of Cloud by application vendors will vary and to truly take advantage of Cloud it is vital to define principles for a Infrastructure lead delivery by devising frameworks and tools to test and benchmark the 5G applications  classification as Gold , Bronze , Silver with common direction to achieve a fully gold standard applications in the 5G era . Although Cloud native by principles support vision to achieve common, shared and automated infrastructure but it is easier said than done in real practice as achieving a Telco grade conformance for

Telco services is complex that require rigorous validation and testing. Based on real Open 5G cloud deployments and corresponding CNF benchmarking there are still certain gaps in standards that need  both standardization and testing.

  • Application resources over committing
  • Application networking dependence that slows scaling
  • Use of SDN in 5G Cloud
  • Lack of Open Telemetry which makes customized EMS mandatory
  • Hybrid management of VNF and CNF’s

Luckily there are a number of industry initiatives like CNCF ConformanceCNTT RI2  , NFV NOC ,  OPNFV  which fundamentally address this very issues and already we have seen the results  . It is vital that 5G Cloud infrastructures are capable to support east to use SDK’s and tools that vendors and developers can use flexibly to offer and deploy different applications in the 5G era.

In Next Part I shall try to elaborate how Open Telemetry and Automation is driving next era of growth using ML and AI driven solutions

How to Migrate applications to the cloud

Building #Cloud and #Applications on #Cloud is only half a journey , biggest issues surface as we traverse the last 10% which is #Migration and #Handoff to #Operations ?

Questions like #Coexistence ,#NewOnM models and #Processes reengineering required to address following challenges:
a. Theoretically zero downtime by building E2E Architecture e.g UPF pool
b. Delivery pipeline for migration through unified tools solving touching every NE one by one an improve TTM
c. Service consistency automatic verification between Cloud and Legacy
d. 5 9’s reliability and TSR gold standard security

Customers expect #partners who offer #Operational tools and services optimized for
1) E2E whole solution support ownership in MVI
2) SPOC support services
3) Tools and Platform services specially on PaaS and NFVO to co-develop
4) Cloud assurance services specially for #business readiness , #Transition , #TaaS ,#Solution emergency support and CSR

And Partner must work with #Telco to #remodel processes starting with
1) Incident handling for L1
2) Config management supporting multi layers
3) Change management with focus on scaling , SDN and Policy enforcement
4) Release management supporting #devops viz staging to prod

#Cloud#Migration#Operations#Services#Applications#Networking#5G#Edge

P4 programming with Intel 3rd Gen can help build standard Telco Edge Infrastructure Models

Retrieving data in the edge computing environment.

Recent PoC’s of  P4 programming models with multiple registers alongside Intel 3rd gen  processors and agileX(10nm) gave critical indications about Future delivery model of 5G Edge Infrastructure and Networks by solving following Telco requirements  

Telco Key Edge Infrastructure Requirements

  1. Workload placements at edge with possibility to make them portable based on real time use
  2. Enhancing Infrastructure models and form factors for the Edge
  3. Evaluate P4 as baseline to build unified model to program Compute/Network , VNF/CNF got real time use cases like traffic steering

Results with Smart NICS and Latest X86 Chips

Below are key findings

  1. The hardware can deliver consistent throughput (Required 10Gbps per core ) till 10 register pipelines and degrades exponentially after it @12% for each pair of registers after this
  2. Impact on latency is prominent as we increase registers like 40-50% variation after register >10
  3. P4 architecture with Intel 3rd generation can help us solve and optimize for this issues dramatically

Key Achievements with P4 in last year

  1. Unified programming model from Core to Cloud to Edge demo
  2. Vendors like Xilinx ,Intel , Dell , barefoot etc fully supporting accelerate
  3. 12Tbps programmable chips available for PoC at least in Labs
  4. P4 can build CI/CD for Hardware infrastructure to ensure Infra resilience

Current Challenges and Targets in 2021

Below are still some gaps that is expected to be addressed to ensure Telco Grade Edge

  1. P4 is good in Core but in Edge it needs needs to be improved especially Common API models
  2. Performance on Latency is key to build Edge Infrastructure
  3. P4 is not a modeling language but switching model language how to abstract it on service level is a issue
  4. VNF partners ecosystem specially drivers on Cloud and VNF side
  5. Can GPU be of help solve multiple register pipelines
  6. How P4 can work with IPV4 to build use cases like Slicing

Finally most important need that need more cohesive community effort is Telemetry and topology till now we only have less references like from deepinsights on P4 refer to below

References

  • Advanced Information Networking and Applications: Proceedings , Volume 1
  • Toyota Edge Proceedings
  • IEEE

Delivering Edge Architecture Standardization in Edge user group

Edge Deployments are gaining momentum in Australia APAC and rest of markets , due to sheer size of Edge there are new challenges and opportunities for all players in the Ecosystem including

  • Hardware Infrastructure suppliers i.e Dell, HPE , Lenovo etc
  • On-prem Cloud vendors like RedHat ,VMware
  • Hybrid Cloud companies like IBM , Mirantis
  • Public Cloud and Hyperscalers like AWS , Azure , Google etc
  • SI’s like Southtel , Tech Mahindra etc

However one thing which a Telco community need to do is to make a standard architecture and specifications of Edge that will support not only build a thriving ecosystem but also achieve promises of global scale and developer experience . Within Open Infrastructure community we have been working with in Open Infra Edge computing group to achieve exactly this .

Focus Areas

Following is the Scope and areas we are enabling today

  • Defining Reference Architectures for Edge Delivery model in the form of Reference Architectures , Reference Model and Certification process where we are working together with #GSMA and #Anuket in Linux Foundation
  • Defining Use cases based on real RFX and Telco customer requirements
  • Requirements prioritization for each half year
  • Enabling Edge Ecosystem
  • Output the White paper specially on Implementation and Testing Frameworks

Edge Architectures

Alongside Linux foundation Akraino blueprints we are enabling blue prints and best practices in Edge user group however we are emphasizing that the Architecture remains as vendor agnostic as possible with different flavors and vendors solving following challenges Edge Computing Group – OpenStack

  • Life-cycle Management. A virtual-machine/container/bare-metal manager in charge of managing machine/container lifecycle (configuration, scheduling, deployment, suspend/resume, and shutdown). (Current Projects: TK)
  • Image Management. An image manager in charge of template files (a.k.a. virtual-machine/container images). (Current Projects: TK)
  • Network Management. A network manager in charge of providing connectivity to the infrastructure: virtual networks and external access for users. (Current Projects: TK)
  • Storage Management. A storage manager, providing storage services to edge applications. (Current Projects: TK)
  • Administrative. Administrative tools, providing user interfaces to operate and use the dispersed infrastructure. (Current Projects: TK)
  • Storage latency. Addressing storage latency over WAN connections.
  • Reinforced security at the edge. Monitoring the physical and application integrity of each site, with the ability to autonomously enable corrective actions when necessary.
  • Resource utilization monitoring. Monitor resource utilization across all nodes simultaneously.
  • Orchestration tools. Manage and coordinate many edge sites and workloads, potentially leading toward a peering control plane or “selforganizing edge.”
  • Federation of edge platforms orchestration (or cloud-of-clouds). Must be explored and introduced to the IaaS core services.
  • Automated edge commission/decommission operations. Includes initial software deployment and upgrades of the resource management system’s components.
  • Automated data and workload relocations. Load balancing across geographically distributed hardware.
  • Synchronization of abstract state propagation Needed at the “core” of the infrastructure to cope with discontinuous network links.
  • Network partitioning with limited connectivity New ways to deal with network partitioning issues due to limited connectivity—coping with short disconnections and long disconnections alike.
  • Manage application latency requirements. The definition of advanced placement constraints in order to cope with latency requirements of application components.
  • Application provisioning and scheduling. In order to satisfy placement requirements (initial placement).
  • Data and workload relocations. According to internal/external events (mobility use-cases, failures, performance considerations, and so forth).
  • Integration location awareness. Not all edge deployments will require the same application at the same moment. Location and demand awareness are a likely need.
  • Dynamic rebalancing of resources from remote sites. Discrete hardware with limited resources and limited ability to expand at the remote site needs to be taken into consideration when designing both the overall architecture at the macro level and the administrative tools. The concept of being able to grab remote resources on demand from other sites, either neighbors over a mesh network or from core elements in a hierarchical network, means that fluctuations in local demand can be met without inefficiency in hardware deployments.

Edge Standards under Review

Although owing to Carrier grade Telco service requirements on the Edge preference has always been on StarlingX and this is what are maturing to GA but there are many other standards we are standardizing at the Edge as follows

StarlingX

Complete cloud infrastructure solution for edge and IoT
• Fusion between Kubernetes and OpenStack
• Integrated stack
• Installation package for the whole stack
• Distributed cloud support

K3S and Minimal Kubernetes

  • Lightweight Kubernetes distribution
  • Single binary
  • Basic features added, like local storage provider, service load balancer, Traefik ingress controller
  • Tunnel Proxy

KubeEdge specially for IOT

  • Kubernetes distribution tailored for IoT
  • Has orchestration and device management features
  • Basic features added, like storage provider, service loadbalancer, ingress controller
  • Cloud Core and EdgeCore

Submariner

  • Cross Kubernetes cluster L3 connectivity over VPN tunnels
  • Service discovery across clusters
  • Connects clusters with overlapping CIDR-s

Call for Action

Weekly meeting on Mondays at 6am PDT / 1300 UTC
https://wiki.openstack.org/wiki/Edge_Computing_Group#Meetings
● Join our mailing list and IRC channel for more edge discussions
http://lists.openstack.org/cgi-bin/mailman/listinfo/edge-computing
○ #edge-computing-group channel on Freenode

Procedure to Join MIRC Channel

Following are the Steps to join as many guys reported they find issues in MIRC latest version after 7.5 so i wanted to give some summary here

Step1: Registration and Nickname setitngs

You may see some notices from Nickserv that the nick you use is already taken by someone else. The notice looks like this: Nickserv notice Well in this case you need to choose another nickname. You can do this easily by typing

/nick nick_of_your_choice

/nick john_doe

Nickserv will keep telling you this notice until you found a nick, which is not registered by someone else. If you want to use the same nick every time when you connect you may register it. The service called Nickserv handles the nicks of all registered users of the Network. The nick registration is free and you just need an email to confirm that you are a real person. To register the nick you currently use type

/nickserv register password email

/nickserv register supersecret myemail@address.net

Note: Your email address will be kept confidential. We will never send you spam mails or mails were we request private data (like passwords, banking accounts, etc). After this you will see a notice from nickserv telling you this:

– NickServ – A passcode has been sent to myemail@address.net, please type /msg NickServ confirm <passcode> to complete registration

Check your email account for new mails. Some email providers like hotmail may drop our mail sent by our services into your spamfolder. Open the mail and you will find a text like this:

Hi, You have requested to register the following nickname some_nickname. Please type ” /msg NickServ confirm JpayrtZSx ” to complete registration. If you don’t know why this mail is sent to you, please ignore it silently. PLEASE DON’T ANSWER TO THIS MAIL! irchighway.net administrators.

Just copy and paste the part /msg NickServ confirm JpayrtZSx into your status window of you mIRC. Then press the enter key. A text like:

– *NickServ* confirm JpayrtZSx – 
– NickServ – Nickname some_nickname registered under your account: *q@*.1413884c.some.isp.net –
– NickServ – Your password is supersecret – remember this for later use.
– * some_nickname sets mode: +r

should appear after this. This means you finished your registration and the nick can only be used by you or you can force someone else if he/she uses your nick to give it back to you. If you disconnect then you need to tell nickserv that the nick is yours. you can do that by:

/nickserv identify password e.g. /nickserv identify supersecret

if the password is correct it should look like this:

* some_nickname sets mode: +r – 
– NickServ – Password accepted – you are now recognized.

In mIRC you can do the identification process automatically so you don’t have to care about this anymore. Open the mIRC Options by pressing he key combination Alt + O then select the category Options and click on Perform you will see this dialog: Perform window

Check Enable perform on connect and add: if ($network == irchighway) { /nickserv identify password } in the edit box called Perform commands Close the options by clicking OK. Now your mIRC will automatically identify you every time you connect to IRCHighway.

Step2: Setting SAS/CAP authentication

mIRC added built-in SASL support in version 7.48, released April 2017. The below instructions were written for version 7.51, released September 2017. Earlier versions of mIRC have unofficial third-party support for SASL, which is not documented here. freenode strongly recommends using the latest available version of your IRC client so that you are up-to-date with security fixes.

In the File menu, click Select Server…
In the Connect -> Servers section of the mIRC Options window, select the correct server inside the Freenode folder, then click Edit
In the Login Method dropdown, select SASL (/CAP)
In the second Password box at the bottom of the window, enter your NickServ username, then a colon, then your NickServ password. For example, dax:hunter2
Click the OK button

Step3: Joining Channel

Following command to join the channel , best of luck

/connect chat.freenode.net 6667 SID_SAAD:XYZPASSWORD
/join #edge-computing-group

References

  1. https://gist.github.com/xero/2d6e4b061b4ecbeb9f99
  2. https://irchighway.net/14-blog/gaming/14-i-m-new-to-irc
  3. https://freenode.net/kb/answer/mirc
  4. https://www.delltechnologies.com/en-au/solutions/edge-computing/index.htm?gacd=9685819-7002-5761040-271853941-0&dgc=ST&gclid=CjwKCAjwqIiFBhAHEiwANg9szpqx5CQ3z_Q5oeI1eTXLtfXVNDBJSj_vNinJFO7667YIywxAQIlPARoCIogQAvD_BwE&gclsrc=aw.ds
  5. https://www.redhat.com/en/topics/edge-computing/approach
  6. https://aws.amazon.com/edge/
  7. Kubecon Euope April 2021 session by Ildikó Váncsa (Open Infrastructure Foundation) – ildiko@openinfra.dev and colleague Gergely Csatári (Nokia) – gergely.csatari@nokia.com

Understanding Openshift-4 installation for Developer and Lab Environments

As Linux is the defacto OS for innovation in the Datacenters sameway the OpenSHift is proving to be a Catalyst for both Enterprise and Telco’s Cloud transformation . In this blog i will like to share my experience with two environments one is minishift that is a home brew environment for developers and others based on Pre-existing infrastructure .

As you know Openshift is a cool platform as a part of these two modes it support a wide variety of deployment options including hosted platforms on

  • AWS
  • Google
  • Azure
  • IBM

However for hosted platforms we will use full installers with out any customization so this is simply not complex provided you must use only Redhat guide for deployment.

Avoid common Mistakes

  • As a pre requisite you must have a bastion host to be used as bootstrap node
  • Linux manifest NTP , registry ,key should be available while for Full installation the DNS is to be prepared before cloud installer kicks in .
  • Making ignition files on your own (Always use and generate manifest from installers)
  • FOr Pre-existing the Control plane is based on Core OS while workers can be RHel or COreOS while for full stack everything including workers must be based on CoreOS
  • Once started installation whole cluster must be spinned within 24hours otherwise you need to generate new keys before proceed as controller will stop ping as license keys have a 24hour validity
  • As per my experience most manifest for full stack installation is created by installers viz. Cluster Node instances , Cluster Networks and bootstrap nodes

Pain points in Openshift3 installation

Since most openshift installation is around complex Ansible Playbooks , roles and detailed Linux files configuration all the way from DNS , CSR etc so it was a dire need to make it simple and easy for customers and it is what RedHat has done by moving to Opinionated installation which make it simple to install with only high level information and later based on each environment the enterprise can scale as per needs for Day2 requirements , such a mode solves three fundamental issues

  • Installer customization needs (At least this was my experience in OCP3)
  • Full automation of environment
  • Implement CI/CD

Components of installation

There are two pieces you should know for OCP4 installation

Installer

Installers is a linux manifest coming from RedHat directly and need very less tuning and customization

Ignition Files

Ignition files are first bootstrap configs needed to configure both the bootstrap , control and compute nodes .If you have managed the Openstack platform before you know we need separate Kickstart and cloud-init files and in Ignition files process RedHat makes simple both steps . For details on Ignition process and cluster installation refer to nice stuff below

Minishift installation:

Pre-requisites:

Download the CDK (RedHat container development Kit) from below :
https://developers.redhat.com/products/cdk/hello-world/#fndtn-windows

  1. copy CDK in directory C:/users/Saad.Sheikh/minishift and in CMD go in that directory
  2. minishift setup-cdk
  3. It will create .minishift in your path C:/users/Saad.Sheikh
  4. set MINISHIFT_USERNAME=snasrullah.c
  5. minishift start –vm-driver virtualbox
  6. Add the directory containing oc.exe to your PATH
    1. FOR /f “tokens=*” %i IN (‘minishift oc-env’) DO @call %i
  7. minishift stop
  8. minishift start
  9. Below message will come just ignore it and enjoy
    error: dial tcp 192.168.99.100:8443: connectex: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. – verify you have provided the correct host and port and that the server is currently running.
    Could not set oc CLI context for ‘minishift’ profile: Error during setting ‘minishift’ as active profile: Unable to login to cluster
  10. oc login -u system:admin

The server is accessible via web console at:
https://192.168.99.100:8443/console

You are logged in as:
User: developer
Password:

To login as administrator:
oc login -u system:admin

Openshift installation based on onprem hosting

This mode is also known as UPI (User provided infrastructure) and it has the following the key steps for OCP full installation

Step1: run the redhat installer

Step2: Based on manifests build the ignition files for the bootstrap nodes

Step3: The control node boots and fetches information from the bootstrap server

Step4: The etcd provisioned on control node scales to 3 nodes to build a 3 control nore HA cluster

Finally the bootstrap node is depleted and removed

Following is the scripts i used to spin my OCP cluster

1#@Reboot the machine bootstrap during reboot go to PXE and install CoreOS

2#openshift-install --dir=./ocp4upi

3@rmeove the bootstrap IP's entries from /etc/haproxy/haproxy.cfg 
4# systemctl reload haproxy

5#set the kubeconfig ENV variables 
6# export kubeconfig=~/ocp4upi/auth/kubeconfig

7# verify the installation 
8# oc get pv
9# oc get nodes
10# oc get custeroperator

11#approve any CSR and certificates 
12# oc get csr -o go-template='{{range.items}}{{if no .status}}{{.metadata .name}}{{""\n""}}{{end}} | xargs oc adm certificate approve

13#login to OCP cluster GUI using 
https://localhost:8080

Do try it out and share your experience what you think about OCP4.6 installation .

Disclaimer: All commands and processes i validated in my home lab environment and you need tune and check your environment before apply as some tuning may be needed .