To implement the key strategic initiatives of building a "cyberpower, digital China, and smart society" outlined in the 14th Five-Year Plan, and to meet demands of users for high-quality cloud-network services, telecom operators are at the forefront of digital and intelligent transformation. However, the rapid expansion of IT infrastructure and the high user expectations for business continuity present new challenges for operations and maintenance in this sector. Accelerating business value delivery, minimizing business interruption during updates, and ensuring high-quality, efficient, and secure service cutovers have become major hurdles. A professional subsidiary of a leading Chinese telecom operator tackled these challenges by leveraging the CanWay Application Release Center. It enabled them to achieve integrated DevOps management across all business lines and implement full-link grayscale releases for core systems without downtime.
Stage One: Agile at Scale
01 Pain Point
Complex technology stack due to multi-cloud integration: As businesses gradually transition to cloud-native architectures, traditional and containerized architectures coexist. With IT scale expanding, tens of thousands of servers and containers are distributed across public clouds and on-premises data centers, making unified management difficult.
Fragmented tooling capabilities and inefficiencies: Some teams lack mature DevOps and application release capabilities, with many processes still manual and inefficient. Other teams have repeatedly built and maintained tools, leading to inconsistent technology stacks, high maintenance costs, low standardization, and increased complexity in advancing digital and intelligent initiatives.
Lack of standards and weak change control: There are no unified specifications for artifact management, pipeline standards, or application release. Change and cutover activities are implemented independently by each team, leading to unstable release success rates. Meanwhile, effective governance and control over change behaviors are missing.
02 Solution
This opeartor company introduced the CanWay Application Release Center to establish an end-to-end, full-process, enterprise-level, and integrated DevOps system. By enabling comprehensive quantitative measurement, agile development, and intelligent operations, it empowers the entire network business, reduces operational burdens, lowers costs, and improves efficiency. Key achievements include unified management of massive heterogeneous hosts, support for multiple release types and scenarios, unified tools, integrated artifact management, fine-grained production permission control, systematic tool governance, and standardized specifications for service deployment.


Build Compilation Automation: A pipeline assembly is introduced to lower the barrier to use, integrating code inspection and build compilation to enhance development efficiency. From a management perspective, non-compliant code and unsafe operations are effectively intercepted during the process, ensuring secure and compliant build flows for both processes and delivered applications.
Artifact Management: The artifact repository is further enhanced by incorporating security scanning capabilities to detect vulnerabilities and non-compliant licenses, proactively blocking problematic artifacts and improving security control. All packages from local, private, and central repositories are managed centrally, including dependency packages in the processes, significantly reducing security risks associated with open-source dependencies. The solution supports cluster-based deployment across multiple data centers and horizontal multi-node scaling, enabling efficient sharing of IT R&D assets across clusters and synchronized delivery of multi-node application value.
Environment Management: It provides connectivity across complex network environments, enabling centralized management of deployment units in multi-cloud setups. Application configuration management capabilities are provided to record deployment architectures and configuration parameters related to release activities, providing a data foundation for automated application releases and ensuring configuration updates after release changes.
Host-Based Release: Establish automated release capabilities for steady-state applications deployed on hosts. Provide fundamental functions such as file distribution, script execution, load balancing management, and application status detection, while supporting various control logic including serial, parallel, conditional, and loop operations. It enables flexible orchestration in complex host release scenarios, with the system managing over 10,000 hosts.
Container-Based Release: Build automated release capabilities for agile business applications deployed on Kubernetes (K8s). Deliver container cluster management under the K8s architecture, including viewing status information of container objects and visual configuration management, thereby strengthening foundational capabilities during agile business releases, and managing over 80 container clusters.
03 Effects
Tool Integration: The Application Release Center now covers all business lines of the company, serving over 1,000 technical staff. And it has configured more than 6,000 CI pipelines, executed over 230,000 builds, set up CD release over 5,000 pipelines, and completed more than 170,000 releases across all environments.
Unified Standards: The project team has established a series of standards, including role permissions, pipeline specifications, construction environment standards, and release script specifications. These standards have been promoted company‑wide via the Application Release Center, with regular measurement and assessment.

Cost Reduction and Efficiency Improvement: For small and medium-sized businesses, the transition from manual to automated processes has reduced the average release time from 30 minutes to 3 minutes, achieving a tenfold increase in efficiency and saving an estimated more than 3,000 person-days annually.
The Application Release Center integrates capabilities across IT R&D, testing, and operations, solving the problem of redundant, siloed construction across the entire network. By automating the entire software development lifecycle, collaboration and automated processes have significantly accelerated software delivery speed, improving application release efficiency by more than five times while ensuring higher quality product output. Additionally, it provides business personnel with a streamlined, standardized, normalized, and visualized full-process management platform for IT R&D and O&M.
Stage Two:Innovation and Practical Application
The first phase of the Application Release Center was well received by business teams. However, new pain points emerged regarding the inefficiency and broad impact of nighttime change.
01 Pain Point
Broad Influence of Failures: The current practice of halting all services during change means any issue affects all users across the entire network. The platform lacks the capability to verify production functions with a small percentage of production traffic, failing to narrow the failure impact range.
Long Hotfix Cycles: Even minor flaws in production must wait until the next day’s change window, prolonging their impact. When major issues occur, rollbacks inevitably interrupt business operations.
High Cost of Nighttime Changes: Nighttime changes for core services require the joint participation of R&D, testing, and operations teams, leading to high manpower consumption, and low efficiency. Frequent nighttime changes also overburden the teams.
02 Solution
Grayscale Release (also known as Canary Release) is a deployment strategy that enables a smooth transition between the old version and the new version. It supports A/B testing, where most users continue using the old version (feature A), while a subset of users (the grayscale group) begin using the new version (feature B). If the new version functions normally, the traffic scope gradually expands until all users are migrated to version B. If issues are detected in the new version, a quick rollback to version A can be performed. Grayscale release ensures overall system stability, allowing issues to be detected, analyzed, and resolved as traffic gradually shifts, thus minimizing impact.
The CanWay Application Release Center leverages Application Service Mesh technology to provide proxyless mesh traffic coloring, isolation, and grayscale traffic control capabilities without intrusive modifications. It enables new features to be validated in production grayscale environments before being promoted across the entire network, thus reducing the impact scope, and progressively achieving fully automated, full-link grayscale releases without downtime.

1.Traffic Coloring: In the grayscale release process, users can be precisely identified based on business policies, such as designated internal test accounts, users with specific client versions, or users with ID numbers ending in particular digits. At the ingress service gateway, traffic can be colored and processed according to business-defined traffic labels and then routed to different versions of ingress microservices, enabling ingress traffic governance.
2.Proxyless Mesh: To enable label propagation, traffic control, conditional routing, and other service capabilities for grayscale releases across the entire microservices architecture, a service mesh architecture is adopted. The service mesh control plane also provides advanced capabilities such as traffic throttling, service degradation, traffic replay, and mirrored traffic testing. Service meshes are typically implemented through data-plane sidecar proxies (such as Envoy), Java Agent bytecode injection, or SDK embedding.The Envoy approach consumes significant resources, while SDK embedding requires code modifications on the business side. Given the high proportion of Java applications in this project, the Java It dynamically enhances the original traffic processing logic when the service process starts, aligning with the full lifecycle of the service runtime. Additionally, it shortens the traffic forwarding path (from application process to sidecar), delivering higher performance.

3.Coloring Propagation: To enable full-link grayscale releases across microservices, coloring labels must be passed throughout the entire invocation chain. The system identifies coloring labels in HTTP headers or RPC protocol contexts within ingress traffic by adapting java agent, determines the traffic characteristics, and appends coloring identifiers to egress traffic after completing its own logic processing. This ensures transparent propagation of traffic coloring markers throughout the complete service invocation link.
4. Conditional Routing: When traffic reaches version V1 of Service B, it needs to route release-tagged traffic to version V1 of Application C and gray-tagged traffic to version V2 of Application C based on the propagated coloring labels. At this point, the Java Agent accesses the Nacos registry to obtain service addresses for Application C versions V1/V2 and dynamically distributes traffic according to the traffic labels. This enables dynamic routing of traffic between services during grayscale releases.
5. Logical Swimlane Isolation: Through coloring propagation and conditional routing, a group of services consuming the same coloring markers and associated through release dependencies can be considered a logical swimlane. Traffic within each swimlane is relatively independent and can take effect in parallel. Multiple logical swim lanes can simultaneously support parallel grayscale validation of multiple product features. Additionally, during online issue troubleshooting, swimlanes can lock down upstream and downstream services for fault demarcation, thus improving problem diagnosis efficiency.





















