China Mobie Communications Group Beijing Co., Ltd. (Hereinafter referred to as Beijing Mobile) was established in 1999 and is affiliated to China Mobile Communications Group Corporation. It adheres to the corporate core values of "being a world-class enterprise and becoming a mobile Information expert" and closely focuses on "being a world-class enterprise." "Strategic positioning, forging first-class Information Service with excellent quality, striving to achieve a new leap from excellence to excellence with innovative spirit, and striving to promote" mobile changing life ."
01. Initial exploration of automation, transformation urgently needs new momentum
Since its establishment, Beijing Mobile has always maintained its status as a Information-based leader in the industry, adhering to the mission of "pursuing excellence and making digital life better" and volunteering to become the "leader of Customer's preferred digital Service." In order to quickly respond to business needs, the IT team started from the construction of the technical platform in the early stage and used open source software to build a basic automated operations system to achieve batch automation of scripts and files. However, there are still problems such as incomplete Configure Data coverage, lack of job execution control methods, and lack of operations scenarios. Insufficient capabilities, technology Architecture relatively simple, etc. Problem.
In order to improve system stability and ensure stable business operation, Beijing Mobile joined hands with Jiawei Blue Whale to create an intelligent operations platform, build operations scenarios such as Configure Data Management, Monitoring Management, log Management, unified Alert Management, automated inspection, and operation Service reports, and realize Management process innovation., improve intelligent operations capabilities, and move towards an operations model with business value.
02. "Platform + application" to efficiently expand application scenarios
Through the PaaS platform +SaaS scenario method Build integrated operations Management system, a basic platform integrates public capabilities to support the operation, collaboration, and Service of scenario applications. Scenario applications include five major applications: Configuration management CMDB, IT Monitoring Alert, log platform, automated inspection, and operational Service reporting. Various applications can be quickly and flexibly expanded in the future.
Platform scale: Currently, Production environment supports managed 200 and node Service devices, manages eight sets of application systems, and deploys and runs sixteen operations scenario tools;
Platform integration: Integrates with the 4A system and implements message notification through integration with the email system;
Unified control: The management and control platform is used to manage hosts in different Network areas Domain.

03. Accelerate operations efficiency and ensure stable business operation
1) CMDB automatic collection, unified resources Management
Build Cloud operations cornerstone CMDB realizes centralized Management of Asset Data and provides Configure Data Service for various operations scenarios for applications. By providing Configuration management Service, the accuracy and consistency of Relationships and Assurance Data between applications are mapped in combination with Data and model; And promote it with the idea of integration, and finally face application consumption, give full play to the value of Configure Service, and realize centralized and standardized Management and consumption of IT resources.
The cloud platform A creation model covers a total of 30+ hosts, businesses, cloud virtual resources, security devices, infrastructure, Network, Service devices, cloud platforms, etc.; Complete access to VMware and Huawei cloud resources at the same time; There are six managed hosts with 120 + and access services. Configure discovery Configure collection task runs 270 times;
Cloud platform B creates 30+ models including Data center, computer room, cabinet, etc., and completes model Data entry;
Nearly 1600 virtual machine instances are automatically collected and entered, more than 160 Service devices are calculated, more than 110 Service devices are stored, and more than 1200 virtual resource volumes are stored.

2) Massive object management, all-in-one Monitoring Escalation
Based on the platform's rich Data collection, Data processing and plug-in expansion capabilities, it integrates the existing Monitoring platform to realize Monitoring Alert for various Network devices, host devices, storage devices, middleware, Database, and key application processes. By centralizing Management Alert Information and adopting unified convergence, masking, correlation analysis, automated processing and other means to improve the effectiveness of Alert, reduce false alarms and omissions, realize closed-loop Management of Alert from access, convergence, processing, and assignment, and improve Alert processing efficiency. Ensure stable operation of the system.
Monitoring access: Managed four business systems, a total of 120 + hosts Monitoring, four Network ports dial test Monitoring, 20+ processes Monitoring, three custom scripts Monitoring, a total of 70+ Monitoring Metric.
Alert access: Alert access to thirteen Alert sources such as Huawei Cloud, Zabbix, and VMware, completely covering layer one and layer two Hardware Monitoring, and layer three Monitoring covering four business systems.

3) Log linkage Alert, Network health protection at any time
After log Monitoring Data Alert Configure is completed, the Network device health status can be obtained through log Data analysis of the Network device. At the same time, the number of log levels of the Network device can be counted and the Monitoring display panel can be displayed. A total of 70+ Network log Data sources are accessed, ten Network task access tasks are created, five Network log Data dashboards, and three log keyword Alert tasks are created.

4) Automated inspection and more efficient security precautions
Through customized inspection scripts and script objects, automated execution of inspection work such as timeliness and periodicity can be carried out to replace daily manual inspections by operations personnel. Different inspection work plans can be customized according to different operations Role, and standard visual Report can be generated. The auxiliary operations team fully understands the Production health status of various soft Hardware resources, discovers hidden dangers in advance from all Production systems, and ensures business stability.
Inspection template: Complete four standardized host inspection scripts, three Linux inspection templates, and Metric 62 various inspections.
Inspection tasks: Three Configure inspection tasks and thirty-two host inspection objects.

5) Data visualization, it operation fully mastered
Beijing Mobile did not use special reporting tools and used excel to manually record resources Asset Information. The workload was heavy and the timeliness was low, and the accuracy was difficult Assurance, making it inconvenient to produce, Maintenance, and view. Based on the Blue Whale platform, this project has created a lightweight, self-analyzing report production tool to access various Data sources and help operations personnel fully grasp the operating status and resource usage of IT resources.
Cloud Platform A: Complete the connection with Huawei and VMware Data Interface to obtain Data. Create nine Huawei report Data sources, seven VMware Data sources, and eight Huawei operations reports in the operation Service report; Prepare five VMware operations reports, including daily, weekly and monthly reports, and implement field filtering and display report Data;
Cloud Platform B: Complete the parsing of five Data files, encapsulate them into report Data source Interface, and generate a total of eight automated operations reports in five categories according to needs, which can be filtered and displayed based on time or object names.

In the wave of digital transformation, Beijing Mobile took the lead in creating an intelligent operations platform for cloud computing operations scenarios, promoting the construction of a more refined, automated, and intelligent operations system, and strengthening the early detection, early positioning, and early processing of system Risk and faults., ensure the stable operation of the business, build complete operations Exploitation capabilities, and realize the transformation from traditional operations to operations Exploitation.



























