Data science projects are known to have a high failure rate of up to 85% despite the nature of their important role to business. Integrating data analytics into core Information Technology (IT) capabilities can be elusive and daunting.
“If we consider IT projects two-dimensional with requirements versus implementation, data projects are three-dimensional. The third dimension is needed to uncover data gems, even though the requirements don’t know where the gems are and what they look like,” says Xiaolin Li, director of Data Engineering and Analytics at T-Rex.
The concept of Data Operations (DataOps) was introduced in 2014 but in recent years it has grown into an emerging branch of IT operations. The key components of DataOps include:
- Information Architecture: Understanding data context and usage environment; developing data taxonomy and methodology for incremental data analysis.
- Data Integration and Automation: Integrating data results for continuous delivery; automating process flow and validating output through iterative measurement and feedback.
- Data Governance: Collaborative data catalog with end-to-end data support from scientists, analysts, engineers, and clients; incorporating data knowledge and analysis into Agile development
- Data Security: Integrating security into each stage of data lifecycle; addressing the need to protect a wide variety of data workload.
In a recent national data analytics project of T-Rex’s 2020 Census Technical Integration (TI) implementation, the team adopted a DataOps-based approach and achieved the project goal successfully. The project was the first of its kind because of the new response data collection. At the start of the project, the team faced technical challenges such as data evaluation and performance enhancement. Those issues were resolved effectively through a data-centric, iterative, and highly cooperative process. When the 2020 Census ended last year, the team finished the complex, multi-faceted analysis of 300 million national responses critical to the mission.
Throughout the course of the project, the team treated data as a software component and integrated it into the coding cycles. Test data were treated like code, built and released bi-weekly. The data were adjusted in each cycle to follow the requirement evolution and calibrated to improve model accuracy. Daily integration tests, which included both code and data, were fully automated such that team members could address issues early. “The DataOps recipe was a key contributing factor to the project’s success,” Xiaolin Li points out; “we created the analytics model incrementally, built and deployed it automatically, and reviewed it constantly.”
Global data volume has been predicted to grow 23% annually according to a recent IDC report.1 Those who understand data better will gain a competitive edge and be in a better position to project success. DataOps is expected to play an ever bigger role in the data industry moving forward.
Learn more about T-Rex’s Data Engineering and Analytics capability.
1 International Data Corporation, “Data Creation and Replication Will Grow at a Faster Rate than Installed Storage Capacity, According to the IDC Global DataSphere and StorageSphere,” March 24, 2021, https://www.idc.com/getdoc.jsp?containerId=prUS47560321