DataOps: Toward an Incremental Data Process

DataOps: Toward an Incremental Data Process

Date: July 8, 2021

Data science projects are known to have a high failure rate of up to 85% despite the nature of their important role to business. Integrating data analytics into core Information Technology (IT) capabilities can be elusive and daunting.

“If we consider IT projects two-dimensional with requirements versus implementation, data projects are three-dimensional. The third dimension is needed to uncover data gems, even though the requirements don’t know where the gems are and what they look like,” says Xiaolin Li, director of Data Engineering and Analytics at T-Rex.

The concept of Data Operations (DataOps) was introduced in 2014 but in recent years it has grown into an emerging branch of IT operations. The key components of DataOps include:

  • Information Architecture: Understanding data context and usage environment; developing data taxonomy and methodology for incremental data analysis.
  • Data Integration and Automation: Integrating data results for continuous delivery; automating process flow and validating output through iterative measurement and feedback.
  • Data Governance: Collaborative data catalog with end-to-end data support from scientists, analysts, engineers, and clients; incorporating data knowledge and analysis into Agile development
  • Data Security: Integrating security into each stage of data lifecycle; addressing the need to protect a wide variety of data workload.

In a recent national data analytics project of T-Rex’s 2020 Census Technical Integration (TI) implementation, the team adopted a DataOps-based approach and achieved the project goal successfully. The project was the first of its kind because of the new response data collection.  At the start of the project, the team faced technical challenges such as data evaluation and performance enhancement. Those issues were resolved effectively through a data-centric, iterative, and highly cooperative process. When the 2020 Census ended last year, the team finished the complex, multi-faceted analysis of 300 million national responses critical to the mission.

Throughout the course of the project, the team treated data as a software component and integrated it into the coding cycles. Test data were treated like code, built and released bi-weekly. The data were adjusted in each cycle to follow the requirement evolution and calibrated to improve model accuracy.  Daily integration tests, which included both code and data, were fully automated such that team members could address issues early. “The DataOps recipe was a key contributing factor to the project’s success,” Xiaolin Li points out; “we created the analytics model incrementally, built and deployed it automatically, and reviewed it constantly.”

Global data volume has been predicted to grow 23% annually according to a recent IDC report.1 Those who understand data better will gain a competitive edge and be in a better position to project success. DataOps is expected to play an ever bigger role in the data industry moving forward.

Learn more about T-Rex’s Data Engineering and Analytics capability.

1  International Data Corporation, “Data Creation and Replication Will Grow at a Faster Rate than Installed Storage Capacity, According to the IDC Global DataSphere and StorageSphere,” March 24, 2021, https://www.idc.com/getdoc.jsp?containerId=prUS47560321

 


recently posted
T-Rex is Hiring at Hill Air Force Base

T-Rex is Hiring at Hill Air Force Base

Since 2016, T-Rex has been supporting large scale hybrid-cloud migrations and systems modernization, to include the successful deployment of the first-ever online U.S. Census in 2020. We are excited about expanding our IT Modernization services within the National Security market, […]

Secure by Design and Zero Trust: Integrating Supply Chain Risk Management with DevSecOps

Secure by Design and Zero Trust: Integrating Supply Chain Risk Management with DevSecOps

Agencies are seeking innovative ways to mature their Zero Trust posture. In this article, we have focused on improvements to your DevSecOps to increase maturity in two pillars: 1) Devices and Applications and 2) Workloads.

Protecting Government Apps and Workloads from Zero-Day Cyber Threats

Protecting Government Apps and Workloads from Zero-Day Cyber Threats

In April 2023, CISA released Zero Trust Maturity Model Version 2, which added a fourth security maturity level and reaffirmed application and workload cyber requirements. In this article we consider Application Threat Protections security levels within the Applications and Workloads […]

Dr. Allen Harper on How T-Rex Helps Federal Agencies Meet the Zero Trust Challenge

Dr. Allen Harper on How T-Rex Helps Federal Agencies Meet the Zero Trust Challenge

The White House Executive Order (EO) 14028 “Improving the Nation’s Cybersecurity” calls for Federal agencies to adopt and implement zero trust architecture. In a recent interview with Washington Exec, T-Rex’s Executive Vice President of Cybersecurity Dr. Allen Harper discusses the […]