DataOps: Toward an Incremental Data Process

DataOps: Toward an Incremental Data Process

Date: July 8, 2021

Data science projects are known to have a high failure rate of up to 85% despite the nature of their important role to business. Integrating data analytics into core Information Technology (IT) capabilities can be elusive and daunting.

“If we consider IT projects two-dimensional with requirements versus implementation, data projects are three-dimensional. The third dimension is needed to uncover data gems, even though the requirements don’t know where the gems are and what they look like,” says Xiaolin Li, director of Data Engineering and Analytics at T-Rex.

The concept of Data Operations (DataOps) was introduced in 2014 but in recent years it has grown into an emerging branch of IT operations. The key components of DataOps include:

  • Information Architecture: Understanding data context and usage environment; developing data taxonomy and methodology for incremental data analysis.
  • Data Integration and Automation: Integrating data results for continuous delivery; automating process flow and validating output through iterative measurement and feedback.
  • Data Governance: Collaborative data catalog with end-to-end data support from scientists, analysts, engineers, and clients; incorporating data knowledge and analysis into Agile development
  • Data Security: Integrating security into each stage of data lifecycle; addressing the need to protect a wide variety of data workload.

In a recent national data analytics project of T-Rex’s 2020 Census Technical Integration (TI) implementation, the team adopted a DataOps-based approach and achieved the project goal successfully. The project was the first of its kind because of the new response data collection.  At the start of the project, the team faced technical challenges such as data evaluation and performance enhancement. Those issues were resolved effectively through a data-centric, iterative, and highly cooperative process. When the 2020 Census ended last year, the team finished the complex, multi-faceted analysis of 300 million national responses critical to the mission.

Throughout the course of the project, the team treated data as a software component and integrated it into the coding cycles. Test data were treated like code, built and released bi-weekly. The data were adjusted in each cycle to follow the requirement evolution and calibrated to improve model accuracy.  Daily integration tests, which included both code and data, were fully automated such that team members could address issues early. “The DataOps recipe was a key contributing factor to the project’s success,” Xiaolin Li points out; “we created the analytics model incrementally, built and deployed it automatically, and reviewed it constantly.”

Global data volume has been predicted to grow 23% annually according to a recent IDC report.1 Those who understand data better will gain a competitive edge and be in a better position to project success. DataOps is expected to play an ever bigger role in the data industry moving forward.

Learn more about T-Rex’s Data Engineering and Analytics capability.

1  International Data Corporation, “Data Creation and Replication Will Grow at a Faster Rate than Installed Storage Capacity, According to the IDC Global DataSphere and StorageSphere,” March 24, 2021, https://www.idc.com/getdoc.jsp?containerId=prUS47560321

 


recently posted
The Path to Zero Trust Maturity

The Path to Zero Trust Maturity

As the world continues to battle the pandemic, a remote workforce persists. Given the sudden and continued need for remote capabilities, organizations are forced to assess and consider strengthening their security measures to meet an entirely new set of demands […]

FinOps: Modernizing Cost Planning and Management in Hybrid IT Environments

FinOps: Modernizing Cost Planning and Management in Hybrid IT Environments

According to the latest forecast from Gartner, Inc Worldwide, end-user spending on public cloud services is forecast to grow 23.1% in 2021 to total $332.3 billion, up from $270 billion in 2020. While cloud spending is increasing at a rapid […]

What is Zero Trust and Why Do I Need It?

What is Zero Trust and Why Do I Need It?

The traditional approach to cybersecurity, known as perimeter and defense in depth models, are not enough. Bad actors continue to evolve their methods of attack to access an organization’s most mission critical systems. How can your business withstand a cyber-attack?

How to modernize to the cloud: A Census Case History

How to modernize to the cloud: A Census Case History

The 2020 Census went online for the first time ever without a single second of downtime and zero hacks. T-Rex helped contribute to this success as the Technical Integrator. T-Rex’s Chief Technology & Innovation Officer Jason Keplinger and Chief Engineer […]