Integrating Data Quality Management into Continuous integration and continuous delivery (CI/CD)

With digital transformation, we are stepping through a worm-hole that takes us to a different time-line while re-defining the way we are doing business and consuming services. Now, most of us prefer to embrace digital transactions and try out door-delivery of goods.

As we are increasingly moving towards distributed work environments — perhaps our homes — Firms will look at embracing distributed agile delivery practices for solutions, in Information Technology.

One such aspect is Continuous integration and continuous delivery where delivery of quality software at frequent intervals, is enabled through automated ways of detecting, pulling, building, and unit testing code.

Integrating Data Quality into the change life-cycle of the organization is important for better operational outcomes from the solutions builds.

In continuous integration, most often code review is optional, but having code review enabled as a best practice, enables one to include certain pre-checks of data quality that can be performed one-time while doing the test builds. Validation routines like precision, format conformance can be easily spotted in this review. Frameworks like Gerrit allow these features.

Another aspect of continuous integration is continuous unit testing where smaller builds, in isolation, are tested for basic functions. In a typical data lifecycle like “POSMAD”, planning for data includes modeling for data for better outcomes from data acquired by the organization.

Data Quality checks during data modeling catch costly errors during the planning stages of product or solution development. Even in modern databases like Graph, one needs to decide which entities can be nodes while others can be edges. Various unit-testing solutions like Junit can be leveraged for unit tests while coding in Java.

In continuous unit testing, there can test cases specific to the below dimensions of data quality

  1. Consistency routines — Structural consistency between data structures to avoid loss of data
  2. The precision of fields that includes a number of decimals
  3. Validity routines — Conformance to Data Type and a specific Format
  4. Integrity routines — The structural or relational quality of data sets
  5. Uniqueness routines associated with having non-duplicate values and identifiers

Trending AI Articles:

1. Top 5 Open-Source Machine Learning Recommender System Projects With Resources

2. Deep Learning in Self-Driving Cars

3. Generalization Technique for ML models

4. Why You Should Ditch Your In-House Training Data Tools (And Avoid Building Your Own)

Continuous delivery encompasses Continuous integration and continuous testing as concepts. These concepts are translated into features that are made available through an integrated framework and toolsets. One data quality solution can be leveraged to perform a complete test coverage including unit tests, integration tests, functional tests.

However, the available test environment often has contextual data that can be profiled and the profiling results from the quality assessment provide a platform to explore and analyze data quality using the Data Quality Validation Routines or Checks. The selection of data Quality validation routines varies across the data lifecycle and the Software development lifecycle.

Summarizing the key aspects —

? What if your organization is actively embracing agile practices & toolsets — Should the data quality practices mature as well?

? In a Distributed & Disciplined Agile environment, Data Quality Management can be integrated with the DevOps integration tools to support Continuous Integration & Delivery.

? In such fast-paced deployments, the data quality test automation in unit & integration test automation stages is required.

? Often, Data Quality tools have their own code repository and versioning, & Integration, as well as Deployment capabilities and having to integrate them with CI/CD toolsets, can be a challenge.

? Automating data quality management by running pre-built or templated rules in an automated manner assists in integrating the feedback faster by the developers.

? Integration with Test management tools would also be beneficial to raise data quality issues and assigning them to the developers and data owners.

Don’t forget to give us your ? !


Integrating Data Quality Management into Continuous integration and continuous delivery (CI/CD) was originally published in Becoming Human: Artificial Intelligence Magazine on Medium, where people are continuing the conversation by highlighting and responding to this story.

Via https://becominghuman.ai/integrating-data-quality-management-into-continuous-integration-and-continuous-delivery-ci-cd-9e12beacc3fe?source=rss—-5e5bef33608a—4

source https://365datascience.weebly.com/the-best-data-science-blog-2020/integrating-data-quality-management-into-continuous-integration-and-continuous-delivery-cicd

Published by 365Data Science

365 Data Science is an online educational career website that offers the incredible opportunity to find your way into the data science world no matter your previous knowledge and experience. We have prepared numerous courses that suit the needs of aspiring BI analysts, Data analysts and Data scientists. We at 365 Data Science are committed educators who believe that curiosity should not be hindered by inability to access good learning resources. This is why we focus all our efforts on creating high-quality educational content which anyone can access online.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Design a site like this with WordPress.com
Get started