Skip to main content

Data & Intelligence

Big Data and You: Data Agility

Welcome to “Big Data and You (the enterprise IT leader),” the Enterprise Content Intelligence group’s demystification of the “Big Data” . A critical piece of any DevOps or DataOps initiative is the processes, technology, and coordination which support it.  The Agile methodology, along with certain tools, can integrate various technical teams, along with business processes and priorities. These groups are often siloed, but when aligned around a similar set of priorities, it can help create better approaches to data management and application development.

Agile works well with Big Data initiatives because it is iterative, incremental, and evolutionary. Using Agile methods helps foster communication among technology groups, as well as with the business. This integration can foster trust that risk is being reduced, and that both data integrity and product quality are high. Agile supports the type of iterative improvement that a data scientist requires to refine their models over time, or a developer needs to rapidly meet an incremental business need. This method affords both groups the flexibility to improve upon that model later.

The work of trying to obtain value from the Data Lake belongs to the data scientists. Hypotheses are formulated and validated using the structured and unstructured data available to them, and this analysis is improved upon over time as additional data is obtained, and/or hypotheses are adjusted. The ultimate goal is to accelerate the generation of quality analytics, and there are a few baseline requirements to ensuring this can take place:

  1. Provide a laboratory space: a test environment and secure access to all appropriate data,
  2. Use tools that help users get to the general impression, quickly (Kibana, Jupyter, and Zepplin), and
  3. Keep all the data, don’t throw anything away – use simple storage and security (Ranger and Atlas; see the previous article in this series on DataOps) to ensure that no connection is missed.

From a process perspective, Agile will allow an initial version to be released so that value can be achieved early on, and subsequent versions can add incremental value soon after their requirements are discovered/refined by the scientists. From a technology perspective, the goal becomes organizing the system and structuring the code so that when a change is made, the relevant updates downstream from that change also occur. Otherwise, complex computations can fail – or this type of rapid change can result in an undesirable amount of rework on a fairly regular basis. This type of ‘aware’ application is capable of real-time responses to changes by combining technologies like Storm (process unbounded streams of data), Kafka (distributed commit log and messaging queue), and Cassandra (highly-scalable, -available, -performant database).

To get the most value out of your operations, you must create a culture that values continuous process improvement, and can be flexible enough to account for shifting organizational values. An analytic model can begin to provide value by demonstrating insights into business decisions or product creation within the first few iterations, and become increasingly valuable over time. Waiting for an analytic solution to be developed via waterfall methods and tools may result in delaying a revelation that a minimally-viable sliver could have provided in real-time.

[Thanks to Eric Walk for his contributions]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.