Skip to main content

Data & Intelligence

5 Ways to Build a Data Lake and not a Data Swamp

In the last 6 months, my customers and I have been on a journey with some of the largest cloud data lake vendors, open source Big Data vendors, and a team of the smartest Big Data architects that I’ve worked with in my career. As we explore this journey, often our clients are looking to build a data lake. Here are the most common use cases:

  • Capture healthcare EMR/EHR datasets
  • Restaurant point of sale, menu, and feedback
  • Capture claims and patient information for provider data
  • Benchmarks and social media (Twitter) data integrated into a data warehouse

When clients view data lakes and the encompassing technologies as a “cool new toy” to challenge their status quo, in no time, this data lake becomes a data swamp with a lot of “dark data”  for cleanup and consumption. Sure, storage is cheap and you can usually afford to store that data swamp. But if you are truly thinking of your business’s competitive advantage with data as an asset, it’s in your company’s best interest to create some structure and planning around your data lake.

Here are 5 ways to think about a Big Data project, whether you are just a beginner willing to invest in Big Data or you are halfway through an implementation:

  1. Start with the Business Value: Never think of a Big Data project as a technology project. There are many technologies to choose from which Perficient can help with. However, without a business value proposition, this is yet another IT project.
  2. Think Long Term: Big Data projects are never a one-time investment. With large volumes of data in your hand, think of investing in data scientists and data stewards to test your data to find new ways of changing your business models (such as sales, marketing, service, and manufacturing). As Bernard Marr mentioned, “a good data scientist is a good journalist
  3. It’s All About Analytics: Big Data projects should ALWAYS end up with Analytics. A picture speaks a thousand words.
  4. Learn Story Telling: Visualization is a fantastic way of representing data. However, there needs to be a coherence to the story. If the charts are jumping from sales per territory to customer service for a customer to manufacturing and logistics (all in one chart), there is no synchronization, and little value to management.
  5. Wrap All of the Above with Governance: What is data without a lineage of where it is coming from, without good quality, and without a clear glossary of definitions?

Perficient has a lot of experience in industry leading technologies for Big Data. In addition, Perficient can help you govern your data with an emphasis on business value with our proven Enable Methodologies. Reach out to us for more information.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Arvind Murali, Chief Data Strategist

Arvind Murali is the Chief Data Strategist for Data Governance with Perficient. His role includes defining data strategy and governance to deliver transformative data platforms. Arvind has served as an executive advisor for data strategy and governance to organizations across several industries. Arvind’s dedication to solving challenges and identifying new opportunities has provided valuable business-focused results for clients, such as providing self-service access to data for global sales teams; helping physicians create informed wellness plans; and delivering insights about current supply chain inventories. He is a passionate Vlogger on YouTube and discusses real-world insights, data platform trends, and the importance of governance as big data continues its exponential growth.

More from this Author

Follow Us