data architecture Articles / Blogs / Perficient

Avoiding Metadata Contention in Unity Catalog

David Callaghan — Mon, 07 Apr 2025 21:03:05 +0000

Metadata contention in Unity Catalog can occur in high-throughput Databricks environments, slowing down user queries and impacting performance across the platform. Our Finops strategy shifts left on performance. However, we have found scenarios where clients are still experiencing query slowdowns intermittently and even on optimized queries. As our client’s lakehouse footprint grows, we are seeing an emerging pattern where stress on Unity Catalog can have a downstream drag on performance across the workspace. In some cases, we have identified metadata contention in Unity Catalog as a contributor to unexpected reductions in response times after controlling for more targeted optimizations.

How Metadata Contention Can Slow Down User Queries

When data ingestion and transformation pipelines rely on structural metadata changes, they introduce several stress points across Unity Catalog’s architecture. These are not isolated to the ingestion job—they ripple across the control plane and affect all users.

Control Plane Saturation – Control plane saturation, often seen in distributed systems like Databricks, refers to the state when administrative functions (like schema updates, access control enforcement, and lineage tracking) overwhelm their processing capacity. Every structural table modification—especially those via CREATE OR REPLACE TABLE—adds to the metadata transaction load in Unity Catalog. This leads to:
- Delayed responses from the catalog API
- Increased latency in permission resolution
- Slower query planning, even for unrelated queries

Metastore Lock Contention – Each table creation or replacement operation requires exclusive locks on the underlying metastore objects. When many jobs concurrently attempt these operations:
- Other jobs or queries needing read access are queued
- Delta transaction commits are delayed
- Pipeline parallelism is reduced

Query Plan Invalidation Cascade – CREATE OR REPLACE TABLE invalidates the current logical and physical plan cache for all compute clusters referencing the old version. This leads to:
- Increased query planning time across clusters
- Unpredictable performance for dashboards or interactive workloads
- Reduced cache utilization across Spark executors

Schema Propagation Overhead – Structural changes to a table (e.g., column additions, type changes) must propagate to all services relying on schema consistency. This includes:
- Databricks SQL endpoints
- Unity Catalog lineage services
- Compute clusters running long-lived jobs

Multi-tenant Cross-Job Interference – Unity Catalog is a shared control plane. When one tenant (or set of jobs) aggressively replaces tables, the metadata operations can delay or block unrelated tenants. This leads to:
- Slow query startup times for interactive users
- Cluster spin-up delays due to metadata prefetch slowness
- Support escalation from unrelated teams

The CREATE OR REPLACE Reset

In other blogs, I have said that predictive optimization is the reward for investing in good governance practices with Unity Catalog. One of the key enablers of predictive optimzation is a current, cached logical and physical plan. Every time a table is created, a new logical and physical plan for this and related tables is created. This means that ever time you execute CREATE OR REPLACE TABLE, you are back to step one for performance optimization. The DROP TABLE + CREATE TABLE pattern will have the same net result.

This is not to say that CREATE OR REPLACE TABLE is inherently an anti-pattern. It only becomes a potential performance issue at scales, think thousands of jobs rather than hundreds. Its also not the only cuplrit. ALTER TABLE with structural changes have a similar effect. CREATE OR REPLACE TABLE is ubiquitous in data ingestion pipelines and it doesn’t start to cause a noticeable issue until is deeply ingrained in your developer’s muscle memory. There are alternatives, though.

Summary of Alternatives

There are different techniques you can use that will not invalidate the plan cache.

Use CREATE TABLE IF NOT EXISTS + INSERT OVERWRITE is probably my first choice because there is a straight code migration path.

CREATE TABLE IF NOT EXISTS catalog.schema.table (
id INT,
name STRING
) USING DELTA;
INSERT OVERWRITE catalog.schema.table
SELECT * FROM staging_table;

Both MERGE INTO and COPY INTO have the metadata advantages of the prior solution and support schema evolution as well as concurrency-safe ingestion.

MERGE INTO catalog.schema.table t
USING (SELECT * FROM staging_table) s
ON t.id = s.id
WHEN MATCHED THEN UPDATE SET *
WHEN NOT MATCHED THEN INSERT *;

COPY INTO catalog.schema.table
FROM '/mnt/source/'
FILEFORMAT = PARQUET
FORMAT_OPTIONS ('mergeSchema' = 'true');

Consider whether you need to be persisting the data beyond the life of the job. If not, consider temporary views or tables. This will avoid Unity Catalog entirely as there is no metadata overhead.

df.createOrReplaceTempView("job_tmp_view")

While I prefer Unity Catalog to handle partitioning strategies in the Silver and Gold layer, you can implement a partitioning scheme with your ingestion logic to keep the metadata stable. This is helpful for high-concurrency workloads.

CREATE TABLE IF NOT EXISTS catalog.schema.import_data (
id STRING,
source STRING,
load_date DATE
) PARTITIONED BY (source, load_date);
INSERT INTO catalog.schema.import_data
PARTITION (source = 'job_xyz', load_date = current_date())
SELECT * FROM staging;

I have summarized the different techniques you can use to minimize plan invalidation. In general, I think INSER OVERWRITE usually works well as a drop-in replacement. You get schema evolution with MERGE INTO and COPY INTO. I am often surprised at how many tables that should be considered temporary are stored. This is just a good exercise to go through with your jobs. Finally, there are occasions when the Partition + INSERT paradigm is preferable to INSERT OVERWRITE, particularly for high-concurrency workloads.

Technique	Metadata Cost	Plan Invalidation	Concurrency-Safe	Schema Evolution	Notes
CREATE OR REPLACE TABLE	High	Yes	No	Yes	Use with caution in production
INSERT OVERWRITE	Low	No	Yes	No	Fast for full refreshes
MERGE INTO	Medium	No	Yes	Yes	Ideal for idempotent loads
COPY INTO	Low	No	Yes	Yes	Great with Auto Loader
TEMP VIEW / TEMP TABLE	None	No	Yes	N/A	Best for intermittent pipeline stages
Partition + INSERT	Low	No	Yes	No	Efficient for batch-style jobs

Conclusion

Tuning the performance characteristics of a platform is more complex than single-application performance tuning. Distributed performance is even more complicated at scale, sice strategies and patterns may start to break down as volume and velocity increase.

Contact us to learn more about how to empower your teams with the right tools, processes, and training to unlock Databricks’ full potential across your enterprise.

End-to-End Lineage and External Raw Data Access in Databricks

David Callaghan — Mon, 31 Mar 2025 20:01:27 +0000

Achieving end-to-end lineage in Databricks while allowing external users to access raw data can be a challenging task. In Databricks, leveraging Unity Catalog for end-to-end lineage is a best practice. However, enabling external users to access raw data while maintaining security and lineage integrity requires a well-thought-out architecture. This blog outlines a reference architecture to achieve this balance.

Key Requirements

To meet the needs of both internal and external users, the architecture must:

Maintain end-to-end lineage within Databricks using Unity Catalog.
Allow external users to access raw data without compromising governance.
Secure data while maintaining flexibility for different use cases.

Recommended Architecture

1. Shared Raw Data Lake (Pre-Bronze)

The architecture starts with a shared data lake as a landing zone for raw, unprocessed data from various sources. This data lake is located in external cloud storage, such as AWS S3 or Azure Data Lake, and is independent of Databricks. Access to this data is managed using IAM roles and policies, allowing both Databricks and external users to interact with the data without overlapping permissions.

Benefits:

External users can access raw data without direct entry into the Databricks Lakehouse.
Secure and isolated raw data management.
Maintains data availability for non-Databricks consumers.

2. Bronze Layer (Managed by Databricks)

The bronze layer ingests raw data from the shared data lake into Databricks. Using Delta Live Tables (DLT), data is processed and stored as managed or external Delta tables. Unity Catalog governs these tables, enforcing fine-grained access control to maintain data security and lineage. End-to-end lineage and Databricks begins with the bronse layer and can be easily maintained throughout silver and gold by using DLTs.

Governance:

Permissions are enforced through Unity Catalog.
Data versioning and lineage tracking are maintained within Databricks.

3. Silver and Gold Layers (Processed Data)

Subsequent data processing transforms bronze data into refined (silver) and aggregated (gold) tables. These layers are exclusively managed within Databricks to ensure lineage continuity, leveraging Delta Lake’s optimization features.

Access:

Internal users access data through Unity Catalog with appropriate permissions.
External users do not have direct access to these curated layers, preserving data quality.

Access Patterns

External Users: Access raw data from the shared data lake through configured IAM policies. No direct access to Databricks-managed bronze tables.
Internal Users: Access the full data pipeline from bronze to gold within Databricks, leveraging Unity Catalog for secure and controlled access.

Why This Architecture Works

Security: Separates raw data from managed bronze, reducing exposure.
Governance: Unity Catalog maintains strict access control and lineage.
Performance: Internal data processing benefits from Delta Lake optimizations, while raw data remains easily accessible for external systems.

End-to-end lineage in Databricks

This reference architecture offers a balanced approach to handling raw data access while maintaining governance and lineage within Databricks. By isolating raw data in a shared lake and managing processed data within Databricks, organizations can effectively support both internal analytics and external data sharing.

Contact us to learn more about how to empower your teams with the right tools, processes, and training to unlock Databricks’ full potential across your enterprise.

Perficient Listed in Forrester Now Tech: Data Management Service Providers, Q4 2021

Meghan Frederick — Fri, 08 Oct 2021 14:09:33 +0000

Data plays an essential role in today’s digital economy and keeping up with modern data management processes is key to staying competitive. In fact, enterprises with advanced data practices are more productive, innovate faster, are able to enter new markets quickly, and are more likely to be directly monetizing their data compared to their less-mature peers. But achieving that level of competency often requires the assistance of a data management service provider.

Partnering with a data management service provider can help your organization:

Establish a strategy and operating model for data
Build an enterprise data foundation
Mature and scale data governance

In the Now Tech: Data Management Service Providers, Q4 2021 report, Forrester defines data management service providers as:

“Service firms that provide talent, technology, and best practices through strategy and deployment partnerships in order to improve an enterprise’s use of data management to drive insights and business results.”

Forrester Now Tech: Data Management Service Providers, Q4 2021 Report

Identifying the right service provider to partner with can help you realize the benefits of data and increase your data competency.

In the report, Forrester segmented vendors based on market presence and functionality. Market presence was determined by data management service revenue and vendors were placed into one of three categories: large established players, midsize players, and small players. Functionality was based on varying capabilities and broken down into four segments: platform providers, data and analytics services, specialized service providers, and system integrators.

Each vendor was asked a number of detailed questions about their services, including geographic presence, industry expertise, and data management experience and expertise. Based on the responses, Forrester supplies information about these service providers to help you determine the best vendor for your data management needs.

Perficient’s Primary Functionality Lies in Specialized Services

Forrester listed Perficient in its midsize category ($100M to $1B in annual category revenue) as a specialized service provider. According to the report, “Specialized service providers concentrate on data and governance foundations. These firms have extensive data engineering, data management, data security, and data governance expertise for data-driven initiatives prioritized by CIOs, chief data officers, and enterprise architects. Engagements focus on data strategy, data architecture, data operations (DataOps), and data governance, helping enterprises transition into insight-driven businesses.”

Perficient’s listing in this Now Tech includes our geographic presence (100% North America); industry focus areas (healthcare and life sciences, financial services, and retail); and sample customers (Novant Health, StorageMart, and United Wholesale Mortgage).

Perficient’s Approach to Data Management

One of the greatest attributes of data is that it becomes more valuable the more you use it. But seeing that value is difficult if you’re not managing your data properly.

As a specialized service provider, we’re helping leading companies create actionable business insights based on accurate, scalable, and comprehensive data. We bring the thought leadership, technology expertise, and processes to help our customer become data-driven organizations capable of leveraging data for competitive advantage. We do this through:

Taking the time to understand your business
Collecting, organizing and managing data from all over your organization
Delivering insights via intelligent applications
Deploying data and insights to any user via any interface

Learn More about Perficient

We’re ready to help you realize the benefits of data no matter where you are on your journey. Our experience, our technology partnerships, and most importantly our people are what make us a great partner. Visit us on Perficient.com and learn more about how we can help you master the realities of the data-driven world. And listen to the Intelligent Data Podcast where we interview thought leaders on a variety of topics around using data and technology to reshape your business.

You can read the entire Forrester Now Tech: Data Management Service Providers, Q4 2021 report via the Forrester website where it’s available to Forrester subscribers and for purchase.

Change Your Mindset to Create a Better Data and Analytics Program

Bill Busch — Wed, 09 Sep 2020 11:45:15 +0000

Self-Service for data is not a new concept. Even in the early 2000s, companies have struggled with giving “power users” or “information workers” access to data to develop value-based insights. One of my past customers had a simple as a web page with hundreds of CSV formatted data extracts that people could “self-service data” based on their assigned role (security group). Although crude, it was one of the more popular pages on the corporate portal. The data extracts where indexed like a table of contents with descriptions that made data easy to locate. Data was downloaded with a point and click, so it was easy to access.

Challenges Facing Data Leaders

Although this previous example is 15 years old, it illustrates some of the critical challenges with which data leaders are struggling to provide data self-service or data as a service. Specifically, these challenges include how to:

Publish curated data
Organize data so it can be found
Provide easy access to data
Describe data in business terms

Today we have a bevy of tools that enable self-service data, data integration/preparation, analytics, BI, and machine learning capabilities. From the architecture perspective, containerization, microservices, DevOps, DataOps, and cloud services all provide infrastructure and processes to enable scalable and cost-effective self-service data and analytics. Weaving all these tools and technologies into an enterprise’s data ecosystem can be daunting, even for large, well-resourced companies.

Self-Service data and analytics (think AI, ML, and Model Building) and self-service Business Intelligence require different mindsets. With self-service BI, we had the luxury of buying a single tool like Microstrategy or Tableau and just enabling self-service during implementation. The success depended primarily on how well you or your consulting partner implemented the device and how well it was governed.

A Different Mindset

However, with Data and Analytics, we have a set of complexities that we did not have with self-service BI. These include enabling and governing direct data access, providing tools to transform, prep, and cleanse data, facilitating analytical models being deployed to production, creating sandboxes in the cloud, and helping users connect a wide variety of analytical and AI tools to enterprise data.

At Perficient, we had the opportunity to guide a large number of organizations through the process of specifying and implementing data and analytics architecture. Through this vast experience, we have observed that companies that gain a significant return on their analytics and data investments have changed their mindset from “let’s implement a tool” to “let’s provide a service.”

Whether we call this Data as a Service or Analytics as a Service or any other name du jour really does matter. But it was the mindset to define a set of enabling services (that involve data and analytics) then continually improve these critical services. This mindset revolves around approaching your program from your customer’s perspective. Talking to your data consumer and understanding how they access and use data, what their challenges that impede productivity are all items with which the leaders of the Data and Analytics program should be familiar.

Approaching the data and analytics program from the consumer point of view will undoubtedly change your perspective. Instead of looking at implementing tools that make IT happy, successful programs view tools as a way of making your users happy – and rarely will that happen without a service or capability driven implementation.

5 Ways to Consider Digital and Data and An End-to-End Architecture

Arvind Murali — Tue, 07 Jul 2020 17:00:38 +0000

Digital and data are like TV and movies.

Imagine the following three scenarios of watching a movie during a long weekend with different types of technology. Note, all these movies are rated high or low on IMDB.

Watching an amazing movie like Avengers End Game or Avatar on a laptop or iPad
Now, that same amazing movie watched in a theater with surround sound or a home theater
How about The Last Airbender or Guardians (NOT Guardians of the Galaxy, Google this movie)

Chances are you’d find that watching movies on a big screen with surround sound would provide a better cinematic experience than on a small screen. Basically, a great movie watched on an iPad is pointless because the visualization and the sound effects are key components that make it great. On the other hand, a low-rated movie with a horrible plot and no visual treats watched in a home theater is kind of a waste of time.

Digital and data have a similar relationship. A beautifully designed mobile app displaying incorrect information is going to lose trust from its users. However, a thorough analysis using a machine learning algorithm without visualization or explainability is not going to have action. The art of “Design Thinking” is relevant in this context where you need to empathize the user community you are going after and then ask them the question of “value of information” in a language that is important for them (business value).

5 Digital and Data Considerations for End-to-End Architecture

Here are five ways to think about digital and data when you are designing an application to solve a business problem.

Start with the business value (the why): This is so important, because every dollar that an organization is spending at this time should help them compete in the marketplace and should have a tangible impact in the bottom line.
Empathize the experience: Put yourself in the business users’ shoes and understand their experience with this application. If possible, put a journey map to articulate it.
Find the data that will deliver meaning to that experience: Data is such a critical aspect of the experience, because it articulates the meaning. If the objective of an app is to do eCommerce, it is important to have data that reflects parts of eCommerce such as products, customer, transactions, and so on.
Solution the digital and data components together: It is so important that the components should include both application architecture and data architecture to deliver a seamless, yet informational experience to the end user to get their buy-in and loyalty.
Deliver in increments: I’m still seeing that companies are trying to push waterfall in parts of the solution to isolate digital and data as two separate components. It’s not going to work. You are going to have Shadow IT (which is now more than ever very easy to get to especially with open technology and large communities of users using it) and data will lose its meaning and value creating a governance nightmare for the organization to provide a holistic picture.

Talk to us and we can help you start thinking about the digital and data pieces and how they fit into the intelligent enterprise model.

Data Architecture: 2.5 Types of Modern Data Integration Tools

Bill Busch — Mon, 10 Feb 2020 13:55:15 +0000

As we move into the modern cloud data architecture era, enterprises are deploying 2 primary classes of data integration tools to handle the traditional ETL and ELT use cases.

The first type of Data integration tool is GUI-Based Data Integration solutions.

Talend, Infosphere Datastage, Informatica, and Matillion are good examples. These tools leverage a UI to either configure a data integration engine or compile code for data integration. GUI Integration tools promise fast, friendly user interfaces to rapidly create new data pipelines. Also, GUI-based data integration tools have a proven record of increasing developer productivity. They are good for organizations that have:

Many data integration pipelines to manage.
Complex MDM requirements and business rules that need to integrate into data pipelines.
An ubiquitous relational database ecosystem.
Requirements to move data to and from cloud platforms (e.g. AWS, Azure, GCP)

The second type of Data Integration is the Script/Code-based Data Integration Solutions.

Script/Code-based data integration leverages a serious of tools to develop a data pipeline. This capability usually requires:

A programming language like Python or Scala
A data processing framework such as Spark
An orchestration tool similar to Apache Airflow.

Code/Scripts are constructed in vertices or nodes using a programming language and framework. These vertices then are structured in Directed Acyclic Graphs (DAGs) by the orchestration tool. DAGs can scale to handle very large (think 10s of Terabytes per day) data pipelines. DAGs are also extremely useful for handling customized or complex processing that one would see in Artificial Intelligence or Machine Learning use cases.

The 0.5: Cloud Native

When I was initially socializing the two types of Cloud ETL blog idea, a counterpart asked, “What about cloud-native?” Good question! The cloud-native options are just flavors of the two types of Data Integration. For instance, AWS Glue and Google DataProc have UIs that generate code (e.g. Python and Scala). Unlike their legacy counterparts with a rich UI functionality, these cloud-native tools still require editing the generated code (usually Python or Scala). The cloud-native tools are quickly catching up, but they still need to add significant functionality to their UIs to be able to garner the same productivity gains as traditional GUI-based solutions.

OLAP and Hadoop: The 4 Differences You Should Know

Rick Kapalko — Thu, 31 Oct 2019 19:29:56 +0000

OLAP and Hadoop are not the same. OLAP is a technology to perform multi-dimensional analytics like reporting and data mining. It has been around since 1970. Hadoop is a technology to perform massive computation on large data. Around since 2002. They can be used together but there are differences when choosing between using Hadoop/MapReduce data processing versus classic OLAP. For this chat, let’s avoid the concern of price and also assume the business needs have been thought through.

1 Processing Type

For transactions and data mining use OLAP. But, for analytics and data discovery use Hadoop. For known cleaned data/processes that yield definitive results of high integrity use OLAP. For unknown messier data/processes that yield suggestive results use Hadoop. E.g., use OLAP for weather sensors, but Hadoop for weather models. OLAP can perform fast reads on high-end servers. Hadoop can perform fast reads and writes on distributed services.

2 Data Size

OLAP is meant to operate on pre-aggregated data from a massive number of records. It has good throughput of more records in a data warehouse. Hadoop is meant to operate on massive un-aggregated data from a lower number of objects. It has high throughput of larger objects in a data lake (Harris, n.d.). Does the business need more of smaller objects or less of larger objects? For example, if summing records is important, then OLAP is good. But, if audio analysis is important, then Hadoop is good. Overall, Hadoop has superior throughput.

3 Interaction

OLAP runs on SQL following DB normalization principles. Hadoop runs on HQL following object-oriented concepts. SQL is based on a relational DB model. But, HQL combines object-oriented programming with relational DB concepts (Jeyakanth, 2017). OLAP is good for update, insert, select, and delete. Hadoop is good for any other manner of object.

4 Data Structure

OLAP is meant for structured dimensional model. It scales well vertically. OLAP likes more of same things in a relational table. Whereas, Hadoop is meant for unstructured data and scales well horizontally. Hadoop likes more of different things with key/value pairs. Thus, the sources of data is important consideration. For example, OLAP for more police ticket transactions and Hadoop for more body cam data. Overall, Hadoop will be better on the max total storage needs.

Conclusion, OLAP and Hadoop

In most cases, Hadoop can do what OLAP does. OLAP might be needed if there is a legacy system to consider. Or you only need reporting. Or tech maturity is a driver. However, generally, I lean toward Hadoop/MapReduce over OLAP.

For more information:

Meet Perficient’s Chief Strategists: Arvind Murali

Connor Stieferman — Mon, 09 Sep 2019 17:30:16 +0000

Thrilling our clients with innovation and impact – it’s not just rhetoric. This belief is instrumental for our clients’ success. In 2018, we introduced our Chief Strategists, who provide vision and leadership to help our clients remain competitive. Get to know each of our strategists as they share unique insights on their areas of expertise.

The current digital age has generated an exponential amount of data. From mobile devices and online activity to enterprise performance metrics and operational considerations, data is all around us. By 2025 worldwide data volumes are projected to grow to 175ZB, creating more opportunities for business leaders to develop informed strategies and make decisions based on their data.

We recently spoke with Arvind Murali, Data Governance Chief Strategist, to get his perspective on data governance, building data strategies to optimize business outcomes, and his life beyond the world of strategy.

What does your role as a Chief Strategist entail?

Arvind Murali: The AI and Digital technologies have launched the Fourth Industrial Revolution focused on machines supporting man and increasing efficiency and effectiveness. Companies across every industry are feeling the impact of digital and data transformation, and leaders are rethinking how technology and data can reshape their businesses.

As a Chief Strategist, I support my clients on thinking of Data as an Asset to use data in ways that they aren’t thinking about. I’m constantly listening to and learning from our clients about their business outcomes and opportunities. Then, I translate that feedback to help them build data strategy and governance to support and manage their data infrastructure.

By focusing on the business outcomes, we can build and implement data solutions that are broad enough to meet clients’ current needs and nimble enough to scale for future iterations.

What do you hope to accomplish as a Chief Strategist?

AM: Among my aspirations, I will continue supporting organizations with their digital and data transformation journeys. This support includes moving clients towards Data-as-a-Service (DaaS) models that directly creates bottom line opportunities. Some clients begin with a clean slate and want to capitalize on their legacy data assets. However, in that process, we determined that by marrying legacy data and enriching it with competitive and benchmarking data, it truly gives clients an edge. In the end, we’ve developed modern data platforms for them, allowing them to analyze their businesses, identify patterns, and adjust (as necessary) to compete in a digital market.

Above all, I always want to find purpose in my work by creating data solutions that make a difference for our clients and the customers and communities they serve. For instance, if our data-driven work provides a hospital with the ability to reduce a patient’s rate of readmission, that’s a meaningful end result. Or, if our manufacturing clients can simulate their components digitally and use analytics to enhance productivity, that increases their efficiency.

Strategically Speaking

Why does data governance matter for today’s enterprises?

AM: How often have you heard advertisements from Exxon, Best Buy, and Amazon stating that they use data and analytics for competitive advantage? Can you think of organizations making these statements five years ago? Technology has truly enabled data to become an asset that’s vital for any organization’s growth. It allows businesses to manage their supply chain, understand buying behavior, create personalized marketing, impact people’s lives, or streamline operations. If properly managed, businesses can use it to create a tangible return on their investments.

Data governance ultimately allows you to monitor, understand, measure, and own your data assets. This will lead to organizations creating competitive advantage based on their data assets.

How does implementing data governance impact businesses?

AM: Data governance [involves managing] data, culture, process, and technology. On one hand, companies rely on technology, such as AI and MDM, automation, and mastering customer data. However, the fundamental process of data ownership requires cultural acceptance [from the organization].

For example, some organizations have relied on employees to compile dense spreadsheet databases that contain massive amounts of data. The process of creating them is time consuming and mundane, but it’s a familiar process that executives have come to expect of their reports. When working with clients on data governance, we design and build a centralized, modern data platform to house and self-service their data. Once established, it’s a shift for employees because they’re now tasked to focus more on data analytics rather than building the database. Overall, the change will improve our clients’ productivity, but it’s upending long-held employee expectations.

By incorporating organizational change management with data governance, we can prepare workforces for the future and improve effectiveness and efficiency. If we’re working with industry-specific data assets like healthcare or financial services, we can also integrate our thought leadership in those areas to influence the process.

“Intelligent automation is already present in our daily lives, so it’s changing individuals’ perception about the technology. However, unifying an enterprise and shifting the business perception about intelligent automation [for data] is imperative for success.”

Why do businesses need a data strategy?

AM: By next year, businesses strategically using data will realize $430 billion in productivity benefits compared to competitors that aren’t using data. This translates into untapped potential of available data that can advance business growth, which is why developing a data strategy can mobilize those assets. Companies such as Facebook, Google, Salesforce, and Exxon have already implemented a strategy to convert data to information to insights, which has effectively differentiated their firms as dominant players in the digital space.

Setting a strategy for how you use data is essential because the technologies involved are constantly evolving. For most industries, impactful solutions will incorporate some form of automation, such as machine learning, AI, bots, or some other innovation. Being adaptable to the shifting landscape will only improve the final solution and future-proof your organization.

Think Like a Chief Strategist

Tell us about a recent project you’ve tackled. How did we help the client achieve success?

AM: We recently began work with a large hospital to build an end-to-end Data and AI platform. This work supports the client’s objective to become more patient-centric. Ultimately, we hope to improve patient outcomes, physician interaction, and overall efficiency.

A digital transformation journey for any organization takes time, and this situation is no different. A few months ago, the client had nothing established as far as a modern data platform or supporting processes. In fact, multiple departments established their own analytics and attempted to make decisions using data silos. Now, a centralized data platform allows for self-service, collaboration, and cross-departmental insights into knowledge that wasn’t previously possible. Although the project isn’t yet finished, the client is already realizing some significant benefits.

What questions do you ask a client when developing a data strategy?

AM: The top five questions every client needs to ask of their enterprise:

What data do we have?
What data do we need to have?
How do we use our data today?
How do we want to use it in the future?
How do we want to access our data?

These questions define our approach to creating data governance solutions that meet clients’ specific goals. Beyond that, these questions guide any enterprise that seek any form of digital transformation – they must embrace the startup mentality from the beginning.

How can businesses take a strategic approach to their data?

AM: A data strategy enables organizations to make informed decisions based on their data insights. Every data strategy focuses on three areas to optimize business outcomes:

Identifying which data sets are available for analysis and – more importantly – which are not
Building a modern data platform to host existing and targeted data
Developing data governance to make intelligent decisions based on data that’s been collected

This process should not revolve around departmental silos within an organization. Instead, developing a data strategy should start at the executive level and involve stewards from different business units. A strategy with visibility across the organization can help prioritize goals by identifying shared pain points, strategic objectives, and situations where overlaps exist.

“Always have a data strategy aligned to your business outcome. Data without outcomes is like a business without goals. It can be exciting to grow quickly at first, but it’s not a sustainable approach.”

Beyond the World of Strategy

What are your interests or hobbies when you’re not wearing the Chief Strategist hat?

AM: My two sons, a nine-year-old and a three-year-old, keep me busy outside of work between their activities and spending quality time with them. I often joke that I’ve played cricket since I was born. It’s something that’s in my blood. My sons have also grown to love cricket, so we enjoy playing together. I also really enjoy boxing, which is my favorite outlet for fitness.

Additionally, I’m an avid vlogger and discuss topics pertaining to technology, data, and being a “Smarchitect,” a term I’m hoping to trademark.

A Smarchitect is a smart-architect who doesn’t limit him/herself to one specialty and chooses to wear multiple architect hats. Being able to switch from one discipline to another at any point during a solution process, Smarchitects can define an end-to-end solution that prioritizes the business outcome by being agnostic on technology or capabilities needed to implement it.

Learn more about each of our Chief Strategists by following this series.

Data Architecture and Design Thinking

Arvind Murali — Thu, 27 Jun 2019 11:00:45 +0000

Simplicity is the ultimate sophistication. – Leonardo da Vinci

Simplicity is a very important strategy as people are thinking of designing their modern data platforms. Design is often a complex task, so I recommend applying strong design thinking to simple goals as you lead your data architecture teams. I have stood before many architecture review boards (ARB) for organizations and found that many teams over complicate things. Here are three common issues that ARBs run into and how to simplify data platform design:

ONE: “Let us release the well versed, fully thought through, over-complicated version of this architecture in 3 months to support our use case.” How to improve?

Break down this architecture into smaller components and release frequently
Prioritize the design and release appropriately
Know the goal of this design and what business purpose it serves

TWO: “I have my AI component which is trained and set for business requirements. Let me release into production.” How to improve?

Have you thought through the operations components such as meeting SLA, cost of support, cost of hardware and software?
Have you thought about the data that is required for this AI component and the data model that will support it?
What are the phases of improvement for this AI?

THREE: “I have a meeting with my business counterparts to discuss these new cool tools that my team has implemented.” How to improve?

Business teams have many things to worry about and technology is not one of them. So stop harassing them with new technologies to prove a point.
Business teams do care about their data and how it’s served to them to solve their problem. So think data governance when you’re presenting a solution to them.
Start pulling the business team into the solution when you start the design and build phase.

Final Recommendations

My recommendations on a simplistic data architecture focused on design thinking with best practices are as follows:

Consistently think synergy between data architecture and business outcome
Create an architecture review board (ARB) that will align capability models to business strategy
Consider utilizing people who know and listen to what they can offer before you can be the hero of your organization (This is a very common problem I see in the industry which needs to be managed using organizational change management)
Focus on simple, purposeful, lightweight architecture while leveraging stable systems of record for baseline data
Empower citizen developers within your team to build more and agile

Refine Your Enterprise’s Approach to AI in 2019

Meghan Frederick — Tue, 27 Nov 2018 13:00:09 +0000

2019 is the year to stop letting obstacles get in the way of enterprise AI adoption. Of course implementing AI tech comes with its challenges, but addressing two big obstacles may be the key to success. According to our AI expert, practice director and chief strategist Christine Livingston, these obstacles are perception and data. And addressing these can help you find a more pragmatic approach to AI.

Livingston recently did a Q&A for AI Business where she discusses, among other things, these key obstacles and how to overcome them. Below is an excerpt from that article. You can read the full feature, Building Seamless Customer Experiences with AI, on the AI Business website.

Christine Livingston, Artificial Intelligence CoE Director and Chief Strategist, Perficient

What are the key obstacles to making AI work for global enterprises?

There are two big obstacles: perception and data. There’s a perception that if an enterprise is going to implement AI it’s going to be a tremendous undertaking that has to be applied in a really big way. There’s also a sense that you have to do a lot of preliminary things in order to prepare to implement AI.

However, a lot of the time, we’re seeing really effective implementations of AI that actually include filling in the gaps that enterprises need to be fully optimized for AI.

An example of this is when we worked with a retailer to create a personalized shopping advisor – a virtual agent that actually incorporated customer DNA and made personalized recommendations.

One of the things we found out through the course of that implementation was that they didn’t actually have all of the necessary metadata about their products to make recommendations. So, we applied visual recognition to the images of their products to help create and generate the metadata they needed to make the larger term initiative work. There’s an opportunity to apply AI to some of the smaller problems to help realize the larger transformation.

The other key obstacle is knowing where your data is and you have to be able to establish a ground truth. You can’t have multiple versions of the truth. You need to know where your data is stored and what it means essentially.

A Pragmatic AI Approach

Instead of avoiding enterprise AI altogether, keep your road map grounded. Be realistic about your enterprise’s capabilities, but don’t overthink it. Often times, deficiencies can be addressed during the adoption process. And make efforts to strengthen your data environment. Even Forrester predicts that good old fashioned information architecture will see continued investments in 2019 in an effort to create AI-worthy data environments.

Finding a pragmatic approach to AI implementation may be the strategy you need for successful AI adoption in 2019. For more on this, find Christine Livingston at The AI Summit New York December 5-6 where she’ll expand upon the pragmatic approach to implementing AI and speak to lessons learned, best practices, and tangible steps to adopting an AI solution.

Connect with Us at The AI Summit New York

Make sure to connect with us at The AI Summit! Our experts will be at booth #617 discussing how we can help clients maximize your investment in AI tools.

Responsibility of Data Architecture in Data Governance

Vincent Urso — Thu, 27 Sep 2018 18:27:04 +0000

The data architecture capability will supply the components and standards necessary to implement other capabilities coherently and enable them to work together. A primary responsibility of data architecture is to define and have an accepted enterprise-wide set of models, standards, glossaries and hierarchies which allow a standard description of data across business lines, products and functional areas.

An enterprise data model provides a common, well-understood classification of data. This needs to be abstract enough to gain acceptance across the financial institution, yet maintain an appropriate level of detail required to support clear ownership.

A standard business glossary is required to define business terms along with links or mappings to the various technical data dictionaries that define the production management of these items as data attributes. Together, these will ensure an underlying commonality between authoritative sources, data lineage, and data quality, and in turn data contracts.

Data architecture will also be involved in the review and assessment of tooling and platforms for use by each of the various capabilities. Fragmented, duplicative and aging systems should be evaluated in favor of simpler, enterprise-wide approaches.

In addition, the establishment of common metadata principles is required to enable interoperability of tools across the capability areas.

We recently published a guide that explores the building blocks (i.e., data governance components) of data governance, which can help drive better business decisions, enhance regulatory compliance, and improve risk management. You can download it here.

Driving Better Decisions with Data Governance

Vincent Urso — Thu, 19 Jul 2018 18:49:59 +0000

The business capabilities presented in our new guide demonstrates how forward-thinking financial services companies are leveraging data governance to create value for the enterprise. Accurate and timely information continues to be a key driver of enabling better decision making.

Capabilities such as data principles and strategy, data architecture, organizational roles, authoritative sources, data lineage, data quality, and data contracts can be used individually or in concert to create new value for financial management, regulators, or risk management. Leading firms are leveraging these capabilities to maintain excellence in a highly competitive marketplace.

Through technological advances and well-defined business capabilities, new paradigms have been created for leveraging data governance to accelerate value for financial services organizations.