data quality Articles / Blogs / Perficient https://blogs.perficient.com/tag/data-quality/ Expert Digital Insights Wed, 19 Mar 2025 19:51:43 +0000 en-US hourly 1 https://blogs.perficient.com/files/favicon-194x194-1-150x150.png data quality Articles / Blogs / Perficient https://blogs.perficient.com/tag/data-quality/ 32 32 30508587 Delta Live Tables and Great Expectations: Better Together https://blogs.perficient.com/2025/03/19/delta-live-tables-and-great-expectations/ https://blogs.perficient.com/2025/03/19/delta-live-tables-and-great-expectations/#respond Wed, 19 Mar 2025 19:22:25 +0000 https://blogs.perficient.com/?p=377318

Modern data platforms like Databricks enable organizations to process massive volumes of batch and streaming data—but scaling reliably requires more than just compute power. It demands data observability: the ability to monitor, validate, and trace data through its lifecycle.

This blog compares two powerful tools—Delta Live Tables and Great Expectations—that bring observability to life in different but complementary ways. Delta Live Tables (DLTs) provide built-in enforcement and lineage within Databricks pipelines, while Great Expectations (GX) offers deep validation and anomaly detection.

In my experience, Delta Live Tables and Great Expectations are better together. Together, they form a robust observability stack, enabling teams to deliver trusted, production-grade data pipelines across batch and streaming workflows.

  • Use Delta Live Tables to automate pipelines, enforce rules, and track lineage natively in Databricks.
  • Use Great Expectations for in-depth validation, anomaly detection, and schema profiling.

I’m not a fan of taking both sides of an argument. Let’s look at our core responsibilities as data engineers from the ground up and follow the solutions where the requirements take us.

Data Asset

A data asset is a managed, valuable dataset.  A valuable dataset is not just data—it is data with purpose, invested in through processes and controls, and justified by the business value provided. A managed dataset is actively governed, monitored, and maintained to ensure it delivers sustained value to stakeholders.

Data Asset Management

Fundamentally, a data asset is considered managed when under governance. Proper data governance, such as Unity Catalog-managed datasets, has at least these fundamental characteristics.

Ownership & StewardshipWho is responsible for maintaining the data asset? Who can answer questions about it?
Access ControlWho can read, write, or modify this data? Are permissions aligned with roles and rules?
LineageWhere does this data come from? What transformations has it gone through?
Compliance & PrivacyIs sensitive data (e.g., PII, PHI) addressed? Are retention and masking policies enforced?
AuditabilityCan we trace who accessed or modified the data and when?

Unity Catalog is the foundation for a well-managed lakehouse. I have written about migrating to Unity Catalog and highlighted some bonus features. If you have not migrated yet, I recommend getting started immediately and then focusing on data value.

Data Asset Valuation

The business will communicate the value of a data asset primarily through a Service Level Agreement (SLA). The SLA defines agreed-upon expectations around reliability, performance, and quality.

Reliability describes the resiliency, freshness, and correctness of a data asset. 

  • Freshness (Liveness) – How up-to-date the data is.
  • Accuracy (Correctness) – How well the data aligns with expected values or business rules.
  • Availability (Resiliency) – How robust the data pipeline is to failures and recovery.

Performance describes the efficiency of data from ingestion to processing to consumption.

  • Latency – Time taken for data to travel from source to consumption (e.g., ingestion-to-dashboard delay).
  • Throughput – Volume of data processed over time (e.g., rows/sec, MB/min).
  • Responsiveness – How quickly queries and pipelines respond under load or concurrency.

Quality describes the degree to which data meets defined rules, expectations, and standards.

  • Completeness – All required data is present (e.g., no missing rows or fields).
  • Validity – Data conforms to defined formats, ranges, or types.
  • Consistency – Data is uniform across systems and time (e.g., no contradictory values).
  • Uniqueness – No unintended duplicates exist.
  • Accuracy – Same definition as in reliability; it’s important enough to be listed twice!

These business expectations are fulfilled by IT through People, Processes, and Technology.

For a price.

Servicing Data Assets

Service Level Objectives (SLOs) represent operations domains within the SLAs and can be progressively met. This concept will help to align cost to value within your budget. The dials being tuned here are the Software Development Lifecycle (SDLC) and the Databricks Medallion Architecture. SLAs define commitments for reliability, performance, and quality, and SLOs enforce those commitments throughout the data lifecycle. Each layer strengthens one or more of these domains in the Medallion architecture. Across the SDLC, IT teams progressively validate and enforce these guarantees to ensure production-grade data assets.

The workspace is the primary environment for working with data assets in Unity Catalog. Value is typically proportional within the layers from left to right.

SLA DomainDevTestProd
ReliabilityMonitor source connectivity and pipeline triggersValidate pipeline scheduling, retriesSLAs ensure on-time delivery for consumers
PerformanceBaseline performance benchmarksLoad testing, profilingOptimize for SLAs: query latency, data delivery speed
QualityCreate GE/DQX test suitesEnforce checks with alertsBlocking rules and alerting on quality failures
  • In Dev, you prototype and measure against reliability and performance goals.
  • In Test, you simulate production load and validate SLA thresholds.
  • In Prod, you enforce SLAs and alert on violations with automated monitoring and remediation (GE, DQX, Airflow, Unity Catalog audits, etc.).

The catalog is the primary unit of data isolation in the Databricks data governance model. Value is typically proportional within the layers from right to left.

SLA DomainBronze (Raw)Silver (Cleaned)Gold (Curated)
ReliabilityData lands on time; raw source integrity is monitoredDLT jobs run consistently; schema evolution is managedTimely delivery of business-critical data
PerformanceIngest processes optimized for load handlingTransformations are performant; no bottlenecksDashboards and queries load quickly
QualityBasic data profiling and source rule checksDQ rules (e.g., null checks, constraints) enforcedGolden datasets meet business expectations for data quality
  • In Bronze, you focus on reliability and baseline quality.
  • In Silver, you begin to emphasize quality and start optimizing performance.
  • In Gold, you implement high reliability, optimized performance, and strong quality

Data becomes a true asset as it progresses through these layers, accruing value while incurring costs to meet increasing SLA expectations.

Delta Live Tables and Great Expectations

We are back to where we started, with a little more context. Great Expectations (GX) is focused on data validation and profiling, while Delta Live Tables (DLT) handles schema enforcement and transformations. While DLTs may not have sophisticated rule and profiling capabilities, their native integration to Unity Catalog allows for Its performance characteristics to be similar across both batch and streaming, while GX can struggle with streaming from a performance perspective.

The exercise of defining the progression of value across the SDLC and Medallion Architecture now pays dividends. DLTs stand out for end-to-end data management with automatic lineage management and schema evolution. Great Expectations can then be run as a separate process for more advanced data quality checks and profiling. This could be incorporated as part of a more advanced CI/CD process or just managed manually.

The key is not to focus on a tool in isolation with the idea of picking a winner. I believe most developers could become cross-trained on both technologies. Neither should be outside the scope of a junior data engineer. People are not a problem. I wish that DLTs were integrated with Great Expectations so I didn’t need to have two technologies, but a little Process goes a long way to resolve that Technology issue.

Conclusion

Integrating Delta Live Tables and Great Expectations within the Software Development Lifecycle (SDLC) and the Medallion Architecture helps teams reduce operational costs while continuously delivering business value.

  • Early Validation Reduces Rework: Embedding GX expectations in development and staging environments enables early detection of schema and data issues, minimizing costly reprocessing and production downtime.
  • DLTs Automate Operational Efficiency: With declarative pipelines and built-in monitoring, DLTs reduce manual orchestration and troubleshooting, saving engineering hours and compute costs.
  • Incremental Value Delivery: By combining GX’s detailed validation in Bronze and Silver layers with DLT’s managed lineage and enforcement, teams can release high-quality data incrementally—delivering trusted datasets to stakeholders faster.
  • FinOps-Aligned Observability: Monitoring volume, freshness, and anomalies with GX and DLT enables better cost attribution and prioritization, allowing data teams to optimize for quality and budget.

This hybrid approach supports robust data engineering practices and empowers organizations to scale with confidence, optimize their cloud spend, and maximize the return on data investments.

Contact us to learn more about how to empower your teams with the right tools, processes, and training to unlock Databricks’ full potential cost-consciously.

]]>
https://blogs.perficient.com/2025/03/19/delta-live-tables-and-great-expectations/feed/ 0 377318
Risk Management Data Strategy – Insights from an Inquisitive Overseer https://blogs.perficient.com/2024/08/19/risk-management-data-strategy/ https://blogs.perficient.com/2024/08/19/risk-management-data-strategy/#comments Mon, 19 Aug 2024 14:47:52 +0000 https://blogs.perficient.com/?p=367560

We are witnessing a sea-change in the way data is managed by banks and financial institutions all over the world. Data being commoditized and, in some cases, even monetized by banks is the order of the day. Though this seems to be at a stage where some more push is required in terms of adoption in the risk management function. Traditional risk managers, by their job definition, are highly cautious of the result sets provided by the analytics teams. I have even heard the phrase “Please check the report, I don’t understand the models and hence trust the number”.

So, in the risk function, while this is a race for data aggregation, structured data, unstructured data, data quality, data granularity, news feeds, market overviews, its also a challenge from an acceptance perspective. The vision is that all of the data can be aggregated, harmonized and used for better, faster and more informed decision making for Financial and Non Financial Risk Management. The interdependencies between the risks were factors that were not considered in the “Good Old Days” of risk management (pun intended).

Based on my experience, here are the common issues that are faced by banks running a risk of not having a good risk data strategy.

1. The IT-Business tussle (“YOU don’t know what YOU are doing”)

This according to me is the biggest challenge facing traditional banks, especially in the risk function. “The Business”, in traditional banks, is treated like a larger-than-life entity that needs to be supported by IT. This notion of IT being the service provider, whilst business is the “bread-earner”, especially in the traditional banks’ risk departments; does not hold good anymore. It has been proven time and again that the two cannot function without each other and that’s what needs to be cultivated as a management mindset for strategic data management effort as well. This is a culture change, but it’s happening slowly and will have to be adapted industry-wide. It has been proven that the financial institutions with the most organized data have a significant market advantage.

2. Data Overload (“Dude! where’s my Insight”)

The primary goal of data management, sourcing and aggregation effort will have to be converting data into informational insights. The team analyzing the data warehouses, the data lakes and aiding the analytics will have to have this one major organizational goal in mind. Banks have silos, these silos have been created due to mergers, regulations, entities, risk types, chinese walls, data protection, land laws or sometimes just technological challenges over time. The solution to most this is to start with a clean slate. The management mandate for getting the right people to talk and be vested in this change is crucial, challenging but crucial. Good old analysis techniques and brain storming sessions for weeding out what is unnecessary and getting the right set of elements is the key. This needs an overhaul in the way the banking business has been traditionally looking at data i.e. something that is needed for reporting. Understanding of the data lineage and touchpoint systems is most crucial.

3. The CDO Dilemma (“To meta or not to meta”)

The CDO’s role in most banks is now well defined. The risk and compliance analytics and reporting division almost solely depends on the CDO function for insights on regulatory reporting and other forms of innovative data analytics. The key success factor of the CDO organization lies in allocation of the right set of analysts to the business areas. A CDO analyst on the market risk side, for instance, will have to be well versed with market data, bank hierarchies, VaR Calculation engines, Risk not in VaR (RNiV); supporting reference data in addition to the trade systems data that these data elements will have a direct or indirect impact on. Notwithstanding the critical data elements. An additional understanding of how this would impact other forms of risk reporting, like credit risk and non-financial risk is definitely a nice to have. Defining a meta-data strategy for the full lineage, its touch-points and transformations is a strenuous effort in analysis of systems owned by disparate teams with siloed implementation patterns over time. One fix that I saw working is that every significant application group / team can have a senior representative for the CDO interaction. Vested stakeholder interest is turning out to be the one major success factor in the programs that have been successful. This ascertains completeness of the critical data elements definition and hence aid data governance strategy in a wholesome way.

4. The ever-changing nature of financial risk management (“What did they change now?”)

The Basel Committee recommendations have been consistent in driving the urge to reinvent processes in the risk management area. With Fundamental Review of the Trading Book (FRTB) the focus has been very clearly realigned to data processes in organizations. Whilst the big banks already had demonstrated a sound understanding of modellable risk factors based on scenarios, this time the Basel committee has also asked banks to focus on Non-Modellable Risk factors (NMRF). Add the standard approach (sensitivities defined by regulator) and internal models approach (IMA – Bank defined enhanced sensitivities), the change from entity based risk calculations to desk based is a significant paradigm shift. Single golden-source definition for transaction data along with desk structure validation seems to be a major area of concern amongst banks.

Add climate risk to the mix with the Paris accord, the RWA calculations will now need additional data points, additional models and additional investment in external data defining the physical and transition risk associated. Data-lake / Big Data solutions with defined critical data elements and a full log of transformations with respect to lineage is a significant investment but will only work in favor of any more changes that come through on the regulations side. There have always been banks that have been great at this consistently and banks that lag significantly.

All and all, risk management happens to be a great use case for a greenfield CDO data strategy implementation, and these hurdles have to be handled before the ultimate Zen goal of a perfect risk data strategy. Believe me, the first step is to get the bank’s consolidated risk data strategy right and everything else will follow.

 

This is a 2021 article, also published here –  Risk Management Data Strategy – Insights from an Inquisitive Overseer | LinkedIn

]]>
https://blogs.perficient.com/2024/08/19/risk-management-data-strategy/feed/ 1 367560
iCEDQ – An Automation Testing Tool https://blogs.perficient.com/2024/07/23/icedq-an-automation-testing-tool/ https://blogs.perficient.com/2024/07/23/icedq-an-automation-testing-tool/#comments Tue, 23 Jul 2024 22:00:22 +0000 https://blogs.perficient.com/?p=332235

Data Warehouse/ETL Testing

Data warehouse testing is a process of verifying data loaded in a data warehouse to ensure the data meets the business requirements. This is done by certifying data transformations, integrations, execution, and scheduling order of various data processes.

Extract, transform, and load (ETL) Testing is the process of verifying the combined data from multiple sources into a large, central repository called a data warehouse.

Conventional Testing tools are designed for UI-based applications, whereas a data warehouse testing tool is purposefully built for data-centric systems and designed to automate data warehouse testing and generating results. It is also used during the development phase of DWH.

iCEDQ

Integrity Check Engine For Data Quality (iCEDQ) is one of the tools used for data warehouse testing which aims to overcome some of the challenges associated with conventional methods of data warehouse testing, such as manual testing, time-consuming processes, and the potential for human error.

It is an Automation Platform with a rules-based auditing approach enabling organizations to automate various test strategies like ETL Testing, Data Migration Testing, Big Data Testing, BI Testing, and Production Data Monitoring.

It tests data transformation processes and ensures compliance with business rules in a Data Warehouse.

Qualities of iCEDQ

Let us see some of the traits where testing extends its uses.

Automation

It is a data testing and monitoring platform for all sizes of files and databases. It automates ETL Testing and helps maintain the sanctity of your data by making sure everything is valid.

Design

It is designed with a greater ability to identify any data issues in and across structured and semi-structured data.

Uniqueness

Testing And Monitoring:

Its unique in-memory engine with support for SQL, Apache Groovy, Java, and APIs allows organizations to implement end-to-end automation for Data Testing and Monitoring.

User Friendly Design:

This tool provides customers an easy way to set up an automated solution for end-to-end testing of their data-centric projects and it provides Email Support to its customers

Supported Platforms:

Mostly widely used by Enterprises and Business Users and used in platforms like Web apps and Windows. Does not support MAC, Android, and IOS.

Execution Speed:

New Big Data Edition test 1.7 Billion rows in less than 2 minutes and Recon Rule with around 20 expressions for 1.7 billion rows in less than 30 minutes.

With a myriad of capabilities, iCEDQ seamlessly empowers users to automate data testing, ensuring versatility and reliability for diverse data-centric projects.

Features:

  • Performance Metrics and Dashboard provides a comprehensive overview of system performance and visualizes key metrics for enhanced monitoring and analysis.
  • Data Analysis, Test and data quality management ensures the accuracy, reliability, and effectiveness of data within a system.
  • Testing approaches such as requirements-based testing and parameterized testing involve passing new parameter values during the execution of rules.
  • Move and copy test cases and supports parallel execution.
  • The Rule Wizard automatically generates a set of rules through a simple drag-and-drop feature, reducing user effort by almost 90%.
  • Highly scalable in-memory engine to evaluate billions of records.
  • Connect to Databases, Files, APIs, and BI Reports. Over 50 connectors are available.
  • Enables DataOps by allowing integration with any Scheduling, GIT, or DevOps tool.
  • Integration with enterprise products like Slack, Jira, ServiceNow, Alation, and Manta.
  • Single Sign-On, Advanced RBAC, and Encryption features.
  • Use the built-in Dashboard or enterprise reporting tools like Tableau, Power BI, and Qlik to generate reports for deeper insights.
  • Deploy anywhere: On-Premises, AWS, Azure, or GCP.

Testing with iCEDQ:

ETL Testing:

There are few data validations and reconciliation the business data and validation can be done in ETL/Big data testing.

  • ETL Reconciliation – Bridging the data integrity gap
  • Source & Target Data Validation – Ensuring accuracy in the ETL pipeline
  • Business Validation & Reconciliation – Aligning data with business rules

Migration Testing:

iCEDQ ensures accuracy by validating all data migrated from the legacy system to the new one.

Production Data Monitoring:

iCEDQ is mainly used for support projects to monitor after migrating to the PROD environment. It continuously monitors ETL jobs and notifies the data issues through a mail trigger.

Why iCEDQ?

Reduces project timeline by 33%, increases test coverage by 200%, and improves productivity by 70%.

Pros & Cons V1

In addition to its automation capabilities, iCEDQ offers unparalleled advantages, streamlining data testing processes, enhancing accuracy, and facilitating efficient management of diverse datasets. Moreover, the platform empowers users with comprehensive data quality insights, ensuring robust and reliable Data-Centric project outcomes.

Rule Types:

Users can create different types of rules in iCEDQ to automate the testing of their Data-Centric projects. Each rule performs a different type of test cases for the different datasets.

Rules

By leveraging iCEDQ, users can establish diverse rules, enabling testing automation for their Data-Centric projects. Tailoring each rule within the system to execute distinct test cases caters to the specific requirements of different datasets.

iCEDQ System Requirements

iCEDQ’s technical specifications and system requirements to determine if it’s compatible with the operating system and other software.

Icedq_Details

To successfully deploy iCEDQ, it is essential to consider its system requirements. Notably, the platform demands specific configurations and resources, ensuring optimal performance. Additionally, adherence to these requirements guarantees seamless integration, robust functionality, and efficient utilization of iCEDQ for comprehensive data testing and quality assurance.

Hence, iCEDQ is a powerful Data Mitigation and ETL/Data Warehouse Testing Automation Solution designed to give users total control over how they verify and compare data sets. With iCEDQ, they can build various types of tests or rules for data set validation and comparison.

Resources related to iCEDQ – https://icedq.com/resources

]]>
https://blogs.perficient.com/2024/07/23/icedq-an-automation-testing-tool/feed/ 2 332235
Don’t Let Poor Data Quality Derail Your AI Dreams https://blogs.perficient.com/2023/07/24/dont-let-poor-data-quality-derail-your-ai-dreams/ https://blogs.perficient.com/2023/07/24/dont-let-poor-data-quality-derail-your-ai-dreams/#respond Mon, 24 Jul 2023 08:21:27 +0000 https://blogs.perficient.com/?p=340590

AI is reliant upon data to acquire knowledge and drive decision-making processes. Therefore, the data quality utilized for training AI models is vital in influencing their accuracy and dependability. Data noise in machine learning refers to the occurrence of errors, outliers, or inconsistencies within the data, which can degrade the quality and reliability of AI models. When an algorithm interprets noise as a meaningful pattern, it may mistakenly draw generalized conclusions, giving rise to erroneous outcomes. Therefore, it is vital to identify and remove data noise from the dataset before initiating the training process for the AI model to ensure accurate and reliable results. Below is a set of guidelines to mitigate data noise and enhance the quality of training datasets utilized in AI models.

Data Preprocessing

AI collects the relevant data from various sources. Data Quality checks and supporting rules will make sure all the details needed for organizing and formatting it are there so that AI algorithms can easily understand and learn from it. There are several widely used techniques for eliminating data noise. One such technique is outlier detection, which involves identifying and eliminating data points that deviate significantly from the rest of the data. Another technique is data smoothing, where moving averages or regressions are applied to reduce the impact of noisy data and make it more consistent. Additionally, data cleaning plays a crucial role in removing inconsistent or incorrect values from the dataset, ensuring its integrity and reliability. Data professionals can perform Data profiling to understand the data and then integrate the cleaning rules within data engineering pipelines.

Data Validation

Proper data validation is mandatory for even the most performant algorithms to predict accurate results. Once the data is collected and preprocessed, validation against reference values that have been tried and tested on various occasions would enhance confidence in data quality. This step involves checking the training data for accuracy, completeness, and relevance. Any missing, incorrect, or irrelevant data found is to be corrected or removed.

One such check is the field length check, which restricts the number of characters entered within a specific field. Phone numbers are an example where any number entered with more than ten digits needs correction before being used for prediction models. Another important check is the range check, where the entered number must fall within a specified range. Consider a scenario where you possess a dataset containing the blood glucose levels of individuals diagnosed with diabetes. As customary, blood glucose levels are quantified in milligrams per deciliter (mg/dL). To validate the data, it is imperative to ascertain that the entered blood glucose levels fall within a reasonable range, let’s say between 70 and 300 mg/dL.

A range check would establish a restriction, enabling only values within this range to enter the blood glucose level field. Any values that surpass this range would be promptly flagged and corrected before using in the training dataset. This meticulous validation process ensures the accuracy and reliability of the blood glucose data for further analysis and decision-making. Additionally, the present check must ensure data completeness, meaning a field cannot be left empty. For example, a Machine Learning algorithm to predict package delivery performance must implement a completeness check to verify that each package detail in training data records valid values of the customer’s name, shipment origin, and destination address.

Monitoring Data Quality Over Time

Regular monitoring is crucial for maintaining data quality in an AI system. As new data is collected and integrated into the system, it is essential to continuously assess the data’s accuracy, completeness, and consistency. The AI system can operate with precision and dependability by upholding high data quality standards. Various metrics for monitoring data quality, including accuracy, completeness, and consistency, highlight any change in data quality.

The accuracy metric evaluates the alignment between the data and reality, such as verifying the correctness and currency of customer addresses. The completeness metric measures the extent to which the required data is present in the dataset, such as ensuring that each order record contains all the necessary field values. The consistency metric examines the adherence of the data to established rules or standards, such as checking if dates follow a standard format. The AI system can maintain its accuracy and reliability over time by consistently monitoring these metrics and others.

Implementing these techniques allows AI systems to improve the quality of training datasets for AI models, resulting in more accurate and superior decision-making reliable outcomes.

]]>
https://blogs.perficient.com/2023/07/24/dont-let-poor-data-quality-derail-your-ai-dreams/feed/ 0 340590
A MDM Success Story: Streamlining Claims Processing and Payments With Reliable, Centralized Data https://blogs.perficient.com/2022/06/29/a-mdm-success-story-streamlining-claims-processing-and-payments-with-reliable-centralized-data/ https://blogs.perficient.com/2022/06/29/a-mdm-success-story-streamlining-claims-processing-and-payments-with-reliable-centralized-data/#comments Wed, 29 Jun 2022 16:15:50 +0000 https://blogs.perficient.com/?p=311874

Does this sound familiar: you’re running multiple point-of-care systems, each with redundant data points collected. But what happens when there are small variations in how data is captured and processed across these systems? As a healthcare provider client of ours discovered, inconsistent data can reduce revenue realization. It caused claims to be improperly matched and processed… then denied – a stressful, frustrating, and business-impacting experience.

We partnered with the provider to consolidate their siloed data. Our Agile practitioners recognized the organization’s challenge as a prime use case for Informatica, an automated solution that would compare, streamline, and harmonize patient data in order to establish a single, reliable source of truth and boost claims processing efficiencies.

READ MORE: We’re A Cloud and On-Premises Certified Informatica Partner  Master Data Management: Modern Use Cases for Healthcare

Increasing Revenue and Improving Cycle Processing With a Single Source of Truth

To deliver a single source of truth, our team of industry and technical experts partnered with the provider to achieve the following:

  • Created a shared vision and enabled teams for the new solution’s rollout.
  • Built an automated solution on Informatica’s Data Quality and Master Data Management Hub to ingest, cleanse, compare, and consolidate data between various internal and external sources, proactively identifying mismatches for quick resolution.
  • Established a Center for Enablement (C4E) to oversee information governance and resolve process gaps across multiple workstreams.

Data quality improvements achieved with this solution significantly reduced the cost and complexity associated with integrating newly acquired hospitals, increased the frequency of successfully billed patient claims, and limited confusion and frustration associated with providers’ attempts to understand and reverse denied claims.

And the results are impressive. Want to learn more? Check out the complete success story here.

DISCOVER EVEN MORE SOLUTIONS: Master Data Management: Modern Use Cases for Healthcare

Healthcare MDM Solutions

With Perficient’s expertise in life scienceshealthcare, Informatica, and Agile leadership, we equipped this multi-state integrated health network with a modern, sustainable solution that vastly accelerated the revenue realization cycle.

Have questions? We help the healthcare organizations navigate healthcare data, business processes and technology, and solution integration and implementation. Contact us today, and let’s discuss your specific needs and goals.

]]>
https://blogs.perficient.com/2022/06/29/a-mdm-success-story-streamlining-claims-processing-and-payments-with-reliable-centralized-data/feed/ 2 311874
[Part 2 of 2] How to Refresh Your Marketo Data for the New Year https://blogs.perficient.com/2022/01/18/how-to-refresh-your-marketo-marketing-automation-data-for-the-new-year-2/ https://blogs.perficient.com/2022/01/18/how-to-refresh-your-marketo-marketing-automation-data-for-the-new-year-2/#respond Tue, 18 Jan 2022 12:00:32 +0000 https://blogs.perficient.com/?p=303280

New Year, New You… continued!

Last week, I asked you to take a moment and think about your Marketo instance. When was the last time you focused purely on data hygiene for marketing automation? It’s a fresh new year, and now is the perfect time to refresh your data! As such, I posted a list of quick wins to get you rolling. Below, I’ve included a few additional tasks to help take you even further.

Your order of operations in cleaning the database is important. If done in the wrong order, it can turn into some redundant work. Be sure to complete my previous suggested steps before moving onto these tasks. When you’re ready, let’s jump in.

4. Normalize Data

Why this matters: Normalizing data means improving your data quality by creating standardization, making it easier to reference or report on. Some of the most important reasons to do this for Marketo include ensuring proper lead routing, preventing mailing or dialing errors (thereby reducing costs), ensuring the right leads are included in segmentation and ensuring that reports come out accurate the first time. You’ll love it when your boss says, “Wow – thanks for the fast reply!” with the latest quarterly data ahead of their big meeting.

Pro Tip: There are several ways to normalize data for marketing automation, ranging from using smart campaigns in Marketo, to tools like Microsoft Access, SQL Server, Python, or Excel. If you are working in Marketo and have a list of related items that need normalization but feel like you can build out what you need in one smart campaign, put the item at the top with the most leads attached. This will help your system process these sequences more optimally, reducing performance drag. That said, you will want to be mindful of what data is syncing back and forth between your Marketo instance and your CRM, as the CRM could wipe out all your hard work next sync. If you’re concerned about this, work with your CRM Admin to ensure standardization there as well. For example, instead of allowing an open text field for country or state, use a drop-down list instead.

Screenshot of Marketo Custom Data Normalization Country Smart List

Sample custom Marketo smart campaign’s smart list for Country data normalization process

Screenshot of Marketo Custom Data Normalization Country Flow

Sample custom Marketo smart campaign’s flow for Country data normalization process

 

 

5. Find Missing Data

Why this matters: If you have multiple data import sources, naturally, some data may be missing. If you know that certain data is key for your upcoming year’s campaigns, running tests to find missing data (field is empty) can help you locate or update that info in advance. You might just have added a few thousand new sales-ready leads to your upcoming offer!

Pro Tip: Create a custom “Missing Data Lookup” smart list for just this purpose and keep it in your testing folder. In tandem, create a custom view to see the results. If you find a pattern of missing data, see about enacting a change in your organization to collect that data at the right touchpoint — such as adding progressive profiling to gather that information via Marketo forms or making certain fields required.

Screenshot of Marketo Custom Missing Data Lookup for Marketing Automation

Sample custom Marketo Missing Data Lookup smart list

Screenshot of Marketo Custom Missing Data Lookup View for Marketing Automation

Sample custom Marketo Missing Data Lookup view

6. Clean Up Fields

Why this matters: Hide old and confusing fields so that your team is sure to use the right ones in those complex, expensive campaigns you’ll be running this year. Also, if you have any updates scheduled for your APIs, you and your developer will have an easier time mapping everything. You’ll thank yourself when you’re working with your team to get that quick campaign out the door on Friday just before 5pm.

Pro Tip: If you want to hide a field from a sync between Marketo and Salesforce, you will need to pause the Salesforce/Marketo sync, remove visibility of the field in Salesforce, hide the field on the Marketo end, refresh the schema in Marketo, then turn the sync back on.

Screenshot of Marketo Hidden Custom Field for Marketing Automation

Sample hidden Marketo Custom Field

7. Name Consistently 

Why this matters: There’s nothing more frustrating than pulling a report and seeing a scattered list of items that should have the same naming convention, but don’t, which leads you to spend time combining data points in Excel. Eliminate this as best you can by creating a naming structure for your assets and going through to ensure everything is named correctly.

Pro Tip: You can use Marketo APIs to pull program names, landing page names, list names, etc. Review which ones need naming attention and click through directly to adjust their naming using the exported URL. After you finish the clean-up process, be sure to document the naming convention and ensure others are using it as well.

Screenshot of Perficient Standard Program Naming Convention

If you need help with any of the above, or something much more complicated, please reach out! My team and I would love to see how we can help you and your organization with marketing automation.

]]>
https://blogs.perficient.com/2022/01/18/how-to-refresh-your-marketo-marketing-automation-data-for-the-new-year-2/feed/ 0 303280
[Part 1 of 2] How to Refresh Your Marketo Data for the New Year https://blogs.perficient.com/2022/01/11/how-to-refresh-your-marketo-marketing-automation-data-for-the-new-year-1/ https://blogs.perficient.com/2022/01/11/how-to-refresh-your-marketo-marketing-automation-data-for-the-new-year-1/#respond Tue, 11 Jan 2022 12:00:07 +0000 https://blogs.perficient.com/?p=303146

New Year, New You!

It’s 2022! Take a moment and think about your Marketo instance. When was the last time you focused purely on data hygiene for marketing automation?

If you’re like most Marketo Admins, it’s tough to find the time – but you know how important it is. Never fear, I’m not judging you! However, if you’re looking for a New Year’s Resolution, then now is the perfect time to refresh your data.

Why now? You’ll want to wait until all the data comes in and settles for December 2021, so you have a full year’s worth of data to compare reporting. We’ll call this your baseline. This approach will help demonstrate an improvement of campaign results and increased income thanks to your January 2022 labor investment. Make a note of this for your year-end review!

If your boss is particularly numbers-driven, you might want to share that, according to a 2017 Gartner study — bad data contributes to an estimated average loss of $15 million per organization, per year! Related, the Gartner Marketing Data and Analytics Survey 2020 found that “fifty-four percent of senior marketing respondents in the survey indicate that marketing analytics has not had the influence within their organizations that they expected.” This is primarily because of poor data quality and data findings conflicting with intended courses of action.

Let’s face it: bad data not only means your efforts might not reach intended audiences, but it also contributes to mistrust of data accuracy for decision-making. This can reduce your boss’ perceived value of marketing automation tools like Marketo — possibly leading to decrease in the value of your input.

To get ahead of this issue, I’m posting a two-part blog series to share useful strategies that I recommend to improve your data quality.

Your order of operations in cleaning the marketing automation database is important. If done in the wrong order, it can turn into some redundant work. Here is my first set of suggested tasks for a quick win:

1. Merge Duplicates

Why this matters: If a lead’s email appears within the system more than once, Marketo will email the most recently updated one. However, activity could be attributed to the previously created account. This can cause issues downstream. For example, perhaps Sales looks at the wrong version of a profile and then reaches out to the customer only to feel embarrassed when the customer corrects the Salesperson — leading to an angry call to your desk phone to complain. Or, lead scores split between the old and new profiles, thereby not moving a potentially valuable lead through the lifecycle to MQL or SQL at the right time… possibly leading to loss of revenue. Don’t let this be you. Merge those dupes! And while you’re at it, investigate why dupes are being created in the first place.

Pro Tip: If you don’t have a CRM, use the Marketo System Smart List called ‘Possible Duplicates’ to review and merge dupes. If you have a CRM, such as Salesforce or Microsoft Dynamics 365, you might want to perform the merge in the CRM and allow de-duplication to sync back to Marketo. This is especially key for Microsoft Dynamics.

Screenshot of Marketo Possible Duplicates Smart List

Screenshot from Marketo Database – System Smart Lists – Possible Duplicates

Screenshot of Salesforce Merge Duplicate Leads In Lightning Experience to Sync to Marketing Automation Tool Like Marketo

Screenshot from Salesforce documentation to Merge Duplicate Leads in Lightning Experience

Screenshot of Microsoft Dynamics 365 Merge Duplicate Records For Accounts Contacts Or Leads to Sync to Marketing Automation Tool Like Marketo

Screenshot from Microsoft Dynamics 365 documentation to Merge duplicate records for accounts, contacts, or leads

2. Correct Incorrect Emails

Why this matters: Even if you think that leads that stumbled across your content were ‘organic and free’ that’s not totally true. It costs money to operate systems and teams. So, you want to make sure you’re optimizing every lead generation opportunity that you have so you can accurately trace back the cost per lead and even add in the revenue per lead if you’re able. For example, what if John Smith was a major purchaser at his company, but he mistakenly filled out an intake form with his email as ‘john.smith@gmail.comm.’ Your Marketo system will eventually mark his email invalid… only for you to find out months later. Think of the potential loss of revenue! Also, be sure to do this task before database deletions, because some of these invalid emails might be salvageable.

Pro Tip: To automate this process, you can add custom code to your Marketo forms to prevent spam or mistyped email addresses from entering the system. Or, to handle email addresses already in your system, look through the System Smart List ‘Bounced Email Addresses’, and manually or automatically (smart campaign) fix the errors. During this process, create a “cheat sheet” smart list of the email errors you most commonly encounter — such as “@gmail.comm” or “@yahho.com.” Then, run this smart list to look at the full database for any other errors that come up.

Screenshot of Marketo Custom Common Email Errors

Sample custom Marketo smart list to find email errors

3. Database Deletions

Why this matters: Your Marketo instance price includes lead volume in your database. While leads might still need to live in your CRM, you just might not need them in Marketo. Bad leads might be dragging down your marketing analytics reports. Imagine having a clean database full of only marketable leads! Some examples of types of data that should be deleted include people with no email address, those who have invalid emails, or customers who have unsubscribed. Once you perform this task, you can then set up smart campaigns to automate approved database deletions regularly, such as every month or every quarter.

Pro Tip: Ahead of deleting leads, you will want to ensure that any connected database, such as Salesforce, does not re-sync these people back over into Marketo. You will need to create a Custom Field in Salesforce and Marketo, if you’re using Salesforce. Then, contact Marketo Support to add a rule to not sync those leads back into Marketo. Of note, rules can be setup on the CRM side that can unmark the Custom Field so leads can sync back over if needed. For more information on this, here’s a great post from the Marketo Nation community.

Screenshot of Marketo Deletion List

Sample custom Marketo smart list to find emails recommended for deletion

If you need help with any of the above, or something much more complicated for your marketing automation project, please reach out! My team and I would love to see how we can help you and your organization.

Next week, I will post more tips for sprucing up your Marketo instance for the new year! Stay tuned!

]]>
https://blogs.perficient.com/2022/01/11/how-to-refresh-your-marketo-marketing-automation-data-for-the-new-year-1/feed/ 0 303146
[Technology Tapas] Data Assets – Simple Things You Can Do to Get Started https://blogs.perficient.com/2021/12/01/technology-tapas-data-assets-simple-things-you-can-do-to-get-started/ https://blogs.perficient.com/2021/12/01/technology-tapas-data-assets-simple-things-you-can-do-to-get-started/#respond Wed, 01 Dec 2021 21:24:32 +0000 https://blogs.perficient.com/?p=301817

In a recent Technology Tapas article, I mentioned identifying your data assets to understand how you might use data, what it can tell you about your business, and where there might be gaps in information. You may find your organization has lots of ideas, but you’re unsure where to start.  That’s okay!  The best way to get started is to pick one business challenge you know the organization is passionate about and where you would like to see better data.  Then, ask some questions:

Why is this data valuable?  What does it tell you about your business? 

Sometimes we look at a financial number or an operational metric because it is what the organization has looked at for a long time, but we have lost the reason for looking at it.  Go back to the ‘why’. Does it tell you about profitability, efficiency, or growth?  You may uncover some hidden gems when you do this.

Who owns this data asset? 

It is easy to say that there is something is wrong with data, but many organizations have limited knowledge about how it is produced and maintained.  If you can’t put a name (not an organization) to the data asset, start thinking about why that is.  You may find that no one owns it or is accountable for it.

How is the quality of the data asset? If it is not good, can you identify where it gets corrupted in your business processes?

Much like having a specific owner for data assets, it is important to know how the data is used in business processes. Sometimes data looks good at the department level but does not roll up well to the corporate level. It’s possible that expectations about data quality differ.
For example, if your organization receives data from a third party and it regularly contains incorrect spellings of state names, but your data ingestion team fixes it before you see it in your reports, does that meet quality standards? Can you get the third party to fix it before it hits your door, eliminating some work for the ingestion team? If you didn’t have this data asset, what would you use instead to provide this same value, the same ‘why’?

Asking this may lead to other assets or discoveries of data with which you are not so familiar. One company found their focus on widgets was no longer giving the pulse of the business. They needed to shift to customers, and not just customers overall, but customers by business segment, as they had quite different demographics.

At the end of the day, you want to be able to tell a story about your data and what it means to your company. As you do that, you are starting to understand the value, lineage, and governance around the data. We will talk more on these topics in a future Technology Tapas blog post.

Check out our podcast series on Intelligent Data to learn more about how data is being used in the industry and how you can leverage your data assets.

]]>
https://blogs.perficient.com/2021/12/01/technology-tapas-data-assets-simple-things-you-can-do-to-get-started/feed/ 0 301817
[Podcast] Data Realities for Implementing AI https://blogs.perficient.com/2021/01/28/data-realities-for-implementing-ai/ https://blogs.perficient.com/2021/01/28/data-realities-for-implementing-ai/#respond Thu, 28 Jan 2021 17:17:01 +0000 https://blogs.perficient.com/?p=286718

According to the IDC’s Worldwide Artificial Intelligence Spending Guide, implementing AI has become a necessity for businesses to become more agile, innovate, and scale. And it appears that more companies are coming to terms with this as global spending is expected to reach more than $110 billion by 2024.

In season 1 episode 3 of the Intelligent Data Podcast, host Arvind Murali and his guest Christine Livingston, Perficient’s former Managing Director and Chief Strategist of AI, discuss trends in AI, the value of big data and data quality, supervised learning, AI ethics, and more.

Data is the single most important element to the success of your machine learning program. – Christine Livingston, Managing Director and Chief AI Strategist, Perficient

Listening Guide

Data Realities for Implementing AI

  • Trends in AI and machine learning [2:29]
  • How cloud adoption is helping accelerate AI adoption [4:10]
  • Real versus hype [7:45]
  • Value of data in AI and machine learning [10:55]
  • Data quality and governance [12:10]
  • Three Vs: Volume, variety and veracity of training data [ 15:47]
  • Avoiding inherent human bias [18:27]
  • Data scientists and engineering skillsets [20:53]
  • AI ethics and implications [25:22]
  • Why are software vendors adding AI to their platforms [29:41]
  • Intelligent automation [32:20]
  • Conversation commerce use cases (Automotive, Retail, Healthcare) [34:37]
  • Intelligent automation loan processing use case [37:14]
  • Affordability and ROI [38:02]
  • How to accelerate your AI journey [39:10]

Get This Episode Where You Listen

And don’t forget to subscribe, rate and review!

Apple | Google | Spotify | Amazon | Stitcher | Pocket Casts

Connect with the Host

Meet Intelligent Data Podcast Host Arvind MuraliArvind Murali, Perficient Principal and Chief Strategist

LinkedIn | Perficient

 

 

Learn More About Our AI Solutions

If you are interested in learning more about Perficient’s AI services capabilities or would like to contact us, click here. Our experts can help you start implementing AI in a meaningful way no matter where you fall on the AI maturity model.

]]>
https://blogs.perficient.com/2021/01/28/data-realities-for-implementing-ai/feed/ 0 286718
10 Data and Analytics Trends in 2020 https://blogs.perficient.com/2020/01/07/10-data-and-analytics-trends-in-2020/ https://blogs.perficient.com/2020/01/07/10-data-and-analytics-trends-in-2020/#respond Tue, 07 Jan 2020 12:00:01 +0000 https://blogs.perficient.com/?p=249520

The importance of data and analytics will continue to grow in 2020 and there are ten trends your organization should take note of to stay competitive. In the video below, I’ve outlined these ten trends and what you can do to stay on top of them.

10 Data and Analytics Trends in 2020 [Video]

  1. Data Literacy
  2. Data and AI Ethics
  3. Deep Learning
  4. AI for AI
  5. Data Privacy and Regulations
  6. Cloud Data warehousing
  7. Human in the Loop
  8. Augmented Analytics
  9. Total Data Quality Management
  10. Healthcare AI

]]>
https://blogs.perficient.com/2020/01/07/10-data-and-analytics-trends-in-2020/feed/ 0 249520
Data Quality Improvement is Key to Successful Data Governance https://blogs.perficient.com/2018/08/16/data-quality-improvement-key-successful-data-governance/ https://blogs.perficient.com/2018/08/16/data-quality-improvement-key-successful-data-governance/#respond Thu, 16 Aug 2018 19:22:32 +0000 https://blogs.perficient.com/?p=228157

The goal of any data quality program is to improve quality of data at the source. Once a financial institution’s data lineage capabilities are in place, a key starting point for data quality initiatives is the confirmation of critical data attributes for each major business line and functional area.

The data quality program should define data rules across multiple categories – completeness, validity, consistency, timeliness, and accuracy. These core attributes should be measured through a data quality monitoring capability against these rules on a real-time basis.

When the data quality rules are breached, the reasons for the breach should be investigated proactively and a proposed fix should be identified and remediated. For quick fixes, the data owners can be notified to correct and provide re-feeds or to make the necessary updates for the next delivery.

We recently published a guide that explores the building blocks (i.e., data governance components) of data governance, which can help drive better business decisions, enhance regulatory compliance, and improve risk management. You can download it here.

]]>
https://blogs.perficient.com/2018/08/16/data-quality-improvement-key-successful-data-governance/feed/ 0 228157
Driving Better Decisions with Data Governance https://blogs.perficient.com/2018/07/19/driving-better-decisions-data-governance/ https://blogs.perficient.com/2018/07/19/driving-better-decisions-data-governance/#respond Thu, 19 Jul 2018 18:49:59 +0000 https://blogs.perficient.com/?p=228163

The business capabilities presented in our new guide demonstrates how forward-thinking financial services companies are leveraging data governance to create value for the enterprise. Accurate and timely information continues to be a key driver of enabling better decision making.

Capabilities such as data principles and strategy, data architecture, organizational roles, authoritative sources, data lineage, data quality, and data contracts can be used individually or in concert to create new value for financial management, regulators, or risk management. Leading firms are leveraging these capabilities to maintain excellence in a highly competitive marketplace.

Through technological advances and well-defined business capabilities, new paradigms have been created for leveraging data governance to accelerate value for financial services organizations.

]]>
https://blogs.perficient.com/2018/07/19/driving-better-decisions-data-governance/feed/ 0 228163