Modern data platforms like Databricks enable organizations to process massive volumes of batch and streaming data—but scaling reliably requires more than just compute power. It demands data observability: the ability to monitor, validate, and trace data through its lifecycle.
This blog compares two powerful tools—Delta Live Tables and Great Expectations—that bring observability to life in different but complementary ways. Delta Live Tables (DLTs) provide built-in enforcement and lineage within Databricks pipelines, while Great Expectations (GX) offers deep validation and anomaly detection.
In my experience, Delta Live Tables and Great Expectations are better together. Together, they form a robust observability stack, enabling teams to deliver trusted, production-grade data pipelines across batch and streaming workflows.
I’m not a fan of taking both sides of an argument. Let’s look at our core responsibilities as data engineers from the ground up and follow the solutions where the requirements take us.
A data asset is a managed, valuable dataset. A valuable dataset is not just data—it is data with purpose, invested in through processes and controls, and justified by the business value provided. A managed dataset is actively governed, monitored, and maintained to ensure it delivers sustained value to stakeholders.
Fundamentally, a data asset is considered managed when under governance. Proper data governance, such as Unity Catalog-managed datasets, has at least these fundamental characteristics.
Ownership & Stewardship | Who is responsible for maintaining the data asset? Who can answer questions about it? |
Access Control | Who can read, write, or modify this data? Are permissions aligned with roles and rules? |
Lineage | Where does this data come from? What transformations has it gone through? |
Compliance & Privacy | Is sensitive data (e.g., PII, PHI) addressed? Are retention and masking policies enforced? |
Auditability | Can we trace who accessed or modified the data and when? |
Unity Catalog is the foundation for a well-managed lakehouse. I have written about migrating to Unity Catalog and highlighted some bonus features. If you have not migrated yet, I recommend getting started immediately and then focusing on data value.
The business will communicate the value of a data asset primarily through a Service Level Agreement (SLA). The SLA defines agreed-upon expectations around reliability, performance, and quality.
Reliability describes the resiliency, freshness, and correctness of a data asset.
Performance describes the efficiency of data from ingestion to processing to consumption.
Quality describes the degree to which data meets defined rules, expectations, and standards.
These business expectations are fulfilled by IT through People, Processes, and Technology.
For a price.
Service Level Objectives (SLOs) represent operations domains within the SLAs and can be progressively met. This concept will help to align cost to value within your budget. The dials being tuned here are the Software Development Lifecycle (SDLC) and the Databricks Medallion Architecture. SLAs define commitments for reliability, performance, and quality, and SLOs enforce those commitments throughout the data lifecycle. Each layer strengthens one or more of these domains in the Medallion architecture. Across the SDLC, IT teams progressively validate and enforce these guarantees to ensure production-grade data assets.
The workspace is the primary environment for working with data assets in Unity Catalog. Value is typically proportional within the layers from left to right.
SLA Domain | Dev | Test | Prod |
---|---|---|---|
Reliability | Monitor source connectivity and pipeline triggers | Validate pipeline scheduling, retries | SLAs ensure on-time delivery for consumers |
Performance | Baseline performance benchmarks | Load testing, profiling | Optimize for SLAs: query latency, data delivery speed |
Quality | Create GE/DQX test suites | Enforce checks with alerts | Blocking rules and alerting on quality failures |
The catalog is the primary unit of data isolation in the Databricks data governance model. Value is typically proportional within the layers from right to left.
SLA Domain | Bronze (Raw) | Silver (Cleaned) | Gold (Curated) |
---|---|---|---|
Reliability | Data lands on time; raw source integrity is monitored | DLT jobs run consistently; schema evolution is managed | Timely delivery of business-critical data |
Performance | Ingest processes optimized for load handling | Transformations are performant; no bottlenecks | Dashboards and queries load quickly |
Quality | Basic data profiling and source rule checks | DQ rules (e.g., null checks, constraints) enforced | Golden datasets meet business expectations for data quality |
Data becomes a true asset as it progresses through these layers, accruing value while incurring costs to meet increasing SLA expectations.
We are back to where we started, with a little more context. Great Expectations (GX) is focused on data validation and profiling, while Delta Live Tables (DLT) handles schema enforcement and transformations. While DLTs may not have sophisticated rule and profiling capabilities, their native integration to Unity Catalog allows for Its performance characteristics to be similar across both batch and streaming, while GX can struggle with streaming from a performance perspective.
The exercise of defining the progression of value across the SDLC and Medallion Architecture now pays dividends. DLTs stand out for end-to-end data management with automatic lineage management and schema evolution. Great Expectations can then be run as a separate process for more advanced data quality checks and profiling. This could be incorporated as part of a more advanced CI/CD process or just managed manually.
The key is not to focus on a tool in isolation with the idea of picking a winner. I believe most developers could become cross-trained on both technologies. Neither should be outside the scope of a junior data engineer. People are not a problem. I wish that DLTs were integrated with Great Expectations so I didn’t need to have two technologies, but a little Process goes a long way to resolve that Technology issue.
Integrating Delta Live Tables and Great Expectations within the Software Development Lifecycle (SDLC) and the Medallion Architecture helps teams reduce operational costs while continuously delivering business value.
This hybrid approach supports robust data engineering practices and empowers organizations to scale with confidence, optimize their cloud spend, and maximize the return on data investments.
Contact us to learn more about how to empower your teams with the right tools, processes, and training to unlock Databricks’ full potential cost-consciously.
]]>We are witnessing a sea-change in the way data is managed by banks and financial institutions all over the world. Data being commoditized and, in some cases, even monetized by banks is the order of the day. Though this seems to be at a stage where some more push is required in terms of adoption in the risk management function. Traditional risk managers, by their job definition, are highly cautious of the result sets provided by the analytics teams. I have even heard the phrase “Please check the report, I don’t understand the models and hence trust the number”.
So, in the risk function, while this is a race for data aggregation, structured data, unstructured data, data quality, data granularity, news feeds, market overviews, its also a challenge from an acceptance perspective. The vision is that all of the data can be aggregated, harmonized and used for better, faster and more informed decision making for Financial and Non Financial Risk Management. The interdependencies between the risks were factors that were not considered in the “Good Old Days” of risk management (pun intended).
Based on my experience, here are the common issues that are faced by banks running a risk of not having a good risk data strategy.
1. The IT-Business tussle (“YOU don’t know what YOU are doing”)
This according to me is the biggest challenge facing traditional banks, especially in the risk function. “The Business”, in traditional banks, is treated like a larger-than-life entity that needs to be supported by IT. This notion of IT being the service provider, whilst business is the “bread-earner”, especially in the traditional banks’ risk departments; does not hold good anymore. It has been proven time and again that the two cannot function without each other and that’s what needs to be cultivated as a management mindset for strategic data management effort as well. This is a culture change, but it’s happening slowly and will have to be adapted industry-wide. It has been proven that the financial institutions with the most organized data have a significant market advantage.
2. Data Overload (“Dude! where’s my Insight”)
The primary goal of data management, sourcing and aggregation effort will have to be converting data into informational insights. The team analyzing the data warehouses, the data lakes and aiding the analytics will have to have this one major organizational goal in mind. Banks have silos, these silos have been created due to mergers, regulations, entities, risk types, chinese walls, data protection, land laws or sometimes just technological challenges over time. The solution to most this is to start with a clean slate. The management mandate for getting the right people to talk and be vested in this change is crucial, challenging but crucial. Good old analysis techniques and brain storming sessions for weeding out what is unnecessary and getting the right set of elements is the key. This needs an overhaul in the way the banking business has been traditionally looking at data i.e. something that is needed for reporting. Understanding of the data lineage and touchpoint systems is most crucial.
3. The CDO Dilemma (“To meta or not to meta”)
The CDO’s role in most banks is now well defined. The risk and compliance analytics and reporting division almost solely depends on the CDO function for insights on regulatory reporting and other forms of innovative data analytics. The key success factor of the CDO organization lies in allocation of the right set of analysts to the business areas. A CDO analyst on the market risk side, for instance, will have to be well versed with market data, bank hierarchies, VaR Calculation engines, Risk not in VaR (RNiV); supporting reference data in addition to the trade systems data that these data elements will have a direct or indirect impact on. Notwithstanding the critical data elements. An additional understanding of how this would impact other forms of risk reporting, like credit risk and non-financial risk is definitely a nice to have. Defining a meta-data strategy for the full lineage, its touch-points and transformations is a strenuous effort in analysis of systems owned by disparate teams with siloed implementation patterns over time. One fix that I saw working is that every significant application group / team can have a senior representative for the CDO interaction. Vested stakeholder interest is turning out to be the one major success factor in the programs that have been successful. This ascertains completeness of the critical data elements definition and hence aid data governance strategy in a wholesome way.
4. The ever-changing nature of financial risk management (“What did they change now?”)
The Basel Committee recommendations have been consistent in driving the urge to reinvent processes in the risk management area. With Fundamental Review of the Trading Book (FRTB) the focus has been very clearly realigned to data processes in organizations. Whilst the big banks already had demonstrated a sound understanding of modellable risk factors based on scenarios, this time the Basel committee has also asked banks to focus on Non-Modellable Risk factors (NMRF). Add the standard approach (sensitivities defined by regulator) and internal models approach (IMA – Bank defined enhanced sensitivities), the change from entity based risk calculations to desk based is a significant paradigm shift. Single golden-source definition for transaction data along with desk structure validation seems to be a major area of concern amongst banks.
Add climate risk to the mix with the Paris accord, the RWA calculations will now need additional data points, additional models and additional investment in external data defining the physical and transition risk associated. Data-lake / Big Data solutions with defined critical data elements and a full log of transformations with respect to lineage is a significant investment but will only work in favor of any more changes that come through on the regulations side. There have always been banks that have been great at this consistently and banks that lag significantly.
All and all, risk management happens to be a great use case for a greenfield CDO data strategy implementation, and these hurdles have to be handled before the ultimate Zen goal of a perfect risk data strategy. Believe me, the first step is to get the bank’s consolidated risk data strategy right and everything else will follow.
This is a 2021 article, also published here – Risk Management Data Strategy – Insights from an Inquisitive Overseer | LinkedIn
]]>Data warehouse testing is a process of verifying data loaded in a data warehouse to ensure the data meets the business requirements. This is done by certifying data transformations, integrations, execution, and scheduling order of various data processes.
Extract, transform, and load (ETL) Testing is the process of verifying the combined data from multiple sources into a large, central repository called a data warehouse.
Conventional Testing tools are designed for UI-based applications, whereas a data warehouse testing tool is purposefully built for data-centric systems and designed to automate data warehouse testing and generating results. It is also used during the development phase of DWH.
Integrity Check Engine For Data Quality (iCEDQ) is one of the tools used for data warehouse testing which aims to overcome some of the challenges associated with conventional methods of data warehouse testing, such as manual testing, time-consuming processes, and the potential for human error.
It is an Automation Platform with a rules-based auditing approach enabling organizations to automate various test strategies like ETL Testing, Data Migration Testing, Big Data Testing, BI Testing, and Production Data Monitoring.
It tests data transformation processes and ensures compliance with business rules in a Data Warehouse.
Let us see some of the traits where testing extends its uses.
It is a data testing and monitoring platform for all sizes of files and databases. It automates ETL Testing and helps maintain the sanctity of your data by making sure everything is valid.
It is designed with a greater ability to identify any data issues in and across structured and semi-structured data.
Its unique in-memory engine with support for SQL, Apache Groovy, Java, and APIs allows organizations to implement end-to-end automation for Data Testing and Monitoring.
This tool provides customers an easy way to set up an automated solution for end-to-end testing of their data-centric projects and it provides Email Support to its customers
Mostly widely used by Enterprises and Business Users and used in platforms like Web apps and Windows. Does not support MAC, Android, and IOS.
New Big Data Edition test 1.7 Billion rows in less than 2 minutes and Recon Rule with around 20 expressions for 1.7 billion rows in less than 30 minutes.
With a myriad of capabilities, iCEDQ seamlessly empowers users to automate data testing, ensuring versatility and reliability for diverse data-centric projects.
There are few data validations and reconciliation the business data and validation can be done in ETL/Big data testing.
iCEDQ ensures accuracy by validating all data migrated from the legacy system to the new one.
iCEDQ is mainly used for support projects to monitor after migrating to the PROD environment. It continuously monitors ETL jobs and notifies the data issues through a mail trigger.
Reduces project timeline by 33%, increases test coverage by 200%, and improves productivity by 70%.
In addition to its automation capabilities, iCEDQ offers unparalleled advantages, streamlining data testing processes, enhancing accuracy, and facilitating efficient management of diverse datasets. Moreover, the platform empowers users with comprehensive data quality insights, ensuring robust and reliable Data-Centric project outcomes.
Users can create different types of rules in iCEDQ to automate the testing of their Data-Centric projects. Each rule performs a different type of test cases for the different datasets.
By leveraging iCEDQ, users can establish diverse rules, enabling testing automation for their Data-Centric projects. Tailoring each rule within the system to execute distinct test cases caters to the specific requirements of different datasets.
iCEDQ’s technical specifications and system requirements to determine if it’s compatible with the operating system and other software.
To successfully deploy iCEDQ, it is essential to consider its system requirements. Notably, the platform demands specific configurations and resources, ensuring optimal performance. Additionally, adherence to these requirements guarantees seamless integration, robust functionality, and efficient utilization of iCEDQ for comprehensive data testing and quality assurance.
Hence, iCEDQ is a powerful Data Mitigation and ETL/Data Warehouse Testing Automation Solution designed to give users total control over how they verify and compare data sets. With iCEDQ, they can build various types of tests or rules for data set validation and comparison.
Resources related to iCEDQ – https://icedq.com/resources
]]>AI is reliant upon data to acquire knowledge and drive decision-making processes. Therefore, the data quality utilized for training AI models is vital in influencing their accuracy and dependability. Data noise in machine learning refers to the occurrence of errors, outliers, or inconsistencies within the data, which can degrade the quality and reliability of AI models. When an algorithm interprets noise as a meaningful pattern, it may mistakenly draw generalized conclusions, giving rise to erroneous outcomes. Therefore, it is vital to identify and remove data noise from the dataset before initiating the training process for the AI model to ensure accurate and reliable results. Below is a set of guidelines to mitigate data noise and enhance the quality of training datasets utilized in AI models.
AI collects the relevant data from various sources. Data Quality checks and supporting rules will make sure all the details needed for organizing and formatting it are there so that AI algorithms can easily understand and learn from it. There are several widely used techniques for eliminating data noise. One such technique is outlier detection, which involves identifying and eliminating data points that deviate significantly from the rest of the data. Another technique is data smoothing, where moving averages or regressions are applied to reduce the impact of noisy data and make it more consistent. Additionally, data cleaning plays a crucial role in removing inconsistent or incorrect values from the dataset, ensuring its integrity and reliability. Data professionals can perform Data profiling to understand the data and then integrate the cleaning rules within data engineering pipelines.
Proper data validation is mandatory for even the most performant algorithms to predict accurate results. Once the data is collected and preprocessed, validation against reference values that have been tried and tested on various occasions would enhance confidence in data quality. This step involves checking the training data for accuracy, completeness, and relevance. Any missing, incorrect, or irrelevant data found is to be corrected or removed.
One such check is the field length check, which restricts the number of characters entered within a specific field. Phone numbers are an example where any number entered with more than ten digits needs correction before being used for prediction models. Another important check is the range check, where the entered number must fall within a specified range. Consider a scenario where you possess a dataset containing the blood glucose levels of individuals diagnosed with diabetes. As customary, blood glucose levels are quantified in milligrams per deciliter (mg/dL). To validate the data, it is imperative to ascertain that the entered blood glucose levels fall within a reasonable range, let’s say between 70 and 300 mg/dL.
A range check would establish a restriction, enabling only values within this range to enter the blood glucose level field. Any values that surpass this range would be promptly flagged and corrected before using in the training dataset. This meticulous validation process ensures the accuracy and reliability of the blood glucose data for further analysis and decision-making. Additionally, the present check must ensure data completeness, meaning a field cannot be left empty. For example, a Machine Learning algorithm to predict package delivery performance must implement a completeness check to verify that each package detail in training data records valid values of the customer’s name, shipment origin, and destination address.
Regular monitoring is crucial for maintaining data quality in an AI system. As new data is collected and integrated into the system, it is essential to continuously assess the data’s accuracy, completeness, and consistency. The AI system can operate with precision and dependability by upholding high data quality standards. Various metrics for monitoring data quality, including accuracy, completeness, and consistency, highlight any change in data quality.
The accuracy metric evaluates the alignment between the data and reality, such as verifying the correctness and currency of customer addresses. The completeness metric measures the extent to which the required data is present in the dataset, such as ensuring that each order record contains all the necessary field values. The consistency metric examines the adherence of the data to established rules or standards, such as checking if dates follow a standard format. The AI system can maintain its accuracy and reliability over time by consistently monitoring these metrics and others.
Implementing these techniques allows AI systems to improve the quality of training datasets for AI models, resulting in more accurate and superior decision-making reliable outcomes.
]]>Does this sound familiar: you’re running multiple point-of-care systems, each with redundant data points collected. But what happens when there are small variations in how data is captured and processed across these systems? As a healthcare provider client of ours discovered, inconsistent data can reduce revenue realization. It caused claims to be improperly matched and processed… then denied – a stressful, frustrating, and business-impacting experience.
We partnered with the provider to consolidate their siloed data. Our Agile practitioners recognized the organization’s challenge as a prime use case for Informatica, an automated solution that would compare, streamline, and harmonize patient data in order to establish a single, reliable source of truth and boost claims processing efficiencies.
READ MORE: We’re A Cloud and On-Premises Certified Informatica Partner Master Data Management: Modern Use Cases for Healthcare
To deliver a single source of truth, our team of industry and technical experts partnered with the provider to achieve the following:
Data quality improvements achieved with this solution significantly reduced the cost and complexity associated with integrating newly acquired hospitals, increased the frequency of successfully billed patient claims, and limited confusion and frustration associated with providers’ attempts to understand and reverse denied claims.
And the results are impressive. Want to learn more? Check out the complete success story here.
DISCOVER EVEN MORE SOLUTIONS: Master Data Management: Modern Use Cases for Healthcare
With Perficient’s expertise in life sciences, healthcare, Informatica, and Agile leadership, we equipped this multi-state integrated health network with a modern, sustainable solution that vastly accelerated the revenue realization cycle.
Have questions? We help the healthcare organizations navigate healthcare data, business processes and technology, and solution integration and implementation. Contact us today, and let’s discuss your specific needs and goals.
]]>Last week, I asked you to take a moment and think about your Marketo instance. When was the last time you focused purely on data hygiene for marketing automation? It’s a fresh new year, and now is the perfect time to refresh your data! As such, I posted a list of quick wins to get you rolling. Below, I’ve included a few additional tasks to help take you even further.
Your order of operations in cleaning the database is important. If done in the wrong order, it can turn into some redundant work. Be sure to complete my previous suggested steps before moving onto these tasks. When you’re ready, let’s jump in.
Why this matters: Normalizing data means improving your data quality by creating standardization, making it easier to reference or report on. Some of the most important reasons to do this for Marketo include ensuring proper lead routing, preventing mailing or dialing errors (thereby reducing costs), ensuring the right leads are included in segmentation and ensuring that reports come out accurate the first time. You’ll love it when your boss says, “Wow – thanks for the fast reply!” with the latest quarterly data ahead of their big meeting.
Pro Tip: There are several ways to normalize data for marketing automation, ranging from using smart campaigns in Marketo, to tools like Microsoft Access, SQL Server, Python, or Excel. If you are working in Marketo and have a list of related items that need normalization but feel like you can build out what you need in one smart campaign, put the item at the top with the most leads attached. This will help your system process these sequences more optimally, reducing performance drag. That said, you will want to be mindful of what data is syncing back and forth between your Marketo instance and your CRM, as the CRM could wipe out all your hard work next sync. If you’re concerned about this, work with your CRM Admin to ensure standardization there as well. For example, instead of allowing an open text field for country or state, use a drop-down list instead.
Sample custom Marketo smart campaign’s smart list for Country data normalization process
Sample custom Marketo smart campaign’s flow for Country data normalization process
Why this matters: If you have multiple data import sources, naturally, some data may be missing. If you know that certain data is key for your upcoming year’s campaigns, running tests to find missing data (field is empty) can help you locate or update that info in advance. You might just have added a few thousand new sales-ready leads to your upcoming offer!
Pro Tip: Create a custom “Missing Data Lookup” smart list for just this purpose and keep it in your testing folder. In tandem, create a custom view to see the results. If you find a pattern of missing data, see about enacting a change in your organization to collect that data at the right touchpoint — such as adding progressive profiling to gather that information via Marketo forms or making certain fields required.
Sample custom Marketo Missing Data Lookup smart list
Sample custom Marketo Missing Data Lookup view
Why this matters: Hide old and confusing fields so that your team is sure to use the right ones in those complex, expensive campaigns you’ll be running this year. Also, if you have any updates scheduled for your APIs, you and your developer will have an easier time mapping everything. You’ll thank yourself when you’re working with your team to get that quick campaign out the door on Friday just before 5pm.
Pro Tip: If you want to hide a field from a sync between Marketo and Salesforce, you will need to pause the Salesforce/Marketo sync, remove visibility of the field in Salesforce, hide the field on the Marketo end, refresh the schema in Marketo, then turn the sync back on.
Sample hidden Marketo Custom Field
Why this matters: There’s nothing more frustrating than pulling a report and seeing a scattered list of items that should have the same naming convention, but don’t, which leads you to spend time combining data points in Excel. Eliminate this as best you can by creating a naming structure for your assets and going through to ensure everything is named correctly.
Pro Tip: You can use Marketo APIs to pull program names, landing page names, list names, etc. Review which ones need naming attention and click through directly to adjust their naming using the exported URL. After you finish the clean-up process, be sure to document the naming convention and ensure others are using it as well.
If you need help with any of the above, or something much more complicated, please reach out! My team and I would love to see how we can help you and your organization with marketing automation.
]]>It’s 2022! Take a moment and think about your Marketo instance. When was the last time you focused purely on data hygiene for marketing automation?
If you’re like most Marketo Admins, it’s tough to find the time – but you know how important it is. Never fear, I’m not judging you! However, if you’re looking for a New Year’s Resolution, then now is the perfect time to refresh your data.
Why now? You’ll want to wait until all the data comes in and settles for December 2021, so you have a full year’s worth of data to compare reporting. We’ll call this your baseline. This approach will help demonstrate an improvement of campaign results and increased income thanks to your January 2022 labor investment. Make a note of this for your year-end review!
If your boss is particularly numbers-driven, you might want to share that, according to a 2017 Gartner study — bad data contributes to an estimated average loss of $15 million per organization, per year! Related, the Gartner Marketing Data and Analytics Survey 2020 found that “fifty-four percent of senior marketing respondents in the survey indicate that marketing analytics has not had the influence within their organizations that they expected.” This is primarily because of poor data quality and data findings conflicting with intended courses of action.
Let’s face it: bad data not only means your efforts might not reach intended audiences, but it also contributes to mistrust of data accuracy for decision-making. This can reduce your boss’ perceived value of marketing automation tools like Marketo — possibly leading to decrease in the value of your input.
Your order of operations in cleaning the marketing automation database is important. If done in the wrong order, it can turn into some redundant work. Here is my first set of suggested tasks for a quick win:
Why this matters: If a lead’s email appears within the system more than once, Marketo will email the most recently updated one. However, activity could be attributed to the previously created account. This can cause issues downstream. For example, perhaps Sales looks at the wrong version of a profile and then reaches out to the customer only to feel embarrassed when the customer corrects the Salesperson — leading to an angry call to your desk phone to complain. Or, lead scores split between the old and new profiles, thereby not moving a potentially valuable lead through the lifecycle to MQL or SQL at the right time… possibly leading to loss of revenue. Don’t let this be you. Merge those dupes! And while you’re at it, investigate why dupes are being created in the first place.
Pro Tip: If you don’t have a CRM, use the Marketo System Smart List called ‘Possible Duplicates’ to review and merge dupes. If you have a CRM, such as Salesforce or Microsoft Dynamics 365, you might want to perform the merge in the CRM and allow de-duplication to sync back to Marketo. This is especially key for Microsoft Dynamics.
Screenshot from Marketo Database – System Smart Lists – Possible Duplicates
Screenshot from Salesforce documentation to Merge Duplicate Leads in Lightning Experience
Screenshot from Microsoft Dynamics 365 documentation to Merge duplicate records for accounts, contacts, or leads
Why this matters: Even if you think that leads that stumbled across your content were ‘organic and free’ that’s not totally true. It costs money to operate systems and teams. So, you want to make sure you’re optimizing every lead generation opportunity that you have so you can accurately trace back the cost per lead and even add in the revenue per lead if you’re able. For example, what if John Smith was a major purchaser at his company, but he mistakenly filled out an intake form with his email as ‘john.smith@gmail.comm.’ Your Marketo system will eventually mark his email invalid… only for you to find out months later. Think of the potential loss of revenue! Also, be sure to do this task before database deletions, because some of these invalid emails might be salvageable.
Pro Tip: To automate this process, you can add custom code to your Marketo forms to prevent spam or mistyped email addresses from entering the system. Or, to handle email addresses already in your system, look through the System Smart List ‘Bounced Email Addresses’, and manually or automatically (smart campaign) fix the errors. During this process, create a “cheat sheet” smart list of the email errors you most commonly encounter — such as “@gmail.comm” or “@yahho.com.” Then, run this smart list to look at the full database for any other errors that come up.
Sample custom Marketo smart list to find email errors
Why this matters: Your Marketo instance price includes lead volume in your database. While leads might still need to live in your CRM, you just might not need them in Marketo. Bad leads might be dragging down your marketing analytics reports. Imagine having a clean database full of only marketable leads! Some examples of types of data that should be deleted include people with no email address, those who have invalid emails, or customers who have unsubscribed. Once you perform this task, you can then set up smart campaigns to automate approved database deletions regularly, such as every month or every quarter.
Pro Tip: Ahead of deleting leads, you will want to ensure that any connected database, such as Salesforce, does not re-sync these people back over into Marketo. You will need to create a Custom Field in Salesforce and Marketo, if you’re using Salesforce. Then, contact Marketo Support to add a rule to not sync those leads back into Marketo. Of note, rules can be setup on the CRM side that can unmark the Custom Field so leads can sync back over if needed. For more information on this, here’s a great post from the Marketo Nation community.
Sample custom Marketo smart list to find emails recommended for deletion
If you need help with any of the above, or something much more complicated for your marketing automation project, please reach out! My team and I would love to see how we can help you and your organization.
Next week, I will post more tips for sprucing up your Marketo instance for the new year! Stay tuned!
]]>In a recent Technology Tapas article, I mentioned identifying your data assets to understand how you might use data, what it can tell you about your business, and where there might be gaps in information. You may find your organization has lots of ideas, but you’re unsure where to start. That’s okay! The best way to get started is to pick one business challenge you know the organization is passionate about and where you would like to see better data. Then, ask some questions:
Sometimes we look at a financial number or an operational metric because it is what the organization has looked at for a long time, but we have lost the reason for looking at it. Go back to the ‘why’. Does it tell you about profitability, efficiency, or growth? You may uncover some hidden gems when you do this.
It is easy to say that there is something is wrong with data, but many organizations have limited knowledge about how it is produced and maintained. If you can’t put a name (not an organization) to the data asset, start thinking about why that is. You may find that no one owns it or is accountable for it.
Much like having a specific owner for data assets, it is important to know how the data is used in business processes. Sometimes data looks good at the department level but does not roll up well to the corporate level. It’s possible that expectations about data quality differ.
For example, if your organization receives data from a third party and it regularly contains incorrect spellings of state names, but your data ingestion team fixes it before you see it in your reports, does that meet quality standards? Can you get the third party to fix it before it hits your door, eliminating some work for the ingestion team? If you didn’t have this data asset, what would you use instead to provide this same value, the same ‘why’?
Asking this may lead to other assets or discoveries of data with which you are not so familiar. One company found their focus on widgets was no longer giving the pulse of the business. They needed to shift to customers, and not just customers overall, but customers by business segment, as they had quite different demographics.
At the end of the day, you want to be able to tell a story about your data and what it means to your company. As you do that, you are starting to understand the value, lineage, and governance around the data. We will talk more on these topics in a future Technology Tapas blog post.
Check out our podcast series on Intelligent Data to learn more about how data is being used in the industry and how you can leverage your data assets.
]]>According to the IDC’s Worldwide Artificial Intelligence Spending Guide, implementing AI has become a necessity for businesses to become more agile, innovate, and scale. And it appears that more companies are coming to terms with this as global spending is expected to reach more than $110 billion by 2024.
In season 1 episode 3 of the Intelligent Data Podcast, host Arvind Murali and his guest Christine Livingston, Perficient’s former Managing Director and Chief Strategist of AI, discuss trends in AI, the value of big data and data quality, supervised learning, AI ethics, and more.
Data is the single most important element to the success of your machine learning program. – Christine Livingston, Managing Director and Chief AI Strategist, Perficient
And don’t forget to subscribe, rate and review!
Apple | Google | Spotify | Amazon | Stitcher | Pocket Casts
Arvind Murali, Perficient Principal and Chief Strategist
If you are interested in learning more about Perficient’s AI services capabilities or would like to contact us, click here. Our experts can help you start implementing AI in a meaningful way no matter where you fall on the AI maturity model.
]]>The importance of data and analytics will continue to grow in 2020 and there are ten trends your organization should take note of to stay competitive. In the video below, I’ve outlined these ten trends and what you can do to stay on top of them.
The goal of any data quality program is to improve quality of data at the source. Once a financial institution’s data lineage capabilities are in place, a key starting point for data quality initiatives is the confirmation of critical data attributes for each major business line and functional area.
The data quality program should define data rules across multiple categories – completeness, validity, consistency, timeliness, and accuracy. These core attributes should be measured through a data quality monitoring capability against these rules on a real-time basis.
When the data quality rules are breached, the reasons for the breach should be investigated proactively and a proposed fix should be identified and remediated. For quick fixes, the data owners can be notified to correct and provide re-feeds or to make the necessary updates for the next delivery.
We recently published a guide that explores the building blocks (i.e., data governance components) of data governance, which can help drive better business decisions, enhance regulatory compliance, and improve risk management. You can download it here.
]]>The business capabilities presented in our new guide demonstrates how forward-thinking financial services companies are leveraging data governance to create value for the enterprise. Accurate and timely information continues to be a key driver of enabling better decision making.
Capabilities such as data principles and strategy, data architecture, organizational roles, authoritative sources, data lineage, data quality, and data contracts can be used individually or in concert to create new value for financial management, regulators, or risk management. Leading firms are leveraging these capabilities to maintain excellence in a highly competitive marketplace.
Through technological advances and well-defined business capabilities, new paradigms have been created for leveraging data governance to accelerate value for financial services organizations.
]]>