data governance Articles / Blogs / Perficient https://blogs.perficient.com/tag/data-governance/ Expert Digital Insights Mon, 02 Dec 2024 21:57:35 +0000 en-US hourly 1 https://blogs.perficient.com/files/favicon-194x194-1-150x150.png data governance Articles / Blogs / Perficient https://blogs.perficient.com/tag/data-governance/ 32 32 30508587 SAP and Databricks: Better Together https://blogs.perficient.com/2024/11/17/sap-and-databricks-better-together/ https://blogs.perficient.com/2024/11/17/sap-and-databricks-better-together/#respond Sun, 17 Nov 2024 23:07:21 +0000 https://blogs.perficient.com/?p=372152

Across industries like manufacturing, energy, life sciences, and retail, data drives decisions on durability, resilience, and sustainability. A significant share of this critical data resides in SAP systems, which is why so many business have invested i SAP Datasphere. SAP Datasphere is a comprehensive data service that enables seamless access to mission-critical business data across SAP and non-SAP systems. It acts as a business data fabric, preserving the semantic context, relationships, and logic of SAP data. Datasphere empowers organizations to unify and analyze their enterprise data landscape without the need for complex extraction or rebuilding processes.

No single platform architecture can satisfy all the needs and use cases of large complex enterprises, so SAP partnered with a small handful of companies to enhance and enlarge the scope of their offering. Databricks was selected to deliver bi-directional integration with their Databricks Lakehouse platform. This blog explores the key features of SAP Datasphere and Databricks, their complementary roles in modern data architectures, and the business value they deliver when integrated.

What is SAP Datasphere?

SAP Datasphere is designed to simplify data landscapes by creating a business data fabric. It enables seamless and scalable access to SAP and non-SAP data with its business context, logic, and semantic relationships preserved. Key features of the data fabric include:

  • Data Cataloging
    Centralized metadata management and lineage.
  • Semantic Modeling
    Retaining relationships, hierarchies, and KPIs for analytics.
  • Federation and Replication
    Choose between connecting or replicating data.
  • Data Pipelines
    Automated, resilient pipelines for SAP and non-SAP sources.

What is Databricks?

A data lakehouse is a unified platform that combines the scalability and flexibility of a data lake with the structure and performance of a data warehouse. It is designed to store all types of data (structured, semi-structured, unstructured) and support diverse workloads, including business intelligence, real-time analytics, machine learning and artificial intelligence.

  • Unified Data Storage
    Combines the scalability and flexibility of a data lake with the structured capabilities of a data warehouse.
  • Supports All Data Types
    Handles structured, semi-structured, and unstructured data in a single platform.
  • Performance and Scalability
    Optimized for high-performance querying, batch processing, and real-time analytics.
  • Simplified Architecture
    Eliminates the need for separate data lakes and data warehouses, reducing duplication and complexity.
  • Advanced Analytics and AI
    Provides native support for machine learning, predictive analytics, and big data processing.
  • ACID Compliance
    Ensures reliability and consistency for transactional and analytical workloads using features like Delta Lake.
  • Cost-Effectiveness
    Reduces infrastructure and operational costs by consolidating data architectures.

How do they complement each other?

While each architecture has pros and cons, the point of this partnership is that these two architectures are better together. Consider a retail company that combines SAP Datasphere’s enriched sales and inventory data with Databricks Lakehouse’s real-time analytics capabilities. By doing so, they can optimize pricing strategies based on demand forecasts while maintaining a unified view of their data landscape. Data-driven enterprises can achieve the following goals by combining these two architectures.

  • Unified Data Access Meets Unified Processing Power
    A data fabric excels at connecting data across systems while retaining semantic context. Integrating with a lakehouse allows organizations to bring this connected data into a platform optimized for advanced processing, AI, and analytics, enhancing its usability and scalability.
  • Advanced Analytics on Connected Data
    While a data fabric ensures seamless access to SAP and non-SAP data, a lakehouse enables large-scale processing, machine learning, and real-time insights. This combination allows businesses to derive richer insights from interconnected data, such as predictive modeling or customer 360° analytics.
  • Data Governance and Security
    Data fabrics provide robust governance by maintaining lineage, metadata, and access policies. Integrating with a lakehouse ensures these governance frameworks are applied to advanced analytics and AI workflows, safeguarding compliance while driving innovation.
  • Simplified Data Architectures
    Integrating a fabric with a lakehouse reduces the complexity of data pipelines. Instead of duplicating or rebuilding data in silos, organizations can use a fabric to federate and enrich data and a lakehouse to unify and analyze it in one scalable platform.
  • Business Context for Data Science
    A data lakehouse benefits from the semantic richness provided by the data fabric. Analysts and data scientists working in the lakehouse can access data with preserved hierarchies, relationships, and KPIs, accelerating the development of business-relevant models. Add to that the additional use cases provided by Generative AI are still emerging.

Conclusion

The integration of SAP Datasphere and the Databricks Lakehouse represents a transformative approach to enterprise data management. By uniting the strengths of a business data fabric with the advanced analytics and scalability of a lakehouse architecture, organizations can drive better decisions, foster innovation, and simplify their data landscapes. Whether it’s unifying SAP and non-SAP data, enabling real-time insights, or scaling AI initiatives, this partnership provides a roadmap for the future of data-driven enterprises.

Contact us to learn more about how SAP Datasphere and Databricks Lakehouse working together can help supercharge your enterprise.

 

]]>
https://blogs.perficient.com/2024/11/17/sap-and-databricks-better-together/feed/ 0 372152
A New Era of AI Agents in the Enterprise? https://blogs.perficient.com/2024/10/22/a-new-era-of-custom-ai-in-the-enterprise/ https://blogs.perficient.com/2024/10/22/a-new-era-of-custom-ai-in-the-enterprise/#respond Tue, 22 Oct 2024 18:08:30 +0000 https://blogs.perficient.com/?p=370801

In a move that has sparked intense discussion across the enterprise software landscape, Klarna announced its decision to drop both Salesforce Sales Cloud and Workday, replacing these industry-leading platforms with its own AI-driven tools. This announcement, led by CEO Sebastian Siemiatkowski, may signal a paradigm shift toward using custom AI agents to manage critical business functions such as customer relationship management (CRM) and human resources (HR). While mostly social media fodder at this point, this very public bet on SaaS replacement has raised important questions about the future of enterprise software and how Agentic AI might reshape the way businesses operate.

AI Agents – Impact on Enterprises

Klarna’s move maybe be a one-off internal pivot or it may signal broader shifts that impact enterprises worldwide. Here are three ways this transition could affect the broader market:

  1. Customized AI Over SaaS for Competitive Differentiation Enterprises are always on the lookout for ways to differentiate themselves from the competition. Klarna’s decision may reflect an emerging trend: companies developing custom Agentic AI solutions to better tailor workflows and processes to their specific needs. The advantage here lies in having a system that is purpose-built for an organization’s unique requirements, potentially driving innovation and efficiencies that are difficult to achieve with out-of-the-box software. However, this approach also raises challenges. Building Agentic AI solutions in-house requires significant technical expertise, resources, and time. Not all companies will have the bandwidth to undertake such a transformation, but for those who do, it could become a key differentiator in terms of operational efficiency and personalized customer experiences.
  2. Shift in Vendor Relationships and Power Dynamics If more enterprises follow Klarna’s lead, we could see a shift in the traditional vendor-client dynamic. For years, businesses have relied on SaaS providers like Salesforce and Workday to deliver highly specialized, integrated solutions. However, AI-driven automation might diminish the need for comprehensive, multi-purpose platforms. Instead, companies might lean towards modular, lightweight tech stacks powered by AI agents, allowing for greater control and flexibility. This shift could weaken the power and influence of SaaS providers if enterprises increasingly build customized systems in-house. On the other hand, it could also lead to new forms of partnership between AI providers and SaaS companies, where AI becomes a layer on top of existing systems rather than a full replacement.
  3. Greater Focus on Data and Compliance Risks With AI agents handling sensitive business functions like customer management and HR, companies like Klarna must ensure that data governance, compliance, and security are up to the task. This shift toward Agentic AI requires robust mechanisms to manage customer and employee data, especially in industries with stringent regulatory requirements, like finance and healthcare. Marc Benioff, Salesforce’s CEO, raised these concerns directly, questioning how Klarna will handle compliance, governance, and institutional memory. AI might automate many processes, but without the proper safeguards, it could introduce new risks that legacy SaaS providers have long addressed. Enterprises looking to follow Klarna’s example will need to rethink how they manage these critical issues within their AI-driven frameworks.

AI Agents – SaaS Vendors Respond

As enterprises explore the potential of Agentic AI-driven systems, SaaS providers like Salesforce and Workday must adapt to a new reality. Klarna’s decision could be the first domino in a broader shift, forcing these companies to reconsider their own offerings and strategies. Here are three possible responses we could see from the SaaS giants:

  1. Doubling Down on AI Integration Salesforce and Workday are not standing still. In fact, both companies are already integrating AI into their platforms. Salesforce’s Einstein and the newly introduced Agentforce are examples of AI-powered tools designed to enhance customer interactions and automate tasks. We might see a rapid acceleration of these efforts, with SaaS providers emphasizing Agentic AI-driven features that keep businesses within their ecosystems rather than prompting them to build in-house solutions. However, as Benioff pointed out, the key might be blending AI with human oversight rather than replacing humans altogether. This hybrid approach will allow Salesforce and Workday to differentiate themselves from pure AI solutions by ensuring that critical human elements—like decision-making, customer empathy, and regulatory knowledge—are never lost.
  2. Building Modular and Lightweight Offerings Klarna’s move underscores the desire for flexibility and control over tech stacks. In response, SaaS companies may offer more modular, API-driven solutions that allow enterprises to mix and match components based on their needs. This would enable businesses to take advantage of best-in-class SaaS features without being locked into a monolithic platform. By offering modular systems, Salesforce and Workday could cater to enterprises looking to integrate AI while maintaining the core advantages of established SaaS infrastructure—such as compliance, security, and data management.
  3. Strengthening Data Governance and Compliance as Key Differentiators As AI grows in influence, data governance, compliance, and security will become critical battlegrounds for SaaS providers. SaaS companies like Salesforce and Workday have spent years building trusted systems that comply with various regulatory frameworks. Klarna’s AI approach will be closely scrutinized to ensure it meets these same standards, and any slip-ups could provide an opening for SaaS vendors to argue that their systems remain the gold standard for enterprise-grade compliance. By doubling down on their strengths in these areas, SaaS vendors could position themselves as the safer, more reliable option for enterprises that handle sensitive or regulated data. This approach could attract companies that are hesitant to take the AI plunge without fully understanding the risks.

What’s Next?

Klarna’s decision to replace SaaS platforms with a custom AI system may represent a significant shift in the enterprise software landscape. While this move highlights the growing potential of AI to reshape key business functions, it also raises important questions about governance, compliance, and the long-term role of SaaS providers. As organizations worldwide watch Klarna’s big bet play out, it’s clear that we are entering a new phase of enterprise software evolution—one where the balance between AI, human oversight, and SaaS will be critical to success.

What do you think? Is Klarna’s move a sign of things to come, or will it encounter challenges that reaffirm the importance of traditional SaaS systems? Lets continue the SaaS replacement conversation in the comments below!

]]>
https://blogs.perficient.com/2024/10/22/a-new-era-of-custom-ai-in-the-enterprise/feed/ 0 370801
Agentic AI: The New Frontier in GenAI https://blogs.perficient.com/2024/09/27/agentic-ai-the-new-frontier-in-genai/ https://blogs.perficient.com/2024/09/27/agentic-ai-the-new-frontier-in-genai/#comments Fri, 27 Sep 2024 20:47:17 +0000 https://blogs.perficient.com/?p=369907

In the rapidly evolving landscape of digital transformation, businesses are constantly seeking innovative ways to enhance their operations and gain a competitive edge. While Generative AI (GenAI) has been the hot topic since OpenAI introduced ChatGPT to the public in November 2022, a new evolution of the technology is emerging that promises to revolutionize how businesses operate: Agentic AI. 

What is Agentic AI? 

Agentic AI represents a fundamental shift in how we approach intelligence within digital systems.  

Unlike the first wave of Generative AI solutions that rely heavily on prompt engineering, agentic AI possesses the ability to make autonomous decisions based on predefined goals, adapting in real-time to changing environments. This enables a deeper level of interaction, as agents are able to “think” about the steps in a more structured and planned approach. With access to web search, outputs are more researched and comprehensive, transforming both efficiency and innovation potential for business. 

Key characteristics of Agentic AI include: 

  •   Autonomy: Ability to perform tasks independently based on predefined goals or dynamically changing circumstances. 
  •  Adaptability: Learns from interactions, outcomes, and feedback to make better decisions in the future. 
  • Proactivity: Not only responds to commands but can anticipate needs, automate tasks, and solve problems proactively. 

As technology evolves at an unprecedented rate, agentic AI is positioned to become the next big thing in tech and business transformation, building upon the foundation laid by generative AI while enhancing automation, resource utilization, scalability, and specialization across various tasks. 

Leveraging Agentic Frameworks 

Central to this transformation is the concept of the Augmented Enterprise, which leverages advanced technologies to amplify human capabilities and business processes. Agentic Frameworks provide a structured approach to integrating autonomous systems and artificial intelligence (AI) into the enterprise. 

Agentic Frameworks refer to the strategic models and methodologies that enable organizations to deploy and manage autonomous agents—software entities that perform tasks on behalf of users or other systems. Use cases include code development, content creation, and more.  

Unlike traditional approaches that require explicit programming for each sequence of tasks, Agentic Frameworks provide the business integrations to the model and allow it to decide what system calls are appropriate to achieve the business goal.  

“The integration of agentic AI through well-designed frameworks marks a pivotal moment in business evolution. It’s not just about automating tasks; it’s about creating intelligent systems that can reason, learn, and adapt alongside human workers, driving innovation and efficiency to new heights.” – Robert Bagley, Director 

Governance and Ethical Considerations 

As we embrace the potential of agentic AI and our AI solutions begin acting on our behalf, developing robust AI strategy and governance frameworks becomes more essential. With the increasing complexity of regulatory environments, Agentic Frameworks must include mechanisms for auditability, compliance, and security, ensuring that the deployment of autonomous agents aligns with legal and ethical standards. 

“In the new agentic era, the scope of AI governance and building trust should expand from ethical compliance to include procedural compliance. As these systems become more autonomous, they must both operate within ethical boundaries and align with our organizational values. This is where thoughtful governance becomes a competitive advantage.” – Robert Bagley, Director 

To explore how your enterprise can benefit from Agentic Frameworks, implement appropriate governance programs, and become a truly Augmented Enterprise, reach out to Perficient’s team of experts today. Together, we can shape the future of your business in the age of agentic AI. 

 

]]>
https://blogs.perficient.com/2024/09/27/agentic-ai-the-new-frontier-in-genai/feed/ 2 369907
Maximize Your Data Management with Unity Catalog https://blogs.perficient.com/2024/08/23/unity-catalog-migration-tools-benefits/ https://blogs.perficient.com/2024/08/23/unity-catalog-migration-tools-benefits/#respond Fri, 23 Aug 2024 19:50:17 +0000 https://blogs.perficient.com/?p=368029

Databricks Unity Catalog is a unified and open governance solution for data and AI, built into the Databricks Data Intelligence Platform.

UnitycatalogUnity Catalog offers a comprehensive solution for enhancing data governance, operational efficiency, and technological performance. By centralizing metadata management, access controls, and data lineage tracking, it simplifies compliance, reduces complexity, and improves query performance across diverse data environments. The seamless integration with Delta Lake unlocks advanced technical features like predictive optimization, leading to faster data access and cost savings. Unity Catalog plays a crucial role in machine learning and AI by providing centralized data governance and secure access to consistent, high-quality datasets, enabling data scientists to efficiently manage and access the data they need while ensuring compliance and data integrity throughout the model development lifecycle.

Unity Catalog brings governance to data across your enterprise. Lakehouse Federation capabilities in Unity Catalog allow you to discover, query, and govern data across data platforms including MySQL, PostgreSQL, Amazon Redshift, Snowflake, Azure SQL Database, Azure Synapse, Google’s BigQuery, and more from within Databricks without moving or copying the data, all within a simplified and unified experience. Unity Catalog supports advanced data-sharing capabilities with Delta Sharing, enabling secure, real-time data sharing across organizations and platforms without the need for data duplication. Additionally, Unity Catalog facilitates the creation of secure data Clean Rooms, where multiple parties can collaborate on shared datasets without compromising data privacy. Its support for multi-cloud and multi-region deployments ensures operational flexibility and reduced latency, while robust security features, including fine-grained access controls, automated compliance auditing, and encryption, help future-proof your data infrastructure.

These capabilities position your organization for scalable, secure, and efficient data management, driving innovation and maintaining a competitive edge. However, this fundamental transition will need to be implemented with minimal disruption to ongoing operations. This is where the Unity Catalog Migration Tool comes into play.

Unity Catalog Migration Tool

UCX, or the Unity Catalog Migration Tool, is an open source project from Databricks Labs  designed to streamline and automate the Unity Catalog migration process. UCX automates much of the work involved in transitioning to Unity Catalog, including migrating metadata, access controls, and governance policies. Migrating metadata ensures the enterprise will have access to data and AI assets after the transition. In additional to data, the migration tool ensures that security policies and access controls are accurately transferred and enforced in the Unity Catalog. This capability is critical for maintaining data security and compliance during and after migration

Databricks is continually developing UCX to better ensure that all your data assets, governance policies, and security controls are seamlessly transferred to Unity Catalog with minimal disruption to ongoing operations. Tooling and automation helps avoid costly downtime or interruptions in data access that could impact business performance, thereby maintaining continuity and productivity. While it is true that automating these processes significantly reduces the time, effort, and cost required for migration, the process is not automatic. There needs to be evaluation, planning, quality control, change management and additional coding and development tasks performed along with, and outside of, the tool. This knowledge and expertise is where Unity Catalog Migration Partners come into play.

Unity Catalog Migration Partner

An experienced Unity Catalog migration partner leads the process of transitioning your data assets, governance policies, and security controls by planning, executing, and managing the migration process, ensuring that it is smooth, efficient, and aligned with your organization’s data governance and security requirements. Their duties typically include assessing the current data environment, designing a custom migration strategy, executing the migration while minimizing downtime and disruptions, and providing post-migration support to optimize Unity Catalog’s features. Additionally, they offer expertise in data governance best practices and technical guidance to enhance your organization’s data management capabilities.

Databricks provides its system integrators with tools, guidance and best practices to ensure a smooth transition to Unity Catalog. Perficient has built upon those valuable resources to enable a more effective pipeline with our Unity Catalog Migration Accelerator.

Unity Catalog Migration Accelerator

Our approach to Unity Catalog migration is differentiated by our proprietary Accelerator, which includes a suite of project management artifacts and comprehensive code and data quality checks. This Accelerator streamlines the migration process by providing a structured framework that ensures all aspects of the migration are meticulously planned, tracked, and executed, reducing the risk of errors and delays. The built-in code and data quality checks automatically identify and resolve potential issues before they become problems, ensuring a seamless transition with minimal impact on business operations. By leveraging our Accelerator, clients benefit from a more efficient migration process, higher data integrity, and enhanced overall data governance, setting us apart from other Unity Catalog migration partners who may not offer such tailored and robust solutions.

In summary, Unity Catalog provides a powerful solution for modernizing data governance, enhancing performance, and supporting advanced data operations like machine learning and AI. With our specialized Unity Catalog migration services and unique Accelerator, we offer a seamless transition that optimizes data management and security while ensuring data quality and operational efficiency. If you’re ready to unlock the full potential of Unity Catalog and take your data infrastructure to the next level, contact us today to learn how we can help you achieve a smooth and successful migration. Contact us for a complimentary Migration Analysis and let’s work together on your data and AI journey!

]]>
https://blogs.perficient.com/2024/08/23/unity-catalog-migration-tools-benefits/feed/ 0 368029
Risk Management Data Strategy – Insights from an Inquisitive Overseer https://blogs.perficient.com/2024/08/19/risk-management-data-strategy/ https://blogs.perficient.com/2024/08/19/risk-management-data-strategy/#respond Mon, 19 Aug 2024 14:47:52 +0000 https://blogs.perficient.com/?p=367560

We are witnessing a sea-change in the way data is managed by banks and financial institutions all over the world. Data being commoditized and, in some cases, even monetized by banks is the order of the day. Though this seems to be at a stage where some more push is required in terms of adoption in the risk management function. Traditional risk managers, by their job definition, are highly cautious of the result sets provided by the analytics teams. I have even heard the phrase “Please check the report, I don’t understand the models and hence trust the number”.

So, in the risk function, while this is a race for data aggregation, structured data, unstructured data, data quality, data granularity, news feeds, market overviews, its also a challenge from an acceptance perspective. The vision is that all of the data can be aggregated, harmonized and used for better, faster and more informed decision making for Financial and Non Financial Risk Management. The interdependencies between the risks were factors that were not considered in the “Good Old Days” of risk management (pun intended).

Based on my experience, here are the common issues that are faced by banks running a risk of not having a good risk data strategy.

1. The IT-Business tussle (“YOU don’t know what YOU are doing”)

This according to me is the biggest challenge facing traditional banks, especially in the risk function. “The Business”, in traditional banks, is treated like a larger-than-life entity that needs to be supported by IT. This notion of IT being the service provider, whilst business is the “bread-earner”, especially in the traditional banks’ risk departments; does not hold good anymore. It has been proven time and again that the two cannot function without each other and that’s what needs to be cultivated as a management mindset for strategic data management effort as well. This is a culture change, but it’s happening slowly and will have to be adapted industry-wide. It has been proven that the financial institutions with the most organized data have a significant market advantage.

2. Data Overload (“Dude! where’s my Insight”)

The primary goal of data management, sourcing and aggregation effort will have to be converting data into informational insights. The team analyzing the data warehouses, the data lakes and aiding the analytics will have to have this one major organizational goal in mind. Banks have silos, these silos have been created due to mergers, regulations, entities, risk types, chinese walls, data protection, land laws or sometimes just technological challenges over time. The solution to most this is to start with a clean slate. The management mandate for getting the right people to talk and be vested in this change is crucial, challenging but crucial. Good old analysis techniques and brain storming sessions for weeding out what is unnecessary and getting the right set of elements is the key. This needs an overhaul in the way the banking business has been traditionally looking at data i.e. something that is needed for reporting. Understanding of the data lineage and touchpoint systems is most crucial.

3. The CDO Dilemma (“To meta or not to meta”)

The CDO’s role in most banks is now well defined. The risk and compliance analytics and reporting division almost solely depends on the CDO function for insights on regulatory reporting and other forms of innovative data analytics. The key success factor of the CDO organization lies in allocation of the right set of analysts to the business areas. A CDO analyst on the market risk side, for instance, will have to be well versed with market data, bank hierarchies, VaR Calculation engines, Risk not in VaR (RNiV); supporting reference data in addition to the trade systems data that these data elements will have a direct or indirect impact on. Notwithstanding the critical data elements. An additional understanding of how this would impact other forms of risk reporting, like credit risk and non-financial risk is definitely a nice to have. Defining a meta-data strategy for the full lineage, its touch-points and transformations is a strenuous effort in analysis of systems owned by disparate teams with siloed implementation patterns over time. One fix that I saw working is that every significant application group / team can have a senior representative for the CDO interaction. Vested stakeholder interest is turning out to be the one major success factor in the programs that have been successful. This ascertains completeness of the critical data elements definition and hence aid data governance strategy in a wholesome way.

4. The ever-changing nature of financial risk management (“What did they change now?”)

The Basel Committee recommendations have been consistent in driving the urge to reinvent processes in the risk management area. With Fundamental Review of the Trading Book (FRTB) the focus has been very clearly realigned to data processes in organizations. Whilst the big banks already had demonstrated a sound understanding of modellable risk factors based on scenarios, this time the Basel committee has also asked banks to focus on Non-Modellable Risk factors (NMRF). Add the standard approach (sensitivities defined by regulator) and internal models approach (IMA – Bank defined enhanced sensitivities), the change from entity based risk calculations to desk based is a significant paradigm shift. Single golden-source definition for transaction data along with desk structure validation seems to be a major area of concern amongst banks.

Add climate risk to the mix with the Paris accord, the RWA calculations will now need additional data points, additional models and additional investment in external data defining the physical and transition risk associated. Data-lake / Big Data solutions with defined critical data elements and a full log of transformations with respect to lineage is a significant investment but will only work in favor of any more changes that come through on the regulations side. There have always been banks that have been great at this consistently and banks that lag significantly.

All and all, risk management happens to be a great use case for a greenfield CDO data strategy implementation, and these hurdles have to be handled before the ultimate Zen goal of a perfect risk data strategy. Believe me, the first step is to get the bank’s consolidated risk data strategy right and everything else will follow.

 

This is a 2021 article, also published here –  Risk Management Data Strategy – Insights from an Inquisitive Overseer | LinkedIn

]]>
https://blogs.perficient.com/2024/08/19/risk-management-data-strategy/feed/ 0 367560
Data Lake Governance with Tagging in Databricks Unity Catalog https://blogs.perficient.com/2024/02/29/data-lake-governance-with-tagging-in-databricks-unity-catalog/ https://blogs.perficient.com/2024/02/29/data-lake-governance-with-tagging-in-databricks-unity-catalog/#respond Thu, 29 Feb 2024 17:12:46 +0000 https://blogs.perficient.com/?p=357919

The goal of Databricks Unity Catalog is to provide centralized security and management to data and AI assets across the data lakehouse. Unity Catalog provides fine-grained access control for all the securable objects in the lakehouse; databases, tables, files and even models. Gone are the limitations of the Hive metadata store. The Unity Catalog metastore manages all data and AI assets across different workspaces and storage locations. Providing this level of access control substantially increases the quality of governance while reducing the workload involved. There is an additional target of opportunity with tagging.

Tagging Overview

Tags are metadata elements structured as key-value pairs that can be attached to any asset in the lakehouse. Tagging can make these assets more seachable, manageable and governable. A well-structured, well-executed tagging strategy can enhance data classification, enable regulatory compliance and streamline data lifecycle management. The first step is to identify a use case that could be used as a Proof of Value in your organization. A well-structured tagging strategy means that you will need buy-in and participation from multiple stakeholders, include technical resources, SMEs and a sponsor. These are five common use cases for tagging that might find some traction in a regulated enterprise because they can usually be piggy-backed off an existing or upcoming initiative:

  • Data Classification and Security
  • Data Lifecycle Management
  • Compliance and Regulation
  • Project Management and Collaboration

Data Classification and Security

There is always room for an additional mechanism to help safely manage PII (personally identifiable information). A basic initial implementation of tagging could be as simple as applying a PII tag to classify data based on sensitivity. These tags can then be integrated with access control policies in Unity Catalog to automatically grant or restrict access to sensitive data. Balancing the promise of data access in the lakehouse with the regulatory realities surrounding sensitive data is always difficult. Additional tools are always welcome here.

Data Lifecycle Management

Some organizations struggle with the concept of managing different environments in Databricks. This is particularly true when they are moving from a data landscape where there were specific servers for each environment. Tags can be used to identify stages (ex: dev, test, and prod). These tags can then be leveraged to implement policies and practices around moving data through different lifecycle stages. For example, masking policies or transformation steps may be different between environments. Tags can also be used to facilitate rules around deliberate destruction of sensitive data. Geo-coding data with tags to comply with European regulations is also a possible target of opportunity.

Data Cataloging and Discovery

There can be a benefit in attaching descriptive tags directly to the data for cataloging and discovery even if you are already using an external tool. Adding descriptive tags like ‘customer’ or ‘marketing’ directly to the data assets themselves can make it more convenient for analysts and data scientist to perform searches and therefore more likely to be actually used.

Compliance and Regulation

This is related to, and can be used in conjunction with, data classification and security. Applying tags such as ‘GDPR’ or ‘HIPAA’ can make performing audits for regulators much simpler. These tags can be used in conjunction with security tags. In an increasing regulated data environment, it pays to make your data assets easy to regulate.

Project Management and Collaboration

This tagging strategy can be used to organize data assets based on project, teams or departments. This can facilitate project management and improve collaboration by identifying which organizational unit owns or is working with a particular data asset.

Implementation

There are some practical considerations when implementing a tagging program:

  • each securable object has a limit of twenty tags
  • the maximum length of a tag is 255 characters, with no special characters allowed
  • you can only search by using exact match (pattern-matching would have really been nice here)

A well-executed tagging strategy will involve some level of automation. It is possible to manage tags in the Catalog Explorer. This can be a good way to kick the tires in the very beginning but automation is critical for a consistent, comprehensive application of the tagging strategy. Good governance is automated. While tagging is available to all securable objects, you will likely start out applying tags to tables.

The information schema tables will have the tag information. However, Databricks Runtime 13.3 and above allows tag management through SQL commands. This is the preferred mechanism because it is so much easier to use than querying the information schema. Regardless of the mechanism used, a user must have the APPLY TAG privilege on the object, the USE SCHEMA privilege on the object’s parent schema and the USE CATALOG privilege on the object’s parent catalog. This is pretty typical with Unity Catalog’s three-tiered hierarchy. If you are using SQL commands to manage tags, you can use the SET TAGS and UNSET TAGS clauses in the ALTER TABLE command.

You can use a fairly straightforward PySpark script to loop through a set of tables, look for a certain set of column names and then apply tags as appropriate. This can be done as an initial one-time run and then automated by creating a distinct job to check for new tables and/or columns or include in existing ingestion processes. There is a lot to be gained by augmenting this pipeline from just using a script that checks for columns named ‘ssn’ to creating an ML job that looks for fields that contain social security numbers.

Conclusion

I’ve seen a lot of companies struggle with populating their Databricks Lakehouse with sensitive data. In their current state, databases had a very limited set of users, so only people that were authorized to see certain data, like PII, had access to the database that stored this information. However, the utility of a lakehouse is dramatically reduced if you don’t allow sensitive data. In most cases, it just won’t get any enterprise traction. Leveraging all of the governance and security feature of Unity Catalog is a great, if not mandatory, first step. Enhancing governance and security, as well as utility, with tagging is probably going to be necessary to one degree or another in your organization to get broad usage and acceptance.

Contact us to learn more about how to build robustly governed solutions in Databricks for your organization.

]]>
https://blogs.perficient.com/2024/02/29/data-lake-governance-with-tagging-in-databricks-unity-catalog/feed/ 0 357919
Let’s Meet at Informatica World 2023 #InformaticaWorld https://blogs.perficient.com/2023/04/28/lets-meet-at-informatica-world-2023-informaticaworld/ https://blogs.perficient.com/2023/04/28/lets-meet-at-informatica-world-2023-informaticaworld/#respond Fri, 28 Apr 2023 21:55:24 +0000 https://blogs.perficient.com/?p=334106

Informatica World takes place May 8-11 at the Venetian Resort Las Vegas and we can’t wait to meet you there! Perficient is a proud sponsor of Informatica’s largest event, which brings together customers and partners from across the globe.

Perficient is a global digital consultancy, an Informatica Platinum Enterprise Partner, and the 2022 Cloud Modernization Channel Partner of the Year. We’ve delivered strategy and implementation for on-premises, cloud, and hybrid solutions. Our investment and commitment to our Informatica partnership is extensive, with hundreds of dedicated consultants and more than 80 partner certifications.

Perficient’s Informatica experts will be at the show, will you? Visit us at booth #B12 to meet with subject matter experts and thought leaders and learn how we’ve leveraged our extensive expertise in data integration, data governance, and master data management to deliver transformational initiatives for our customers.

If you haven’t yet made the decision to attend Informatica World, we invite you to join us and 2,000+ Informatica enthusiasts in person in Las Vegas. Don’t miss this opportunity to collaborate, learn, and be inspired! The agenda this year will not disappoint. PowerCenter to Cloud Modernization Summit is a can’t miss session for customers wanting to hear from other customers that have made the leap and haven’t looked back. Learn best practices on how to modernize workloads from PowerCenter to cloud.

Make sure to connect with us at Informatica World 2023. Our experts will be at booth #B12 discussing how we can help clients maximize your investment in Informatica tools.

]]>
https://blogs.perficient.com/2023/04/28/lets-meet-at-informatica-world-2023-informaticaworld/feed/ 0 334106
Transform Your Business with Amazon DataZone https://blogs.perficient.com/2023/02/13/transform-your-business-with-amazon-datazone/ https://blogs.perficient.com/2023/02/13/transform-your-business-with-amazon-datazone/#respond Mon, 13 Feb 2023 15:35:48 +0000 https://blogs.perficient.com/?p=327445

Amazon recently released a new data tool called DataZone, which allows companies to share, search, and discover data at scale across organizational boundaries. It offers many features such as the ability to search for published data, request access, collaborate with teams through data assets, manage and monitor data assets across projects, access analytics with a personalized view for data assets through a web-based application or API, and manage and govern data access in accordance with your organization’s security regulations from a single place.

DataZone may be helpful for IT leaders because it enables them to empower their business users to make data-driven decisions and easily access data both within and outside their organization. With DataZone, users can search for and access data they need quickly and easily while also ensuring the necessary governance and access control. Additionally, DataZone makes it easier to discover, prepare, transform, analyze, and visualize data with its web-based application.

Implementation of DataZone can vary depending on the organization and its existing governance policies. If your data governance is already in place, implementation of DataZone may take only a few months. However, if governance needs to be established and implemented, it will take much longer and require significant organizational changes.

While it may seem obvious, DataZone is not a magic solution to all your data problems. Simply having a tool is not enough. Deciding to move forward with any data marketplace solution requires a shared responsibility model and governance across multiple channels and teams. We’ve Seen many companies fail to adopt the full use of data marketplaces due to lack of adoption by the business.2022 Guide Cover Image Vp Of It To Transforming Your Business 1400x788

Ultimately, DataZone can be an invaluable tool for IT leaders looking to empower their business to access data quickly and easily within and outside their organization while adhering to necessary governance and access control policies. With the help of the automated data harvesters, stewards, and AI, DataZone makes data not just accessible but also available, allowing businesses to make use of it when making decisions.

With our “VP of IT’s Guide to Transforming Your Business,” IT leaders can gain the insights they need to successfully implement the latest data-driven solutions, such as DataZone. Download it for free today to get the answers you need to unlock the full potential of your data investments and drive your business forward with data-driven decisions.

]]>
https://blogs.perficient.com/2023/02/13/transform-your-business-with-amazon-datazone/feed/ 0 327445
Blockchain: The Secret Sauce to Supply Chain Visibility  https://blogs.perficient.com/2023/02/07/blockchain-the-secret-sauce-to-supply-chain-visibility/ https://blogs.perficient.com/2023/02/07/blockchain-the-secret-sauce-to-supply-chain-visibility/#respond Tue, 07 Feb 2023 20:44:10 +0000 https://blogs.perficient.com/?p=327242

In “Behind the Golden Arches” by John Love, the author shares an anecdote of how in the early days of McDonald’s, the company was having cashflow problems despite its being a significant source of revenue for many farmers. In a meeting with a particularly large farmer who had made significant money from selling tomatoes, lettuce, and especially potatoes to McDonald’s, Ray Kroc asked for a cash infusion, and without hesitation, the farmer wrote him a $25 thousand-dollar check on the spot (a very significant sum of money for the time). Ray Kroc realized the suppliers of McDonald’s were making more money than McDonald’s itself and immediately implemented an open-books process in which suppliers would be obligated to share their financial statements with McDonald’s. This gave McDonald’s visibility into how profitable each member of its supply chain was, and allowed negotiation of prices both paid and charged, ensuring the entire supply chain stayed in balance. Of course, while we may imagine not all suppliers were too thrilled with sharing their financial statements with McDonalds, the profit and prestige of joining and maintaining the relationship with the country’s largest supply chain proved sufficient to overcome their sense of financial privacy. 

The visibility McDonalds established more than a half-century ago was driven by cashflow challenges, but even without that impetus, executives of any company in a supply chain need visibility into it. While distributed ledger technology including blockchains did not exist when Ray Kroc was meeting with his farmers/suppliers, they do exist now and allow all members of a supply chain to see current inventory and materials in transit. A distributed ledger’s ability to track different goods and/or products in transit, giving a clear view of the inventory and activity, enables members of the supply chain to: 

  1. Improve visibility across all supply chain activities with proactive status updates,  
  2. increase transparency and cost controls through management of inventory in motion, 
  3. use smart contracts that automatically trigger when pre-defined business conditions are met,  
  4. all while limiting disruptions and risk mitigation. 

Not BLT – DLT Examples 

Distributed ledger technologies, including blockchains, when paired with tracking devices, enable pizza chains and windshield replacement companies to let their retail customers know exactly where their pizza/windshield is. Companies advertise this service as a distinguishing feature of why customers should choose to use them. This use of technology improves customer service by means of transparency while also allowing companies to monitor the efficiency of their team members. Clearly a win-win – all via data (the location of the driver) transparency. 

While a customer receiving a pizza made cold by a lagging delivery driver may be an insignificant issue to the average person, imagine you’re a construction manager paying union wages for a full crew while you await the arrival of cement trucks.  You need the cement in a timely fashion because every minute is money. Now imagine you need the cement to set while the temperature remains above a certain level or before the approaching storm clouds burst. Think the transparency of data would ease that construction manager’s day? Knowing ahead of time that the cement trucks would not be coming because the first firm messed up the delivery date would allow a procurement manager to arrange for cement to be delivered from alternate providers. This would allow the building to avoid delays, use its union-cost labor efficiently and remain on track and on budget. A win-win – if only the construction manager that you are had transparency into the delivery schedule and truck location of your cement providers. 

Conclusion 

In this blog, we covered why visibility to the supply chain is so important and how improved visibility of data can help improve customer service, reduce inventory costs, provide proactive status updates, and even help mitigate risk.  In the next blog, we will suggest a process to help you and your firm achieve data transparency in the supply chain.  If you would like to have Perficient discuss your supply chain issues and how data transparency might help your firm, reach out to us here.

About the Authors: 

Carl AridasCarl Aridas has been a member of Perficient’s Digital Assets Team since 2021.  Supporting mostly large financial services firms, he is certified in the Scaled Agile Framework (SAFe), is a Scrum Master, and is a Six Sigma Green Belt project manager.   

Lin ELin Eshleman has been a member of Perficient’s Supply Chain team since 2021.  Supporting large, cross-industry Supply Chain firms, he has a PMP certification and is a Six Sigma Green Belt project manager. 

]]>
https://blogs.perficient.com/2023/02/07/blockchain-the-secret-sauce-to-supply-chain-visibility/feed/ 0 327242
5 Essential Steps to Ensuring Data Regulation Within Your Financial Services Institution https://blogs.perficient.com/2023/01/18/5-essential-steps-to-ensuring-data-regulation-within-your-financial-services-institution-2/ https://blogs.perficient.com/2023/01/18/5-essential-steps-to-ensuring-data-regulation-within-your-financial-services-institution-2/#respond Wed, 18 Jan 2023 21:41:33 +0000 https://blogs.perficient.com/?p=325866

The amount of data being created, captured, copied, and consumed has increased exponentially with society’s tectonic shift to digital reliance. As a result, the pace of data privacy and data regulation has accelerated on a global scale. Ensuring the security of your proprietary and customers’ data is paramount to staying in line with ethical and regulatory standards and retaining customer trust.  

When strategizing how to best understand your data and determine if you have the appropriate data controls in place, you should consider the following steps:  

1. Classify your data. Each category has a different sensitivity and will require different security controls. 

  • Public Information: There is no specific restriction required for this type of data, and there is no negative repercussion if data is shared. i.e., Information shared on a company website. 
  • Private Information: Information that is only for internal use but there are no severe consequences if data is leaked. i.e., employee salaries.   
  • Sensitive Data: Regulated. Data leaks might result in high business impact and financial loss. i.e., Customer credit card information. 
  • Highly Sensitive Data: Subject to high regulation. Should only be available to authorized individuals. Data leaks could result in losing permission to continue operations.

2. Identify your sensitive and high-risk data. Sensitive and high-risk data include: 

  • Personally identifiable information (PII): Name, address, SSN# 
  • Protected health information (PHI): Patient records, health insurance details, and medical records. 
  • Sensitive personal information (SBI): Religion, sexual orientation, criminal convictions, racial or ethnic origin. 
  • Non-public or financial information: Company Strategic plans, contract information, tax records, employee salary. 
  • Intellectual Property: Patents, trademarks, trade secrets, licensing, copyrights. 

3. Determine your type of data. Each type of data has different levels of difficulty to manage data:.  

  • Structured data: Easy to access, search, identify and protect. i.e., Data stored in a database. 
  • Unstructured data: Not organized and not in a predefined format. i.e., Microsoft Office or Adobe PDF documents stored in a shared drive or computer folder. 

4. Understand your data.  

  • How is this data being captured? 
  • Where is this data being stored? 
  • What is your true source of data? What are the critical data elements? 
  • How is this data being shared? i.e., via reports, messaging, etc. 
  • What is the quality of this data? 
  • How is this data archived, removed, and destroyed? 

5. Review risk and controls. 

  • What is the purpose of collecting and processing this data? 
  • Is this data subject to local or global regulations? i.e., GDPR, CCPA, Irish DPA, Schrems II, etc.  
  • Do I have consent to store and share this data?  
  • Are there entitlements and security controls in place for sensitive data? 
  • What are the actual threats and risks for this data? Is this data secured from external threats? 
  • What are the current processes for data monitoring and incidence response? 
  • Are there specific regulatory requirements for this data’s archive period, and how must it be removed and destroyed?

***

Perficient’s financial services and data solutions teams have extensive experience building and supporting complex data governance and data lineage programs that ensure regulatory compliance (e.g., BCBS 239, CCAR, MiFID II, GDPR, CCPA, FRTB) and enable data democratization. In addition to understanding how to navigate financial institutions with many complex systems, we have experience with various platforms and tools in the ecosystem, including ASG, Collibra, and Informatica Enterprise Data Catalog (EDC).

Whether you need help with business and IT requirements, data acquisition/sourcing, data scanning, data linking and stitching, UAT and sign-off, or data analysis – we can help.

Reach out today to learn more about our experience and how we support your efforts.

 

 

]]>
https://blogs.perficient.com/2023/01/18/5-essential-steps-to-ensuring-data-regulation-within-your-financial-services-institution-2/feed/ 0 325866
4 Ways Financial Institutions Can Help Their Customers Navigate Inflation https://blogs.perficient.com/2022/10/26/4-ways-financial-institutions-can-help-their-customers-navigate-inflation/ https://blogs.perficient.com/2022/10/26/4-ways-financial-institutions-can-help-their-customers-navigate-inflation/#respond Wed, 26 Oct 2022 14:12:08 +0000 https://blogs.perficient.com/?p=320856

Inflation reached a forty-one-year high in June, and according to the Consumer Price Index, prices have remained elevated. Many are struggling to make their dollars stretch and are looking to their financial institutions for guidance on how to better manage spending and stay afloat financially.

Financial firms cannot singlehandedly control inflation, of course, but they can position themselves as knowledgeable partners that customers can depend on to be in their corner.

Here are four ways financial institutions can better support struggling customers during periods of high inflation:

1. Harness customers’ financial transaction data to tailor personalized solutions.

Companies should invest in data governance, data analysis, and data visualization. More accessible and digestible data gives organizations the information required to build the kind of intuitive, tailored products that customers want and need. It also supplies firms with deeper insight into their customer base, enabling them to suggest and sell the right products at the right time.

For example, a customer struggling to pay off a credit card bill because of inflated prices may benefit from transferring their balance to a new credit card that is zero interest for the first twelve months. Access to data gives companies the understanding they need to match such a product to an appropriate consumer. They can use data to make decisions on the types of products they should build, as well as to figure out whom to market certain products towards.

2. Invest in marketing.

Many financial institutions offer budgeting tools and apps that can help customers better manage their spending when money is tight, but adoption lags because clients aren’t made aware of them. To get the most out of these investments, companies must prioritize publicizing their tools and products in an appealing and clear way. Investing in marketing does not only mean investing in the creation and sharing of content and collateral, however. To fully reap the benefits of marketing, employee outreach and training must also be prioritized, so employees are prepared and encouraged to teach all types of customers how to use these tools and products in a way that suits their unique needs.

3. Eliminate overdraft fees.

In a study of 5,000 banking customers worldwide conducted by Censuswide, 29% of banking customers suggested that eliminating overdraft fees would help them make it through this high inflation period. Many large banks have done away with overdraft fees (thanks to the Overdraft Protection Act of 2021) or have at least begun giving longer grace periods for payments before charging fees. Getting rid of overdraft fees and implementing more lenient overdraft policies not only helps save customers money but contributes to higher customer loyalty and improved brand reputation.

4. Implement rewards and loyalty programs.

Banks can implement cash-back programs through their products, partner with businesses and venues to provide special discounts, and offer rewards upon new account creation and card openings. They can also offer incentives to customers when they practice healthy financial behavior, like paying bills on time and in full when they are able. Rewards-based programs are a fantastic area to start for banks striving for a more personalized, customer-centric approach.

***

Interested in discussing how you can better support your customers during this period of high inflation? Contact one of our financial services experts today to learn how to make the most out of your data and offer personalized solutions to your customers.

]]>
https://blogs.perficient.com/2022/10/26/4-ways-financial-institutions-can-help-their-customers-navigate-inflation/feed/ 0 320856
7 Considerations for the Data Acquisition Sourcing Stage of a Data Lineage Solution Implementation https://blogs.perficient.com/2022/07/26/7-considerations-for-the-data-acquisition-sourcing-stage-of-a-data-lineage-solution-implementation/ https://blogs.perficient.com/2022/07/26/7-considerations-for-the-data-acquisition-sourcing-stage-of-a-data-lineage-solution-implementation/#respond Tue, 26 Jul 2022 17:56:08 +0000 https://blogs.perficient.com/?p=314331

Data lineage is the process of understanding, recording, and visualizing metadata as it flows from the originating data source to consumption. Using a data lineage tool can expedite the processes, however, it is a common misconception that a data lineage tool can automatically generate data lineage between several applications with the push of a button.

Data lineage platforms offer significant benefits, but like any other artificial intelligence solution, they require human input to be successful. The technical capabilities of the data lineage tool are important, but the essentiality of human knowledge should not be dismissed.

There are four major steps in a typical data lineage effort: (1) data acquisition/sourcing, (2) data scanning, (3) data stitching, and (4) data lineage validation.

Here are seven questions to consider during the data acquisition/sourcing stage:

  1. Who are the right subject matter experts with the technical knowledge about the target application?
  2. What is the general purpose and background of the application?
  3. What data sources does the target application receive data from?
  4. What data sources does the target application send data to?
  5. Which data flow(s) should be prioritized? Prioritization can be decided based on which data flows contain the majority of the critical data elements or regulatory data, such as BASEL and CECL.
  6. What are the source code repositories that contain the programs and scripts involved in moving data from source to target applications?
  7. How is the data received or sent to applications from prioritized data flows? Methods include database, mainframe files, real-time messages, non-mainframe file transmission, and application programming interfaces (APIs).
  • Database
    • What is the hostname, port number, service name, and schema to request database access?
    • What are the schema and table names in which the data resides?
    • Are there any stored procedures involved?
  • Mainframe
    • What is the mainframe complex, and does the data lineage platform have the underlying technology to support this type of mainframe?
    • How is data getting loaded into the application?
    • What is the job name/procedure name, JCL, library names, COBOL programs, and copybooks for starting point, intermediary job, and final transmission?
  • Real-time Messages
    • What is the list of relevant data fields, corresponding tag numbers, and messaging queue (MQ) column names?
  • Non-mainframe File Transmission
    • What is the name of the file, and how is the file being created/loaded?
    • Is there a copy of the file layout or sample files you can provide?
    • Is there any source code or stored procedure that creates/loads the file (i.e., java program)?
  • API
    • What are the file name and source code?

Perficient’s financial services and data solutions teams have extensive experience building and supporting complex data governance and data lineage programs that ensure regulatory compliance (e.g., BCBS 239, CCAR, MiFID II, GDPR, CCPA, FRTB) and enable data democratization. In addition to understanding how to navigate financial institutions with many complex systems, we have experience with various platforms and tools in the ecosystem, including ASG, Collibra, and Informatica Enterprise Data Catalog (EDC).

Whether you need help with business and IT requirements, data acquisition/sourcing, data scanning, data linking and stitching, UAT and sign-off, or data analysis – we can help.

Reach out today to learn more about our experience and how we support your efforts.

]]>
https://blogs.perficient.com/2022/07/26/7-considerations-for-the-data-acquisition-sourcing-stage-of-a-data-lineage-solution-implementation/feed/ 0 314331