Delta Sharing Articles / Blogs / Perficient https://blogs.perficient.com/tag/delta-sharing/ Expert Digital Insights Tue, 25 Feb 2025 15:38:08 +0000 en-US hourly 1 https://blogs.perficient.com/files/favicon-194x194-1-150x150.png Delta Sharing Articles / Blogs / Perficient https://blogs.perficient.com/tag/delta-sharing/ 32 32 30508587 How to Access Oracle Fusion Cloud Apps Data from Databricks https://blogs.perficient.com/2025/02/24/databricks-for-oracle-fusion-cloud-applications/ https://blogs.perficient.com/2025/02/24/databricks-for-oracle-fusion-cloud-applications/#comments Mon, 24 Feb 2025 15:00:09 +0000 https://blogs.perficient.com/?p=377620

Connecting to Oracle Fusion Cloud Applications data from external non-Oracle systems, like Databricks, is not feasible for bulk data operations via a direct connection. However, there are several approaches to making Oracle apps data available for consumption from Databricks. What makes this task less straightforward is the fact that Oracle Fusion Cloud Applications and Databricks exist in separate clouds. Oracle Fusion apps (ERP, SCM, HCM, CX) are hosted on Oracle Cloud while Databricks leverages one of AWS, Azure or Google Cloud. Nevertheless, there are several approaches that I will present in this blog on how to access Oracle Apps data from Databricks.

While there are other means of performing this integration than what I present in this post, I will be focusing on:

  1. Methods that don’t require 3rd party tools: The focus here is on Oracle and Databricks technologies or Cloud services.
  2. Methods that scale to large number of objects and high data volumes: While there are additional means of Fusion data extraction such as using REST APIs, OTBI, or BI Publisher, these are not recommended methods for handling large bulk data extracts from Oracle Fusion and are therefore not part of this analysis. One or more of these techniques may still be applied though, when necessary, and may co-exist with the approaches discussed in this blog.

The following diagrams summarize four different approaches on how to replicate Oracle Fusion Apps data in Databricks. Each diagram highlights the data flow, and the technologies applied.

  • Approach A: Leverages Oracle Autonomous Data Warehouse and an Oracle GoldenGate Replication Deployment
  • Approach B: Leverages Oracle Autonomous Data Warehouse and the standard Delta Sharing protocol
  • Approach C: Leverages Oracle Autonomous Data Warehouse and a direct JDBC connection from Databricks.
  • Approach D: Leverages a Perficient accelerator solution using Databricks AutoLoader and DLT Pipelines. More information is available on this approach here.

Oracle Fusion data flow to Databricks with Oracle Autonomous DW and GoldGateOracle Fusion data flow to Databricks with Oracle Autonomous DW and Delta Sharing Oracle Fusion data flow to Databricks with Oracle Autonomous DW and JDBCOracle Fusion data flow to Databricks with Perficient Accelerator Solution

Choosing the right approach for your use case is dependent on the objective of performing this integration and the ecosystem of cloud platforms that are applicable to your organization. For guidance on this, you may reach Perficient by leaving a comment in the form below. Our Oracle and Databricks specialists will connect with you and provide recommendations.

]]>
https://blogs.perficient.com/2025/02/24/databricks-for-oracle-fusion-cloud-applications/feed/ 2 377620
SAP and Databricks: Better Together https://blogs.perficient.com/2025/02/13/sap-and-databricks-better-together-3-2/ https://blogs.perficient.com/2025/02/13/sap-and-databricks-better-together-3-2/#respond Thu, 13 Feb 2025 22:49:26 +0000 https://blogs.perficient.com/?p=377252

SAP Databricks is important because convenient access to governed data to support business initiatives is important. Breaking down silos has been a drumbeat of data professionals since Hadoop, but this SAP <-> Databricks initiative may help to solve one of the more intractable data engineering problems out there. SAP has a large, critical data footprint in many large enterprises. However, SAP has an opaque data model.  There was always a long painful process to do the glue work required to move the data while recognizing no real value was being realized in that intermediate process. This caused a lot of projects to be delayed, fail, or not pursued resulting in a pretty significant lost opportunity cost for the client and a potential loss of trust or confidence in the system integrator. SAP recognized this and partnered with a small handful of companies to enhance and enlarge the scope of their offering. Databricks was selected to deliver bi-directional integration with their Databricks Lakehouse platform. When I heard there was going to be a big announcement, I thought we were going to hear about a new Lakehouse Federation Connector. That would have been great; I’m a fan.

This was bigger.

Technical details are still emerging, so I’m going to try to focus on what I heard and what I think I know. I’m also going to hit on some use cases that we’ve worked on that I think could be directly impacted by this today. I think the most important takeaway for data engineers is that you can now combine SAP with your Lakehouse without pipelines. In both directions. With governance. This is big.

SAP Business Data Cloud

I don’t know much about SAP, so you can definitely learn more here. I want to understand more about the architecture from a Databricks perspective and I was able to find out some information from the Introducing SAP Databricks post on the internal Databricks blog page.

Introducing SAP Databricks This is when it really sunk in that we were not dealing with a new Lakeflow Connector;

SAP Databricks is a native component in the SAP Business Data Cloud and will be sold by SAP as part of their SAP Business Data Cloud offering. It’s not in the diagram here, but you can actually integrate new or existing Databricks instances with SAP Databricks. I don’t want to get ahead of myself, but I would definitely consider putting that other instance of Databricks on another hyperscaler. 🙂

In my mind, the magic is the dotted line from the blue “Curated context-rich SAP data products” up through the Databricks stack.

 

Open Source Sharing

The promise of SAP Databricks is the ability to easily combine SAP data with the rest of the enterprise data. In my mind, easily means no pipelines that touch SAP. The diagram we see with the integration point between SAP and Databricks SAP uses Delta Sharing as the underlying enablement technology.

Delta Sharing is an open-source protocol, developed by Databricks and the Linux Foundation, that provides strong governance and security for sharing data, analytics and AI across internal business units, cloud providers and applications. Data remains in its original location with Delta Sharing: you are sharing live data with no replication. Delta Share, in combination with Unity Catalog, allows a provider to grant access to one or more recipients and dictate what data can be seen by those shares using row and column-level security.

Open Source Governance

Databricks leverages Unity Catalog for security and governance across the platform including Delta Share. Unity Catalog offers strong authentication, asset-level access control and secure credential vending to provide a single, unified, open solution for protecting both (semi- & un-)structured data and AI assets. Unity Catalog offers a comprehensive solution for enhancing data governance, operational efficiency, and technological performance. By centralizing metadata management, access controls, and data lineage tracking, it simplifies compliance, reduces complexity, and improves query performance across diverse data environments. The seamless integration with Delta Lake unlocks advanced technical features like predictive optimization, leading to faster data access and cost savings. Unity Catalog plays a crucial role in machine learning and AI by providing centralized data governance and secure access to consistent, high-quality datasets, enabling data scientists to efficiently manage and access the data they need while ensuring compliance and data integrity throughout the model development lifecycle.

Data Warehousing

Databricks is now a first-class Data Warehouse with its Databricks SQL offering. The serverless SQL warehouses have been kind of a game changer for me because they spin up immediately and size elastically. Pro tip: now is a great time to come up with a tagging strategy. You’ll be able to easily connect your BI tool (Tableau, Power BI, etc) to the warehouse for reporting. There are also a lot of really useful AI/BI opportunities available natively now. If you remember in the introduction, I said that I would have been happy had this only been a Lakehouse Federation offering. You still have the ability to take advantage of Federation to discover, query and govern data from Snowflake, Redshift, Salesforce, Teradata and many others all from within a Databricks instance. I’m still wrapping my head around being able to query Salesforce and SAP Data in a notebook inside Databricks inside SAP.

Mosaic AI + Joule

As a data engineer, I was the most excited about zero-copy, bi-directional SAP data flow into Databricks. This is selfish because it solves my problems, but its relatively short-sighted. The integration between SAP and Databricks will likely deliver the most value through Agentic AI. Lets stipulate that I believe that chat is not the future of GenAI. This is not a bold statement; most people agree with me. Assistants like co-pilots represented a strong path forward. SAP thought so, hence Joule. It appears that SAP is leveraging the Databricks platform in general and MosaicAI in particular to provide a next generation of Joule which will be an AI copilot infused with agents.

Conclusion

The integration of SAP  and the Databricks Lakehouse represents a transformative approach to enterprise data management. By uniting the strengths of SAP’s end-to-end process management and semantically rich data with the advanced analytics and scalability of a lakehouse architecture, organizations can drive better decisions, foster innovation, and simplify their data landscapes. Whether it’s unifying SAP and non-SAP data, enabling real-time insights, or scaling AI initiatives, this partnership provides a roadmap for the future of data-driven enterprises.

Contact us to learn more about how SAP Databricks can help supercharge your enterprise.

 

]]>
https://blogs.perficient.com/2025/02/13/sap-and-databricks-better-together-3-2/feed/ 0 377252
SAP and Databricks: Better Together https://blogs.perficient.com/2024/11/17/sap-and-databricks-better-together-3/ https://blogs.perficient.com/2024/11/17/sap-and-databricks-better-together-3/#respond Sun, 17 Nov 2024 23:07:21 +0000 https://blogs.perficient.com/?p=372152

SAP Databricks is important because convenient access to governed data to support business initiatives is important. Breaking down silos has been a drumbeat of data professionals since Hadoop, but this SAP <-> Databricks initiative may help to solve one of the more intractable data engineering problems out there. SAP has a large, critical data footprint in many large enterprises. However, SAP has an opaque data model.  There was always a long painful process to do the glue work required to move the data while recognizing no real value was being realized in that intermediate process. This caused a lot of projects to be delayed, fail, or not pursued resulting in a pretty significant lost opportunity cost for the client and a potential loss of trust or confidence in the system integrator. SAP recognized this and partnered with a small handful of companies to enhance and enlarge the scope of their offering. Databricks was selected to deliver bi-directional integration with their Databricks Lakehouse platform. When I heard there was going to be a big announcement, I thought we were going to hear about a new Lakehouse Federation Connector. That would have been great; I’m a fan.

This was bigger.

Technical details are still emerging, so I’m going to try to focus on what I heard and what I think I know. I’m also going to hit on some use cases that we’ve worked on that I think could be directly impacted by this today. I think the most important takeaway for data engineers is that you can now combine SAP with your Lakehouse without pipelines. In both directions. With governance. This is big.

SAP Business Data Cloud

I don’t know much about SAP, so you can definitely learn more here. I want to understand more about the architecture from a Databricks perspective and I was able to find out some information from the Introducing SAP Databricks post on the internal Databricks blog page.

Introducing SAP Databricks This is when it really sunk in that we were not dealing with a new Lakeflow Connector;

SAP Databricks is a native component in the SAP Business Data Cloud and will be sold by SAP as part of their SAP Business Data Cloud offering. It’s not in the diagram here, but you can actually integrate new or existing Databricks instances with SAP Databricks. I don’t want to get ahead of myself, but I would definitely consider putting that other instance of Databricks on another hyperscaler. 🙂

In my mind, the magic is the dotted line from the blue “Curated context-rich SAP data products” up through the Databricks stack.

 

Open Source Sharing

The promise of SAP Databricks is the ability to easily combine SAP data with the rest of the enterprise data. In my mind, easily means no pipelines that touch SAP. The diagram we see with the integration point between SAP and Databricks SAP uses Delta Sharing is the underlying enablement technology.

Delta Sharing is an open-source protocol, developed by Databricks and the Linux Foundation, that provides strong governance and security for sharing data, analytics and AI across internal business units, clouds providers and applications. Data remains in its original location with Delta Sharing: you are sharing live data with no replication. Delta Share, in combination with Unity Catalog, allows a provider to grant access to one or more recipients and dictate what data can be seen by those shares using row and column-level security.

Open Source Governance

Databricks leverages Unity Catalog for security and governance across the platform including Delta Share. Unity Catalog offers strong authentication, asset-level access control and secure credential vending to provide a single, unified, open solution for protecting both (semi- & un-)structured data and AI assets. Unity Catalog offers a comprehensive solution for enhancing data governance, operational efficiency, and technological performance. By centralizing metadata management, access controls, and data lineage tracking, it simplifies compliance, reduces complexity, and improves query performance across diverse data environments. The seamless integration with Delta Lake unlocks advanced technical features like predictive optimization, leading to faster data access and cost savings. Unity Catalog plays a crucial role in machine learning and AI by providing centralized data governance and secure access to consistent, high-quality datasets, enabling data scientists to efficiently manage and access the data they need while ensuring compliance and data integrity throughout the model development lifecycle.

Data Warehousing

Databricks is now a first-class Data Warehouse with its Databricks SQL offering. The serverless SQL warehouses have been kind of a game changer for me because they spin up immediately and size elastically. Pro tip: now is a great time to come up with a tagging strategy. You’ll be able to easily connect your BI tool (Tableau, PowerBI, etc) to the warehouse for reporting. There are also a lot of really useful AI/BI opportunities available natively now. If you remember in the introduction, I said that I would have been happy had this only been a Lakehouse Federation offering. You still have the ability to take advantage of Federation to discover, query and govern data from Snowflake, Redshift, Salesforce, Teradata and many others all from within a Databricks instance. I’m still wrapping my head around being able to query Salesforce and SAP Data in a notebook inside Databricks inside SAP.

Mosaic AI + Joule

As a data engineer, I was the most excited about zero-copy, bi-directional SAP data flow into Databricks. This is selfish because it solves my problems, but its relatively short-sighted. The integration between SAP and Databricks will likely deliver the most value through Agentic AI. Lets stipulate that I believe that chat is not the future of GenAI. This is not a bold statement; most people agree with me. Assistants like co-pilots represented a strong path forward. SAP thought so, hence Joule. It appears that SAP is leveraging the Databricks platform in general and MosaicAI in particular to provide a next generation of Joule which will be an AI copilot infused with agents.

Conclusion

The integration of SAP  and the Databricks Lakehouse represents a transformative approach to enterprise data management. By uniting the strengths of SAP’s end-to-end process management and semantically rich data with the advanced analytics and scalability of a lakehouse architecture, organizations can drive better decisions, foster innovation, and simplify their data landscapes. Whether it’s unifying SAP and non-SAP data, enabling real-time insights, or scaling AI initiatives, this partnership provides a roadmap for the future of data-driven enterprises.

Contact us to learn more about how SAP Databricks can help supercharge your enterprise.

 

]]>
https://blogs.perficient.com/2024/11/17/sap-and-databricks-better-together-3/feed/ 0 372152
Maximize Your Data Management with Unity Catalog https://blogs.perficient.com/2024/08/23/unity-catalog-migration-tools-benefits/ https://blogs.perficient.com/2024/08/23/unity-catalog-migration-tools-benefits/#comments Fri, 23 Aug 2024 19:50:17 +0000 https://blogs.perficient.com/?p=368029

Databricks Unity Catalog is a unified and open governance solution for data and AI, built into the Databricks Data Intelligence Platform.

UnitycatalogUnity Catalog offers a comprehensive solution for enhancing data governance, operational efficiency, and technological performance. By centralizing metadata management, access controls, and data lineage tracking, it simplifies compliance, reduces complexity, and improves query performance across diverse data environments. The seamless integration with Delta Lake unlocks advanced technical features like predictive optimization, leading to faster data access and cost savings. Unity Catalog plays a crucial role in machine learning and AI by providing centralized data governance and secure access to consistent, high-quality datasets, enabling data scientists to efficiently manage and access the data they need while ensuring compliance and data integrity throughout the model development lifecycle.

Unity Catalog brings governance to data across your enterprise. Lakehouse Federation capabilities in Unity Catalog allow you to discover, query, and govern data across data platforms including MySQL, PostgreSQL, Amazon Redshift, Snowflake, Azure SQL Database, Azure Synapse, Google’s BigQuery, and more from within Databricks without moving or copying the data, all within a simplified and unified experience. Unity Catalog supports advanced data-sharing capabilities with Delta Sharing, enabling secure, real-time data sharing across organizations and platforms without the need for data duplication. Additionally, Unity Catalog facilitates the creation of secure data Clean Rooms, where multiple parties can collaborate on shared datasets without compromising data privacy. Its support for multi-cloud and multi-region deployments ensures operational flexibility and reduced latency, while robust security features, including fine-grained access controls, automated compliance auditing, and encryption, help future-proof your data infrastructure.

These capabilities position your organization for scalable, secure, and efficient data management, driving innovation and maintaining a competitive edge. However, this fundamental transition will need to be implemented with minimal disruption to ongoing operations. This is where the Unity Catalog Migration Tool comes into play.

Unity Catalog Migration Tool

UCX, or the Unity Catalog Migration Tool, is an open source project from Databricks Labs  designed to streamline and automate the Unity Catalog migration process. UCX automates much of the work involved in transitioning to Unity Catalog, including migrating metadata, access controls, and governance policies. Migrating metadata ensures the enterprise will have access to data and AI assets after the transition. In additional to data, the migration tool ensures that security policies and access controls are accurately transferred and enforced in the Unity Catalog. This capability is critical for maintaining data security and compliance during and after migration

Databricks is continually developing UCX to better ensure that all your data assets, governance policies, and security controls are seamlessly transferred to Unity Catalog with minimal disruption to ongoing operations. Tooling and automation helps avoid costly downtime or interruptions in data access that could impact business performance, thereby maintaining continuity and productivity. While it is true that automating these processes significantly reduces the time, effort, and cost required for migration, the process is not automatic. There needs to be evaluation, planning, quality control, change management and additional coding and development tasks performed along with, and outside of, the tool. This knowledge and expertise is where Unity Catalog Migration Partners come into play.

Unity Catalog Migration Partner

An experienced Unity Catalog migration partner leads the process of transitioning your data assets, governance policies, and security controls by planning, executing, and managing the migration process, ensuring that it is smooth, efficient, and aligned with your organization’s data governance and security requirements. Their duties typically include assessing the current data environment, designing a custom migration strategy, executing the migration while minimizing downtime and disruptions, and providing post-migration support to optimize Unity Catalog’s features. Additionally, they offer expertise in data governance best practices and technical guidance to enhance your organization’s data management capabilities.

Databricks provides its system integrators with tools, guidance and best practices to ensure a smooth transition to Unity Catalog. Perficient has built upon those valuable resources to enable a more effective pipeline with our Unity Catalog Migration Accelerator.

Unity Catalog Migration Accelerator

Our approach to Unity Catalog migration is differentiated by our proprietary Accelerator, which includes a suite of project management artifacts and comprehensive code and data quality checks. This Accelerator streamlines the migration process by providing a structured framework that ensures all aspects of the migration are meticulously planned, tracked, and executed, reducing the risk of errors and delays. The built-in code and data quality checks automatically identify and resolve potential issues before they become problems, ensuring a seamless transition with minimal impact on business operations. By leveraging our Accelerator, clients benefit from a more efficient migration process, higher data integrity, and enhanced overall data governance, setting us apart from other Unity Catalog migration partners who may not offer such tailored and robust solutions.

In summary, Unity Catalog provides a powerful solution for modernizing data governance, enhancing performance, and supporting advanced data operations like machine learning and AI. With our specialized Unity Catalog migration services and unique Accelerator, we offer a seamless transition that optimizes data management and security while ensuring data quality and operational efficiency. If you’re ready to unlock the full potential of Unity Catalog and take your data infrastructure to the next level, contact us today to learn how we can help you achieve a smooth and successful migration. Contact us for a complimentary Migration Analysis and let’s work together on your data and AI journey!

]]>
https://blogs.perficient.com/2024/08/23/unity-catalog-migration-tools-benefits/feed/ 1 368029
The Technical Power of Unity Catalog – Beyond Governance https://blogs.perficient.com/2024/08/23/unity-catalog-technical-advantages/ https://blogs.perficient.com/2024/08/23/unity-catalog-technical-advantages/#respond Fri, 23 Aug 2024 14:36:55 +0000 https://blogs.perficient.com/?p=367896

If you use Databricks, you probably know that

Databricks Unity Catalog is the industry’s only unified and open governance solution for data and AI, built into the Databricks Data Intelligence Platform. With Unity Catalog, organizations can seamlessly govern both structured and unstructured data in any format, as well as machine learning models, notebooks, dashboards and files across any cloud or platform.

Unity Catalog delivers centralized data governance, fine-grained access control, streamlines data discovery and data lineage and automates auditing and compliance. Lakehouse governance is a critical element of an enterprise data strategy and Unity Catalog is a must-have for any enterprise that is using Databricks. This is such a clear and important message that sometimes some other features enabled by Unity Catalog get overlooked. Lets take some time and see how Unity Catalog can save you money.

Liquid Clustering

Liquid clustering provides significant performance advantages to large-scaled data analytic environments by reorganising data dynamically to optimize storage and query performance by continually analysing query patterns and data access frequency. Data that is access together is stored together, reducing the amount of data scanned. Data skipping, which involves ignoring irrelevant data, has significant performance and cost advantages. IT costs are also reduced since re-clustering had previously been a manual maintenance task. Manual partitioning and Z-ordering is also difficult, so it took up valuable time to probably not do the job very well. Liquid Clustering only requires Delta tables to be enabled. However, Unity Catalog reduces the size thresholds for clustering to be triggered. Unity Catalog also provides for predictive optimization, which will automatically trigger incremental clustering for UC managed tables rather than having to perform the OPTIMIZE task manually. The metadata gathered by Unity Catalog powers a more efficient layout in Liquid Clustering as well as improving overall query performance.

Enhanced Query Performance

Unity Catalog is well-known as a centralized metastore location in the context of governance. However, maintaining catalog-wide statistics can also improve query performance. Unity Catalog can provide more efficient execution plans since it maintains comprehensive metadata and statistics. Additionally, Unity Catalog’s metadata capabilities can assist in effectively managing partitions and indexes. A more efficient execution plan, better partitions and more effective indexes result in faster query times. Faster query times result in lower costs and an improved user experience.

Complex Workload Support

Data workloads scale in size and complexity in step with the growth and adoption of your lakehouse. Typically, this would result in increased maintenance and administrative workloads. However, Unity Catalog supports organising and isolating workloads within different schemas and catalogs. This isolation optimizes resource allocation and minimizes contention between different teams. This isolation and optimization minimizes the need for scaling additional resources beyond what would would be needed for the new workloads.

Multi-Region Data Management

One of Unity Catalog’s strongest governance capabilities is the seamless enablement of multi-cloud. A common use case for multi-cloud is business continuity and disaster recovery. Along with multi-cloud capability, you also get multi-region data management. Optimizing data placement and access patterns based on regional considerations can be reduce latency and improve overall performance.

Conclusion

Unity Catalog is an integral part of Databricks. If you have not migrated from the Hive Metastore yet, plan to do so. The security and governance features alone are mandatory for most companies and certainly for all regulated companies. However, the motivations don’t need to be all stick and no carrot. Every business that uses Databricks will likely see a cost benefit from liquid clustering and enhanced query performance. As these saving come for free; no administration is necessary. Companies with larger footprints may also see cost benefits from workload isolation. Finally, some companies may see a benefit in multi-region data management where such a strategy wasn’t even considered before because of the perceived complexity.

Get in touch with us for a free assessment and find out how internal expertise and custom accelerators can help you migrate to Unity Catalog.

]]>
https://blogs.perficient.com/2024/08/23/unity-catalog-technical-advantages/feed/ 0 367896