I don’t like testing Databricks notebooks and that’s a problem. I like Databricks. I like Test Driven Development. Not in an evangelical; 100% code coverage or fail kind of way. I just find that a reasonable amount of code coverage gives me a reasonable amount of confidence. Databricks has documentation for unit testing. I tried […]
Data + Intelligence
LinkedIn open sources a control plane for lake houses
LinkedIn open sources a lot of code. Kafka, of course, but also Samza and Voldemoort and a bunch of Hadoop tools like DataFu and Gobblin. Open-source projects tend to be created by developers to solve engineering problems while commercial products … Anyway, LinkedIn has a new open-source data offering called OpenHouse, which is billed as […]
Talend ESB – tRestRequest and tRestResponse
This article covers the configuration of tRestRequest, tRestResponse and How can we create the HTTP listeners using postman. Create a job in the Talend tool (it can be Talend ESB or Talend Data fabric). Once you have created a job, place tRestRequest, tjavaRow component, and tRestResponse in the designer. Once you have placed all the […]
Maintaining Your Adobe Launch Implementation
Adobe Launch is a valuable tool to help you manage the tags placed across your website, including Facebook, Pinterest, and Bing pixels as well as Adobe Analytics and Target depending on your property. Many of these tags are either deployed via custom code or via one of the many extensions within Launch’s extension to help […]
Ready for Microsoft Copilot for Microsoft 365?
Organizations want to leverage the productivity enhancements Microsoft Copilot for Microsoft 365 may enable, but want to avoid unintentional over-exposure of organizational information while users are accessing these Copilot experiences. Our Microsoft team is fielding many questions from customers about how to secure and govern Microsoft Copilot for Microsoft 365. These organizations want to ensure […]
Databricks Lakehouse Federation Public Preview
Sometimes, its nice to be able to skip a step. Most data projects involve data movement before data access. Usually this is not an issue; everyone agrees that the data must be made available before it can be available. There are use cases where the data movement part is a blocker because of time, cost, […]
Data Lake Governance with Tagging in Databricks Unity Catalog
The goal of Databricks Unity Catalog is to provide centralized security and management to data and AI assets across the data lakehouse. Unity Catalog provides fine-grained access control for all the securable objects in the lakehouse; databases, tables, files and even models. Gone are the limitations of the Hive metadata store. The Unity Catalog metastore […]
Navigating AI in Business: Moving Beyond “Using AI”
In today’s rapidly evolving business landscape, the role of artificial intelligence (AI) has shifted from being a mere buzzword to a strategic imperative. However, if the goal is simply to be “using AI”, we risk missing opportunities and wasting resources. To truly harness the power of AI, organizations must shift their mindset from vague aspirations […]
Feature Engineering with Databricks and Unity Catalog
Feature Engineering is the preprocessing step used to make raw data usable as input to an ML model through transformation, aggregation, enrichment, joining, normalization and other processes. Sometimes feature engineering is used against the output of another model rather than the raw data (transfer learning). At a high level, feature engineering has a lot in […]
Spark DataFrame: Writing to Tables and Creating Views
In this Blog Post we will see methods of writing Spark DataFrame into tables and creating views, for essential tasks for data processing and analysis. Before diving into this blog have a look at my other blog posts discussing about creating the DataFrame and manipulating the DataFrame. Creating DataFrame: https://blogs.perficient.com/2024/01/10/spark-scala-approaches-toward-creating-dataframe/ Manipulating DataFrame: https://blogs.perficient.com/2024/02/15/spark-dataframe-basic-methods/ Dataset: The […]
Data Virtualization with Oracle Enterprise Semantic Models
A common symptom of organizations operating at suboptimal performance is when there is a prevalent challenge of dealing with data fragmentation. The fact that enterprise data is siloed within disparate business and operational systems is not the crux to resolve, since there will always be multiple systems. In fact, businesses must adapt to an ever-growing […]
Best Practices for Oracle Fusion HCM Analytics
Oracle Fusion HCM Analytics, a part of Oracle Fusion Data Intelligence Platform (DIP) (earlier known as Fusion Analytics Warehouse), equips various management levels with deep insights to effectively manage the workforce across the organization. DIP is to the most part a ready-to-use data and analytics solution that is typically implemented in a matter of weeks. […]