Natural language AI has proliferated into many of today’s applications and platforms. One of the high in demand use cases is the ability to find quick answers to questions about what’s hidden within organizational data, such as operational, financial, or other enterprise type data. Therefore leveraging the latest advancements in the GenAI space together with enterprise data warehouses has valuable benefits. The SelectAI feature of the Oracle Autonomous Database (ADB) achieves this outcome. It eliminates the complexity of leveraging various Large Language AI Models (LLMs) from within the database itself. From an end user perspective, SelectAI is as easy as asking the question, without having to worry about GenAI prompt generation, data modeling, or LLM fine tuning.
In this post, I will summarize my findings on implementing ADB SelectAI and share some tips on what worked best and what to look out for when planning your implementation.
What I like about SelectAI is that switching the underlying GenAI model is simple. This is important over time to stay up to date and take advantage of the latest and greatest of what LLMs have to offer and at the most suitable cost. We can also set up SelectAI with multiple LLMs simultaneously, for example, to cater to different user groups, at varying levels of service. In the future, there will always be a better LLM model to use, but at this time these findings are based on trials of the Oracle Cloud Infrastructure (OCI) shared Cohere Command model, the OpenAI GPT-3.5-Turbo model and the OpenAI GPT-4 model. Here is a summary of how each worked out:
While this model worked well for simple questions that are well phrased with nouns that relate to the metadata, it didn’t work well when the question got more complex. It didn’t give a wrong answer, as much as it returned a message as follows apologizing for the inability to generate an answer: “Sorry, unfortunately a valid SELECT statement could not be generated…”. At the time of this writing, the Command R+ model had just been introduced and became generally available, but it wasn’t attempted as part of this exercise. It remains to be found out how effective the newer R+ model is in comparison to the other ones.
This LLM worked a lot better than Cohere Command in that it answered all the questions that Command couldn’t. However, it comes at a higher cost.
This one is my favorite so far as it also answered all the questions that Command couldn’t and is roughly 50 times less expensive than GPT-4. It is also a lot faster to respond compared to the OCI shared Cohere Command. There were some differences though at times in how the answers are presented. Below is an example of what I mean:
SELECT s.PROD_ID, s.AMOUNT_SOLD, s.QUANTITY_SOLD, s.CHANNEL_ID, p.PROD_PACK_SIZE, c.CHANNEL_CLASS
FROM ADW_USER.SALES_V s
JOIN ADW_USER.CHANNELS_V c ON s.CHANNEL_ID = c.CHANNEL_ID
JOIN ADW_USER.PRODUCTS_V p ON s.PROD_ID = p.PROD_ID
WHERE p.PROD_PACK_SIZE = 'P' AND c.CHANNEL_CLASS IN ('Direct', 'Indirect');
SELECT c.CHANNEL_CLASS AS Channel_Class, SUM(s.AMOUNT_SOLD) AS Total_Sales
FROM ADW_USER.SALES_V s
JOIN ADW_USER.PRODUCTS_V p ON s.PROD_ID = p.PROD_ID
JOIN ADW_USER.CHANNELS_V c ON s.CHANNEL_ID = c.CHANNEL_ID
WHERE p.PROD_PACK_SIZE = 'P'AND c.CHANNEL_CLASS IN ('Direct', 'Indirect')
GROUP BY c.CHANNEL_CLASS;
Despite this difference, most of the answers were similar between GPT-4 and GPT-3.5-Turbo and that’s why I recommend to start with the 3.5-Turbo and experiment with your schemas at minimal cost.
Another great aspect of the OpenAI GPT models is that they support conversational type questions to follow up in a thread-like manner. So, after I ask for total sales by region, I can do a follow up question in the same conversation and say for example, “keep only Americas”. The query gets updated to restrict previous results to my new request.
No matter how highly intelligent you pick of an LLM model, the experience of using GenAI won’t be pleasant unless the database schemas are well-prepared for natural language. Thanks to the Autonomous Database SelectAI, we don’t have to worry about the metadata every time we ask a question. It is an upfront setup that is done and applies to all questions. Here are some schema prep tips that make a big difference in the overall data Q&A experience.
Limit SelectAI to operate on the most relevant set of tables/views in your ADB. For example exclude any intermediate, temporary, or irrelevant tables and enable SelectAI on only the reporting-ready set of objects. This is important as SelectAI automatically generates the prompt with the schema information to send over to the LLM together with the question. Sending a metadata that excludes any unnecessary database objects, narrows down the focus for the LLM as it generates an answer.
To result in correct joins between tables, name the join columns with the same name. For example, SALES.CHANNEL_ID = CHANNELS.CHANNEL_ID. Foreign key constraints and primary keys constraints don’t affect how tables are joined, at least at the time of writing this post. So we will need to rely on consistently naming join columns in the databases objects.
Creating database views are very useful for SelectAI in several ways.
Adding comments makes a huge difference in how much more effective SelectAI is. Here are some tips on what to do with comments:
Explain certain data values: Sometimes data values are coded and require translation. Following is an example of when this can be helpful: comment on column Products.VALID_FLAG: “indicates if a product is active. the value is A for active”
While the aforementioned guidance is tailored for the upfront administrative setup of SelectAI, here are some tips for the SelectAI end user.
“what is the average total sales by customer name in northern america grouped by customer. Only consider Direct sales and customers with over 3 years of residency and in farming. case insensitive.”
SELECT c.CUST_FIRST_NAME || ' ' || c.CUST_LAST_NAME AS CUSTOMER_NAME, AVG(s.AMOUNT_SOLD)
FROM ADW_USER.SALES_V s JOIN ADW_USER.CUSTOMERS_V c ON s.CUST_ID = c.CUST_ID
JOIN ADW_USER.COUNTRIES_V co ON c.COUNTRY_ID = co.COUNTRY_ID
JOIN ADW_USER.CHANNELS_V ch ON s.CHANNEL_ID = ch.CHANNEL_ID
JOIN ADW_USER.CUSTOMER_DEMOGRAPHICS_V cd ON c.CUST_ID = cd.CUST_ID
WHERE UPPER(co.COUNTRY_SUBREGION) = 'NORTHERN AMERICA'
AND UPPER(ch.CHANNEL_CLASS) = 'DIRECT'
AND cd.YRS_RESIDENCE > 3
AND UPPER(cd.OCCUPATION) = 'FARMING'
GROUP BY c.CUST_FIRST_NAME, c.CUST_LAST_NAME;
It’s impressive to see how GenAI can take the burden off the business in finding quick and timely answers to questions that may come up throughout the day, all without data security risks. Contact us if you’re looking to unlock the power of GenAI for your enterprise data.
]]>ETL testing is a type of testing technique that requires human participation in order to test the extraction, transformation, and loading of data as it is transferred from source to target according to the given business requirements.
Take a look at the block below, where an ETL tool is being used to transfer data from Source to Target. Data accuracy and data completeness can be tested via ETL testing.
Data is loaded from the source system to the data warehouse using the Extract-Transform-Load (ETL) process, referred to as ETL.
Extraction defines the extraction of data from the sources (The sources can be either from a legacy system, a Database, or through Flat files).
Transformation defines Data that is transformed as part of cleaning, aggregation, or any other data alterations completed in this step of the transformation process.
Loading defines the load of data from the Transformed data into the Target Systems called Destinations (The Destinations can again be either a Legacy system, Database, or flat file).
Data is tested via ETL before being transferred to live data warehouse systems. Reconciliation of products is another name for it. ETL testing differs from database testing in terms of its scope and the procedures used to conduct the test. When data is loaded from a source to a destination after transformation, ETL testing is done to ensure the data is accurate. Data that is used between the source and the destination is verified at several points throughout the process.
In order to avoid duplicate records and data loss, ETL testing verifies, validates, and qualifies data. Throughout the ETL process, there are several points where data must be verified.
While testing tester confirms that the data we have extracted, transformed, and loaded has been extracted completely, transferred properly, and loaded into the new system in the correct format.
ETL testing helps to identify and prevent issues with data quality during the ETL process, such as duplicate data or data loss.
Examining the mapping document for accuracy to make sure all the necessary data has been provided. The most crucial document for the ETL tester to design and construct the ETL jobs is the ETL mapping document, which comprises the source, target, and business rules information.
Example: Consider the following real-world scenario: We receive a source file called “Employee_info” that contains employee information that needs to be put into the target’s EMP_DIM table.
The following table shows the information included in any mapping documents and how mapping documents will look.
Depending on your needs, you can add additional fields.
Validate the source and target table structure against the corresponding mapping doc. The source data type and target data type should be identical. Length of data type in both the source and target should be equal. Will verify that the data field type and format are specified. Also, validate the name of the column in the table against the mapping doc.
Ex. Check the below table to verify the mentioned point of metadata check.
Source – company_dtls_1
Target – company_dtls_2
Data Completeness will Ensure that all expected data is loaded into the target table. And check for any rejected records and boundary value analysis. Will Compare record counts between the source and target. And will see data should not be truncated in the column of target tables. Also, compare the unique value of key fields between data loaded to WH and source data.
Example:
You have a Source table with five columns and five rows that contain company-related details. You have a Target table with the same five columns. After the successful completion of an ETL, all 5 records of the source table (SQ_company_dtls_1) are loaded into the target table (TGT_company_dtls_2) as shown in the below image. If any Error is encountered while ETL execution, its error code will be displayed in statistics.
To make sure the key constraints are defined for specific tables as expected.
Inaccurate data resulting from flaws in the ETL process can lead to data issues in reporting and poor strategic decision-making. According to analyst firm Gartner, bad data costs companies, on average, $14 million annually with some companies costing as much as $100 million.
A consequence of inaccurate data is:
A large fast-food company depends on business intelligence reports to determine how much raw chicken to order every month, by sales region and time of year. If these data are inaccurate, the business may order too much, which could result in millions of dollars in lost sales or useless items.
Here are a few situations where it is essential to use ETL testing:
To protect the data quality of the company, an ETL tester plays a crucial role.
ETL testing makes sure that all validity checks are met and that all transformation rules are strictly followed while transferring data from diverse sources to the central data warehouse. The main role of an ETL tester includes evaluating the data sources, data extraction, transformation logic application, and data loading in the destination tables. Data reconciliation is used in database testing to acquire pertinent data for analytics and business intelligence. ETL testing is different from data reconciliation. It is used by data warehouse systems.
Responsibilities of an ETL tester:
In general, an ETL tester is the organization’s data quality guardian and ought to participate in all significant debates concerning the data used for business intelligence and other use cases.
Here we learned what ETL is, what is ETL testing, why we perform ETL testing when we need ETL testing, what skills are required for an ETL tester, and the Role and responsibilities of an ETL tester.
Happy Reading!
]]>ETL stands for Extract, Transform, Load. This process is used to integrate data from multiple sources into a single destination, such as a data warehouse. The process involves extracting data from the source systems, transforming it into a format that can be used by the destination system, and then loading it into the destination system. ETL is commonly used in business intelligence and data warehousing projects to consolidate data from various sources and make it available for analysis and reporting.
ELT stands for Extract, Load, Transform. It is a process similar to ETL but with a different order of operations. In ELT, data is first extracted from source systems and loaded into the destination system, and then transformed into a format that can be used for analysis and reporting. This approach is often used when the destination system has the capability to perform complex transformations and data manipulation. ELT is becoming more popular with the rise of cloud-based data warehouses and big data platforms that can handle large-scale data processing and transformation.
ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform) are two methods of data integration used in data warehousing.
ETL involves extracting data from various sources, transforming it into a format that can be used by the target system, and then loading it into the target system. The transformation process involves cleaning, validating, and enriching the data before it is loaded. ETL is a batch-oriented process that requires a significant amount of computing power and storage space.
Conversely, ELT involves extracting data from various sources and loading it directly into the target system without any transformation. The transformation process is performed after the data has been loaded into the target system. ELT is a more modern approach that takes advantage of the processing power of modern data warehouses and allows for real-time analysis of data.
The main difference between ETL and ELT is the order in which the transformation process is performed. In ETL, transformation is performed before loading, while in ELT, transformation is performed after loading. The choice between ETL and ELT depends on the specific needs of the organization and the characteristics of the data being integrated.
Advantages of ELT over ETL:
Disadvantages of ELT over ETL:
So which approach to choose, ETL or ELT?
ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform) are two approaches to data integration that are widely used in the industry. Both ETL and ELT are used to extract data from multiple sources, transform it into a format that can be used by the target system, and load it into the target system. However, there are some key differences between the two approaches.
ETL is a traditional approach to data integration that has been used for many years. In this approach, data is first extracted from various sources and then transformed into a format that can be used by the target system. The transformed data is then loaded into the target system. ETL is a batch process that is usually done on a scheduled basis.
The main advantage of ETL is that it allows for complex transformations to be performed on the data before it is loaded into the target system. This means that data can be cleaned, filtered, and enriched before it is used. ETL also allows for data to be consolidated from multiple sources, which can be useful when data is spread across different systems.
However, ETL can be slow and resource intensive. Because the transformations are performed before the data is loaded into the target system, large amounts of data can take a long time to process. ETL also requires a dedicated server or cluster to perform the transformations.
A company wants to integrate data from multiple sources, including sales data from its CRM system and financial data from its accounting software. They use an ETL tool to extract the data, transform it into a common format, and load it into a data warehouse. The ETL process includes cleaning and filtering the data and performing calculations to create new metrics. The transformed data is then used for reporting and analysis.
ELT is a newer approach to data integration that has become popular in recent years. In this approach, data is first extracted from various sources and then loaded into the target system. Once the data is in the target system it is transformed into a format that can be used by the system.
The main advantage of ELT is that it is faster and more scalable than ETL. Because the transformations are performed after the data is loaded into the target system, large amounts of data can be processed quickly. ELT also requires less hardware than ETL, as the transformations can be performed on the target system itself.
However, ELT is not suitable for complex transformations. Because the transformations are performed after the data is loaded into the target system, there are limitations on what can be done with the data. ELT is also not suitable for consolidating data from multiple sources, as the data must be loaded into the target system before it can be combined.
A company wants to migrate its on-premises database to the cloud. They use an ELT tool to extract the data from the on-premises database and load it into the cloud database. Once the data is in the cloud database, they use SQL queries and other tools to transform the data into the desired format. The ELT process is faster and more scalable than ETL, as it does not require a dedicated server or cluster for transformations.
In conclusion, both ETL and ELT have their advantages and disadvantages. ETL is best suited for situations where complex transformations are required and where data needs to be consolidated from multiple sources. ELT is best suited for situations where speed and scalability are important and where simple transformations are sufficient. Ultimately, the choice between ETL and ELT will depend on the specific needs of the organization and the nature of the data being integrated.
Please share your thoughts and suggestions in the space below, and I’ll do my best to respond to all of them as time allows.
For more such blogs click here
Happy Reading!
]]>Oracle Cloud ERP offers several built-in tools for reporting. While native reporting tools like Oracle Transactional Business Intelligence (OTBI) and Oracle BI Publisher are well-suited for specific types of operational reporting, they do have limitations when it comes to performing complex and enterprise-wide reporting. It is therefore crucial to complement the Oracle Cloud ERP application with an enterprise reporting solution.
A major consideration to keep in mind is that Oracle Cloud ERP is a SaaS application. Unlike Oracle E-Business Suite (EBS), direct access to the Oracle Cloud ERP database (OLTP) is typically restricted. Therefore, traditional approaches to ERP reporting that may have worked well with EBS, do not fit very well with Oracle Fusion SaaS applications. For example, you may have done EBS reporting with Noetix for IBM Cognos, OBIEE, Discoverer or other legacy BI tools. Or you may have several ETL processes that extracted, transformed, and loaded on-premises ERP data into a data warehouse. However, following a similar approach for Cloud ERP reporting is not ideal. The recommendation is to have the ERP Cloud implementation accompanied by a more innovative reporting methodology that fits well with the modernity of the Cloud ERP application, is scalable to perform adequately, and offers timely time to value when it comes to addressing continuously evolving business needs for analytical insights. In this blog, I will describe how Oracle Cloud ERP is supplemented with Incorta, an innovative data and reporting platform that transcends common challenges of the classical approach of the data warehouse.
What Differentiates Incorta Analytics for Oracle Cloud ERP?
Severals factors come into play when deciding on which type of reporting solution works best with the applications at hand. Here I am presenting Incorta as a very viable option for its capabilities in handling data and reporting features. Out of many reasons why, I am focusing here on the three I believe are most relevant with Oracle Cloud ERP.
Deploying Incorta for Oracle Cloud ERP follows a much faster cycle than implementing traditional data warehouse type deployments. Even after the initial deployment, rolling out additional reporting enhancements on Incorta follows a faster time to value due to several reasons:
The whole data export process is however streamlined and managed from within Incorta. A built-in connector to Oracle Fusion applications allows Incorta to tap into any data object in Oracle Cloud ERP. The connector performs data discovery on Oracle Cloud ERP View Objects (VO), reads the metadata and date available in both Oracle VOs and custom VOs, and loads data into Incorta. The connector adheres to Oracle best practices for exporting Oracle Fusion data in bulk. The connectivity happens through the Oracle Fusion Business Intelligence Cloud Connector (BICC). There is no need to develop the BICC data exports from scratch as the Incorta Blueprint for Oracle Cloud ERP already includes pre-defined BICC offerings for various ERP functional areas (such AP, AR, GL, Fixed Assets, etc.). These offerings are available to import into BICC, with the option of updating with custom View Objects. Managing the data load from Oracle Cloud ERP into Incorta takes place from the Incorta web UI and therefore requires minimal setup on the Oracle Fusion side.
We can schedule multiple pre-configured offerings from the Incorta blueprint, depending on which modules are of interest to enable in Incorta for Oracle Cloud ERP reporting. This matrix provides a list of BICC offerings that get scheduled to support different functional areas of interest.
In addition, pre-built dashboards include reporting on common business functions such as: Procure to Pay, Order to Cash, Bookings, Billings and Backlog.
If you are familiar with data warehouses and BI solutions, you are probably aware that the performance of a reporting solution is key to its success. And performance here includes both the data layer, whereby data refreshes happen in a timely manner, as well as front-end reporting response times. If the business is unable to get the information required to drive decisions in a timely manner, the reporting platform would have failed its purpose. Therefore, laying a solid foundation for an enterprise-wide reporting solution must have performance and scalability as a key criterion.
What I like about Incorta is that it is not only a data visualization or reporting platform, but it is a scalable data storage and optimized data querying engine as well. With Incorta we don’t need to setup a 3rd party database (data warehouse) to store the data. Incorta handles the storage and retrieval of data using data maps that offer very quick response times. Previously, with a data warehouse, when a table (like GL journals or sales invoices, for example) starts growing above a few million rows, you would need to consider performance optimization through several techniques like archiving, partitioning, indexing, and even adding several layers of aggregation to enhance reporting performance. All these activities are time consuming and hinders productivity and innovation. These traditional concepts for performance optimization are not needed anymore as Incorta is able to easily handle hundreds of millions and billions of rows without the need to intervene with additional levels of aggregate tables.
It is often the case that analytics encompasses information from multiple applications, not just Oracle Cloud ERP. A couple things to consider in this regard:
If you’re on your journey to Oracle Cloud ERP and wondering what to do with your legacy data warehouse and reporting platforms, I encourage you to reach out for a consultation on this. The Perficient BI team is highly experienced with ERP projects and has helped many customers with their upgrades and analytics initiatives leveraging a diverse set of technology vendors and platforms.
]]>One of the easiest and lowest maintenance approaches to feed Fusion Cloud Apps data into a data warehouse, is with Oracle Analytics Cloud (OAC) Fusion Business Intelligence Cloud Connector (BICC) Data Replication. Data Replication for Fusion Apps is a native feature of OAC Enterprise edition. If you’re migrating your on-premises application, such as E-Business Suite, to Oracle SaaS, you probably already realize that the capability to directly connect to the Oracle SaaS transaction database for data extraction, isn’t generally available, except for limited use with BI Publisher type reporting. However, Fusion BICC offers a robust approach to enable data extraction from Fusion Apps. BICC extracts data from Fusion App view objects into files stored on Oracle Cloud (OCI). OAC’s data replication from BICC facilitates the process of configuring, scheduling and monitoring the whole process of data extraction from BICC into Cloud storage and then importing the same data into a data warehouse. While this doesn’t really offer an ETL-like functionality, it does streamline the end-to-end process of extracting data from Fusion apps into relational table structures in an Oracle database. These target relational tables can then either be directly reported on or transformed for more complex analytics.
Here are, in my opinion, the key advantages of using OAC Data Replication from Fusion Apps:
While OAC Data Replication for Fusion Apps does offer some great functionality, there are restrictions that may render it unsuitable, depending on how you envision the holistic view of your future state data warehouse. Here are some of the reasons why it may not be adequate:
Thrilling our clients with innovation and impact – it’s not just rhetoric. This belief is instrumental for our clients’ success. In 2018 we introduced our Chief Strategists, who provide vision and leadership to help our clients remain competitive. Get to know each of our strategists as they share their unique insights on their areas of expertise.
Big data has significantly impacted today’s leading enterprises “as it helps detect patterns, consumer trends, and enhance decision making.” In fact, the big data and analytics market is estimated to reach $49 billion this year with a CAGR of 11 percent.
However, big data is often too broad and complex for analysis by traditional processing application software. For businesses to get the most out of their data, they must deploy a strategy that transforms their data management and analytics practices.
We recently spoke with Bill Busch, Big Data Chief Strategist, to learn more about creating value with big data and developing a data strategy to achieve meaningful results.
Bill Busch: Since joining Perficient, I have acted as an evangelist for big data, machine analytics, resource management, and the business development of those functions. My new role as Chief Strategist is a continuation of those efforts. This role allows me to be a resource for our clients and help them gain value from strategically using their data.
BB: I want to help clients understand how to best use their own data with AI, machine learning, analytics, and cloud to inform their business decisions. To get there, business leaders must transform data collection platforms in a way that addresses the typical mistakes or challenges experienced.
Many of our implementations involve several areas of expertise, which provides an opportunity to collaborate with other Chief Strategists, whether they’re focused in a particular industry or a specific technology.
Collectively, we help clients change processes and train teams to approach their duties differently. By addressing technology, processes, and people, they can take full advantage of [data collection] platforms’ speed and still maintain the quality and visibility of data.
“The breadth and depth of the Chief Strategists’ expertise enables the creation of comprehensive solutions that resolve our clients’ most critical issues.”
A recent example of our collaboration involved developing a unified data view using APIs. Typically, application data and API management were separated when companies sought to integrate data. Now, these two ecosystems are merging into one. This presents a challenge to create a single solution that enables application integration and the consolidation of different data sources. To conquer this challenge, we partnered with Erich Roch, Chief Strategist for IT Modernization and Integration, to create comprehensive solutions for our clients.
BB: Big data relies on a series of systems or platforms to gain meaningful insights, such as cloud data warehousing or cloud data lakes. Developing a strategy [around people and processes working with data] is what ultimately helps businesses align decision making and gain value from their data.
BB: Data is the heartbeat of any modern organization. Having information readily available to make decisions doesn’t happen quickly or seamlessly. The most productive strategy to achieve speed-to-market value requires funneling your information into a database to generate high-level insights.
Data strategy also doesn’t have to end with the executive level. Businesses enable other departments for success when they make data accessible for broader analytics by following best practices, establishing predefined processes, and creating a strategy to integrate the data.
BB: If you don’t have a data strategy in place, then you’re operating reactively and setting up your [business] for displacement. It’s easy for businesses to rely on low-quality data that is readily available to influence their decisions. However, the issue with that approach is the lack of depth of that information. Your enterprise may have the best AI in the world, but those AI models aren’t meaningful if you lack robust data for the technology to analyze.
To shift from a reactive to proactive stance, you should invest in a strategy and a data management infrastructure to manage the information. Investing in both yields deeper insights than surface-level information, provides greater context with existing knowledge, and identifies emerging trends to reveal unknown information.
When helping clients to create a strategy, we identify a scalable, analytical use case that could benefit from a data warehouse. Then, we help our clients establish a process with dedicated people and resources to assess existing data and develop ideas about the data they want to collect. From there, we create a road map to bridge the gap, so our clients are well-positioned to make strategic decisions relatively quickly.
BB: The open-source approach to data processing is new. Companies previously used on-premise data lakes for their processing, and then the trend shifted to cloud-driven data lakes. Now, we’re migrating data away from mainframes and high-dollar, appliance-based platforms to open source, cloud-based platforms.
For example, we’re currently working with a major healthcare payer to migrate its costly claims processing from a data lake mainframe to an open source, cloud-based platform. This project will enable our client to do more with its data – at a faster pace – while reducing costs. I think many companies realize the possibilities and are considering this model and similar solutions. It’s a sign of the times and underscores the need for digital transformation.
However, if your organization isn’t quite ready for data processing in the cloud, data lakes are a great intermediate step, allowing you to consolidate your data for the analytical use case. Using a cost-effective enterprise data warehouse (EDW), you can take advantage of the data lake and use the data for operational processing.
While I’m not saying that we’re moving away from data lakes, I think we’re seeing a need to build out the reliance of data lakes and enable more than one use case. By doing so, we’re architecturally enabling multiple areas of a business to take advantage of data that suits its goals or objectives.
BB: At the start of any client engagement, we identify the business drivers for a data strategy. Whether the use case is analytical or scientific, the supporting data architecture and data strategy must have some perspective to the [business] need. We also ensure the business need isn’t too tactical so future iterations aren’t limited.
Organizational culture is another major consideration for developing a big data strategy. Enabling self-service within the finance industry is a prime example. Industry regulations impact the level of self-service capabilities that a firm can offer. These limitations can greatly influence a financial organization’s culture, and its reception of a data-focused strategy.
Contrast this scenario to the life sciences industry that’s comprised of people who are hard-wired to interact with data in a more analytical way. Regardless of the industry or the culture, incorporating an organizational change management component helps ease the transition.
Using these insights, I try to establish a vision with clients. Many issues that businesses generally encounter when creating a data strategy and data lake platform isn’t related to the technology. Instead, it’s overcoming the mindset of the people involved. If you start with a universal outlook, develop agile concepts, and deploy lean processes, a data strategy can inform much analysis upfront. Then, the strategy establishes a data pipeline to begin moving information from the source to its destination in a relatively short timeframe.
Learn more about each of our Chief Strategists by following this series.
]]>It’s no secret that data is a massive asset when it comes to making better business decisions. But gaining the valuable insights required to make those decisions requires quality data that you can trust. And to accomplish this you need a data strategy.
Without understanding your business objectives, identifying use cases, knowing how your users access data, and much more then you put yourself in the position of making decisions based on incomplete or incorrect insights.
Next month, leaders in the data industry will meet in New York City for the Strata Data Conference September 23-26 to share insights on how to implement a strong data strategy (as well as current hot topics like AI and machine learning, which need a strong data strategy foundation to build on).
Here are four sessions to attend to learn more about the elements of a quality data strategy.
Foundations for successful data projects
1:30pm-5:00pm, Sep 24 / 1E 10
The enterprise data management space has changed dramatically in recent years, and this has led to new challenges for organizations in creating successful data practices. Presenters, Ted Malaska and Jonathan Seidman, detail guidelines and best practices from planning to implementation based on years of experience working with companies to deliver successful data projects.
Running multidisciplinary big data workloads in the cloud
9:00am-12:30pm, Sep 24 / 1E 14
Moving to the cloud poses challenges from re architecting to data context consistency across workloads that span multiple clusters. Presenters Jason Wang, Tony Wu, and Vinithra Varadharajan explore cloud architecture and its challenges, as well as using Cloudera Altus to build data warehousing and data engineering clusters and run workloads that share metadata between them using Cloudera SDX.
It’s not you; it’s your database: How to unlock the full potential of your operational data (sponsored by MemSQL)
10:20am-10:25am, Sep 25 / 3E
Data is now the world’s most valuable resource, with winners and losers decided every day by how well we collect, analyze, and act on data. However, most companies struggle to unlock the full value of their data, using outdated, outmoded data infrastructure. Presenter Nikita Shamgunov examines how businesses use data, the new demands on data infrastructure, and what you should expect from your tools.
The ugly truth about making analytics actionable (sponsored by SAS)
1:15pm-1:55pm, Sep 25 / 1A 01/02
Companies today are working to adopt data-driven mind-sets, strategies, and cultures. Yet the ugly truth is many still struggle to make analytics actionable. Presenter Diana Shaw outlines a simple, powerful, and automated solution to operationalize all types of analytics at scale. You’ll learn how to put analytics into action while providing model governance and data scalability to drive real results.
If you’re attending the Strata Data Conference don’t forget to come visit us! Perficient is proud to be a Premier Exhibitor of the event and we’ll be at booth #1338 in the expo hall. Our experts will be onsite to strategize and showcase our expertise in complex data environments, AI, machine learning, and data strategy.
You can also connect with our team to set up a meeting, even if you’re not attending the conference. We look forward to seeing you.
]]>The blog “5 Ways Your Data Warehouse Is Better In the Cloud” outlined the advantages of running your data warehouse in the cloud. In this blog, we look at the high-level considerations for migrating your data warehouse to the cloud.
Many companies facing expensive on-premises product renewals are aggressively migrating their data to the cloud. In this case, the cost of migrating to the cloud can be offset by the cost avoidance of the upgrade and migration. It is important to recognize product end-of-life cycles and plan a migration to the cloud accordingly. Insufficient lead time for a migration due to end-of-life dates would likely result in a less than optimum lift-and-shift migration.
Building a business case requires a TCO calculation for each data warehouse option considered. Cost should be gathered for data center/compute expenses, networking, data transfer, storage, administrative, software license/subscription fees, hardware and software maintenance, depreciation, and support. Fault-tolerance and disaster recovery is inherent in the cloud and is expensive to achieve with your own data centers. You should include these costs in the TCO model as well.
Intangible benefits typically associated with the cloud like agility and elasticity are more difficult to quantify. However, you should include these as benefits within the business case.
A recent Perficient client TCO study found the cost of the enterprise data warehouse (EDW) in the cloud to be significantly less than on-premises due to the following factors:
There are many options to run your EDW in the cloud. Businesses require a strong knowledge of cloud architectures, data lakes, elasticity and storage options, cross-region redundancy, and cloud subscription models to make technology choices. The leading public cloud providers have many data-related services and continue to innovate rapidly. There are also many third party options to choose from.
Your data-related technology selections and the roadmap to implement them must also align to the business strategy and priorities, your overall cloud strategy, and the direction you want to take your BI and AI in the cloud. The ability to support or migrate your existing data movements, ETLs, and security model must also be considered.
Using a partner with cloud platform and data migration experience is often something worth considering. This is especially the case for technology selection and migration planning.
As previously stated, an EDW product end-of-life upgrade spend could force you to a lift-and-shift migration scenario. If this is the case, it should be considered a tactical stop in the overall data warehouse roadmap. A lift-and-shift migration will not produce an agile EDW or optimize the elastic cloud environment.
Another option is to create a new cloud EDW for a specific set of use cases. Businesses should select these use cases carefully so that they leverage the cloud with all the associated benefits. For example, businesses could build a new EDW in the cloud for a specific business unit or to support a specific data science use case. This more incremental approach would build internal cloud skills and prove the value of data in the cloud to the business and thus build momentum to continue the migration. This approach could provide analysts and data scientists new capabilities such as streaming data, AI/ML, and sandboxes for self-service BI.
The EDW is an attractive cloud use case. Businesses often already have their data in the cloud thanks to their software-as-a-service-based applications or already-migrated applications. For those that don’t have data in the cloud yet, their data will likely be there soon. As a first step, migrating the EDW and BI is lower risk than moving your transactional systems. On top of that, there are also many financial and technical benefits to moving the EDW to the cloud.
Perficient has experience with EDW-to-cloud migrations and partnerships with all leading vendors in the space should you need any help. Our cloud EDW, business intelligence, performance management, and predictive and risk analytics solutions incorporate industry expertise, years of experience, and insights from hundreds of successful implementations.
]]>According to RightScale’s eighth annual survey, “Cloud Computing Trends: 2019 State of the Cloud Survey”, “A significant number of public cloud users are now leveraging services beyond just the basic compute, storage, and network services.” The survey says relational database services is the most popular extended cloud service and “data warehouse moved up significantly to the third position.” It’s no surprise that as data volume, velocity, and types have exploded, companies are looking for a more agile and cost effective solutions for their data management and analytics strategies in the cloud.
The following topics outline the advantages of data storage and the enterprise data warehouse (EDW) in the cloud.
The TCO of a cloud-based EDW is lower than that of on-premises when variables such as redundancy and disaster recovery are included. There are the added benefits of avoiding a large initial capital expense, the ability to try it before you buy it (reducing financial risk), and the ability to process large, one-off analytical workloads with elastic compute and storage and only pay for what is consumed.
With elastic compute and storage you never have to worry about a lack of hardware impacting performance. You do however need to pick the right data management tools for the job to ensure requirements are met. There is an expansive set of data services available on the cloud and performance is straight forward to test in the cloud paying for only the resources you use during tests.
You can quickly deploy new databases and services without the concern of expensive capital investments or hardware lead times. You can also rapidly try new and innovative tools and approaches in the cloud. For example, you could experiment with streaming data, AI and ML. You could set up sandboxes for user groups and self-service BI. And, in the cloud you have endless capacity for applications like storing sensor data for IoT.
Elastic compute, on-demand provisioning of infrastructure and global connectivity makes tasks like deploying new databases, federating data, replication and redundancy for disaster recovery easier than on premises. Also, innovation on the cloud is happening more rapidly than in on premises data centers allowing you to experiment with the latest technologies like serverless functions and AI.
It is easier to experiment in the cloud and handle high volume and high velocity data. You can keep data longer and store more diverse data types and sources. Combining data availability and the modern data services available on the cloud is a great way to add new data capabilities.
There were early concerns about cloud security. However, recent surveys show that many believe the cloud to be more secure than legacy systems. The cloud has strong perimeter security, controlled access, and cyber security experts monitoring and auditing security. Moving your data to the cloud is a good time to look again at cloud security policies and architecture.
Data and BI is a great way to get started on the cloud. EDW and BI is lower risk than moving your transactional systems. And there are many financial and technical benefits as outlined above.
Perficient has experience with EDW to cloud migrations. And partnerships with all the leading vendors in the space should you need any help. Our Cloud EDW, business intelligence, performance management, and predictive and risk analytics solutions incorporate industry expertise, years of experience, and insights from hundreds of successful implementations.
]]>In 2016, when I did my first in-depth comparison, the resulting TCOs were usually very close. Usually, the OpEx was slightly higher for the cloud TCO versus the on-prem TCO required substantial capital investment.
However, our most recent estimate was eye-opening to our client. We were assessing a green-field implementation for a Data Warehouse at a mid-sized company. Part of our assessment was to compare TCO between the different deployment options, on-prem and cloud. We fully loaded all expenses for both options, including data center expenses, networking, data transfer, storage, administrative, software subscription fees, hardware and software maintenance, depreciation, and support.
The results were staggering. The cloud deployment TCO was over 30% less than the comparable on-prem deployment. Further, the on-prem deployment required a significant capital investment which was not required for the cloud deployment. It should be noted that in the cloud TCO we greatly over-estimated data transfer, processing, and storage costs.
Inspecting the TCO, there were three cloud features that greatly swung the:
In the past, the cloud vs on-prem decision came down to a conversation around the speed of deployment, flexibility, elasticity – that is the normal cloud advantages. Now with the movement toward PaaS and serverless options that charge based only on resources used, the cloud has become the lowest TCO option in most cases.
]]>The question that is often asked is: Can we leverage the same security we already have in Oracle Fusion SaaS (which includes users, duties, roles and security policies) to secure data in Oracle Analytics Cloud (OAC – an Oracle PaaS)? The answer is Yes. To understand better how this is possible, keep reading. This blog follows my previous 2 blog posts about Integrating Oracle SaaS Data into OAC and Creating OAC Data Replication from Oracle SaaS. While the prior posts describe how to load SaaS data into OAC, this blog focuses on how to make OAC inherit Oracle Fusion SaaS security, and therefore avoid the hassle of manually maintaining security setups in multiple places.
Before delving into the details, it is important to differentiate between securing Oracle SaaS data that is flowing over to OAC directly through a Data Set Connection vs the Oracle SaaS data that is replicated into an OAC Data Warehouse, through any of the data copying techniques (Data Sync, Data Replication, Data Flows, or other ETL means).
1. OAC Data Set Connection against Oracle SaaS: This approach leverages the OAC Oracle Application Connection Adapter. It allows authenticating with either a shared admin user or with an end-user login. Choosing to make end-users login with their own Oracle Fusion App credentials automatically enforces their Fusion App security roles and data policies to any reporting that they do against the Fusion App. Therefore, with a Data Set Connection, no additional configuration is necessary to inherit Fusion App security into OAC, since it all kicks in once an end-user logins in with their Fusion credentials.
2. OAC Data Warehouse Connection: This approach is querying a replica of the Fusion App data that has been copied over to a data warehouse. Accordingly, the replicated data requires that object and data level security controls be defined in OAC. Luckily, while doing this requires a one-time manual configuration, it relies on whatever security role assignments and data policies are setup in the source Fusion App.
The rest of this blog post elaborates on the second type of connection, and how to make OAC inherit Fusion App security against a data warehouse.
I am going to start my explanation by describing how authentication works and then move on to discuss how to setup authorization for both object security as well as data security.
Authentication:
Authorization: There are 2 different levels of authorizations that need to be configured: Object Level and Data Level Security.
To conclude, integrating Oracle Fusion SaaS Security into OAC is an essential part of a successful Oracle Analytics implementation. Performing a comprehensive security integration with SaaS that covers the various layers including users, objects and data is crucial. The success of the implementation is determined by how secure corporate data is and how feasible it is to avoid the maintenance overhead that would have been necessary without a well-planned and integrated security solution for Oracle SaaS and PaaS.
]]>Healthcare IT is ever-changing and Perficient is on the forefront of this change, guiding the industry and those we serve toward a brighter future. We partner with healthcare companies to help people live their lives to their fullest potential today, using best practices and cost saving technologies and processes.
As we look to the future of Healthcare Information Systems, the effectiveness of an organization is measured by four areas; the heart of who we are and do is all about the integration, accuracy, consistency and timeliness of health information.
Healthcare organizations are among the most complex forms of human organization ever attempted to be managed, making transformation a daunting task. Despite the challenges associated with change, organizations need to evolve into a data-driven outcomes improvement organization.
They aggregate tremendous amounts of data – they need to figure out how to use it to drive innovation, boost the quality of care outcomes, and cut costs.
Besides members and providers, as well as internal/external business partners and vendors, there are a multitude state and federal regulatory/compliance agencies that insist on having our information on a near real-time manner in order to perform their own functions and services. These integration requirements needs are constantly changing.
As an EDI Integration Specialist, I have seen many organizations struggle to constantly keep up with the business needs of their trading partners, state and federal agencies. Often, as our trading partners analyze the information we have sent them, they discover missing data or inconsistencies.
This requires a tedious and painful iterative remediation process to get the missing data, and results in resending massive amounts of historical data or correcting/retro-adjudicating claims. Adjusting and recouping claim payments is always painful for all entities involved, especially providers, with possible penalties or sanctions.
In the last few years, I have worked with several clients on getting their claims information loaded into their state’s All Payer Claims Databases (APCDB) and CMS to get their health claims reimbursed. We struggled to get the complete data set loaded successfully, and to meet the rigorous quality assurance standards.
It required several attempts working with their legacy systems to get the necessary data into the correct format. It required a great deal of coordination, testing and validation. Each state has a different submission format and data requirements, not necessarily an 837 EDI format, including one state that had a 220+ field delimited record format (Rhode Island).
We spent a great amount of time in compliance validation, and each submission required a manual effort. We constantly had to monitor each submission’s file acceptance status, handling original and adjusted claims differently using the previously accepted claim ID. If files were not submitted accurately and on a timely manner, there were significant fines imposed.
Several times we discovered that even though the files were successfully accepted, there were still missing information which need to be resubmitted. To be honest, it was a logistical nightmare.
As we design and develop data integrations, APIs and extracts, we often ‘shortcut’ to deliver data due to competing priorities, quickened project delivery schedules or limited development/testing staff. This leads to not giving our full attention to the complete requirements of the client/trading partners.
Companion guides and documentation are vague and say ‘send if known’, realizing several years later that these ‘shortcuts’ will be found out and possibly leading to penalties and corrective action plans. Sometimes legacy system and technical limitations lead to not having the complete record set that is required.
Limitations of electronic health record (EHR) system combined with variable levels of expertise in outcomes improvement impede the health system’s ability to transform.
In many healthcare organizations, information technology (IT) teams—including data architects and data analysts—and quality and clinical teams work in silos. IT provides the technologies, designs and delivers reports, without a clear understanding of the needs of the quality and clinical teams.
This can sometimes turn into a finger pointing exercise. Quality and clinical teams claim IT is not delivering the data they need to succeed, while IT insists that others are not clearly articulating what they need. It takes clear-eyed analysis to see that the teams are failing to work together to prioritize their outcomes improvement initiatives and drive sustainable outcomes.
At Perficient, we can provide a comprehensive picture of your organization’s information needs and provide you with a path to implementing complex system redesigns and simplify integrations. Putting health care redesign into action can be done in the following four general phases:
1. Getting started. The most important part of building a skyscraper is looking at the requirements, developing a blueprint and building a robust foundation. The first phase involves devising a strategic plan and assembling a leadership team to focus on quality improvement efforts. The team should include senior leaders, clinical champions (clinicians who promote the redesign), and administrative leaders. We need to develop a long-term strategy that sunsets legacy systems, consolidates business functions, build synergies between departments and aggregates data into a central repository. High-level needs assessments are performed, scope is defined to limit effort, and a change management process is created to assist in project management. A business governance committee determines what and when business decisions are implemented. Technical/architectural review committee approves the overall design and data governance of systems, interfaces and integrations of enterprise systems.
2. Review the complete electronic dataset. That includes building a corporate data dictionary (including pricing/benefits, membership, providers, claims, utilization, brokers, authorizations/referrals, reference data and code sets, etc.) and set priorities for improvement. The second phase involves gathering data to help inform the priorities for improvement. Once data requirements are gathered, performance measures such as NCQA/HEDIS that represent the major clinical, business, satisfaction, and operations goals for the practice can be identified. Corporate reporting and process needs are critical at this phase to look to ensure compliance and meeting internal and external customers’ requirements. The creation of dashboards and user reports that are easy to manage provide the right information at the right time can make the difference of cost savings and effective management throughout the organization. Using these dashboards allow users to keep an eye on the overall health and utilization of the services that they provide to their members.
One of the most helpful EDI integration practices I have found is to perform a source to target gap analysis between core claims/membership systems, my inbound/outbound EDI staging database, and the EDIFEC/GENTRAN mapping logic which translates the data to the outbound and from the inbound x12 EDI 837 Claims and 834 Membership enrollment files. This document also identifies any transformations, conversions or lookups that are needed from propriety values to HIPAA Standard values. By looking at every EDI Loop/Segment/Element and mapping it all the way through, I was able to identity data fields that were not being sent or being sent incorrectly. I give this mapping document as part of my technical specification documents to my EDI developers, which I customize for specific trading partners while I was reviewing the vendor’s companion guides.
3. Redesign care and business systems. The third phase involves organizing the care team around their roles, responsibilities, and workflows. The care team offers ideas for improvement and evaluates the effects of changes made. Determining how an enterprise integrates and uses often disparate systems is critical to determine timely, complete and accurate data/process flow. The design, creation and use of APIs and messaging technologies assist in getting information extracted, transformed and loaded (ETL) is critical, especially if information is to be used real-time web-based portals. Evaluation of easy to use yet robust batch process ETL tools, such as Informatica, become the cornerstone of any data integration project. Healthcare organization relay upon reporting tools to evaluate, investigate and reconcile information, especially with their financial and clinical systems. Imaging, workflow management and correspondence generation systems are used to create and manage the communications.
4. Continuously improve performance and maintain changes. The fourth phase includes ongoing review of clinical and financial integration outcomes and making adjustments for continued improvement. As we are looking to the future, we need to look at the IT architecture and its ability to expand with the ever-changing technology and needed capability models. Perficient is a preferred partner with IBM, Oracle and Microsoft with extensive experience for digital and cloud based implementations. Using these technologies gives our clients the ability to expand their systems, application servers to be spun up on demand based on need and growth, allow for failover, allow for redundancy, distributed and global databases to be employed, virtualization of software and upgrades be made while being transparent to the end users.
Perficient’s health information technology (IT) initiative for the integration of health information technology (IT) and care management includes a variety of electronic methods that are used to manage information about people’s health and health care, for both individual patients and groups of patients. The use of health IT can improve the quality of care, even as it makes health care more cost-effective.
Implementing an enterprise data warehouse (EDW) or a data lake/analytic platform (DLAP) results in the standardization of terminology and measures across the organization and provides the ability to easily visualize performance. These critical steps allow for the collection and analysis of information organization-wide.
The EDW/DLAP aggregates data from a wide variety of sources, including clinical, financial, supply chain, patient satisfaction, and other operational data sources (ODS) and data marts.
It provides broad access to data across platforms, including the CEO and other operational leaders, department heads, clinicians, and front line leaders. When faced with a problem or question that requires information, clinicians and leaders don’t have to request a report and wait days or weeks for data analysts to build it.
The analytics platform provides clinicians and leaders the ability to visualize data in near-real time, and to explore the problem and population of interest. This direct access increases the speed and scale with which we achieve improvement. Obtaining data required to understand current performance no longer takes weeks or even months.
Application simplification takes the confusion as to the consistency and the accuracy of data within an organization. Per member/Per Month (PMPM) reporting is delivered in a standard format throughout, regardless of line of business.
The analytics platform delivers performance data used to inform organizational and clinician decision-making, evaluate the effectiveness of performance improvement initiatives, and increasingly, predict which patients are at greatest risk for an adverse outcome, enabling clinicians to mobilize resources around the patient to prevent this occurrence.
An analytics platform is incredibly powerful and provides employees and customers with the ability to easily visualize its performance, setting the stage for data-driven outcomes improvement. However, healthcare providers and payers know that tools and technology alone don’t lead to improvement.
To be effective, clinicians, IT, and Quality Assurance have to partner together to identify best practices and design systems to adopt them by building the practices into everyday workflows. Picking the right reporting and analytical tool and platform is critical to the success of the integration project.
Big data tools such Hadoop/HIVE/HUE and cloud technologies are used to bring together various data source together into a unified platform for the end-user.
Perficient provides a full service IT roadmap to transform your healthcare organization and achieve both an increased personalization of care via the same path: digital transformation in healthcare. New health system technology, such as moving beyond basic EMR (Electronic Medical Record) infrastructure to full patient-focused CRM (Customer Relationship Management) solutions, has enabled healthcare organizations to integrate extended care teams, enhance patient satisfaction and improve the efficiency of care.
We connect human insight with digital capabilities in order to transform the consumer experience and deliver significant business value.
For more information on how Perficient can help you with your Healthcare IT integration and analytical needs, please see https://www.perficient.com/industries/healthcare/strategy-and-advisory-service
]]>