data warehouse Articles / Blogs / Perficient

Discoveries from Q&A with Enterprise Data using GenAI for Oracle Autonomous Database

Mazen Manasseh — Tue, 09 Apr 2024 12:55:53 +0000

Natural language AI has proliferated into many of today’s applications and platforms. One of the high in demand use cases is the ability to find quick answers to questions about what’s hidden within organizational data, such as operational, financial, or other enterprise type data. Therefore leveraging the latest advancements in the GenAI space together with enterprise data warehouses has valuable benefits. The SelectAI feature of the Oracle Autonomous Database (ADB) achieves this outcome. It eliminates the complexity of leveraging various Large Language AI Models (LLMs) from within the database itself. From an end user perspective, SelectAI is as easy as asking the question, without having to worry about GenAI prompt generation, data modeling, or LLM fine tuning.

In this post, I will summarize my findings on implementing ADB SelectAI and share some tips on what worked best and what to look out for when planning your implementation.

Several GenAI Models: Which One to Use?

What I like about SelectAI is that switching the underlying GenAI model is simple. This is important over time to stay up to date and take advantage of the latest and greatest of what LLMs have to offer and at the most suitable cost. We can also set up SelectAI with multiple LLMs simultaneously, for example, to cater to different user groups, at varying levels of service. In the future, there will always be a better LLM model to use, but at this time these findings are based on trials of the Oracle Cloud Infrastructure (OCI) shared Cohere Command model, the OpenAI GPT-3.5-Turbo model and the OpenAI GPT-4 model. Here is a summary of how each worked out:

Cohere Command:

While this model worked well for simple questions that are well phrased with nouns that relate to the metadata, it didn’t work well when the question got more complex. It didn’t give a wrong answer, as much as it returned a message as follows apologizing for the inability to generate an answer: “Sorry, unfortunately a valid SELECT statement could not be generated…”. At the time of this writing, the Command R+ model had just been introduced and became generally available, but it wasn’t attempted as part of this exercise. It remains to be found out how effective the newer R+ model is in comparison to the other ones.

OpenAI GPT-4:

This LLM worked a lot better than Cohere Command in that it answered all the questions that Command couldn’t. However, it comes at a higher cost.

OpenAI GPT-3.5-Turbo:

This one is my favorite so far as it also answered all the questions that Command couldn’t and is roughly 50 times less expensive than GPT-4. It is also a lot faster to respond compared to the OCI shared Cohere Command. There were some differences though at times in how the answers are presented. Below is an example of what I mean:

Sample Question: Compare sales for package size P between the Direct and Indirect Channels

Responses Generated by Each Model:

Cohere command: Sorry, unfortunately, a valid SELECT statement could not be generated
OpenAI gpt-3.5-turbo: This was able to generate a good result set based on the following query, but the results weren’t automatically grouped in a concise manner.

SELECT s.PROD_ID, s.AMOUNT_SOLD, s.QUANTITY_SOLD, s.CHANNEL_ID, p.PROD_PACK_SIZE, c.CHANNEL_CLASS
FROM ADW_USER.SALES_V s
JOIN ADW_USER.CHANNELS_V c ON s.CHANNEL_ID = c.CHANNEL_ID
JOIN ADW_USER.PRODUCTS_V p ON s.PROD_ID = p.PROD_ID
WHERE p.PROD_PACK_SIZE = 'P' AND c.CHANNEL_CLASS IN ('Direct', 'Indirect');

OpenAI gpt-4: This provided the best answer, and the results were most suitable with the question as it grouped by Channel Class to easily compare sales.

SELECT c.CHANNEL_CLASS AS Channel_Class, SUM(s.AMOUNT_SOLD) AS Total_Sales 
FROM ADW_USER.SALES_V s  
JOIN ADW_USER.PRODUCTS_V p ON s.PROD_ID = p.PROD_ID 
JOIN ADW_USER.CHANNELS_V c ON s.CHANNEL_ID = c.CHANNEL_ID 
WHERE  p.PROD_PACK_SIZE = 'P'AND c.CHANNEL_CLASS IN ('Direct', 'Indirect') 
GROUP BY c.CHANNEL_CLASS;

Despite this difference, most of the answers were similar between GPT-4 and GPT-3.5-Turbo and that’s why I recommend to start with the 3.5-Turbo and experiment with your schemas at minimal cost.

Another great aspect of the OpenAI GPT models is that they support conversational type questions to follow up in a thread-like manner. So, after I ask for total sales by region, I can do a follow up question in the same conversation and say for example, “keep only Americas”. The query gets updated to restrict previous results to my new request.

Tips on Preparing the Schema for GenAI Questions

No matter how highly intelligent you pick of an LLM model, the experience of using GenAI won’t be pleasant unless the database schemas are well-prepared for natural language. Thanks to the Autonomous Database SelectAI, we don’t have to worry about the metadata every time we ask a question. It is an upfront setup that is done and applies to all questions. Here are some schema prep tips that make a big difference in the overall data Q&A experience.

Selective Schema Objects:

Limit SelectAI to operate on the most relevant set of tables/views in your ADB. For example exclude any intermediate, temporary, or irrelevant tables and enable SelectAI on only the reporting-ready set of objects. This is important as SelectAI automatically generates the prompt with the schema information to send over to the LLM together with the question. Sending a metadata that excludes any unnecessary database objects, narrows down the focus for the LLM as it generates an answer.

Table/View Joins:

To result in correct joins between tables, name the join columns with the same name. For example, SALES.CHANNEL_ID = CHANNELS.CHANNEL_ID. Foreign key constraints and primary keys constraints don’t affect how tables are joined, at least at the time of writing this post. So we will need to rely on consistently naming join columns in the databases objects.

Create Database Views:

Creating database views are very useful for SelectAI in several ways.

Views allow us to reference tables in other schemas so we can setup SelectAI on one schema that references objects in several other schemas.
We can easily rename columns with a view to make them more meaningful for natural language processing.
When creating a view, we can exclude unnecessary columns that don’t add value to SelectAI and limit the size of the LLM prompt at the same time.
Rename columns in views so the joins are on identical column names.

Comments:

Adding comments makes a huge difference in how much more effective SelectAI is. Here are some tips on what to do with comments:

Comment on table/view level: Describe what type of information a table or view contains: For example, a view called “Demographics” may have a comment as follows: “Contains demographic information about customer education, household size, occupation, and years of residency”
Comment on column level: For security purposes SelectAI (in a non-Narrate mode) doesn’t send data over to the GenAI model. Only metadata is sent over. That means if a user asks a question about a specific data value, the LLM doesn’t have visibility where that exists in the database. To enhance the user experience where sending some data values to the LLM is not a security concern, include the important data values in the comment. This enables the LLM to know where that data is. For example, following is a comment on a column called COUNTRY_REGION: “region. some values are Asia, Africa, Oceania, Middle East, Europe, Americas”. Or for a channel column, a comment like the following can be useful by including channel values: “channel description. For example, tele sales, internet, catalog, partners”

Explain certain data values: Sometimes data values are coded and require translation. Following is an example of when this can be helpful: comment on column Products.VALID_FLAG: “indicates if a product is active. the value is A for active”

Is There a Better Way of Asking a Question?

While the aforementioned guidance is tailored for the upfront administrative setup of SelectAI, here are some tips for the SelectAI end user.

Use double quotations for data values consisting of multiple words: This is useful for example when we want to filter data on particular values such as a customer or product name. The quotation marks also help pass the right case sensitivity of a word. For example: what are the total sales for “Tele Sales” in “New York City”.
Add the phrase “case insensitive” at the end of your question to help find an answer. For example: “calculate sales for the partners channel case insensitive”. The SQL query condition generated in this case is: WHERE UPPER(c.CHANNEL_CLASS) = ‘PARTNERS’, which simply means ignore case sensitivity when looking for information about partners.
If the results are filtered, add a statement like the following at the end of the question to avoid unnecessary filters: “Don’t apply any filter condition”. This was more applicable with the cohere command model than the OpenAI models.
Starting the question with “query” instead of “what is”, for instance, worked better with the cohere command model.
Be field specific when possible: Instead of just asking for information by customer or by product, be more field specific such as “customer name” or “product category”.
Add additional instructions to your question: You can follow the main question with specific requests for example to filter or return the information. Here is an example of how this can be done:

“what is the average total sales by customer name in northern america grouped by customer. Only consider Direct sales and customers with over 3 years of residency and in farming. case insensitive.”

Results are returned based on the following automatically generates SQL query:

SELECT c.CUST_FIRST_NAME || ' ' || c.CUST_LAST_NAME AS CUSTOMER_NAME, AVG(s.AMOUNT_SOLD)
FROM ADW_USER.SALES_V s JOIN ADW_USER.CUSTOMERS_V c ON s.CUST_ID = c.CUST_ID
JOIN ADW_USER.COUNTRIES_V co ON c.COUNTRY_ID = co.COUNTRY_ID
JOIN ADW_USER.CHANNELS_V ch ON s.CHANNEL_ID = ch.CHANNEL_ID
JOIN ADW_USER.CUSTOMER_DEMOGRAPHICS_V cd ON c.CUST_ID = cd.CUST_ID
WHERE UPPER(co.COUNTRY_SUBREGION) = 'NORTHERN AMERICA'
AND UPPER(ch.CHANNEL_CLASS) = 'DIRECT'
AND cd.YRS_RESIDENCE > 3
AND UPPER(cd.OCCUPATION) = 'FARMING'
GROUP BY c.CUST_FIRST_NAME, c.CUST_LAST_NAME;

It’s impressive to see how GenAI can take the burden off the business in finding quick and timely answers to questions that may come up throughout the day, all without data security risks. Contact us if you’re looking to unlock the power of GenAI for your enterprise data.

An Introduction to ETL Testing

Reena Bhade — Wed, 23 Aug 2023 05:13:30 +0000

ETL testing is a type of testing technique that requires human participation in order to test the extraction, transformation, and loading of data as it is transferred from source to target according to the given business requirements.

Take a look at the block below, where an ETL tool is being used to transfer data from Source to Target. Data accuracy and data completeness can be tested via ETL testing.

What Is ETL? (Extract, Transform, Load)

Data is loaded from the source system to the data warehouse using the Extract-Transform-Load (ETL) process, referred to as ETL.

Extraction defines the extraction of data from the sources (The sources can be either from a legacy system, a Database, or through Flat files).

Transformation defines Data that is transformed as part of cleaning, aggregation, or any other data alterations completed in this step of the transformation process.

Loading defines the load of data from the Transformed data into the Target Systems called Destinations (The Destinations can again be either a Legacy system, Database, or flat file).

What is ETL testing?

Data is tested via ETL before being transferred to live data warehouse systems. Reconciliation of products is another name for it. ETL testing differs from database testing in terms of its scope and the procedures used to conduct the test. When data is loaded from a source to a destination after transformation, ETL testing is done to ensure the data is accurate. Data that is used between the source and the destination is verified at several points throughout the process.

In order to avoid duplicate records and data loss, ETL testing verifies, validates, and qualifies data. Throughout the ETL process, there are several points where data must be verified.

While testing tester confirms that the data we have extracted, transformed, and loaded has been extracted completely, transferred properly, and loaded into the new system in the correct format.

ETL testing helps to identify and prevent issues with data quality during the ETL process, such as duplicate data or data loss.

Test Scenarios of ETL Testing:

1. Mapping Document Validation

Examining the mapping document for accuracy to make sure all the necessary data has been provided. The most crucial document for the ETL tester to design and construct the ETL jobs is the ETL mapping document, which comprises the source, target, and business rules information.

Example: Consider the following real-world scenario: We receive a source file called “Employee_info” that contains employee information that needs to be put into the target’s EMP_DIM table.

The following table shows the information included in any mapping documents and how mapping documents will look.

Depending on your needs, you can add additional fields.

2. DDL/Metadata Check

Validate the source and target table structure against the corresponding mapping doc. The source data type and target data type should be identical. Length of data type in both the source and target should be equal. Will verify that the data field type and format are specified. Also, validate the name of the column in the table against the mapping doc.

Ex. Check the below table to verify the mentioned point of metadata check.

Source – company_dtls_1

Target – company_dtls_2

3. Data Completeness Validation

Data Completeness will Ensure that all expected data is loaded into the target table. And check for any rejected records and boundary value analysis. Will Compare record counts between the source and target. And will see data should not be truncated in the column of target tables. Also, compare the unique value of key fields between data loaded to WH and source data.

Example:

You have a Source table with five columns and five rows that contain company-related details. You have a Target table with the same five columns. After the successful completion of an ETL, all 5 records of the source table (SQ_company_dtls_1) are loaded into the target table (TGT_company_dtls_2) as shown in the below image. If any Error is encountered while ETL execution, its error code will be displayed in statistics.

4. Constraint Validation

To make sure the key constraints are defined for specific tables as expected.

- Not Null & Null
- Unique
- Primary Key & Foreign Key
- Default value check

5. Data Consistency Check

- The data type and data length for particular attributes may vary in files or tables though the semantic definition is the same.
- Validating the misuse of integrity constraints like Foreign Key

6. Data Correctness

- Data that is misspelled or inaccurately recorded.
- Null, non-unique, or out-of-range data

Why Perform ETL Testing?

Inaccurate data resulting from flaws in the ETL process can lead to data issues in reporting and poor strategic decision-making. According to analyst firm Gartner, bad data costs companies, on average, $14 million annually with some companies costing as much as $100 million.

A consequence of inaccurate data is:

A large fast-food company depends on business intelligence reports to determine how much raw chicken to order every month, by sales region and time of year. If these data are inaccurate, the business may order too much, which could result in millions of dollars in lost sales or useless items.

When do we need ETL Testing?

Here are a few situations where it is essential to use ETL testing:

Following a data integration project.
Following a data migration project.
When the data has been loaded, during the initial setup of a data warehouse.
Following the addition of a new data source to your existing data warehouse.
When migrating data for any reason.
In case there are any alleged problems with how well ETL operations work.
whether any of the source systems or the target system has any alleged problems with the quality of the data

Required Skillset for ETL Tester:

Knowledge of BI, DW, DL, ETL, and data visualization process
Very good experience in analyzing the data and their SQL queries
Knowledge of Python, UNIX scripting
Knowledge of cloud technologies like AWS, Azure, Hadoop, Hive, Spark

Roles and responsibilities of ETL Tester:

To protect the data quality of the company, an ETL tester plays a crucial role.

ETL testing makes sure that all validity checks are met and that all transformation rules are strictly followed while transferring data from diverse sources to the central data warehouse. The main role of an ETL tester includes evaluating the data sources, data extraction, transformation logic application, and data loading in the destination tables. Data reconciliation is used in database testing to acquire pertinent data for analytics and business intelligence. ETL testing is different from data reconciliation. It is used by data warehouse systems.

Responsibilities of an ETL tester:

Understand the SRS document.
Create, design, and execute test cases, test plans, and test harnesses.
Test components of ETL data warehouse.
Execute backend data-driven test.
Identify the problem and provide solutions for potential issues.
Approve requirements and design specifications.
Data transfers and Test flat files.
Constructing SQL queries for various scenarios, such as count tests.
Inform development teams, stakeholders, and other decision-makers of the testing results.
To enhance the ETL testing procedure over time, incorporate new knowledge and best practices.

In general, an ETL tester is the organization’s data quality guardian and ought to participate in all significant debates concerning the data used for business intelligence and other use cases.

Conclusion:

Here we learned what ETL is, what is ETL testing, why we perform ETL testing when we need ETL testing, what skills are required for an ETL tester, and the Role and responsibilities of an ETL tester.

Happy Reading!

ETL Vs ELT Differences

Anshu Sharma — Tue, 04 Jul 2023 12:49:45 +0000

What is ETL?

ETL stands for Extract, Transform, Load. This process is used to integrate data from multiple sources into a single destination, such as a data warehouse. The process involves extracting data from the source systems, transforming it into a format that can be used by the destination system, and then loading it into the destination system. ETL is commonly used in business intelligence and data warehousing projects to consolidate data from various sources and make it available for analysis and reporting.

What is ELT?

ELT stands for Extract, Load, Transform. It is a process similar to ETL but with a different order of operations. In ELT, data is first extracted from source systems and loaded into the destination system, and then transformed into a format that can be used for analysis and reporting. This approach is often used when the destination system has the capability to perform complex transformations and data manipulation. ELT is becoming more popular with the rise of cloud-based data warehouses and big data platforms that can handle large-scale data processing and transformation.

Here’s What Makes these Two Different:

ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform) are two methods of data integration used in data warehousing.

ETL involves extracting data from various sources, transforming it into a format that can be used by the target system, and then loading it into the target system. The transformation process involves cleaning, validating, and enriching the data before it is loaded. ETL is a batch-oriented process that requires a significant amount of computing power and storage space.

Conversely, ELT involves extracting data from various sources and loading it directly into the target system without any transformation. The transformation process is performed after the data has been loaded into the target system. ELT is a more modern approach that takes advantage of the processing power of modern data warehouses and allows for real-time analysis of data.

The main difference between ETL and ELT is the order in which the transformation process is performed. In ETL, transformation is performed before loading, while in ELT, transformation is performed after loading. The choice between ETL and ELT depends on the specific needs of the organization and the characteristics of the data being integrated.

How is ELT Different from ETL and What are its Advantages and Disadvantages.

Advantages of ELT over ETL:

Faster processing: ELT can process data faster than ETL because it eliminates the need for a separate transformation tool.
Lower latency: ELT can provide lower latency in data processing because it can load data directly into the data warehouse without the need for intermediate storage.
More efficient use of resources: ELT can make more efficient use of computing resources because it can leverage the processing power of the data warehouse.
Better support for big data: ELT is better suited for big data environments because it can handle large volumes of data without the need for additional infrastructure.

Disadvantages of ELT over ETL:

Dependency on data warehouse: ELT processes are dependent on the availability and compatibility of the data warehouse, which can cause delays or failures in data integration.
Complexity: ELT requires a high level of technical expertise and may be more difficult to implement than ETL.
Data quality issues: ELT can result in data quality issues if not properly designed or executed, leading to inaccuracies or incomplete data in the data warehouse.
Security risks: ELT processes can introduce security risks if sensitive data is not properly protected during extraction, loading, and transformation.

So which approach to choose, ETL or ELT?

ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform) are two approaches to data integration that are widely used in the industry. Both ETL and ELT are used to extract data from multiple sources, transform it into a format that can be used by the target system, and load it into the target system. However, there are some key differences between the two approaches.

ETL (Extract, Transform, Load):

ETL is a traditional approach to data integration that has been used for many years. In this approach, data is first extracted from various sources and then transformed into a format that can be used by the target system. The transformed data is then loaded into the target system. ETL is a batch process that is usually done on a scheduled basis.

The main advantage of ETL is that it allows for complex transformations to be performed on the data before it is loaded into the target system. This means that data can be cleaned, filtered, and enriched before it is used. ETL also allows for data to be consolidated from multiple sources, which can be useful when data is spread across different systems.

However, ETL can be slow and resource intensive. Because the transformations are performed before the data is loaded into the target system, large amounts of data can take a long time to process. ETL also requires a dedicated server or cluster to perform the transformations.

Example of ETL:

A company wants to integrate data from multiple sources, including sales data from its CRM system and financial data from its accounting software. They use an ETL tool to extract the data, transform it into a common format, and load it into a data warehouse. The ETL process includes cleaning and filtering the data and performing calculations to create new metrics. The transformed data is then used for reporting and analysis.

ELT (Extract, Load, Transform):

ELT is a newer approach to data integration that has become popular in recent years. In this approach, data is first extracted from various sources and then loaded into the target system. Once the data is in the target system it is transformed into a format that can be used by the system.

The main advantage of ELT is that it is faster and more scalable than ETL. Because the transformations are performed after the data is loaded into the target system, large amounts of data can be processed quickly. ELT also requires less hardware than ETL, as the transformations can be performed on the target system itself.

However, ELT is not suitable for complex transformations. Because the transformations are performed after the data is loaded into the target system, there are limitations on what can be done with the data. ELT is also not suitable for consolidating data from multiple sources, as the data must be loaded into the target system before it can be combined.

Example of ELT:

A company wants to migrate its on-premises database to the cloud. They use an ELT tool to extract the data from the on-premises database and load it into the cloud database. Once the data is in the cloud database, they use SQL queries and other tools to transform the data into the desired format. The ELT process is faster and more scalable than ETL, as it does not require a dedicated server or cluster for transformations.

Conclusion:

In conclusion, both ETL and ELT have their advantages and disadvantages. ETL is best suited for situations where complex transformations are required and where data needs to be consolidated from multiple sources. ELT is best suited for situations where speed and scalability are important and where simple transformations are sufficient. Ultimately, the choice between ETL and ELT will depend on the specific needs of the organization and the nature of the data being integrated.

Please share your thoughts and suggestions in the space below, and I’ll do my best to respond to all of them as time allows.

For more such blogs click here

Happy Reading!

Why Implement Incorta Analytics for Oracle Fusion Cloud ERP Reporting?

Mazen Manasseh — Tue, 10 May 2022 20:50:02 +0000

Oracle Cloud ERP offers several built-in tools for reporting. While native reporting tools like Oracle Transactional Business Intelligence (OTBI) and Oracle BI Publisher are well-suited for specific types of operational reporting, they do have limitations when it comes to performing complex and enterprise-wide reporting. It is therefore crucial to complement the Oracle Cloud ERP application with an enterprise reporting solution.

A major consideration to keep in mind is that Oracle Cloud ERP is a SaaS application. Unlike Oracle E-Business Suite (EBS), direct access to the Oracle Cloud ERP database (OLTP) is typically restricted. Therefore, traditional approaches to ERP reporting that may have worked well with EBS, do not fit very well with Oracle Fusion SaaS applications. For example, you may have done EBS reporting with Noetix for IBM Cognos, OBIEE, Discoverer or other legacy BI tools. Or you may have several ETL processes that extracted, transformed, and loaded on-premises ERP data into a data warehouse. However, following a similar approach for Cloud ERP reporting is not ideal. The recommendation is to have the ERP Cloud implementation accompanied by a more innovative reporting methodology that fits well with the modernity of the Cloud ERP application, is scalable to perform adequately, and offers timely time to value when it comes to addressing continuously evolving business needs for analytical insights. In this blog, I will describe how Oracle Cloud ERP is supplemented with Incorta, an innovative data and reporting platform that transcends common challenges of the classical approach of the data warehouse.

What Differentiates Incorta Analytics for Oracle Cloud ERP?

Severals factors come into play when deciding on which type of reporting solution works best with the applications at hand. Here I am presenting Incorta as a very viable option for its capabilities in handling data and reporting features. Out of many reasons why, I am focusing here on the three I believe are most relevant with Oracle Cloud ERP.

Expedited Deployment & Enhancements

Deploying Incorta for Oracle Cloud ERP follows a much faster cycle than implementing traditional data warehouse type deployments. Even after the initial deployment, rolling out additional reporting enhancements on Incorta follows a faster time to value due to several reasons:

- Direct Data Mapping: While conventional data warehouses require extensive data transformation, Incorta leverages data structures out of Oracle Cloud ERP in their original form. Consequently, Incorta replaces ETL processing, star schemas and data transformations, with a Direct Data Mapping technology. With Direct Data Mapping, the Incorta approach maintains source application data models in their original form, with minimal transformation. Consequently, we end up with a one to one mapping to the corresponding data objects and relationships in Oracle Cloud ERP. Traditionally this didn’t work well for reporting due to a significant impact to querying performance. However, the innovation introduced with Incorta Direct Data Mapping enables high performing batch queries on massive amounts of data, without requiring extensive ETL transformation, as was previously the case with a data warehouse. Eliminating the overhead involved in doing extensive data transformation is at the root of why Incorta offers a more expedited path to initially implementing and regularly enhancing Incorta analytics.

- Oracle Cloud Applications Connector: Unlike on-premises ERP applications, direct database access is not available, in a scalable manner, from Oracle Fusion Applications. Doing a reporting solution on Oracle Cloud ERP involves a major undertaking related to the initial setup, scheduling and ongoing refreshes of data extracts of hundreds of data objects typically used for ERP reporting. You may be thinking that using tools like Oracle BI Publisher or OTBI may be a way to go about getting the Cloud ERP data you need for reporting. While such a technique may get you going initially, it’s not a feasible approach to maintain data extracts out of Cloud ERP because it jeopardizes the performance of the Oracle Cloud ERP application itself, the integrity of the reporting data and its completeness, and the ability to scale to cover more data objects for reporting.

The whole data export process is however streamlined and managed from within Incorta. A built-in connector to Oracle Fusion applications allows Incorta to tap into any data object in Oracle Cloud ERP. The connector performs data discovery on Oracle Cloud ERP View Objects (VO), reads the metadata and date available in both Oracle VOs and custom VOs, and loads data into Incorta. The connector adheres to Oracle best practices for exporting Oracle Fusion data in bulk. The connectivity happens through the Oracle Fusion Business Intelligence Cloud Connector (BICC). There is no need to develop the BICC data exports from scratch as the Incorta Blueprint for Oracle Cloud ERP already includes pre-defined BICC offerings for various ERP functional areas (such AP, AR, GL, Fixed Assets, etc.). These offerings are available to import into BICC, with the option of updating with custom View Objects. Managing the data load from Oracle Cloud ERP into Incorta takes place from the Incorta web UI and therefore requires minimal setup on the Oracle Fusion side.

We can schedule multiple pre-configured offerings from the Incorta blueprint, depending on which modules are of interest to enable in Incorta for Oracle Cloud ERP reporting. This matrix provides a list of BICC offerings that get scheduled to support different functional areas of interest.

- Pre-built Dashboards and Data Models for Oracle Cloud ERP: Time to value with Incorta is significantly shorter compared to doing analytics on other platforms because Incorta has a ready-to-use data pipeline, data model and pre-built dashboards specifically for Oracle Cloud ERP. The ready-to-use Cloud ERP blueprint also incorporates business schemas that enable power users to self-serve their needs for creating their own reports. The Incorta Oracle Cloud ERP blueprint includes pre-built dashboards for:
  - Financials: General Ledger, Accounts Payable, Employee Expenses, Accounts Receivable, Fixed Assets, and Projects
  - Supply Chain: Procurement and Spend, Order Management and Inventory
  - Human Capital Management: Workforce, Compensation, Absense and Payroll

In addition, pre-built dashboards include reporting on common business functions such as: Procure to Pay, Order to Cash, Bookings, Billings and Backlog.

High Performing and Scalable to Handle Billions of Rows

If you are familiar with data warehouses and BI solutions, you are probably aware that the performance of a reporting solution is key to its success. And performance here includes both the data layer, whereby data refreshes happen in a timely manner, as well as front-end reporting response times. If the business is unable to get the information required to drive decisions in a timely manner, the reporting platform would have failed its purpose. Therefore, laying a solid foundation for an enterprise-wide reporting solution must have performance and scalability as a key criterion.

What I like about Incorta is that it is not only a data visualization or reporting platform, but it is a scalable data storage and optimized data querying engine as well. With Incorta we don’t need to setup a 3^rd party database (data warehouse) to store the data. Incorta handles the storage and retrieval of data using data maps that offer very quick response times. Previously, with a data warehouse, when a table (like GL journals or sales invoices, for example) starts growing above a few million rows, you would need to consider performance optimization through several techniques like archiving, partitioning, indexing, and even adding several layers of aggregation to enhance reporting performance. All these activities are time consuming and hinders productivity and innovation. These traditional concepts for performance optimization are not needed anymore as Incorta is able to easily handle hundreds of millions and billions of rows without the need to intervene with additional levels of aggregate tables.

Support for Multiple Data Source Applications

It is often the case that analytics encompasses information from multiple applications, not just Oracle Cloud ERP. A couple things to consider in this regard:

- Multiple ERP Applications: The migration of an on-premises ERP application to Oracle Cloud ERP may not necessarily be a single-phased project. The migration process may very well consist of multiple sequential phases, based on different application modules (GL, AP, AR, Projects, Procurement, etc.) or based on staggered migrations for different entities within the same organization. Consequently, it is often the case that the ERP reporting solution needs to simultaneously support reporting from other ERP applications besides Oracle Cloud ERP. A typical use case is to source GL data from Oracle Cloud ERP while sub-ledger data is sourced from an on-premises application like EBS. Another common use case is to combine data for the same ERP module from both Oracle Cloud ERP and EBS. Incorta allows for multiple schemas to be mapped to and loaded from various applications, besides Oracle Cloud ERP. Incorta then handles the union of data sets from multiple schemas to be reported against seamlessly within the same report.
- Cross-functional Reporting: Along the same lines, there is often a need to report on ERP data in conjunction with data external to ERP, such as from a Sales, Planning, Marketing, Service, or other applications. With a rich list of supported connectors and accelerator blueprints for various source applications, Incorta can connect to and establish separate schemas for each of the applications of interest. Data objects loaded into Incorta can then be joined across schemas and mapped appropriately to enable reporting on information from various source systems.

If you’re on your journey to Oracle Cloud ERP and wondering what to do with your legacy data warehouse and reporting platforms, I encourage you to reach out for a consultation on this. The Perficient BI team is highly experienced with ERP projects and has helped many customers with their upgrades and analytics initiatives leveraging a diverse set of technology vendors and platforms.

Seamless Integration of Oracle Fusion Apps Data with OAC Data Replication

Mazen Manasseh — Tue, 21 Sep 2021 13:33:01 +0000

One of the easiest and lowest maintenance approaches to feed Fusion Cloud Apps data into a data warehouse, is with Oracle Analytics Cloud (OAC) Fusion Business Intelligence Cloud Connector (BICC) Data Replication. Data Replication for Fusion Apps is a native feature of OAC Enterprise edition. If you’re migrating your on-premises application, such as E-Business Suite, to Oracle SaaS, you probably already realize that the capability to directly connect to the Oracle SaaS transaction database for data extraction, isn’t generally available, except for limited use with BI Publisher type reporting. However, Fusion BICC offers a robust approach to enable data extraction from Fusion Apps. BICC extracts data from Fusion App view objects into files stored on Oracle Cloud (OCI). OAC’s data replication from BICC facilitates the process of configuring, scheduling and monitoring the whole process of data extraction from BICC into Cloud storage and then importing the same data into a data warehouse. While this doesn’t really offer an ETL-like functionality, it does streamline the end-to-end process of extracting data from Fusion apps into relational table structures in an Oracle database. These target relational tables can then either be directly reported on or transformed for more complex analytics.

Here are, in my opinion, the key advantages of using OAC Data Replication from Fusion Apps:

Oracle Managed: It is a built-in feature of OAC and therefore requires no software install or maintenance. Data Replication jobs are configured, scheduled, and monitored entirely from within an internet browser, within the OAC portal.
Supports Extracts of Custom Fusion BICC Offerings and Custom PVOs: If your Oracle SaaS implementation team has setup custom data sets in Fusion, and therefore ended up with custom View Objects, these may also be exported with OAC Data Replication.
Filter the Data Extraction: The configuration screens of OAC Data Replication allow for setting up filters that are enforced while the Fusion data is extracted. These filters may be useful to avoid pulling in older data, or to segregate organization-specific information into dedicated target databases.
Support for Incremental Loads: The process of setting up an incremental data extract strategy from all the different Fusion data source views is a time-consuming task. However, this is all easy to setup with the configuration screens available in OAC Data Replication.
Handles Deletes: While BICC natively doesn’t automatically take care of identifying which records in a source view got deleted, it has a mechanism to identify the primary keys of the views in their current state. But these keys will then have to be compared to the keys in the data warehouse target table, in order to identify any deleted records and therefore process the deletes. This whole process is automatically performed by OAC data replication by checking a box on the OAC configuration screen for a view object.
Track Historical Changes: Some Fusion PVOs keep track of changes as certain attributes are updated over time. OAC Data Replication offers an option to maintain these changes in the data warehouse as well. This option enables linking fact data to dimensions that behave in a similar manner to traditional slowly changing dimensions (SCDs).
Scheduling of Data Loads: OAC Data Replication allows for scheduling the data extracts and loads from Fusion Apps at various intervals. This may be once a day, but also multiple times throughout the day. In fact, more essential data extracts can be configured to run as frequent as on an hourly basis to offer near real-time reporting, when required on smaller data sets.

While OAC Data Replication for Fusion Apps does offer some great functionality, there are restrictions that may render it unsuitable, depending on how you envision the holistic view of your future state data warehouse. Here are some of the reasons why it may not be adequate:

Lack of Data Transformation: Like the name infers, OAC Data Replication, solely offers an easy way to replicate Fusion SaaS data into a data warehouse. It doesn’t really allow for data transformation prior to the data load. Think of it this way: the result of the replication is a populated staging area of a data warehouse. If you need to apply transformation, another technique needs to be used after the Fusion data is replicated over. For instance, OAC Data Replication itself won’t be able to merge and transform the Fusion sourced data based on information sourced from outside Fusion. To do this, first replicate the Fusion data and then integrate with the non-Fusion data as a separate downstream process.
Restriction to Load into Oracle Databases: OAC data replication from Fusion Apps only loads data into an Oracle Database or an Oracle Autonomous Database. So, if for example, you have a need to get Fusion data into Azure or another non-Oracle database, you will have to follow a two-step process to first replicate into an Oracle DB and then into your final destination. As a result, if your main destination is non-Oracle, you may want to consider one of the other approaches to extracting data from Fusion Apps, as described in my other blog post.

Meet Perficient’s Chief Strategists: Bill Busch

Connor Stieferman — Fri, 11 Oct 2019 13:00:06 +0000

Thrilling our clients with innovation and impact – it’s not just rhetoric. This belief is instrumental for our clients’ success. In 2018 we introduced our Chief Strategists, who provide vision and leadership to help our clients remain competitive. Get to know each of our strategists as they share their unique insights on their areas of expertise.

Big data has significantly impacted today’s leading enterprises “as it helps detect patterns, consumer trends, and enhance decision making.” In fact, the big data and analytics market is estimated to reach $49 billion this year with a CAGR of 11 percent.

However, big data is often too broad and complex for analysis by traditional processing application software. For businesses to get the most out of their data, they must deploy a strategy that transforms their data management and analytics practices.

We recently spoke with Bill Busch, Big Data Chief Strategist, to learn more about creating value with big data and developing a data strategy to achieve meaningful results.

What does your role as a Chief Strategist entail?

Bill Busch: Since joining Perficient, I have acted as an evangelist for big data, machine analytics, resource management, and the business development of those functions. My new role as Chief Strategist is a continuation of those efforts. This role allows me to be a resource for our clients and help them gain value from strategically using their data.

What do you hope to accomplish as a Chief Strategist?

BB: I want to help clients understand how to best use their own data with AI, machine learning, analytics, and cloud to inform their business decisions. To get there, business leaders must transform data collection platforms in a way that addresses the typical mistakes or challenges experienced.

Many of our implementations involve several areas of expertise, which provides an opportunity to collaborate with other Chief Strategists, whether they’re focused in a particular industry or a specific technology.

Collectively, we help clients change processes and train teams to approach their duties differently. By addressing technology, processes, and people, they can take full advantage of [data collection] platforms’ speed and still maintain the quality and visibility of data.

“The breadth and depth of the Chief Strategists’ expertise enables the creation of comprehensive solutions that resolve our clients’ most critical issues.”

Chief Strategists in Action

A recent example of our collaboration involved developing a unified data view using APIs. Typically, application data and API management were separated when companies sought to integrate data. Now, these two ecosystems are merging into one. This presents a challenge to create a single solution that enables application integration and the consolidation of different data sources. To conquer this challenge, we partnered with Erich Roch, Chief Strategist for IT Modernization and Integration, to create comprehensive solutions for our clients.

Strategically Speaking

Why does strategy matter for big data?

BB: Big data relies on a series of systems or platforms to gain meaningful insights, such as cloud data warehousing or cloud data lakes. Developing a strategy [around people and processes working with data] is what ultimately helps businesses align decision making and gain value from their data.

Why is it important for businesses to be strategic with their data?

BB: Data is the heartbeat of any modern organization. Having information readily available to make decisions doesn’t happen quickly or seamlessly. The most productive strategy to achieve speed-to-market value requires funneling your information into a database to generate high-level insights.

Data strategy also doesn’t have to end with the executive level. Businesses enable other departments for success when they make data accessible for broader analytics by following best practices, establishing predefined processes, and creating a strategy to integrate the data.

How does big data affect the ability to remain competitive?

BB: If you don’t have a data strategy in place, then you’re operating reactively and setting up your [business] for displacement. It’s easy for businesses to rely on low-quality data that is readily available to influence their decisions. However, the issue with that approach is the lack of depth of that information. Your enterprise may have the best AI in the world, but those AI models aren’t meaningful if you lack robust data for the technology to analyze.

To shift from a reactive to proactive stance, you should invest in a strategy and a data management infrastructure to manage the information. Investing in both yields deeper insights than surface-level information, provides greater context with existing knowledge, and identifies emerging trends to reveal unknown information.

When helping clients to create a strategy, we identify a scalable, analytical use case that could benefit from a data warehouse. Then, we help our clients establish a process with dedicated people and resources to assess existing data and develop ideas about the data they want to collect. From there, we create a road map to bridge the gap, so our clients are well-positioned to make strategic decisions relatively quickly.

Think Like a Chief Strategist

What trend(s) have you observed that influence how businesses manage big data?

BB: The open-source approach to data processing is new. Companies previously used on-premise data lakes for their processing, and then the trend shifted to cloud-driven data lakes. Now, we’re migrating data away from mainframes and high-dollar, appliance-based platforms to open source, cloud-based platforms.

For example, we’re currently working with a major healthcare payer to migrate its costly claims processing from a data lake mainframe to an open source, cloud-based platform. This project will enable our client to do more with its data – at a faster pace – while reducing costs. I think many companies realize the possibilities and are considering this model and similar solutions. It’s a sign of the times and underscores the need for digital transformation.

However, if your organization isn’t quite ready for data processing in the cloud, data lakes are a great intermediate step, allowing you to consolidate your data for the analytical use case. Using a cost-effective enterprise data warehouse (EDW), you can take advantage of the data lake and use the data for operational processing.

While I’m not saying that we’re moving away from data lakes, I think we’re seeing a need to build out the reliance of data lakes and enable more than one use case. By doing so, we’re architecturally enabling multiple areas of a business to take advantage of data that suits its goals or objectives.

When creating a big data strategy with clients, what are the top considerations?

BB: At the start of any client engagement, we identify the business drivers for a data strategy. Whether the use case is analytical or scientific, the supporting data architecture and data strategy must have some perspective to the [business] need. We also ensure the business need isn’t too tactical so future iterations aren’t limited.

Organizational culture is another major consideration for developing a big data strategy. Enabling self-service within the finance industry is a prime example. Industry regulations impact the level of self-service capabilities that a firm can offer. These limitations can greatly influence a financial organization’s culture, and its reception of a data-focused strategy.

Contrast this scenario to the life sciences industry that’s comprised of people who are hard-wired to interact with data in a more analytical way. Regardless of the industry or the culture, incorporating an organizational change management component helps ease the transition.

Using these insights, I try to establish a vision with clients. Many issues that businesses generally encounter when creating a data strategy and data lake platform isn’t related to the technology. Instead, it’s overcoming the mindset of the people involved. If you start with a universal outlook, develop agile concepts, and deploy lean processes, a data strategy can inform much analysis upfront. Then, the strategy establishes a data pipeline to begin moving information from the source to its destination in a relatively short timeframe.

Learn more about each of our Chief Strategists by following this series.

Data Strategy at Strata Data Conf New York

Meghan Frederick — Wed, 28 Aug 2019 11:30:22 +0000

It’s no secret that data is a massive asset when it comes to making better business decisions. But gaining the valuable insights required to make those decisions requires quality data that you can trust. And to accomplish this you need a data strategy.

Without understanding your business objectives, identifying use cases, knowing how your users access data, and much more then you put yourself in the position of making decisions based on incomplete or incorrect insights.

Next month, leaders in the data industry will meet in New York City for the Strata Data Conference September 23-26 to share insights on how to implement a strong data strategy (as well as current hot topics like AI and machine learning, which need a strong data strategy foundation to build on).

Here are four sessions to attend to learn more about the elements of a quality data strategy.

Data Strategy Sessions at Strata

Foundations for successful data projects
1:30pm-5:00pm, Sep 24 / 1E 10
The enterprise data management space has changed dramatically in recent years, and this has led to new challenges for organizations in creating successful data practices. Presenters, Ted Malaska and Jonathan Seidman, detail guidelines and best practices from planning to implementation based on years of experience working with companies to deliver successful data projects.

Running multidisciplinary big data workloads in the cloud
9:00am-12:30pm, Sep 24 / 1E 14
Moving to the cloud poses challenges from re architecting to data context consistency across workloads that span multiple clusters. Presenters Jason Wang, Tony Wu, and Vinithra Varadharajan explore cloud architecture and its challenges, as well as using Cloudera Altus to build data warehousing and data engineering clusters and run workloads that share metadata between them using Cloudera SDX.

It’s not you; it’s your database: How to unlock the full potential of your operational data (sponsored by MemSQL)
10:20am-10:25am, Sep 25 / 3E
Data is now the world’s most valuable resource, with winners and losers decided every day by how well we collect, analyze, and act on data. However, most companies struggle to unlock the full value of their data, using outdated, outmoded data infrastructure. Presenter Nikita Shamgunov examines how businesses use data, the new demands on data infrastructure, and what you should expect from your tools.

The ugly truth about making analytics actionable (sponsored by SAS)
1:15pm-1:55pm, Sep 25 / 1A 01/02
Companies today are working to adopt data-driven mind-sets, strategies, and cultures. Yet the ugly truth is many still struggle to make analytics actionable. Presenter Diana Shaw outlines a simple, powerful, and automated solution to operationalize all types of analytics at scale. You’ll learn how to put analytics into action while providing model governance and data scalability to drive real results.

Visit Perficient’s Experts in NYC

If you’re attending the Strata Data Conference don’t forget to come visit us! Perficient is proud to be a Premier Exhibitor of the event and we’ll be at booth #1338 in the expo hall. Our experts will be onsite to strategize and showcase our expertise in complex data environments, AI, machine learning, and data strategy.

You can also connect with our team to set up a meeting, even if you’re not attending the conference. We look forward to seeing you.

Top Considerations for Moving Your Data Warehouse to the Cloud

Eric Roch — Tue, 27 Aug 2019 13:02:51 +0000

The blog “5 Ways Your Data Warehouse Is Better In the Cloud” outlined the advantages of running your data warehouse in the cloud. In this blog, we look at the high-level considerations for migrating your data warehouse to the cloud.

Building the Business Case for Migrating Your Data Warehouse

Many companies facing expensive on-premises product renewals are aggressively migrating their data to the cloud. In this case, the cost of migrating to the cloud can be offset by the cost avoidance of the upgrade and migration. It is important to recognize product end-of-life cycles and plan a migration to the cloud accordingly. Insufficient lead time for a migration due to end-of-life dates would likely result in a less than optimum lift-and-shift migration.

Building a business case requires a TCO calculation for each data warehouse option considered. Cost should be gathered for data center/compute expenses, networking, data transfer, storage, administrative, software license/subscription fees, hardware and software maintenance, depreciation, and support. Fault-tolerance and disaster recovery is inherent in the cloud and is expensive to achieve with your own data centers. You should include these costs in the TCO model as well.

Intangible benefits typically associated with the cloud like agility and elasticity are more difficult to quantify. However, you should include these as benefits within the business case.

A recent Perficient client TCO study found the cost of the enterprise data warehouse (EDW) in the cloud to be significantly less than on-premises due to the following factors:

Disaster recovery cost in the cloud was minimal and primarily made up of data storage and data transfer costs.
Separating storage and compute with pay-as-you-go compute minimized overall costs.
The ability to right-size and scale the compute environment minimized initial costs of the program.

Technology Selection for Your Data Warehouse

There are many options to run your EDW in the cloud. Businesses require a strong knowledge of cloud architectures, data lakes, elasticity and storage options, cross-region redundancy, and cloud subscription models to make technology choices. The leading public cloud providers have many data-related services and continue to innovate rapidly. There are also many third party options to choose from.

Your data-related technology selections and the roadmap to implement them must also align to the business strategy and priorities, your overall cloud strategy, and the direction you want to take your BI and AI in the cloud. The ability to support or migrate your existing data movements, ETLs, and security model must also be considered.

Using a partner with cloud platform and data migration experience is often something worth considering. This is especially the case for technology selection and migration planning.

Data Warehouse Migration Style

As previously stated, an EDW product end-of-life upgrade spend could force you to a lift-and-shift migration scenario. If this is the case, it should be considered a tactical stop in the overall data warehouse roadmap. A lift-and-shift migration will not produce an agile EDW or optimize the elastic cloud environment.

Another option is to create a new cloud EDW for a specific set of use cases. Businesses should select these use cases carefully so that they leverage the cloud with all the associated benefits. For example, businesses could build a new EDW in the cloud for a specific business unit or to support a specific data science use case. This more incremental approach would build internal cloud skills and prove the value of data in the cloud to the business and thus build momentum to continue the migration. This approach could provide analysts and data scientists new capabilities such as streaming data, AI/ML, and sandboxes for self-service BI.

Getting Started

The EDW is an attractive cloud use case. Businesses often already have their data in the cloud thanks to their software-as-a-service-based applications or already-migrated applications. For those that don’t have data in the cloud yet, their data will likely be there soon. As a first step, migrating the EDW and BI is lower risk than moving your transactional systems. On top of that, there are also many financial and technical benefits to moving the EDW to the cloud.

Perficient has experience with EDW-to-cloud migrations and partnerships with all leading vendors in the space should you need any help. Our cloud EDW, business intelligence, performance management, and predictive and risk analytics solutions incorporate industry expertise, years of experience, and insights from hundreds of successful implementations.

5 Ways Your Data Warehouse Is Better In the Cloud

Eric Roch — Tue, 23 Jul 2019 13:19:44 +0000

According to RightScale’s eighth annual survey, “Cloud Computing Trends: 2019 State of the Cloud Survey”, “A significant number of public cloud users are now leveraging services beyond just the basic compute, storage, and network services.” The survey says relational database services is the most popular extended cloud service and “data warehouse moved up significantly to the third position.” It’s no surprise that as data volume, velocity, and types have exploded, companies are looking for a more agile and cost effective solutions for their data management and analytics strategies in the cloud.

The following topics outline the advantages of data storage and the enterprise data warehouse (EDW) in the cloud.

Cost Savings

The TCO of a cloud-based EDW is lower than that of on-premises when variables such as redundancy and disaster recovery are included. There are the added benefits of avoiding a large initial capital expense, the ability to try it before you buy it (reducing financial risk), and the ability to process large, one-off analytical workloads with elastic compute and storage and only pay for what is consumed.

Performance and scalability

With elastic compute and storage you never have to worry about a lack of hardware impacting performance. You do however need to pick the right data management tools for the job to ensure requirements are met. There is an expansive set of data services available on the cloud and performance is straight forward to test in the cloud paying for only the resources you use during tests.

Agility and time to value

You can quickly deploy new databases and services without the concern of expensive capital investments or hardware lead times. You can also rapidly try new and innovative tools and approaches in the cloud. For example, you could experiment with streaming data, AI and ML. You could set up sandboxes for user groups and self-service BI. And, in the cloud you have endless capacity for applications like storing sensor data for IoT.

Modern cloud architecture

Elastic compute, on-demand provisioning of infrastructure and global connectivity makes tasks like deploying new databases, federating data, replication and redundancy for disaster recovery easier than on premises. Also, innovation on the cloud is happening more rapidly than in on premises data centers allowing you to experiment with the latest technologies like serverless functions and AI.

Enable new capabilities and data growth

It is easier to experiment in the cloud and handle high volume and high velocity data. You can keep data longer and store more diverse data types and sources. Combining data availability and the modern data services available on the cloud is a great way to add new data capabilities.

Getting started

There were early concerns about cloud security. However, recent surveys show that many believe the cloud to be more secure than legacy systems. The cloud has strong perimeter security, controlled access, and cyber security experts monitoring and auditing security. Moving your data to the cloud is a good time to look again at cloud security policies and architecture.

Data and BI is a great way to get started on the cloud. EDW and BI is lower risk than moving your transactional systems. And there are many financial and technical benefits as outlined above.

Perficient has experience with EDW to cloud migrations. And partnerships with all the leading vendors in the space should you need any help. Our Cloud EDW, business intelligence, performance management, and predictive and risk analytics solutions incorporate industry expertise, years of experience, and insights from hundreds of successful implementations.

EDW in the Cloud TCO

Bill Busch — Thu, 18 Jul 2019 13:24:05 +0000

In 2016, when I did my first in-depth comparison, the resulting TCOs were usually very close. Usually, the OpEx was slightly higher for the cloud TCO versus the on-prem TCO required substantial capital investment.

However, our most recent estimate was eye-opening to our client. We were assessing a green-field implementation for a Data Warehouse at a mid-sized company. Part of our assessment was to compare TCO between the different deployment options, on-prem and cloud. We fully loaded all expenses for both options, including data center expenses, networking, data transfer, storage, administrative, software subscription fees, hardware and software maintenance, depreciation, and support.

The results were staggering. The cloud deployment TCO was over 30% less than the comparable on-prem deployment. Further, the on-prem deployment required a significant capital investment which was not required for the cloud deployment. It should be noted that in the cloud TCO we greatly over-estimated data transfer, processing, and storage costs.

Inspecting the TCO, there were three cloud features that greatly swung the:

Disaster Recovery cost was minimal, primarily data storage and data transfer costs.
Separating storage and compute with pay as you go compute minimized overall costs
Ability to right-size and scale the compute environment minimized initial costs of the program.

In the past, the cloud vs on-prem decision came down to a conversation around the speed of deployment, flexibility, elasticity – that is the normal cloud advantages. Now with the movement toward PaaS and serverless options that charge based only on resources used, the cloud has become the lowest TCO option in most cases.

Oracle Fusion SaaS Security with Oracle Analytics Cloud

Mazen Manasseh — Fri, 28 Dec 2018 22:56:32 +0000

The question that is often asked is: Can we leverage the same security we already have in Oracle Fusion SaaS (which includes users, duties, roles and security policies) to secure data in Oracle Analytics Cloud (OAC – an Oracle PaaS)? The answer is Yes. To understand better how this is possible, keep reading. This blog follows my previous 2 blog posts about Integrating Oracle SaaS Data into OAC and Creating OAC Data Replication from Oracle SaaS. While the prior posts describe how to load SaaS data into OAC, this blog focuses on how to make OAC inherit Oracle Fusion SaaS security, and therefore avoid the hassle of manually maintaining security setups in multiple places.

Before delving into the details, it is important to differentiate between securing Oracle SaaS data that is flowing over to OAC directly through a Data Set Connection vs the Oracle SaaS data that is replicated into an OAC Data Warehouse, through any of the data copying techniques (Data Sync, Data Replication, Data Flows, or other ETL means).

1. OAC Data Set Connection against Oracle SaaS: This approach leverages the OAC Oracle Application Connection Adapter. It allows authenticating with either a shared admin user or with an end-user login. Choosing to make end-users login with their own Oracle Fusion App credentials automatically enforces their Fusion App security roles and data policies to any reporting that they do against the Fusion App. Therefore, with a Data Set Connection, no additional configuration is necessary to inherit Fusion App security into OAC, since it all kicks in once an end-user logins in with their Fusion credentials.

2. OAC Data Warehouse Connection: This approach is querying a replica of the Fusion App data that has been copied over to a data warehouse. Accordingly, the replicated data requires that object and data level security controls be defined in OAC. Luckily, while doing this requires a one-time manual configuration, it relies on whatever security role assignments and data policies are setup in the source Fusion App.

The rest of this blog post elaborates on the second type of connection, and how to make OAC inherit Fusion App security against a data warehouse.

I am going to start my explanation by describing how authentication works and then move on to discuss how to setup authorization for both object security as well as data security.

Authentication:

OAC Authentication: Depending on how your OAC instance is provisioned, you may be using either the OAC embedded Weblogic server as an Identity Provider or Oracle Identity Cloud Service (IDCS). IDCS foundation is something you already have as part of your OAC subscription (if you have Universal Credits), and you are highly encouraged to start using it, if you haven’t already. You will need to use IDCS as the Identity Provider for OAC to establish Single Sign-On (SSO) with your Fusion App. IDCS is where users, roles, and role assignments are defined. In addition, IDCS serves as a common Identity Provider across multiple Oracle Cloud services, as well as non-Oracle Cloud and on-prem systems that may need to be integrated for user federation and SSO. For the purpose of this blog, the main idea is to enable IDCS to inherit users, roles and role assignments from your Fusion App so they can be shared with OAC.
Fusion App Authentication: Ideally IDCS is configured as an Identity Provider for the Oracle Fusion App as well. However, there is a good possibility that this is not the case. Many of the Fusion Cloud App subscribers didn’t have IDCS at the time they provisioned their Oracle SaaS apps. Therefore, they ended up using the built-in Fusion App Identity Provider to manage Fusion user accounts. If this sounds familar, there is no need to worry. Below I will elaborate on how the inheritance of security setups from Fusion Apps to OAC is possible in both scenarios.

Authorization: There are 2 different levels of authorizations that need to be configured: Object Level and Data Level Security.

Object Level Security: This defines what Catalog objects (dashboards, reports, data visualization projects, etc…) and what subject areas and data models an OAC user has access to, and with what type of permission (such as read-only or editable). To seamlessly make OAC objects secured per Fusion App security, we first identify which Fusion App roles we want to use for the purpose of setting up OAC object permissions. For example, if the Fusion App is HCM, you may want to inherit the HR Specialist, HR Analyst and Payroll Manager roles. Users who have these roles in Fusion will automatically be granted access to corresponding objects in OAC. Such a configuration is a great time saver from a maintenance perspective on the analytics side. Making OAC inherit Fusion App roles and role assignments relies on making IDCS serve as a bridge between Fusion Apps and OAC. This integration looks a little different depending on whether you are using IDCS or the Fusion App as the Identity Provider for the Fusion App. Here is how things work in both of these scenarios:
- Scenario 1: IDCS is an Identity Provider to OAC only while Fusion App is using its built-in Identify Provider for user management. With this scenario, IDCS is configured to act as a Service Provider for the Fusion App (in other words, Fusion App is the Identity Provider for IDCS.) Passwords continue to be stored and maintained in the Fusion App. Users, roles, and role to user assignments will all be defined in the Fusion App and then synchronized over to IDCS. New creations, updates and inactivation of Fusion App users flow through automatically into IDCS and OAC. This automatic synchronization from Fusion Apps to IDCS happens through Oracle Enterprise Scheduler (ESS) jobs. More details about setting up the synchronizations are available in this oracle doc.
- Scenario 2: IDCS as Identity Provider to both OAC and Fusion App. In this case user accounts and their passwords are stored and maintained in IDCS. Users may be defined in either IDCS or the Fusion App and synced to the other side. However, Roles and Role Assignments are always defined in the Fusion App, as usual, and synchronized over to IDCS as in Scenario 1.In a nutshell, whether it’s the first or second scenario, Fusion App administrators continue maintaining security in the same way they do prior to activating OAC. There is no overhead required from an ongoing maintenance perspective on their part.
Data Level Security: This is what tells OAC which user has access to what subsets of the data. For example, restrict access to information based on Position, Supervisor Hierarchy, or an HCM Payroll list. Like with securing OAC objects, it is highly advisable to tie OAC data level security to the Fusion App Data Security Policies. Invest the time upfront to make a one-time setup and avoid the hassle of dual and complicated maintanance. You would need to first identify the data security objects to secure out of the Fusion app (such as by location, or by Business Unit). Fusion Data Roles combine a user’s job with the data a user accesses (for example, a Country level HR Specialist). The Data Roles are defined in Fusion App security profiles. So we need to make IDCS, and accordingly OAC, inherit the Fusion Data Roles and apply security filters on such roles in the OAC Data Model. For setting up data security in OAC, we need to be aware of the Fusion Public View Objects (PVOs) that provide the user-permitted security data object identifiers (such as the list of departments a logged in user has access to). Once the Fusion source is identified, we then form extraction SQL to load a Data Warehouse side security metadata table inherited from the Fusion App setup. After the warehouse security table is loaded, we then define OAC Session Variables to query the Fusion App PVOs. (Note that unlike OBIEE on-prem, OAC doesn’t support a direct connection to Fusion SaaS ADF PVOs from the OAC Data Model, hence the security session variable initialization blocks need to be defined against Data Warehouse tables. Refer to this Oracle Doc to see if the OAC Data Model supports direct connection to Oracle Applications in later updates.) When initially applying security filters on the inherited data roles in OAC, we mimic the security policies defined in the Fusion App. Note that securing OAC Application Roles by applying data level security filters to OAC Subject Areas may be done either in the OAC Thin Data Modeler or the OAC Client Repository Administration Tool. For more complex data-level security across several Application Roles, the Client Admin Tool offers a better way of defining such filters.

To conclude, integrating Oracle Fusion SaaS Security into OAC is an essential part of a successful Oracle Analytics implementation. Performing a comprehensive security integration with SaaS that covers the various layers including users, objects and data is crucial. The success of the implementation is determined by how secure corporate data is and how feasible it is to avoid the maintenance overhead that would have been necessary without a well-planned and integrated security solution for Oracle SaaS and PaaS.

Healthcare Information Integration Challenges and Solutions

Steven Vacca — Tue, 14 Aug 2018 17:28:55 +0000

Healthcare IT is ever-changing and Perficient is on the forefront of this change, guiding the industry and those we serve toward a brighter future. We partner with healthcare companies to help people live their lives to their fullest potential today, using best practices and cost saving technologies and processes.

As we look to the future of Healthcare Information Systems, the effectiveness of an organization is measured by four areas; the heart of who we are and do is all about the integration, accuracy, consistency and timeliness of health information.

Healthcare organizations are among the most complex forms of human organization ever attempted to be managed, making transformation a daunting task. Despite the challenges associated with change, organizations need to evolve into a data-driven outcomes improvement organization.

They aggregate tremendous amounts of data – they need to figure out how to use it to drive innovation, boost the quality of care outcomes, and cut costs.

Data Integration Challenges

Besides members and providers, as well as internal/external business partners and vendors, there are a multitude state and federal regulatory/compliance agencies that insist on having our information on a near real-time manner in order to perform their own functions and services. These integration requirements needs are constantly changing.

As an EDI Integration Specialist, I have seen many organizations struggle to constantly keep up with the business needs of their trading partners, state and federal agencies. Often, as our trading partners analyze the information we have sent them, they discover missing data or inconsistencies.

This requires a tedious and painful iterative remediation process to get the missing data, and results in resending massive amounts of historical data or correcting/retro-adjudicating claims. Adjusting and recouping claim payments is always painful for all entities involved, especially providers, with possible penalties or sanctions.

In the last few years, I have worked with several clients on getting their claims information loaded into their state’s All Payer Claims Databases (APCDB) and CMS to get their health claims reimbursed. We struggled to get the complete data set loaded successfully, and to meet the rigorous quality assurance standards.

It required several attempts working with their legacy systems to get the necessary data into the correct format. It required a great deal of coordination, testing and validation. Each state has a different submission format and data requirements, not necessarily an 837 EDI format, including one state that had a 220+ field delimited record format (Rhode Island).

We spent a great amount of time in compliance validation, and each submission required a manual effort. We constantly had to monitor each submission’s file acceptance status, handling original and adjusted claims differently using the previously accepted claim ID. If files were not submitted accurately and on a timely manner, there were significant fines imposed.

Several times we discovered that even though the files were successfully accepted, there were still missing information which need to be resubmitted. To be honest, it was a logistical nightmare.

As we design and develop data integrations, APIs and extracts, we often ‘shortcut’ to deliver data due to competing priorities, quickened project delivery schedules or limited development/testing staff. This leads to not giving our full attention to the complete requirements of the client/trading partners.

Companion guides and documentation are vague and say ‘send if known’, realizing several years later that these ‘shortcuts’ will be found out and possibly leading to penalties and corrective action plans. Sometimes legacy system and technical limitations lead to not having the complete record set that is required.

Limitations of electronic health record (EHR) system combined with variable levels of expertise in outcomes improvement impede the health system’s ability to transform.

In many healthcare organizations, information technology (IT) teams—including data architects and data analysts—and quality and clinical teams work in silos. IT provides the technologies, designs and delivers reports, without a clear understanding of the needs of the quality and clinical teams.

This can sometimes turn into a finger pointing exercise. Quality and clinical teams claim IT is not delivering the data they need to succeed, while IT insists that others are not clearly articulating what they need. It takes clear-eyed analysis to see that the teams are failing to work together to prioritize their outcomes improvement initiatives and drive sustainable outcomes.

How Can Health Care/System Redesign Be Put Into Action?

At Perficient, we can provide a comprehensive picture of your organization’s information needs and provide you with a path to implementing complex system redesigns and simplify integrations. Putting health care redesign into action can be done in the following four general phases:

1. Getting started. The most important part of building a skyscraper is looking at the requirements, developing a blueprint and building a robust foundation. The first phase involves devising a strategic plan and assembling a leadership team to focus on quality improvement efforts. The team should include senior leaders, clinical champions (clinicians who promote the redesign), and administrative leaders. We need to develop a long-term strategy that sunsets legacy systems, consolidates business functions, build synergies between departments and aggregates data into a central repository. High-level needs assessments are performed, scope is defined to limit effort, and a change management process is created to assist in project management. A business governance committee determines what and when business decisions are implemented. Technical/architectural review committee approves the overall design and data governance of systems, interfaces and integrations of enterprise systems.

2. Review the complete electronic dataset. That includes building a corporate data dictionary (including pricing/benefits, membership, providers, claims, utilization, brokers, authorizations/referrals, reference data and code sets, etc.) and set priorities for improvement. The second phase involves gathering data to help inform the priorities for improvement. Once data requirements are gathered, performance measures such as NCQA/HEDIS that represent the major clinical, business, satisfaction, and operations goals for the practice can be identified. Corporate reporting and process needs are critical at this phase to look to ensure compliance and meeting internal and external customers’ requirements. The creation of dashboards and user reports that are easy to manage provide the right information at the right time can make the difference of cost savings and effective management throughout the organization. Using these dashboards allow users to keep an eye on the overall health and utilization of the services that they provide to their members.

One of the most helpful EDI integration practices I have found is to perform a source to target gap analysis between core claims/membership systems, my inbound/outbound EDI staging database, and the EDIFEC/GENTRAN mapping logic which translates the data to the outbound and from the inbound x12 EDI 837 Claims and 834 Membership enrollment files. This document also identifies any transformations, conversions or lookups that are needed from propriety values to HIPAA Standard values. By looking at every EDI Loop/Segment/Element and mapping it all the way through, I was able to identity data fields that were not being sent or being sent incorrectly. I give this mapping document as part of my technical specification documents to my EDI developers, which I customize for specific trading partners while I was reviewing the vendor’s companion guides.

3. Redesign care and business systems. The third phase involves organizing the care team around their roles, responsibilities, and workflows. The care team offers ideas for improvement and evaluates the effects of changes made. Determining how an enterprise integrates and uses often disparate systems is critical to determine timely, complete and accurate data/process flow. The design, creation and use of APIs and messaging technologies assist in getting information extracted, transformed and loaded (ETL) is critical, especially if information is to be used real-time web-based portals. Evaluation of easy to use yet robust batch process ETL tools, such as Informatica, become the cornerstone of any data integration project. Healthcare organization relay upon reporting tools to evaluate, investigate and reconcile information, especially with their financial and clinical systems. Imaging, workflow management and correspondence generation systems are used to create and manage the communications.

4. Continuously improve performance and maintain changes. The fourth phase includes ongoing review of clinical and financial integration outcomes and making adjustments for continued improvement. As we are looking to the future, we need to look at the IT architecture and its ability to expand with the ever-changing technology and needed capability models. Perficient is a preferred partner with IBM, Oracle and Microsoft with extensive experience for digital and cloud based implementations. Using these technologies gives our clients the ability to expand their systems, application servers to be spun up on demand based on need and growth, allow for failover, allow for redundancy, distributed and global databases to be employed, virtualization of software and upgrades be made while being transparent to the end users.

Perficient’s health information technology (IT) initiative for the integration of health information technology (IT) and care management includes a variety of electronic methods that are used to manage information about people’s health and health care, for both individual patients and groups of patients. The use of health IT can improve the quality of care, even as it makes health care more cost-effective.

Bringing in an Analytics/Reporting Platform

Implementing an enterprise data warehouse (EDW) or a data lake/analytic platform (DLAP) results in the standardization of terminology and measures across the organization and provides the ability to easily visualize performance. These critical steps allow for the collection and analysis of information organization-wide.

The EDW/DLAP aggregates data from a wide variety of sources, including clinical, financial, supply chain, patient satisfaction, and other operational data sources (ODS) and data marts.

It provides broad access to data across platforms, including the CEO and other operational leaders, department heads, clinicians, and front line leaders. When faced with a problem or question that requires information, clinicians and leaders don’t have to request a report and wait days or weeks for data analysts to build it.

The analytics platform provides clinicians and leaders the ability to visualize data in near-real time, and to explore the problem and population of interest. This direct access increases the speed and scale with which we achieve improvement. Obtaining data required to understand current performance no longer takes weeks or even months.

Application simplification takes the confusion as to the consistency and the accuracy of data within an organization. Per member/Per Month (PMPM) reporting is delivered in a standard format throughout, regardless of line of business.

The analytics platform delivers performance data used to inform organizational and clinician decision-making, evaluate the effectiveness of performance improvement initiatives, and increasingly, predict which patients are at greatest risk for an adverse outcome, enabling clinicians to mobilize resources around the patient to prevent this occurrence.

An analytics platform is incredibly powerful and provides employees and customers with the ability to easily visualize its performance, setting the stage for data-driven outcomes improvement. However, healthcare providers and payers know that tools and technology alone don’t lead to improvement.

To be effective, clinicians, IT, and Quality Assurance have to partner together to identify best practices and design systems to adopt them by building the practices into everyday workflows. Picking the right reporting and analytical tool and platform is critical to the success of the integration project.

Big data tools such Hadoop/HIVE/HUE and cloud technologies are used to bring together various data source together into a unified platform for the end-user.

Roadmap to Transformation

Perficient provides a full service IT roadmap to transform your healthcare organization and achieve both an increased personalization of care via the same path: digital transformation in healthcare. New health system technology, such as moving beyond basic EMR (Electronic Medical Record) infrastructure to full patient-focused CRM (Customer Relationship Management) solutions, has enabled healthcare organizations to integrate extended care teams, enhance patient satisfaction and improve the efficiency of care.

We connect human insight with digital capabilities in order to transform the consumer experience and deliver significant business value.

For more information on how Perficient can help you with your Healthcare IT integration and analytical needs, please see https://www.perficient.com/industries/healthcare/strategy-and-advisory-service