In a few days Oracle will release Data Sync, a utility that facilitates the process of loading data from various on-premise sources into BI Cloud Services (BICS). The Data Sync utility comes as an addition to several other already available means of loading data into BICS: data load wizard to upload data from files, SQL Developer cart deployment feature, and RESTful APIs. What is special about Data Sync is that it makes the effort of data loading more manageable from a capability and scalability perspective (full loads vs incremental loads of many tables), and its scheduling, notification and automation aspects as well. This approach makes data loading a lot closer to conventional ETL methods. And if you ever worked with DAC before to run execution plans of Informatica mappings, you will find that Data Sync follows a similar methodology of defining connections, tasks and jobs. However, instead of referencing Informatica mappings, as is the case with DAC, Data Sync itself handles the mapping of source columns to target columns. And it supports SQL Queries so your opportunities for doing data transformations are endless. In this blog I present the key features of Data Sync.
The whole Data Sync software sits in one place. There is no server/client components to it. So you basically extract the zipped file you download from Oracle onto a machine on your network that can connect to the different data sources you wish to source data from. Once you unzip the file, you configure the location of your java home and that’s it! You are ready to launch Data Sync and start moving data.
There are 3 main steps to follow when designing data loads in Data Sync. Below the item menu there are 3 corresponding tabs to configure: Connections, Projects and Jobs.
Connections: Under Connections is where you define connections to both on-premise databases as well as Oracle BI Cloud Databases. By default there is a connection called TARGET, which you map to one of your BICS instances. (Note that when configuring the BICS connection, use the BI url and not the cloud database url). Though the default name of the out-of-the-box connection is TARGET, when defining these connections we are not really specifying which connection serves as a source and which is a target. All we do here is setup the connections, test that the connections work and have them ready to be used by projects and jobs.
Project: A project defines data mappings and transformations between sources and target tables. A source may be a data file, a database table or a SQL Query that joins multiple tables. If you need to do transformations on the data you can use the “Data From SQL” option and embed all the transformation logic in a query. The “Target Table” here doesn’t mean it’s the target in any one particular database. It’s just a placeholder structure of where the data will end up and in what columns. (As you will see on the Jobs tab, the target table can be assigned to any connection that you would like to use as a target database.)
The Future of Big Data
With some guidance, you can craft a data platform that is right for your organization’s needs and gets the most return from your data capital.
One important feature that you configure in Projects is the Load Strategy. This is where you define whether the data load for a particular target table is always a full load or incremental. As you see in the dialogue, you get several options to define how the load happens.
If you chose to “Update table”, you get more things to setup when you click OK. First you select the column(s) that make up the user key of the target table. In my example, CUST_ID is the user key, which means the update strategy matches new data on CUST_ID in order to determine whether the new records need to be inserted or updated in the existing target table. On the Filters tab of the same dialogue box, you specify which date column from the source table/SQL query you would like to use for detecting incremental records (these are records that have been newly added/updated in the source since the last time the data load took place from that source).
Jobs: A job is where you relate projects to source and target connections. By default you get one job that loads data into the default connection called: TARGET. However, you can create a new job and then override the default source and target connections with other connections that you have already defined under the Connections tab.
For example, in the New Job windows below, I kept the data sources the same but the TARGET connection (which I configured to be my BICS Test instance) is overridden with a connection to my BICS Production instance. So when the new job runs, data is still sourced from the same databases but instead of being loaded into my BICS Test, it gets loaded into BICS Prod. You may also chose to move data from on-premise databases to other on-premise databases using the same approach. One thing I tried and which doesn’t seem to be currently working with Data Sync is sourcing data from a BICS Database to load into an on-premise database. (If you are looking to copy data from a BICS Cloud Schema database to an on-premise database, I explain how to do that in an earlier blog: Oracle BI Cloud Service: How to migrate from Test to Production?)
Besides defining jobs, we can also schedule jobs and monitor their execution under the Jobs tab. As shown in the screenshot below, the latest job that ran contained one task that successfully loaded 2 rows into the BICS Prod target database.
One last thing: Now suppose you’ve been running data loads incrementally but then decide you need to refresh all the data in the cloud and get a fresh start. This requires a full load run. To do that we don’t need to redefine the Load Strategy which is already defined to work incrementally in the project mappings. All we need to do is go to Tools and then select “Re-extract and load all data”. This clears all the Refresh Dates on the tables. In other words, we are telling Data Sync to forget about when the last data was extracted/loaded for all the tables and just run everything as if it was the very first run.
Alternatively, if you chose to do a full load on selective tables only, as opposed to all tables, you can navigate to these tables in your source and target connections (see the Refresh Dates tab below) and clear the corresponding Refresh Date column. When you run the job the next time, the tables that have their Refresh Date cleared will run in full load mode whereas the other tables continue to run incrementally as usual.
Hello!
Congratulations on the article, helped me a lot.
Thank you.
Regards,
Carina.
Hi, i have problem whit a MS SQL connection i’m sure i have all the field correctly filled but the datasync send me this error “Failure connecting to “NAME”!
Error de inicio de sesión del usuario ‘InteligenciaNegocio’. ClientConnectionId:f34527f9-d2be-4cbd-9023-2c8ac431e459″,
anny idea why this?, thanks.
by the way, i try to configure datasync whit a x86 archiquetcure jdk and dont run, you have to do it whit a x64 jdk, bye
Daniel,
For connecting to SQL server I suggest downloading the appropriate driver .jar file into the DataSync lib folder. You should be able to download from: https://www.microsoft.com/en-us/download/details.aspx?displaylang=en&id=11774
im ok now, it was the configuration on the port in sql configuration. 🙂
What version of Datasync did you use ?
I looked at V1.1 and V1.2 but I don’t see the “user keys” or the “filters” tab in the “incremental settings popup window.
All I see are list of columns on the left. I did select “update table” load strategy.
The Filters tab won’t show up in Data Sync when selecting Update Table if the source table doesn’t contain a date column. A date column is required in the source table to allow selecting a filter for incremental selection.
Hi,
Do you know if there are any to schedule job on Application Express after load data thru data sync?
I tried DBMS_SCHEDULER but it’s not recognized by BICS Database…
Thanks in advanced.
Hi Cezar,
I doubt issuing a function call from DataSync directly to the BICS DB is going to work. As a workaround, you can create a simple task in DataSync that populates one record in a target table in BICS. And then create a trigger in the BICS Database on that same table so whenever a new record is inserted, you can do whatever logic you need by calling a plsql procedure for example within the trigger. I have another post of how I used triggers to issue deletes on BICS to have an idea:
https://blogs.perficient.com/2015/07/29/bi-cloud-service-how-to-automate-delete-jobs-with-datasync/
Regards,
Mazen
Hi,
thank you for answer me.
I found the solution at: https://blogs.oracle.com/fadevrel/entry/java_cloud_service_scheduling_cloud
It’s the same as DBMS_SCHEDULLER… In fact I’m using it to perform ETL at BICS database once I can’t allowed perform ETL at DataSync Source. It’s working really fine.
Hi Mazen,
I have a On-Premise oracle Database and BICS with Schema Service…When i am trying to Connect I am having issues..jsut wanted to Know does the DataSync tool support a Schema Srvice or do we need to have DBaas?
regards,
SS
You should be able to connect from DataSync to the BICS Schema Database as a target. You won’t be able to read data from the cloud schema database but you can definitely load data into it using DataSync. Make sure that the url you are using is the BICS url not the database url.
For example: https://[bics service name]-[domain name].analytics.[data center for example us2].oraclecloud.com
Other than the url, make sure the Connection Type is: Oracle (BICS). And the username you login with needs to have at least the Data Loader access. An administrator login works too.
Regards,
Mazen
Hi Mazen,
did you have any issue when using date columns from datasync to BICS?
All dates are loading in a different timezond from source to BICS…
Any idea why is it happening?
I noticed the same behavior. Typically, I have seen client-server applications store dates in the timezone of the location where the database is located. So if a web-application stores its data in the UK but have users in the US, all dates entered in the web application by US users get converted to UK timezone as they get stored in the database. Upon retrieval of the dates to be displayed on the web application, the application converts from UK timezone of the database to where the user is located at (based on their profile setup). So for example, if the same data entered by a US user is retrieved by a user located in Australia, the dates show up on the web application in the Australian timezone.
I haven’t tested all this with BICS but just to give you an idea of what may be happening.
Regards,
Mazen
How to run multiple jobs in a single transaction using oracle DataSync?
regards@sadashiv
Can you please provide a little more detail on what you are trying to do? Maybe an example can help. I am not sure I understand what you mean by jobs in a single transaction.
Regards,
Mazen
I have two connections in DataSync v2.1 one to an on premise db called “on_premise” and one to database cloud service “Dbaas”. Then I click on “projects”, “relational data”, “data from table”. I see two “records/tasks”.
First task:
connection=on-premise
source table= table_A
target table= table_A
load strategy = Add new, update existing
Second task:
connection=Dbaas
source table=table_B
target table=table_B
load strategy = add new, update existing
Three questions when I run the job:
1. In first task, data will move from and on-premise database to cloud db. True ?
2. In second task, data will move from cloud db. to on-premise database. True ?
3. For both tasks I have different connections. I understand that based on that it knows where the source table resides, how does it know where the target table resides ?
Mohammad,
To answer your questions 1 and 2, you will need to look under Jobs. Select the Job in question and in the lower part of the screen click on Ordered Tasks. Each Task has its corresponding Primary Source and Primary Target specified. So this is what tells you where the target table resides. If you want to change the target, you can create a new Job and as you do that you can switch the connections to map them to what you need them to be. A typical example is when you are first loading data into a test database and then when you finalize your work you want to switch to a Production database. In that case all you have to do is create a new Job and map the Target database to the Production Database. Hope this helps.
-Mazen
Hi,
I have a two source tables i want to join both tables and load data to target. for e.g., i have a A and B table in source and C table in target, in A table 1,2,3 columns are there and in B table 1,2,3,4,5 columns are there and in target c table have 1 column only so i want to move 1 column data from A and B table to C table is it possible in Data Sync.
Yes you can do that by using “Data from Query” in DataSync from under Project -> Relational Data. You can basically write any SQL query in there to extract data from multiple source tables, join them or union them, and select only the columns you need to load into the target table. Better test the SQL query in another client tool to make sure it gives you the expected result and then copy-paste it into DataSync.