In Tableau Prep, usually, there are two approaches to refresh a data flow namely full refresh and incremental refresh. Refreshing the data set means either the original data would append new rows or alter the data.
Full refresh will replace the existing data with new data.
However, Incremental Refresh uses only the additional records to update the existing input dataset which eventually reduces the flow runtime, improves database performance, and uses fewer system resources because incremental refresh would only process the data that have not previously been processed by Tableau Prep.
This type of refresh is extremely useful whenever we have data that updates daily (such as transactional data), or whose data size keeps on increasing gradually.
Steps to set up Incremental refresh in Tableau Prep:
Step 1: There is an additional option in the input tool under the settings tab to enable incremental refresh.
Step 2: As we Enable the set up incremental refresh option, this will result in a further set of options that needs to be configured to avoid further errors. They are-
Input Field: The input field should be selected such that the data field will show there is new data in the Input data source.
(Note- If there are multiple inputs in the flow, the incremental refresh needs to be configured separately for every input)
Output: In the output field we need to specify in which output it can find a field to compare the selected input to identify whether a row is new or not. This field needs to be specified because a flow can result in multiple outputs.
(In this example, there is a single output and hence there is no need to rename it from default.)
Output Field: The output field is nothing but the name of the matching field which is used to compare the last processed values in the flow output with the values in the input to determine the new rows. This is due to the data fields may be renamed as we prepare the flow.
In this example, the emp_id field hasn`t been renamed throughout the flow and hence we have the same input field and output field.
Step 3: In this step, we need to set write options to specify how new data would be added to our tables.
a. From the flow pane, select the output that we want to configure.
b. Under the Incremental Refresh tab, we have three options
- Create table: If the table already exists, it will be deleted and would be overwritten with only new rows.
- Append to the table: This will add data to an existing table. If the table doesn`t exist, it`s created when the flow is first to run.
- Replace data: This will maintain the schema structure of the table as is, but the existing data would be completely replaced by new data.
Step 4: While running the output, we can choose whether we need an incremental refresh or full refresh
- From the Output pane, click the drop-down option on the Run Flow button.
- From the Flow pane, click the drop-down on the Run button next to the Output step.
Hi Snehal,
I have a scenario for the incremental refresh.
1st run: i did a full refresh and 100 records have been loaded to the target.
2nd run: Ran the incremental refresh and As there was no delta data, the process ran successfully with zero records.
3rd run: The process failed as the “last processed value was blank or null”. And its asking me to run the full refresh again.
can you help me how to overcome this issue. Thanks in advance.
Regards,
Mahesh
Hi Snehal, Is incremental refresh always manual ? Or we can set up extract schedule ?