Sometimes we need to calculate one column with cumulative value based on other columns value and value from previous row, here is one case
Column BUSINESS_DAY_OF_MONTH should have a cumulative value of business day count in one month.
You might have some sql script to get this value, but here is a simple way in ETL level to calculate this.
Below are detail steps for generating records of year 2012 for this Date_Dimension table
- Use Date_Generation component to generate year 2012 records, then Query_Date_Generation to get all these data for column CAL_DATE
- In Query_Cal
2.1. Calculate EDW_DATE_ID with Julian day or sequence id
2.2. Use function day_in_week to get weekday
2.3. In this node, you could join or look up some HOLIDAY_CALENDAR table to define some days as holiday based on your company policy, see below definition as example
3. In Query_Is_Holiday and Query_Is_Business_Day
3.1 Set IS_BUSINESS_DAY with ‘N’ if day_in_week(CAL_DATE) = 6 or 7, or CAL_DATE in HOLIDAY_CALENDAR
3.2 Use function previous_row_value to get value of CAL_DATE from previous row
4. In BUSINESS_DAY_OF_MONTH
4.1 Calculate BUSINESS_DAY_OF_MONTH with function of local variable
#####################################################################
# Input: $INPUT_CAL_DATE DateTime
# $INPUT_PREV_CAL_DATE Datetime CAL_DATE of previous row
# $INPUT_IS_BUSINESS_DAY Char(1) ‘Y’: business day
# Output: $V_BUSINESS_DAY_OF_MONTH int define this variable as local variable
#####################################################################
#if is first row in query
if ($INPUT_PREV_CAL_DATE is null)
$V_BUSINESS_DAY_OF_MONTH = 0;
else
# month changes
if (month($INPUT_PREV_CAL_DATE) <> month($INPUT_CAL_DATE))
$V_BUSINESS_DAY_OF_MONTH = 0;
#accumulative total
IF ($INPUT_IS_BUSINESS_DAY = ‘Y’)
$V_BUSINESS_DAY_OF_MONTH = $V_BUSINESS_DAY_OF_MONTH + 1;
return $V_BUSINESS_DAY_OF_MONTH;
5. Here are part of results
Key points:
- Each dataflow is a process with its own memory allocated, so the local variable could keep value during dataflow running.
- If you just call this function by script from ETL level, then the local variable will have null value if set value condition do not match, examples:
PRINT(EDW_CAL_BUSINESS_DAY_OF_MONTH(‘2012.01.01′,NULL,’N’)); à 0
PRINT(EDW_CAL_BUSINESS_DAY_OF_MONTH(‘2012.01.02′,’2012.01.01′,’N’)); à null
PRINT(EDW_CAL_BUSINESS_DAY_OF_MONTH(‘2012.02.01′,’2012.01.31′,’Y’)); à 1
PRINT(EDW_CAL_BUSINESS_DAY_OF_MONTH(‘2012.02.02′,’2012.02.01′,’Y’)); à null