Skip to main content

Development

Using IBM Netezza Stored Procedure to Synchronize Tables

In a data warehouse project, once the physical model has changed, attributes have to be added accordingly or tables need to be re-created. For ETL developers, it’s their job to make sure that these changes take place in the development environment. Thus, this leads to a problem: how do you maintain those databases (eg. QA environment) other than in the development environment?

IBM Netezza stored procedures combine the benefits of SQL to query and manipulate database information with the benefits of a procedural programming language to handle data processing, transaction logic, and application branching behaviors. In this case, we are going to use an  IBM Netezza store procedure to tackle the problem.

There are 3 steps to create a store procedure, especially for this function. Let’s suppose our purpose is to synchronize tables in the QA environment to be consistent with those in a development environment.
1. Retrieve all table information needed in both environments.
2. Make comparisons with the given information to see which attributes of what tables in QA environment need changing.
3. Generate DDL/DML and execute it to make the change happen.

We’ll now going through this procedure step by step.

  1. Retrieve the table information

To do this, we have to introduce a user view named _v_relation_column. This is a Netezza user view that describes the information of all the objects in Netezza, which includes object name, owner, create date, object type, attribute number, attribute name, attribute type, and not null indicator.

Once we specify “ TYPE = ‘TABLE’ ” and ” DATABASE = QA ” ( Moreover, you can specify a “SCHEMA” etc. depending on the information you need ),  all the tables and attributes as well as their format type (“FORMAT_TYPE”) and length (“ATTCOLLENTH”) in the database are displayed.

1

Now that we have all the information we need, we can create some temporary tables for storing the table information of development environment. ( You may want to create a stored procedure to delete an already existing temp table before you create one. )

    a. CI_TBLINFO_DEV – names of tables that exist in development environment.

INSERT INTO CI_TBLINFO_DEV(OBJID,TBLNM,DATABASE,SCHEMA)

SELECT DISTINCT CAST(OBJID AS VARCHAR(100)) AS OBJID

,CAST(NAME AS VARCHAR(100)) AS TBLNM

,CAST(DATABASE AS VARCHAR(100)) AS DATABASE

,CAST(SCHEMA AS VARCHAR(100)) AS SCHEMA

FROM _V_RELATION_COLUMN T WHERE TYPE=’TABLE’;

    b. CI_CLMNINFO_DEV– attributes of tables that exist in development environment.

INSERT INTO CI_CLMNINFO_DEV(OBJID,TBLNM,DATABASE,SCHEMA,ATTNUM,ATTNAME,FORMAT_TYPE,ATTCOLLENG,ATTNOTNULL)

SELECT CAST(T.OBJID AS VARCHAR(100)) AS OBJID

,CAST(T.NAME AS VARCHAR(100)) AS TBLNM

,CAST(T.DATABASE AS VARCHAR(100)) AS DATABASE

,CAST(T.SCHEMA AS VARCHAR(100)) AS SCHEMA

,CAST(T.ATTNUM AS INTEGER) AS ATTNUM

,CAST(T.ATTNAME AS VARCHAR(100)) AS ATTNAME

,CASE

WHEN T.ATTTYPID=23 THEN ‘INTEGER’

WHEN T.ATTTYPID=16 THEN ‘BOOLEAN’

WHEN T.ATTTYPID=17 THEN ‘BYTEA’

WHEN T.ATTTYPID=18 THEN ‘CHAR’

WHEN T.ATTTYPID=19 THEN ‘NAME’

WHEN T.ATTTYPID=20 THEN ‘BIGINT’

WHEN T.ATTTYPID=21 THEN ‘SNALLINT’

WHEN T.ATTTYPID=22 THEN ‘INT2VECTOR’

WHEN T.ATTTYPID=24 THEN ‘REGPROC’

WHEN T.ATTTYPID=25 THEN ‘TEXT’

WHEN T.ATTTYPID=26 THEN ‘OID’

WHEN T.ATTTYPID=27 THEN ‘TID’

WHEN T.ATTTYPID=28 THEN ‘XID’

WHEN T.ATTTYPID=29 THEN ‘CID’

WHEN T.ATTTYPID=30 THEN ‘OIDVECTOR’

WHEN T.ATTTYPID=210 THEN ‘SMGR’

WHEN T.ATTTYPID=700 THEN ‘REAL’

WHEN T.ATTTYPID=701 THEN ‘DOUBLE’

WHEN T.ATTTYPID=702 THEN ‘ABSTIME’

WHEN T.ATTTYPID=705 THEN ‘UNKNOWN’

WHEN T.ATTTYPID=1007 THEN ‘_INTEGER’

WHEN T.ATTTYPID=1033 THEN ‘ACLITEM’

WHEN T.ATTTYPID=1034 THEN ‘_ACLITEM’

WHEN T.ATTTYPID=1042 THEN ‘CHARACTER(‘ || ATTTYPMOD -12 || ‘)’

WHEN T.ATTTYPID=1043 THEN ‘VARCHAR(‘ || ATTTYPMOD -12 || ‘)’

WHEN T.ATTTYPID=1082 THEN ‘DATE’

WHEN T.ATTTYPID=1083 THEN ‘TIME’

WHEN T.ATTTYPID=1184 THEN ‘TIMESTAMP’

WHEN T.ATTTYPID=1266 THEN ‘TIMETZ’

WHEN T.ATTTYPID=1700 THEN ‘NUMERIC(‘ || (ATTTYPMOD -12)/65536 ||’,’||(ATTTYPMOD -12)%65536|| ‘)’

WHEN T.ATTTYPID=2500 THEN ‘BYTEINT’

WHEN T.ATTTYPID=2569 THEN ‘NUCL’

WHEN T.ATTTYPID=2570 THEN ‘PROT’

WHEN T.ATTTYPID=2522 THEN ‘NCHAR(‘ || ATTTYPMOD -12 || ‘)’

WHEN T.ATTTYPID=2530 THEN ‘NVARCHAR(‘ || ATTTYPMOD -12 || ‘)’

WHEN T.ATTTYPID=278 THEN ‘NTEXT’ END FORMAT_TYPE

,CAST(T.ATTCOLLENG AS VARCHAR(100)) AS ATTCOLLENG

,CAST(T.ATTNOTNULL AS BOOLEAN) AS ATTNOTNULL

FROM _V_RELATION_COLUMN T  WHERE TYPE=’TABLE’;

There are two things you should pay attention to:

1). The column “FORMAT_TYPE” in the view is generated by a function format_type(). You will get an error when using this column to insert records into a table. Therefore we use “ATTTYPID” to get the format of the attribute instead.

2). Use ATTTYPMOD to get the length of an attribute. But when it comes to a double precision format (like “NUMERIC” for example), you have to calculate the right length and precision as shown above.

c. CI_KEYDATA_DEV– Constraints of tables that exist in development environment, drawn from another public view_v_relation_keydata.

INSERT INTO CI_KEYDATA_DEV(TBLNM,DATABASE,SCHEMA,CONSTRAINTNAME,CONTYPE,CONSEQ,ATTNAME,PKDATABASE,PKSCHEMA,PKRELATION,PKCONSEQ,PKATTNAME ,UPDT_TYPE,DEL_TYPE,MATCH_TYPE)

SELECT

CAST(T.RELATION AS VARCHAR(100)) AS TBLNM

,CAST(T.DATABASE AS VARCHAR(100)) AS DATABASE

,CAST(T.SCHEMA AS VARCHAR(100)) AS SCHEMA

,CAST(T.CONSTRAINTNAME AS VARCHAR(100)) AS CONSTRAINTNAME

,CAST(T.CONTYPE AS CHAR(100)) AS CONTYPE

,CAST(T.CONSEQ AS INTEGER) AS CONSEQ

,CAST(T.ATTNAME AS VARCHAR(100)) AS ATTNAME

,CAST(T.PKDATABASE AS VARCHAR(100)) AS PKDATABASE

,CAST(T.PKSCHEMA AS VARCHAR(100)) AS PKSCHEMA

,CAST(T.PKRELATION AS VARCHAR(100)) AS PKRELATION

,CAST(T.PKCONSEQ AS INTEGER) AS PKCONSEQ

,CAST(T.PKATTNAME AS VARCHAR(100)) AS PKATTNAME

,CAST(T.UPDT_TYPE AS VARCHAR(100)) AS UPDT_TYPE

,CAST(T.DEL_TYPE AS VARCHAR(100)) AS DEL_TYPE

,CAST(T.MATCH_TYPE AS VARCHAR(100)) AS MATCH_TYPE

FROM _V_RELATION_KEYDATA T;

The attribute CONTYPE has 3 values: “f”, which stands for foreign key, “p”, which stands for primary key, and “u”, short for unique key.

We then have two temp tables of development environment. Now we do the same thing for the QA environment, drawing its table information and key data into another 2 temp tables, naming them CI_TBLINFO_QA and CI_CLMNINFO_QA.

  1. Make comparisons between 2 environments.

Now we create another stored procedure to compare the differences and log the result in 3 temp tables.

a. CI_TBLNEW– tables that exist in development environment but don’t exist in QA environment.

CREATE TABLE CI_TBLNEW (

OBJID VARCHAR(100) NULL,

TBLNM VARCHAR(100) NULL,

DATABASE VARCHAR(100) NULL,

SCHEMA VARCHAR(100) NULL,

ATTNUM INT4 NULL,

ATTNAME VARCHAR(100) NULL,

FORMAT_TYPE VARCHAR(100) NULL,

ATTCOLLENG VARCHAR(100) NULL,

ATTNOTNULL BOOLEAN NULL );

INSERT INTO CI_TBLNEW (

SELECT T.*

FROM CI_CLMNINFO_DEV T

INNER JOIN (

SELECT * FROM (SELECT DISTINCT TBLNM AS NAME_DEV ,SCHEMA AS SCHEMA_DEV FROM CI_CLMNINFO_DEV ) T

LEFT JOIN (SELECT DISTINCT TBLNM AS NAME_QA ,SCHEMA AS SCHEMA_QA FROM CI_CLMNINFO_QA ) T1

ON T.NAME_DEV=T1.NAME_QA

WHERE NAME_QA IS NULL

) T1 ON T.TBLNM=T1.NAME_DEV;

b. CI_COLNEW– attributes that exist in development environment but don’t exist in QA environment.

CREATE TABLE CI_COLNEW (

OBJID VARCHAR(100) NULL,

TBLNM VARCHAR(100) NULL,

DATABASE VARCHAR(100) NULL,

SCHEMA VARCHAR(100) NULL,

ATTNUM INT4 NULL,

ATTNAME VARCHAR(100) NULL,

FORMAT_TYPE VARCHAR(100) NULL,

ATTCOLLENG VARCHAR(100) NULL,

ATTNOTNULL BOOLEAN NULL );

INSERT INTO CI_COLNEW

SELECT T.*

FROM CI_CLMNINFO_DEV T

INNER JOIN (

SELECT T.TBLNM, T.SCHEMA

FROM CI_TBLINFO_DEV T

INNER JOIN CI_TBLINFO_QA T1 ON T.TBLNM=T1.TBLNM

) T1 ON T.TBLNM=T1.TBLNM

LEFT JOIN CI_CLMNINFO_QA T2 ON T.TBLNM=T2.TBLNM AND T.ATTNAME=T2.ATTNAME

WHERE T2.TBLNM IS NULL;

c. CI_COLUPDATED– attributes that exist in both environments but their format/length/not-null statement are not the same.

CREATE TABLE CI_COLUPDATED (

OBJID VARCHAR(100) NULL,

TBLNM VARCHAR(100) NULL,

DATABASE VARCHAR(100) NULL,

SCHEMA VARCHAR(100) NULL,

ATTNUM INT4 NULL,

ATTNAME VARCHAR(100) NULL,

FORMAT_TYPE VARCHAR(100) NULL,

ATTCOLLENG VARCHAR(100) NULL,

ATTNOTNULL BOOLEAN NULL );

INSERT INTO CI_COLUPDATED

SELECT T.*

FROM CI_CLMNINFO_DEV T

INNER JOIN (

SELECT T.TBLNM, T.SCHEMA

FROM CI_TBLINFO_DEV T

INNER JOIN CI_TBLINFO_QA T1 ON T.TBLNM=T1.TBLNM

) T1 ON T.TBLNM=T1.TBLNM AND T.SCHEMA=T1.SCHEMA

INNER JOIN CI_CLMNINFO_QA T2 ON T.TBLNM=T2.TBLNM AND T.ATTNAME=T2.ATTNAME

WHERE T.FORMAT_TYPE<>T2.FORMAT_TYPE OR T.ATTCOLLENG<>T2.ATTCOLLENG OR T.ATTNOTNULL<>T2.ATTNOTNULL;

You may also create temp tables for dropped tables, deleted attributes etc. as per your needs.

  1. Generate DDL/DML and make the change happen.

After getting the comparison result we need in step 2, we then create a procedure to generate DDL using the 3 temp tables.

First of all, create a table to store the generated SQL. Here we define a SQLTYPE to differentiate for various usages.

CREATE TABLE CI_STOSQL(

STOSQL_ID INTEGER NOT NULL,

SQLTYPE VARCHAR(100) NOT NULL,

CONTYPE VARCHAR(10) NULL,

SCHEMA VARCHAR(100) NULL,

TBLNM VARCHAR(100) NOT NULL,

SQLSTR VARCHAR(8000) NOT NULL );

Then, iterate the records in CI_TBLNEW and insert the generated SQL into CI_STOSQL :

–OUTTER LOOP: ITERATE THE TABLE

FOR REC_TBL IN EXECUTE

‘SELECT DISTINCT OBJID, TBLNM, SCHEMA FROM CI_TBLNEW ORDER BY OBJID’

LOOP

VAR_TBLNM:=TRIM(REC_TBL.TBLNM);

VAR_SCHEMA:=TRIM(REC_TBL.SCHEMA);

SQL_TBLNEW:=’CREATE TABLE ‘||VAR_SCHEMA||’.’||VAR_TBLNM||'( ‘;

–INNER LOOP: ITERATE THE COLUMN

FOR REC_COL IN EXECUTE

‘SELECT * FROM CI_TBLNEW WHERE TBLNM=’||””||VAR_TBLNM||”” ||’ AND SCHEMA=’||””||VAR_SCHEMA||””||’ ORDER BY ATTNUM’

LOOP

VAR_COLNM:=TRIM(REC_COL.ATTNAME);

VAR_FMTTYPE:=TRIM(REC_COL.FORMAT_TYPE);

VAR_ATTCOLLENG:=TRIM(REC_COL.ATTCOLLENG);

IF REC_COL.ATTNOTNULL=TRUE THEN

VAR_ATTNOTNULL:=’NOT NULL’;

ELSE

VAR_ATTNOTNULL:=’NULL’;

END IF;

SQL_TBLNEW:=SQL_TBLNEW||VAR_COLNM ||’ ‘||VAR_FMTTYPE||’ ‘||VAR_ATTNOTNULL||’,’;

END LOOP;

–TRIM THE LAST COMMA FROM THE LOOP

SQL_TBLNEW:=SUBSTR(SQL_TBLNEW,1,LENGTH(SQL_TBLNEW)-1)||’);’;

–INSERT DDL TO CT_STOSQL

IF TRIM(SQL_TBLNEW)<>” THEN

VAR_STOSQL_ID:=VAR_STOSQL_ID+1;

INSERT INTO CI_STOSQL

SELECT VAR_STOSQL_ID AS STOSQL_ID

,’CREATE TABLE’ AS SQLTYPE

,NULL AS CONTYPE

,VAR_SCHEMA AS SCHEMA

,VAR_TBLNM AS TBLNM

,SQL_TBLNEW AS SQLSTR;

END IF;

END LOOP;

Likewise, procedures for the new attributes and updated attributes are basically the same as below. A slight difference lies in the outer loop.

For new attributes, it should be executing “ALTER TABLE … ADD COLUMN …”.

For updated attributes, it should be executing “ALTER TABLE …MODIFY COLUMN …”.

Finally, we take care of the constrains, which is a lot like what we did with the tables. We use CONTYPE to differentiate the constraint type.

One thing you should know, is that when we cope with foreign key, the constrains have to be declared this way as shown below:

IF UPPER(CONTYPE)=’F’ THEN

SQL_KEYDATA:=SQL_KEYDATA||’ FOREIGN KEY (‘||ATTNAME||’) REFERENCES ‘||PKSCHEMA||’.’||

PKRELATION||'(‘||PKATTNAME||’) ON DELETE ‘||DEL_TYPE||’ ON UPDATE ‘||VAR_UPDT_TYPE||’;’;

END IF;

Since now we have the DDL/DML generated, we can either export the DDL/DML into a text file and see for yourself, using

‘CREATE EXTERNAL TABLE ”C:\TABLE_NEW.TXT”’||’ USING (REMOTESOURCE ”ODBC”’||’ DELIM ‘||””||’|’||””||’) AS SELECT SQLSTR FROM CI_STOSQL WHERE SQLTYPE LIKE ‘||””||’%CREATE TABLE%’||””||’ ORDER BY STOSQL_ID’

or simply use “EXECUTE IMMEDIATE” in the procedure to execute the DDL/DML right away.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Categories
Follow Us