Skip to main content

Development

Continuous Integration in the Analytics Project

Many people may have known that Continuous Integration (CI), Continuous Delivery (CD) is great part of the agile activity. In the Java related project, there were lots of open source tool such as Hudson, Continuum, Jenkins etc to support this automation process. However, if you are going to look for some tools to support the similar work on database, ETL and BI reporting, most of them are commercial and very limited. One CI tool usually just support for one type of BI tool and a few of database. Perficient team adopted CI process and performed some practice in our internal analytics project. We build several internally using scripts and tools to support the code migration, deployment automation. But this is not just the tools, we had defined the process that is closely relevant to those tools to guide everyone and also make the process to be more transparent.

CI Benefit

The value of CI has been stated in many books and articles but I would like to mention the value of adopting CI in ETL/BI development and testing activities.

Team don’t have to spend much time on code migration. This benefit is obvious, with some tools and appropriate process, the team can be focus on developing the code, committing the code to repository such as SVN, doing the unit test. Some developers may complain that they cannot be focus on writing the code and had been pulled to work on something else like migrating code, this is not their main responsibility. This impact can be mitigated and eliminated with appropriate CI tools & process.

Connect the Dev and QA team more closely. After the ETL job or the report was developed, usually it was put into Development meta-data repository, while QA team will be doing test in another environment. Testers are not good at importing/exporting those meta-data so they get to interrupt dev team to help moving code across environment. With CI, many pieces of work can be done by machine with quick button-click. Also, if team decide to increase the frequency like to perform the CI daily, the newly developed code can be tested in time as long as the QA team can catch up the Dev team’s speed. The whole piece of deliverables are earlier to be ready for release.

Actually, the biggest value of bringing CI in for the project stakeholders is increase team productivity, not only to better satisfy the timeline constraint, but also make team feeling more conformable where they are creating real value in the limited time.

Deployment Automation

Our recent practice is mainly utilizing the IBM stack that includes IBM IDA, Datastage, Cognos BI and Netezza. IDA is the modeling tool similar to ERWin. Datastage is to create and maintain ETL jobs to transform and load data into Netezza DB which is a Data Warehouse (DW) appliance. The Cognos BI will source from the star-schema from DW and produce dashboard and reports. In this post I just simply mention each component we have taken care of and some of our colleagues may share more detail for some specific components.

Synchronize Code on SVN. All types of code are stored in the SVN repository, Sub-eclipse plugin enable us to synchronize the code to SVN from each developer.

DDL Auto-Deployment. The table structure and DDL was primarily maintained in the IDA tool by separating to business, logical and physical layer. In the real world we cannot hit all the data modeling at one time, it could change. So this is not one time job to apply DLL in the Netezza. Therefore, we developed several piece of scripts and procedures to migrate DDL from Dev to QA without dropping anything and truncating any tables.

Datastage Job Auto-Deployment. A tool was created to push the DS jobs from Dev to QA, a list was provided to the developer, after the finish the unit test on the job, they can fill up that list and someone just need to click on the button then all added or updated jobs will be copied to another target projects.

Cognos BI Auto-Deployment. Cognos tool include Framework manager and the metadata part, the former is basically the file based. Some scripts was created to move FM project files and report metadata between repositories.

From the experience, it may not be so necessary if there are just 1-2 developers in the team to work on either ETL or BI, because tool/script creation also require some effort. But if we increase the developers number to 10 even more, and testers to be the same number. This will bring significant improvement to the whole team and better satisfy the customer.

Thoughts on “Continuous Integration in the Analytics Project”

  1. Kannan Sidharth

    Hello,
    I am Kannan Sidharth, I am currently looking for options of introducing Continuous Integration-Continuous Deployments into one our Analytics projects which uses Datastage, Oracle Exadata & Microstrategy technologies. We currently have a 30 member strong team. Could you pls share if you have any efficient solutions available?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Kent Jiang

Currently I was working in Perficient China GDC located in Hangzhou as a Lead Technical Consultant. I have been with 8 years experience in IT industry across Java, CRM and BI technologies. My interested tech area includes business analytic s, project planning, MDM, quality assurance etc

More from this Author

Follow Us