There are many ETL (extract, transform and load) tools out there but Oracle Data Integrator (ODI) has gained a lot of ground because of its ability to perform event-driven, high-volume batch uploads from disparate data sources and to the variety of target applications/data storages.
Having worked on a recent implementation, I have come to appreciate the ease of data integration and automation of metadata and data uploads with ODI. The knowledge modules are such powerful tools to allow capture changed data (mentioned as CDC) and reverse-engineer metadata from different systems without a need to code much. But to maximize benefits of ODI, it needs to be tuned (with emphasis) and designed well. Here are couple of tips (besides the best practices already covered by many) that I have learnt while designing the packages/interfaces:
- Set truncate mode ON for temporary tables: In development mode, switching off the deletion of temporary tables in the staging area is quite handy for debugging purposes. But these can cost heavily if carried over in the debug mode to production especially if ‘append’ mode is on for the tables.
- Tune JVM heap options: ODI uses java runtime environment for most of the knowledge modules and message-driven functionalities. The heap settings for ODI are in the odiparams.bat script file and the default values for the ODI_INIT_HEAP and ODI_MAX_HEAP properties are 32M and 256M. In most of the implementations where ODI is used, these settings are relatively low and result in OutofMemoryError exception when the packages/interfaces are run. Per documentation the recommended settings are 256M for ODI_INIT_HEAP and 1024M for ODI_MAX_HEAP. Per my experience, 256M of init setting is the optimal one but max heap setting should be based on how many packages and the volume of task performed by those packages.
- Optimize the batch load size: When uploading to an RDBMS data storage, batch upload size can be a very effective setting and improve the runtime of an interface/package significantly. Most of the relational databases have two properties that can be set in the Physical Schema definition of the database technology under Topology Manager. The two settings are Array Fetch Size and Batch Upload Size. These settings are typically quite low and should be increased to the optimal values to make the loads faster.
- Remote server URI and access: Many times the packages/interfaces have to access files from the remote servers. One obvious way to handle this is to share the path in the remote server, map the drive to the remote server path where required files are and then use the mapped drive in the package. But I found that ODI was not able to work with mapped drives. You need to give the fully qualified Universal Resource Identification (URI) path. Secondly, the access given to the path should be to a service account that is registered in the domain and can be accessed from within any server in the network. This service account should also be used as the logon account for ODI agent.
- Sunopsis Memory Engine vs. External Database: ODI provides Sunopsis Memory Engine as an in-memory database based on HSQLDB that serves quite well as a high-performing temporary data storage for intermediary data transformations. But, this engine is good for not too large set of data because it uses the memory from JVM that ODI runs on. For transformations of large data sets in interfaces, rather use an external RDBMS as the target or staging data-storage with truncate mode switched on (as mentioned in point 1). There is a good series of external blog on Sunopsis Memory Engine that also addresses this point in detail and more (Under the hood of the Sunopsis Memory Engine).