Ah, the never-ending task of transposing rows into columns and columns into rows in an Oracle table! There was always a fairly standard max-decode solution for this until Oracle came up with their elegant pivot and unpivot standard functions in 11g. Recently, I had a close encounter with these pivot functions and learned a few things.
The pivot function works beautifully, as long as you know which columns you want to transpose upfront and are dealing with a manageable size of raw data. Unfortunately, neither condition was true for me.
How Artificial Intelligence Can Enhance the Clinical Data Review and Cleaning Process
This guide analyzes how artificial intelligence – including machine learning – can be used by pharmaceutical and medical device companies to improve the clinical data review and cleansing process.
In this particular case, it involved retrieving raw audit data in XML format from a transactional system using a web service call, and then transposing that data to re-create the transactional system’s data tables locally. The list of tables and structures were not fixed and had to be read upfront using a separate metadata web service call.
Since my use case involved transposing a data structure that was not fixed upfront, I didn’t know which columns to transpose into a pivot function. However, by building the pivot query string dynamically, this was pretty easy to solve.
Scalability was a major challenge and a surprise, too! As the amount of source data grew, the pivot queries took longer and longer, sometimes running for hours. Indexes didn’t help and I had to resort to some creative problem-solving.
The trace was characterized by very long I/O wait times, since it was trying to process a large number of rows at the same time. Here’s how I went about reducing the processing time:
- Set a configurable parameter at the application level, driven by the overall database memory available to process queries. (DBAs have to set this, depending on their database.) Let’s say this was set to x bytes.
- Calculated the row byte length of the target table I was transposing to. Let’s say this was y bytes.
- Determined the number of rows I processed in one go as x/y.
- Ran the pivot query in a loop until all the rows were processed.
These few steps enabled me to run multiple small, manageable chunks of pivot queries rather than a single enormous one, and ultimately brought down the overall processing time from hours to just seconds.
To learn how our life sciences practice can help you tackle your database challenges, feel free to send us a note.