Analytics

Top-down approach for Migration of SQL code from Microsoft SQL Server to Presto

Istock 636932704 (1)

In typical Migration Projects, data housed in Database Server A (provided by Vendor A) is extracted, transformed, and loaded into Database Server B (produced by Vendor B). Each of these servers offers a wide array of functions to query data. Though the function names and syntaxes may vary, all the DB servers in the market provide maximum querying power and flexibility.

A block of code is written comprising of many functions to perform an operation. Assume that in Server A, the action (Z1) can be performed by a combination of functions (x1,y1…). When migrating the code to Server B, we would usually be tempted to first find similar replacement functions (x2,y2…), which can then be put together to perform operation Z2. But by doing so, in many cases, we might not be able to reproduce the desired solution and consequently run into a dead-end.

Z1(x1+y1… ) ≠ Z2 (x2+y2…)

Instead, our aim must be to perform a top-down approach wherein we have to analyze Operation Z1 and visualize operation Z2. Then find functions (i1,j1…) specific to Server B which can, when put together, perform Z2

Z1(x1+y1… ) = Z2 (a2+b2…)

Use case: Microsoft SQL server

Let’s assume a sample of the Products table in the Microsoft SQL Server. There are 2 Products, A and B, sold by 2 people each. Suppose that we want the list of sales-people for each product (separated by ‘;’). This can be achieved using the below SQL code.

Covid 19
COVID-19: Digital Insights For Enterprise Action

Access Perficient’s latest insights into how you can leverage digital technologies to not only respond to the pandemic, but drive your operations forward and deliver experiences your customers need.

Get Informed

 

The code could be logically broken down into blocks of code as below:

  • ROW_NUMBER() over (partition by): This function assigns a unique row number for each row within each partition. The partition here happens by Product_Name (logically equivalent to grouping)

     

  • PIVOT – Pivoting the result based on Salesman for num2, whereby rows become columns.

  • Concatenate : Use + operator and delimiter to concatenate: ([1]+’ ; ‘+ [2]) as [Sales_People]

Note that the above code considers only a case where num 2 in (1,2). For situations where products have more than 2 Salespersons, the above system must be expanded to include all possible scenarios.

Conversion Use case: Presto

The above code in the SQL server could be converted into a code supported by Presto, as given below. It produces the same results as the SQL server code.

The code could be logically broken down into blocks of code as below:

  • Group by: The rows are grouped by – based on Product_Name.
  • Array_agg: An array is created with Salesman for each group
  • Array_join: The array elements are joined using the ‘;’ operator.

Note that this can handle any length of Array in the X dimension.

To summarize, we could say that a top-down approach is more-effective in migrating queries. Whereby, the final functionality of a code Z1 is analyzed and then broken down into functions necessary to perform the last operation.

About the Author

Pandian is a data enthusiast with experience in various technologies implementing Business Intelligence solutions.

More from this Author

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Subscribe to the Weekly Blog Digest:

Sign Up
Categories