Skip to main content

Development

Performing a Functional Test for ETL from MySQL to MongoDB

MySQL has been a very popular DBMS and was chosen for many application systems since it is open source and easy to use. In NoSQL, MongoDB is becoming popular and is being applied in application development and the big data analytics field. In one of my colleague’s blog posts, he discusses that we built an internal dashboard utilizing both MySQL (as Source) and MongoDB (as the staging and aggregation layer). From the testing perspective, we did data comparison on both sides to ensure the total numbers and data were properly loaded to MongoDB. We also faced a number of challenges during the testing as this is a new field for the traditional DBMS testing.

First, let’s take a look at the differences between MySQL and MongoDB. Here is the source link on the MongoDB site, where it talks through the concepts and feature differences.

Concept difference

Performing a Functional Test for ETL from MySQL to MongoDB

Query difference Sample

Performing a Functional Test for ETL from MySQL to MongoDB

However, we can always find a straightforward mapping from MySQL to MongoDB that lays out the foundation of our ETL to load from MySQL to MongoDB. With the mapping lineage from source to target, we can test the ETL correctness field by field and compare the row numbers.

Testing Approach

During the test execution, the main objective is to ensure that all rows in MySQL are properly transformed, aggregated and loaded to MongoDB. There are several testing points where we should apply the specific comparison and validation on.

1.Field by field check. If there is a single table loaded to a single collection in MongoDB, it is easy to do so by comparing both parts for all rows. As everything is BSON/JSON in the MongoDB, we can use MySQL workbench and a MongoDB client tool such as Robo 3T to view the data and documents and check the difference.

2.Total number counting validation by some fields. For example, if there is a sales table in MySQL, we can count the total rows group by region to learn the sales total count for each region. In the meantime, in MongoDB we could apply the following query syntax to get the same effect.

3.Aggregation rule validation during ETL.  This is a complex case, but the idea is similar to what we did in the traditional DBMS aggregation. To test it, we could introduce a staging table in MongoDB to reflect the source data.

Challenge

We also faced some challenges going forward since most of the steps performed are semi-automated execution. We will need some tools for our framework to automate the whole process and produce the testing result with multiple cases.

The second challenge is that the JSON hierarchy could be very complex while the corresponding data entity resides in MySQL is a one level hierarchy. It requires us to apply the complex testing validation rule during the test.

If there are videos or images in the source system and they can be loaded to MongoDB as binary, this is another challenge to test it.

MongoDB DML Reference

In the last section, I’m attaching some frequently used query language:

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Categories
Follow Us