Are you looking for How to do looping in XML files? Then yes, you are at the right place to get more information regarding how to retrieve data from XML files with multiple tags to process/transform using one of the advanced component in Talend studio known as tXMLMap Component.
Before going into actual concept of this Blog, we need basic understanding about the tXMLMap and its uses. Unlike tMap, we won’t be able to give the schema to input source in tXMLMap, in order to give schema to such source types we have one wonderful data type known as Document Type.
tXMLMap mainly used to transform and route data from single or multiple sources to single or multiple destinations. Sometimes it is also used for ESB Response-Request Process.
In this Blog, we are going to see about how we can transform/process the XML file with multiple tags and how to configure the tXMLMap without the actual schema from the file and also different type of looping techniques.
There are mainly two ways of writing content in XML files. Depending on the way of writing the content in the file we have two types of looping techniques. They are:
1. Single Loop Element
2. Multiple Loop Elements.
Here in this blog, we are going to learn about How to implement Multiple Looping in tXMLMap Component.
Multiple Loop Elements:
The Single Looping Method works fine if we have only simple xml data but what if we have complex xml data like having multiple employee information within the same company like below then we need Multiple Looping Method.
Sample Input File:
Now I need to load this data to Talend Studio in order to process further. For that I have inserted tFileInputXML component with schema as document type of Column Name as Company.
Input XML Schema:
And the component settings include Loop Path Query, Enabling Nodes and XPath Query.
If you need to consider only particular tags from the input xml, depending upon that tag path from the root element you need to include in Loop Path Query.
If you want to include complete XML then you need to write ‘/’ which means root directory. And you need to enable GET NODES option in order to get the data from the child nodes. After configuring based on your need, it will look like below:
Input Configuration of tFileInputXML:
Once tFileInputXML configured successfully then you need to insert tXMLMap component and connect these two components using main row connection as below
Now is the actual and main step that is to configure tXMLMap component. Now double click on the component, you will get a new editor as follows.
Initial tMap Editor:
By default, the loop element is on root column, but we didn’t have any column as root in the input file. So, I am changing the name of the root to the XMLINFO which is root of the input file. For that right click on root and click rename option.
Once it is renamed, you need to add the other tags here by clicking in create sub-element after right click and enter the suitable name based on the input file.
Like that you need to create the complete XML structure in this editor using this create sub-element option. Like that you need to create all tags in the xmlmap editor and once you did that step, you need to copy the required columns to the right hand side output table. Then the editor will look like as follows:
Initial tXMLMap Processing:
Now if we run the job with loop element on COMPANY, then we won’t be getting complete information that we have in the input file. If we run the output will be like as follows
But we have total of 2 employee information in both Infosys and Perficient but here we are getting only one. In order to get those, we need to change the loop elements and need to allow two loop elements. For that we need to make COMPANYDETAILS and EMP tags both as loop elements because we have multiple information inside the EMP in order to get all records, we need make that loop element.
Here in this example COMPANYDETAILS is not mandatory to keep as loop element but if we have multiple tags in that also then it is mandatory to keep that as loop element. After making those two as loop elements, It looks like
Modified Loop Tags on Parent and Child Elements:
Now left hand side it is configured successfully, But if you look into right schema, you will find some error message like bellow
Now click on messaging icon, there if you click on plus icon there you will be getting the number of sequences equal to the number of loop elements that you included in the job. As here we have two loop elements, we are having two sequences, and you need to specify which loop that you want to execute first.
Here Multiple information is present in only one tag that is Employee. So, we need to select that sequence in the output. If you have other inner tags and you made them as loop elements, and you want to collect information only related to that tag then you need to include that sequence alone.
Configuring order of sequence in message box:
Now if you click that second sequence in above image, error symbol on messaging icon disappears. Now you are good to run the job again. The final output looks like
Final Output what we are expecting:
This output contains all employee information that we have in initial input file with multiple employee information.
Finally, this is how we need to retrieve the data from XML files with the complex xml tags and multiple inner tags.
POINT TO REMEMBER: We can’t make both Parent and Child Elements as LOOP elements, but we can make multiple child elements as loop elements as both examples are clearly explained above.