Are you looking for ‘How to do looping in XML files’? Then yes, you are at the right place to get more information regarding how to retrieve data from XML files with multiple tags to process/transform using one of the advanced component in Talend studio known as tXMLMap Component.
Before going into actual concept of this Blog, we need basic understanding about the tXMLMap and its uses.Unlike tMap, we won’t be able to give the schema to input source in tXMLMap, in order to give schema to such source types we have one wonderful data type known as Document Type.
tXMLMap mainly used to transform and route data from single or multiple sources to single or multiple destinations. Sometimes it is also used for ESB Response-Request Process.
In this Blog, we are going to see about how we can transform/process the XML file with multiple tags and how to configure the tXMLMap without the actual schema from the file and also different type of looping techniques.
There are mainly two ways of writing content in XML files. Depending on the way of writing the content in the file we have two types of looping techniques. They are:
1. Single Loop Element
2. Multiple Loop Elements
The Future of Big Data
With some guidance, you can craft a data platform that is right for your organization’s needs and gets the most return from your data capital.
We are going to discuss about Single Loop Element in coming few seconds with sample input file.
Single Loop Element:
In order to illustrate this concept, I created one sample file as below. This file contains the basic information related to company and its employee. We have main tag named as XMLINFO and this covers complete xml data. Child tag for the main parent tag is COMPANY which acts as both child and parent tag.
The children of COMPANY are COMPANYDETAILS and EMPLOYEEDETAILS which have corresponding children as shown in the below image. Like this we have details of 3 companies with their individual employee.
Schema inside the component:
Now I need to load this data to Talend Studio in order to process further. For that I have inserted tFileInputXML component with schema as document type of Column Name as Company.
And the component settings include Loop Path Query, Enabling Nodes and XPath Query. If you need to consider only particular tags from the input xml, depending upon that tag path from the root element you need to include in Loop Path Query.
If you want to include complete XML then you need to write ‘/’ which means root directory. And you need to enable GET NODES option in order to get the data from the child nodes. After configuring based on your need, it will look like below:
Once tFileInputXML configured successfully then you need to insert tXMLMap component and connect these two components using main row connection as below
Now is the actual and main step that is to configure tXMLMap component. Now double click on the component, you will get a new editor as follows.
By default, the loop element is on root column, but we didn’t have any column as root in the input file. So, I am changing the name of the root to the XMLINFO which is root of the input file. For that right click on root and click rename option.
Once it is renamed, you need to add the other tags here by clicking in create sub-element after right click and enter the suitable name based on the input file.
Final Input XML Schema:
Like that you need to create the complete XML structure in this editor using this create sub-element option. After creating all the necessary fields, it will look like as follows
Here each sub element acts as a child element. Now left-hand side you have configured successfully but on right side you need to enter the actual schema how the data looks and you need to create output table with information that you need. For now, to illustrate, I copied all the elements from left hand side to right side table as below:
Final Job Design:
If you observe the right hand side schema section, it looks like normal schema how we normally configure in other components but in left side we have only document type. Now I have inserted tLogRow Component to display the contents in studio console itself. And I copied the same schema in tLogRow component. Now the final design looks like
Final Job Design with Output:
We are done with connecting and configuring all the components needed. Now we need to run the job. After running I got the output as
Here we have output as 1 record but we have total 3 records in the input file. Why this is happening is we kept the LOOP element on XMLINFO not on the other. If we mention loop element on XMLINFO, it will search for another XMLINFO tag and it will loop again. In our input we didn’t have multiple those tags. That is the reason why we got only one record.
Final tXMLMap Editor Deisgn:
Now what we need to do is Right Click on Company and click on “AS LOOP ELEMENT”. It will ask for a pop-up click ok. Now Loop element will go to Company from XMLINFO. At a time you can’t make multiple parent nodes as loop elements. After changing, editor will look like
Now after changing the loop element if you run the job again, now you will get output 3 records as follows:
This is how we need to apply loop element on the schema that we developed. You will get output based on the loop elements what you mentioned.
Finally, this is how we need to retrieve the data from XML files if it is simple XML tag content. If you want to learn about complex xml tags with multiple inner tags and how to loop, then you can go through the second part of this blog.
POINT TO REMEMBER: We can’t make both Parent and Child Elements as LOOP elements, but we can make multiple child elements as loop elements as both examples are clearly explained above.