A few months back, I worked on a project that involves flat file handling. I thought it was such an odd thing that people still use flat file in the 21st century. Ironically, I’m now on my 3rd project which involves flat file processing. It is not just flat file; I am actually dealing with COBOL Copybook and EDCDIC encoding. I guess it’s only fitting for someone like me who started learning programing with a punch card to deal with COBOL…
Anyway, I have learned quite a few things about Mule flat file processing that warrant some deeper discussion.
With Mule 3.8.x (currently 3.8.5), Dataweave (DW) comes with flat file support https://docs.mulesoft.com/mule-user-guide/v/3.8/dataweave-flat-file-schemas. Part of it is very powerful. There are also problems when it comes to more complex situation, especially with COBOL Copybook.
I categorize Mule’s flat file function into three types: 1) the true flat file 2) structured flat file 3) COBOL Copybook.
True Flat File
This is the first type of flat file. A “true flat file” is a file with a uniform structure for all rows. It is like a simple relational table with a uniform header that defines the size and type of each data column. “True Flat File” processing is relatively simple, especially if you don’t need to deal with special zoned number encoding (I have only seen COBOL Copybook uses zoned numbers; see later).
A sample flat file schema (ffd) can look like:
form: FIXEDWIDTH
name: my-flat-file
values:
– { name: ‘Row-id’, type: String, length: 2 }
– { name: ‘Total’, type: Decimal, length: 11 }
– { name: ‘Module’, type: String, length: 8 }
– { name: ‘Cost’, type: Decimal, length: 8, format: { implicit: 2 } }
– { name: ‘Program-id’, type: String, length: 8 }
– { name: ‘user-id’, type: String, length: 8 }
– { name: ‘return-sign, type: String, length: 1 }
It is quite straightforward for DW to handle “true flat file.” I’ll skip the details here. Please refer to the Mule online document. The only special thing worth mentioning here is the decimal number field:
- In DW mapping, in order for preview to show the mapping correctly, you must provide default value “0” to all “implicit” number fields. Otherwise, you will get a clueless exception: com.mulesoft.flatfile.lexical.WriteException: incompatible type for supplied value object: java.lang.Integer. However, this exception seems to only matter in the studio preview. Even if you don’t fill in the number fields, it seems to work just fine at run time.
- For regular decimal numbers, it will always contain a decimal point “.” in the output.
- For implicit decimal numbers, the mapping result will fill in the decimal places (2 decimal places in this case), and there is no decimal point “.”
Structured Flat file
This is the second type of flat file. Unlike the “true flat file,” the rows in a structured flat file contain different types of records. This type of file is not really flat after all. For example, it may have a sales order on one line followed by multiple sales items for the order. It may even further followed by shipping records.
The Mule online document did a great job providing a structured file example. It also showed how to use reference-id etc.
The only thing I want to point out is the “tag” field. A tag field identifies what type of record a row contains. When flat file data comes in to DW, the file processor needs to differentiate one row type from another. The only thing it can rely on is a “tag” field at the beginning of a line.
For example, you may have tag “101” indicate this line is the order (header, says how much this order is, the order number, etc), then “202” indicates this line is sale item record (contains merchandize name, how many, etc), and “303” will identify a shipping record (with address info).
Please note that when DW generates flat file output, it auto fills the “tag” value depends on the record type.
COBOL Copybook
This the third type of flat file I have identified. It is quite complex. I will further break down the copybook processing into three parts.
I am no Copybook expert. However, I hope the few things I have learned can help anyone who is exposed to Copybook for the first time. Mule Dataweave (DW) Copybook support is somewhat limited at this moment. Current Mule documentation is inaccurate as well.
There are three parts of Copybook processing with DW:
Part I – Generating FFD file from the Copybook file
The Mule DW document assumes you already have a FFD file. However, it does not tell you how to use DW to load Copybook and generate the FFD as step one.
First of all, from a developer’s point of view, a Copybook is a data structure definition for a 32K character space. That’s how and why Copybook is related to flat file. I’m sure there is more to it. But for DW Copybook processing that’s all I care about: all we need is the structure definition of this 32K long space.
DW is very finicky on picking up a Copybook file. The Copybook file I initially used did not include a section “01” (whatever that means), so DW cannot process it. I ended up adding something like “01 GM220-REC” at the top of the file to make DW even recognize the file.
After DW accepts the Copybook file, it will spit out an “FFD” file. You can look at this step as DW translating Copybook format into FFD format. I do not understand why the online Mule document does not mention this step.
Anyway, what is mad is that DW will not recognize the “FFD” file that is generated by itself! Because the generated FFD contains “zoned” number types in my case.
I had to manually tweaking the “FFD” file so DW can recognize the structures. That is where the story start to get murky. Read on to the next section.
Part II – Tweaking the FFD file
If your generated FFD file contains zoned types, you need to read part III below. But first let me address a simpler FFD tweaking first.
The FFD generated by DW contains quite complex structures. Each section of the copybook is treated as separate structure. However, if your original Copybook does not contains OCCURS, the copybook is really just one single long line of flat record. In that case, you can merely flatten out the complex levels of structures in the generated FFD file. You can simply take all the “values” rows with the name and type definition for each field, and remove everything else. That way the FFD file becomes a “true flat file.” Your life is a whole lot easier when it comes to parsing and mapping the records in DW.
Keep in mind, if your Copybook has structures that are more complex, you will not be able to flatten the FFD file.
Part III – Zoned numbers
Finally, if your Copybook contains zoned numbers, the situation will become very complex. DW will create the FFD file with zoned type. But the current version of DW cannot read zoned types in FFD! You have to manually change “zoned” to “decimal” in the FFD file in order for DW to recognize it in the studio.
Then after DW successfully loads the FFD file with “decimal” type, you need go back to the FFD file again, and change the “decimal” back to “zoned”. Yes, you heard me right. It appears to be a bug with DW at this moment. Until the bug is fixed, you have to flip-flop the zoned types in the FFD file.
If you really want to know, here is my limited insight of the bug: DW in the studio does not support zoned type during design time. However, at runtime, zoned type is supported. If you get it, that’s cool. If you don’t, never mind; please just flip-flop between the zoned and decimal type and let’s move on.
We are far from done here. The “zoned” numbers need to encode the number signs with the last digit of the number value http://simotime.com/datazd01.htm.
Here are a few examples numbers and encoded values with “EBDICS” and “ASCII” encoding:
COBOL format | FFD | Original number | EBCDIC | ASCII |
S9(5) | Decimal | 123 | 0012C | 00123 |
-123 | 0012L | 0012s | ||
12345 | 1234E | 12345 | ||
-12345 | 1234N | 1234u | ||
S9(5)V9(2) | Zoned, implicit 2 | 123 | 001230{ | 0012300 |
-123 | 001230} | 001230p | ||
12345 | 123450E | 1234500 | ||
-12345 | 123450N | 123450u | ||
14.18 | 000141H | 0001418 | ||
-14.18 | 000141Q | 000141x | ||
S9(5)V9(3) | Zoned, implicit 3 | 14.18 | 0001418{ | 00014180 |
-14.18 | 0001418} | 0001418p |
Long story short, as of now, Mule DW only supports EBCDIC encoding. If your client uses ASCII encoding (also called MicroFocus), then you are out of luck.
That’s what happened to me. So after all the trouble figuring out how and what, I am unable to use DW Copybook. Mule DW may support ASCII encoding in the future. For now, I ended up using a customized Java solution.
Hi, It is a nice article and explains well about copy book ,flat file and mule DW. I am doing a project where we are placing a message on IBM MQ but the MQ expects us to send the message in EBCDIC we tried using encoding =”cp500″ and did not had any success with this. We are using Mule 4. and did not find any article on how to do this in mule 4. Let us know if you have any suggestions or work around.