We’re constantly looking for ways to streamline the BI development process. Data mapping (source to target) can be a complex and cumbersome process due to:
- Ongoing model changes – the target moves!
- Poor (or “loose”) communication between those mapping and the development team.
- Lack of consistency in collecting and documenting mappings (Excel, I’m looking at you!). “What does a green background mean again?”
Bringing the mapping process into the data modeling tool alleviates all these (albeit with a few gotchas) by providing a controlled environment that includes the critical dependencies (the models!).
PowerDesigner includes direct support for data mapping in the base product (Tools->Mapping Editor). For physical models (where we do our mapping, although you could do it at the conceptual or logical level), mappings are defined at the table and column levels. Since we generally try to create one ETL package per target table, this creates a natural relationship between the ETL design and implementation.
To create a mapping in PowerDesigner you need a source model (generally we reverse engineer the source database into a dedicated model) to add to the mapping editor. Once it’s there, just drag and drop between source tables or columns and the targets. Attributes of the mappings can include anything (through the customization magic of extended model definitions) and out-of-the-box have a place for selection criteria (source filter conditions), column mapping expressions, and comments everywhere. From these limited items, PD will create source SELECT statements that could be used directly to source the data (although anything complex will quickly go beyond this basic capability).
The Future of Big Data
With some guidance, you can craft a data platform that is right for your organization’s needs and gets the most return from your data capital.
In addition, we add validation rules or other business rules, including the action on failure. We add historical tracking options. And we add information about job scheduling if needed.
The goal here is to communicate consistently and clearly to the ETL development team how data it to move between systems. We publish this information using PD’s reporting facility, including images of the models, metadata about the tables, and the mapping themselves. Viola: ETL specifications directly from the modeling tool. We then supplement this information in the wiki with things like flow diagrams where needed.
The other HUGE benefit of mapping this way is that we now have discrete design lineage information that can be published to a metadata wiki. Users can see where any data element ultimately came from and what things influence its value. Additionally, if your ETL tool provides implementation (technical) lineage, we can write automated tests to compare design lineage to implementation lineage to ensure the ETL packages are correct.
Speaking of tests, it becomes trivial to look for (“model check”) things like incompatible data types, missing transformation rules, missing validation rules, etc.
Finally, collocating model and mapping changes means that determining the impact of a change is much easier, and model change logs can include both structural and procedural changes.
There are a few gotchas however:
- PowerDesigner can be difficult to navigate for new users, and you’re expanding the number of people who need to directly interact with the model.
- Keeping models in sync with multiple editors can be a challenge. Fortunately the “model merge” facility works quite well and can even be incorporated into the source control process via COM scripting.
- PD is expensive, and you’re going to need more licenses (assuming you’ll have more people using it).
Altogether, I think that the benefits far outweigh the costs, especially as we push more towards transparency in metadata and excellent communication on our BI projects.
How do you report on mapping rules, such as instructions to parse the first name from before the first comma in a field, etc. I tried mapping notes but it’s not an option the report editor. Any help greatly appreciated.
We’re using the comments fields at the table mapping level for general table-level comments (things that affect grain, general flow description) and the comment field at the column mapping level for individual transform rules. So, in the Mapping Editor, when a target table is selected, navigate to the “Base Column Mapping” tab, make sure Comment is visible, and add your mapping rule there.
We’ve also extended the model with Extended Sub-Objects (using an XEM) for more detail on table and column level mappings. By default, there is one table mapping per target table. We’ve added an extended composition with one object per source table. This allows us to add things like join type and join criteria on a source table by table basis. Probably a future blog post once crunch time at my current client is done!
This is interesting, not in the least because I am in a project implementing Power Designer for a DWH which requires mappings to be recorded in the LDM’s (PDM’s won’t be used because we are working in a SAP BW environment).
In our case, by the way, a mapping can also have only one target table, but a target tabel can have multiple mappings pointing to it (where a target attribute is populated by integrating input from multiple source systems).
And we also use extended sub objects to store information about aggregation rules and conditions.
I think Power Designer offers great possibilities to model and document data models and mappings in an integrated and controlled manner.
Of course, in a DWH, data is moved over multiple layers from source system to datamart(s) and reports.
My challenge right now is how to report on data lineage, that is: how do I create a report that shows a KPI from a report and its links (backwards through all DWH layers) to one or more source attributes – and all this in one report.
First, each step needs to be mapped in the model. We map only the logical hops, that is, we don’t map the staging table, ETL work tables, etc., but rather just the source to the ODS to the EDW to the cube to the report.
There is some capability to do multi-generational lineage within PowerDesigner (have all the models open and run a data lineage report). However, more practically, we extract the mappings to a metadata database and report from there. A “closure” table (as described by Kimball) is most effective for lineage or hierarchy reporting in my opinion. A nice benefit with this approach is the ability (if your ETL tool supports it) to compare design lineage (from PD) to technical lineage (from your ETL tool) using your automated test platform.
Nice! We also use Powerdesigner in our DWH projects. In our case we do lots more in powerdesigner. After that we generate our complete ETL code for stage, dwh and datamarts for different ETL tools. Such as Oracle OWB, Informatica, IBM datatstage en MS SSIS.
So we save a lot of time and have neat code!
So, automatic code generation from the modeling tool. I was aware that some capability existed within the Sybase stack. Is the code generation via a custom connector or a commercial tool?
We developed our own generator based on the XML output of Powerdesigner. From these XML files we generate the different ETL mappings in the different ETL tools. So the customer has a rich, well documented and methodic design in Powerdesigner. And a high quality ETL code (without any errors). Also it fits super in a agile and changing environment, because the only thing you adjust is the powerdesigner models. This generator tool is free of charge for our customers. There is one request, we do the implementation and then the customer gets the benefits. This is our USP against outsourcing to India.
I am testing this feature and it looks very useful. However reporting seems to be challenging trying to get output that shows both source and target columns. Can you provide a report that you’ve written that accomplishes this?
Gary,
Just use List Report from Report menu, select e.g. Column mapping in case of PDM models, and you can easily generate html, xml, csv=xls reports.