We’re constantly looking for ways to streamline the BI development process. Data mapping (source to target) can be a complex and cumbersome process due to:
- Ongoing model changes – the target moves!
- Poor (or “loose”) communication between those mapping and the development team.
- Lack of consistency in collecting and documenting mappings (Excel, I’m looking at you!). “What does a green background mean again?”
Bringing the mapping process into the data modeling tool alleviates all these (albeit with a few gotchas) by providing a controlled environment that includes the critical dependencies (the models!).
PowerDesigner includes direct support for data mapping in the base product (Tools->Mapping Editor). For physical models (where we do our mapping, although you could do it at the conceptual or logical level), mappings are defined at the table and column levels. Since we generally try to create one ETL package per target table, this creates a natural relationship between the ETL design and implementation.
To create a mapping in PowerDesigner you need a source model (generally we reverse engineer the source database into a dedicated model) to add to the mapping editor. Once it’s there, just drag and drop between source tables or columns and the targets. Attributes of the mappings can include anything (through the customization magic of extended model definitions) and out-of-the-box have a place for selection criteria (source filter conditions), column mapping expressions, and comments everywhere. From these limited items, PD will create source SELECT statements that could be used directly to source the data (although anything complex will quickly go beyond this basic capability).
In addition, we add validation rules or other business rules, including the action on failure. We add historical tracking options. And we add information about job scheduling if needed.
The goal here is to communicate consistently and clearly to the ETL development team how data it to move between systems. We publish this information using PD’s reporting facility, including images of the models, metadata about the tables, and the mapping themselves. Viola: ETL specifications directly from the modeling tool. We then supplement this information in the wiki with things like flow diagrams where needed.
The other HUGE benefit of mapping this way is that we now have discrete design lineage information that can be published to a metadata wiki. Users can see where any data element ultimately came from and what things influence its value. Additionally, if your ETL tool provides implementation (technical) lineage, we can write automated tests to compare design lineage to implementation lineage to ensure the ETL packages are correct.
Speaking of tests, it becomes trivial to look for (“model check”) things like incompatible data types, missing transformation rules, missing validation rules, etc.
Finally, collocating model and mapping changes means that determining the impact of a change is much easier, and model change logs can include both structural and procedural changes.
There are a few gotchas however:
- PowerDesigner can be difficult to navigate for new users, and you’re expanding the number of people who need to directly interact with the model.
- Keeping models in sync with multiple editors can be a challenge. Fortunately the “model merge” facility works quite well and can even be incorporated into the source control process via COM scripting.
- PD is expensive, and you’re going to need more licenses (assuming you’ll have more people using it).
Altogether, I think that the benefits far outweigh the costs, especially as we push more towards transparency in metadata and excellent communication on our BI projects.