Ah – my favorite topic in BI tools – data modeling and the modeling tool. An effective data modeling tool is the primary design tool for the database tier of the BI stack and may be extended to capture the design of nearly any system component. It’s an essential tool in the belt of all but the smallest of BI groups.
There are lots of modeling and diagramming tools out there. Lots. So it’s important to have a solid list of expectations of your modeling tool. These expectations must be directly linked to the realities of what your team can maintain and enforce long term. Defining a set of modeling standards is a good first step in this process as it will define things like:
- how much and what kind of metadata will be captured,
- which roles on the team will be expected to use the modeling tool (only the architect, or the analysts and/or developers as well?), and
- how will the modeling tool be incorporated into the overall SDLC and release processes?
Here’s my checklist of features:
Physical Data Modeling – yep, gotta have it. Includes the ability to generate readable DDL for your DBMS, preferably including incremental DDL (ALTER statements). Also should include reverse engineering of a schema and model to DB comparison. If this is all you need, a database design tool will get you there quickly and cheaply.
Logical and Conceptual Modeling – more towards the business requirements side of design, do you want to be able to model information more abstractly or closer to the business requirements. In some tools this means a completely separate model. In some, it’s just a different view. The ability to handle logical and conceptual modeling is really the starting line for a full featured data modeling tool. Note that you need to have a person slated to do this more business oriented modeling that will be comfortable using a modeling tool. If you’re thinking “the DBA” right now, I’ll be tossing a red flag in the air.
Multidimensional or OLAP Modeling – If your environment does or will include OLAP components such as cubes, the capability to model them in the tool may be important.
Business Process Modeling – now we’re a little out in left field, but the biggest and the best allow you to model processes, then attach information to the processes (via a conceptual model) and then translate all that into a physical model. Unless you’re already doing BPM, this is probably a “nice to have, if it’s free.” However in more process driven organizations, BPM can bring value.
Information Flow Mapping – or “source to target mapping” if you prefer. When you’ve gone through the effort of modeling everything, wouldn’t it be nice not to have to copy everything to Excel and try and keep it all in sync. More capable modeling tools make flow mapping a drag, drop, and comment experience.
Lineage and Impact Analysis – Closely related to information flow mapping, this is is ability to model both information flow (source to target) and design lineage (conceptual to logical to physical/relational to OLAP/dimensional). Unless covered in the metadata platform, the modeling tool may be the only place where the impact of design changes can be evaluated to determine the scope and therefore the cost of a change.
Metadata Integration – at very least, the modeling tools should be able to export and/or publish design metadata that it captures either directly or via tight integration with your metadata platform. A data dictionary is the minimum, and in many cases we’ve extended the output to include ETL design specifications (structure + mappings) and test specifications (structure + rules). The ability to compare design metadata from the model(s) to technical metadata from the DBMS and other tools can be enormously valuable, especially when paired with automated test and automated build tools, and can all but eliminate time wasted on inconsistent environment builds (e.g. missing tables, missing columns, missing static data, etc.).
Scripting – Managing large models and keeping standards enforcement consistent can be daunting. The addition of scripting to a modeling tool allows for automated testing/standards enforcement as well as large scale, scripted modifications. Personally, this is one of my top “must have” features for a modeling tool because of its potential to increase my productivity as an architect and limit the small design flaws (like missing metadata columns) that eat up development cycles.
Source Control – As the model (and the team) grows and grows, keeping track of the design version of the model vs. the test vs. the production version starts to get tricky, similarly to all the other artifacts in the environment. The modeling tool needs to be able to integrate with the source control tool in a meaningful way including being able to handle branching (and merging).
This tool class is one where the maturity of your team plays a big role in determining what tool is a good fit. There is little benefit to purchasing a $50K tool and using $2K of its features. The flip side is that converting tools mid-stream is a painful process that inevitably leads to missing metadata for earlier parts of the system.
Again, I’m not going to try and put together an extensive list of modeling tools as there are many. However, I will comment on the ones I’ve used directly.
I started out on ERWin during the largest project (125+ people) I’ve worked. We fought with it constantly, but it got the job done. Although it’s come a long way since then (2005), ERWin is primarily a data modeling tool that handles physical modeling (with a logical “view”).
On the free/open side, FabForce’s DBDesigner is available and does an acceptable job of database design. I’ve used it during small projects and for demos, but I’d be nervous about a large scale deployment. Also, it prefers MySQL although support for other DBMS platforms is included.
Hands down my top choice for modeling is PowerDesigner by Sybase/SAP. It’s enormously powerful and covers every one of the above criteria and more. PowerDesigner is based on an extensible object model that is exposed via a scripting engine and OLE automation to allow integration with any other part of a development environment. And it’s ability to generate technical output (DDL, metadata, technical reports, etc.) is top shelf. The primary drawbacks are the learning curve that comes with such a powerful tool and the cost. As I mentioned above, unless you make use of its power to justify the expense this one may be overkill.
A modeling tool is the center of the design process for BI/DW teams and should be considered early in the life of your team. The formula for success here is to determine how you’ll use the tool, pick a tool that meets those requirements, and then launch the never-ending battle to ensure consistent use of the tool as the team faces inevitable challenges, deadlines, and growth.