Data model patterns identify common model structures and define how they should look and interact with other parts of the model. PowerDesigner can assist in this process by automating the setup of objects (tables, columns, mappings) based on the pattern to be applied and ensuring consistency through the use of custom model checks.
Anyone who’s used PD for any length of time has wondered about the ubiquitous “stereotype” attribute of almost every object in the model. PD’s stereotype facility allows the modeler to specify a general classification or pattern for the object.
Stereotypes are defined in extended model definition (XEM) files and can have nearly anything attached to them – additional attributes, collections, methods, templates, model checks, etc. Stereotypes are hierarchical in nature, so generalization is possible. For instance, we use column stereotypes to designate metadata columns with the “Metadata” stereotype. Other stereotypes extend “Metadata” as shown.
Note that each object can have only one stereotype applied to it, so choose wisely. If what you’re after isn’t a design pattern (a decision made by the modeler) but rather is simply an observation based on a business or technical rule, use a “Criteria” instead. For instance we don’t use stereotypes to indicate materialized views (or MQTs in DB2) – we can add a “criteria” with most of the same functionality as a stereotype and leave the stereotype free for a pattern designation.
Once you’ve identified a common pattern rules can be implemented to speed the modeling process. Here are a few things you can do:
Applying high level model functionality is one of the great values of using a capable modeling tool and can streamline the modeling process greatly. So, don’t settle for using your modeling tool as a glorified diagramming tool. Put it to work!
]]>In its most basic form, deploying a data model means simply applying the model directly to a database via ODBC or a DDL script. But that neglects much of that value in your model beyond the physical structure.
Here are the steps in our model deployment process. We’ve scripted these in the build tool, but they could also be scripted directly in an extended model definition (XEM).
Here’s a few utility scripts I’ve developed to aid in your XEM scripting:
Many functions, especially members of ExtensibleObject, require the “scope” or code of the XEM be included when referencing extended attributes, methods, collections, etc. I’ve added these lines to the global script section as a helper:
' Extension navigation. All references to extended attributes and methods
' must use the EXT() function when referencing the name
const EXTNAME = "PerficientStds"
Function EXT(iden)
EXT = EXTNAME + "." + iden
End Function
Whenever I reference an extended object, I use EXT(extCustomAttribute)
.
PowerDesigner has the concept of a “shortcut”, a reference to an object. As you navigate the model, you’ll often find shortcuts where you expected objects and even shortcuts to shortcuts. Thus:
' Return the base object, dereferencing any shortcuts
Function TgtObj(obj)
If Not IsObject(obj) Then Set TgtObj = Nothing : Exit Function
Set TgtObj = obj
While TgtObj.IsShortcut()
Set TgtObj = TgtObj.TargetObject
Wend
End Function
]]>
Once you’ve invested the time and effort into creating really solid, well documented data models, it would be really great to let some folks know about it, right!?!? If you’re using PowerDesigner, you’ve got a few options:
We’re generally using option 3 in an attempt to integrate design metadata (from the model) with implementation metadata. This allows us to check if our implementation matches the design and present a more complete view of the system.
To make #3 work, we’re using extended generation of files. This is a nice feature of PowerDesigner that allows you to add file definitions for any type of object to your XEM and then generate output based on those definitions. For use, we export CSV files for tables, columns, entity, attributes, table and column mappings, business rules, etc.
Here’s the table template:
.foreach_item(Tables) .ifnot (%IsShortcut%) %Model.Code%,[%Owner%?%Owner.Code%],%Code%,%Name%, .if %Comment% " .replace("\"","") %Comment% .endreplace " .endif \n .endif .next .foreach_item(Packages) %extGenMDRTable% .next
The template runs in the context of a package and recursively traverses child packages to export all table data. That’s all there is to it! Now, under the Tools->Extended Generation menu, I have the option to export all sorts of metadata in a format that is easily loaded into our metadata repository.
]]>
We’re constantly looking for ways to streamline the BI development process. Data mapping (source to target) can be a complex and cumbersome process due to:
Bringing the mapping process into the data modeling tool alleviates all these (albeit with a few gotchas) by providing a controlled environment that includes the critical dependencies (the models!).
PowerDesigner includes direct support for data mapping in the base product (Tools->Mapping Editor). For physical models (where we do our mapping, although you could do it at the conceptual or logical level), mappings are defined at the table and column levels. Since we generally try to create one ETL package per target table, this creates a natural relationship between the ETL design and implementation.
To create a mapping in PowerDesigner you need a source model (generally we reverse engineer the source database into a dedicated model) to add to the mapping editor. Once it’s there, just drag and drop between source tables or columns and the targets. Attributes of the mappings can include anything (through the customization magic of extended model definitions) and out-of-the-box have a place for selection criteria (source filter conditions), column mapping expressions, and comments everywhere. From these limited items, PD will create source SELECT statements that could be used directly to source the data (although anything complex will quickly go beyond this basic capability).
In addition, we add validation rules or other business rules, including the action on failure. We add historical tracking options. And we add information about job scheduling if needed.
The goal here is to communicate consistently and clearly to the ETL development team how data it to move between systems. We publish this information using PD’s reporting facility, including images of the models, metadata about the tables, and the mapping themselves. Viola: ETL specifications directly from the modeling tool. We then supplement this information in the wiki with things like flow diagrams where needed.
The other HUGE benefit of mapping this way is that we now have discrete design lineage information that can be published to a metadata wiki. Users can see where any data element ultimately came from and what things influence its value. Additionally, if your ETL tool provides implementation (technical) lineage, we can write automated tests to compare design lineage to implementation lineage to ensure the ETL packages are correct.
Speaking of tests, it becomes trivial to look for (“model check”) things like incompatible data types, missing transformation rules, missing validation rules, etc.
Finally, collocating model and mapping changes means that determining the impact of a change is much easier, and model change logs can include both structural and procedural changes.
There are a few gotchas however:
Altogether, I think that the benefits far outweigh the costs, especially as we push more towards transparency in metadata and excellent communication on our BI projects.
]]>In our standards XEM file (extended model definition), I’ve added column ordering functionality by categorizing each column and then ordering the categories.
Column groups are defined in a BaseColumn template. We have:
PRIM,0,%Primary% ROWMETA,3,%extIsRowDateMeta% META,4,%extIsMetaColumn% AKNONMETA,1,%AllKeys% BASE,2,TRUE
Basically, the first value is the name of the group, the second is the rank of the group in the table, and the remainder is a template expression that can be evaluated in the column context as a boolean expression. Each line is evaluated until a true is found. So, if a column is %Primary%, the group is PRIM and will be placed first (rank 0) in the table.
Note that I’ve added extended attributes as needed for non-trivial cases, such as whether or not a column is a meta-data column. In our case this is based on the stereotype of the column’s domain.
There’s a catch-all at the end for all other columns that don’t meet any other criteria.
The function that evaluates all this is as follows:
Function %Get%(col) Dim ln For Each ln In Split(col.Model.FindMetaExtensionByName(col, cls_TemplateTargetItem, EXT("extColumnGroups")).Value, vbCrLf) If Len(Trim(ln)) > 0 Then Dim parts : parts = Split(ln,",") If UCase(col.EvaluateTextFor(parts(2),EXTNAME)) = "TRUE" Then %Get% = parts(0) Exit Function End If End If Next %Get% = "None" End Function
Basically, split the template by line and then by column and evaluate each line in turn. extColumnGroups is the name of the template from above, EXTNAME is the name of our XEM. All this goes into the Get section of an extended attribute (here called extColGroupName).
Finding the rank is the same, just change the line to
%Get% = parts(1)
]]>
Setting up a new data governance program means bringing deep change to the operating culture of an organization. As a result, getting traction for an enterprise scale effort may not always be the pragmatic route. Sometimes its better to be small and excellent here and let the enterprise adopt governance organically.
The concept of limited or localized data governance has a couple of key tenants developed to penetrate a reluctant or even resistant organization:
This process is tightly coupled with the development team to support it directly and eliminates many of the lower value or longer developing benefits of larger governance groups in favor of quick turnaround and non-blocking progress.
]]>I love PowerDesigner. It’s the Cadillac (Mercedes, BMW, etc.) of the modeling tools. I love that I can do conceptual, logical, and physical data modeling. I can do business process modeling with direct linkage to my data models. And I can model mappings between everything for both data flow relationships (ETL mappings) and design flow relationships (entities become tables become dimensions, etc.).
But my favorite feature is the ability to set standards and enforce them directly in the tool using extended model definitions (XEMs).
PowerDesigner is built upon a COM based object model. Everything is an object, derived from the aptly named “BaseObject” class and can be read and manipulated via any COM-aware tool or language. This allows us to script things in VBScript right in the tool, or use VB.NET or even Groovy anywhere in the environment.
Here’s just a few of the things that we’re doing in our standards XEM:
All of these rules are checked by custom model checks and automatic fixes are available for many of them.
Think of the time saved by never deploying a model with one of the above (minor) issues. You never have to choose between living with a little non-standard column ordering or redeploying the entire model!
Stay tuned for some tips and tricks posts about how we made these work!
]]>To build a custom plugin in groovy, do this. I know this seems redundant, but it wasn’t as clear as it should have been.
1. Create a project directory. Just a plain ol’ directory. Wherever you like.
2. Add the source file subdirectories:
src\main\groovy\… (with the package path you’d like to use. We have src\main\groovy\com\perficient\gradle)
3. Create a plugin class (in a file named <classname>.groovy, we have DatabasePlugin.groovy here).
package com.perficient.gradle import org.gradle.api.Project import org.gradle.api.Plugin class DatabasePlugin implements Plugin<Project> { void apply(Project project) { project.convention.plugins.database = new DatabaseConvention(project) // if you have a custom convention class. Omit if not.
// Configure your tasks here. Add as many as you need. project.task('build') { description = 'Configure the environment with the specific environment code/name.' } project.task('build') << { println "Add actions like this if needed." } // All the normal task configuration (dependsOn, Type) can go as the first param. project.task(dependsOn: 'build', 'test') { decription = 'Run tests after building the database.' } } }
4. Add a properties file in src\main\resources\META-INF\gradle-plugins with the name of your plugin. We have Database.properties and AgileBiEnv.properties in here. In these files add one line (modified as needed):
implementation-class=com.perficient.gradle.DatabasePlugin
5. Add a build.gradle file to the plugin root:
apply plugin: 'groovy' dependencies { compile gradleApi() groovy localGroovy() }
Once you start getting non-trivial, you may need to add external jars to the classpath (add this to build.gradle):
sourceSets { main { groovy { compileClasspath += fileTree(dir: '../../../tools/lib', includes:['*.jar']) } } }
And, if you want the jar to end up somewhere specific:
jar { destinationDir = new File('../../lib') }
6. Finally, add a settings.gradle file to the root to set the name of the jar you’ll create:
rootProject.name = 'AgileBi'
You’re all set! Run “gradle build” from the command line in the plugin root directory and you should have a shiny new plugin jar to use in your build.
OK, I have a jar. Now what?
Elsewhere in your system, you’ll have a place where you want to use your plugin. In the build.gradle there, add this:
buildscript { dependencies { classpath fileTree(dir: '../system/build/lib', include: '*.jar') } } apply plugin: 'Database'
When you run “gradle tasks” from this directory, you should now see the tasks you created in your plugins Apply() function.
]]>
Quick tip of the day:
Use Gradle plugins to package up functionality for easy reuse. For example, we developed a simple “database” plugin to handle the common tasks associated with building and upgrading a database. Things like:
Now, all we do to enable a database in our environment is add a build.gradle file to the database directory (with ddl/, sp/, and data/ subdirectories) with:
apply plugin: 'Database'
Away we go!
]]>While the Gradle manual is extensive, since we’re not building an executable or the like from source code, many of the concepts are a little opaque. Here’s a quick primer on Gradle for BI:
Gradle is “project” based, with each project containing a set of inter-dependent tasks which in turn contain actions:
The whole thing is configured in a build.gradle file. Wherever you put this file becomes the root of your project. In that file you can manually add tasks and/or apply plugin(s). Plugins then in turn add tasks relevant to their purpose.
Things that we’ve found helpful in developing for BI:
We’ve chosen Gradle as our build system for our iterative BI environment. It’s a powerful tool, but there’s a bunch of awesomeness in there.
Gradle uses Groovy as its scripting language. Groovy is just plain great. You get the power of the Java platform in a scripting language and can do things like this:
def client = new XMLRPCServerProxy("http://localhost:8090/rpc/xmlrpc").confluence2
def session = client.login(user, pass)
client.getPages(session, "MyWiki").each { page ->
println "${page.title} was authored by ${page.author}"
}
client.logout(session)
5 lines of code to access the XML-RPC interface of Confluence and print out all the pages and their authors. Easy like it should be. And, we can do stuff like this in our builds. All of the sudden, checking the version of a database before applying a change script is no big deal. Checking if a wiki page has been modified before updating it is a cinch.
Our build system does:
We’re excited about the possibilities this opens up in terms of accelerating the pace of development on our teams and allowing for more parallel development than ever. It’s a game-changer down in the trenches!
]]>