Chris Grenz, Author at Perficient Blogs

PowerDesigner Tips – Model Patterns

Wed, 22 Aug 2012 13:00:56 +0000

Data model patterns identify common model structures and define how they should look and interact with other parts of the model. PowerDesigner can assist in this process by automating the setup of objects (tables, columns, mappings) based on the pattern to be applied and ensuring consistency through the use of custom model checks.

Stereotypes

Anyone who’s used PD for any length of time has wondered about the ubiquitous “stereotype” attribute of almost every object in the model. PD’s stereotype facility allows the modeler to specify a general classification or pattern for the object.

Stereotypes are defined in extended model definition (XEM) files and can have nearly anything attached to them – additional attributes, collections, methods, templates, model checks, etc. Stereotypes are hierarchical in nature, so generalization is possible. For instance, we use column stereotypes to designate metadata columns with the “Metadata” stereotype. Other stereotypes extend “Metadata” as shown.

Note that each object can have only one stereotype applied to it, so choose wisely. If what you’re after isn’t a design pattern (a decision made by the modeler) but rather is simply an observation based on a business or technical rule, use a “Criteria” instead. For instance we don’t use stereotypes to indicate materialized views (or MQTs in DB2) – we can add a “criteria” with most of the same functionality as a stereotype and leave the stereotype free for a pattern designation.

Ideas for Implementing Model Patterns

Once you’ve identified a common pattern rules can be implemented to speed the modeling process. Here are a few things you can do:

Determine a set of required columns for the pattern. For instance, a Type2 SCD requires a pair of row dates. So, when the Type2SCD stereotype is selected for a table, our XEM adds these columns. We also add a pair of surrogate key columns if they’re not already available.
Add/update required keys. Our Type 2 SCDs have a primary surrogate key and alternate keys on the business key + each of the row dates. The XEM ensures that these 3 keys are available and correct.
Modify the join condition of a foreign key. For FKs that join to Type 2 SCDs, we don’t want to include the row dates in the join (and also don’t want those dates propagated to the child table). XEM logic removes these columns from the join automatically.
Determine standard naming. If your standard says lookup tables are prefixed with “LU”, make it so as part of your stereotype. Note that the stereotype hierarchy allows object overriding as well, so you can add a “stdName” template to each level of the stereotype hierarchy and the most specific version will be applied.
Generate dependent objects. All our Type 2 tables include 2 additional views – the current view and the historical view. The XEM ensures that these exist and stay up to date.
Custom column ordering. The configuration of any standard can potentially be overridden by replacing the setup with sometime specific to the stereotype.
Check all this. The XEM includes custom model checks to verify compliance with all the model pattern standard.

Now we’re cookin’ with gas!

Applying high level model functionality is one of the great values of using a capable modeling tool and can streamline the modeling process greatly. So, don’t settle for using your modeling tool as a glorified diagramming tool. Put it to work!

]]>

PowerDesigner Tips – Deploying a Model

Tue, 21 Aug 2012 13:00:42 +0000

In its most basic form, deploying a data model means simply applying the model directly to a database via ODBC or a DDL script. But that neglects much of that value in your model beyond the physical structure.

Here are the steps in our model deployment process. We’ve scripted these in the build tool, but they could also be scripted directly in an extended model definition (XEM).

Run a complete model check and stop if any model errors are encountered.
Set the model version number, save the model, and commit it to source control.
Generate the CREATE DATABASE statement into an independent file.
Generate DDL for the structure of the database without foreign keys and without DROP statements. This is used only for a clean build or for model comparison with a database management tool such as Redgate.
Generate DDL for foreign keys (this allows us to drop FKs during utility/static data loads and then reapply them easily).
Output images of the model diagrams (for publication to the wiki).
Export model metadata in CSV format (for loading into the MDR and eventually publication to the wiki). If you’re collecting/storing business rules in PD, this step is critical to the ETL process.
Generate the ETL specifications report.
Generate the changelog report based on a model compare against the previous model deployment.
Generate incremental DDL (ALTER statements) for incremental upgrades of the database (actually, this happens independently based on the build system’s requests).
Run any additional reports or list reports designated as part of the deployment.

]]>

PowerDesigner Tips – Utility Scripts

Thu, 16 Aug 2012 13:00:24 +0000

Here’s a few utility scripts I’ve developed to aid in your XEM scripting:

Scope

Many functions, especially members of ExtensibleObject, require the “scope” or code of the XEM be included when referencing extended attributes, methods, collections, etc. I’ve added these lines to the global script section as a helper:

' Extension navigation. All references to extended attributes and methods

' must use the EXT() function when referencing the name

const EXTNAME = "PerficientStds"

Function EXT(iden)

EXT = EXTNAME + "." + iden

End Function

Whenever I reference an extended object, I use EXT(extCustomAttribute).

Shortcuts

PowerDesigner has the concept of a “shortcut”, a reference to an object. As you navigate the model, you’ll often find shortcuts where you expected objects and even shortcuts to shortcuts. Thus:

' Return the base object, dereferencing any shortcuts

Function TgtObj(obj)

If Not IsObject(obj) Then Set TgtObj = Nothing : Exit Function

Set TgtObj = obj

While TgtObj.IsShortcut()

Set TgtObj = TgtObj.TargetObject

Wend

End Function

]]>

PowerDesigner Tips – Exporting Model Metadata

Wed, 15 Aug 2012 13:00:21 +0000

Once you’ve invested the time and effort into creating really solid, well documented data models, it would be really great to let some folks know about it, right!?!? If you’re using PowerDesigner, you’ve got a few options:

Use the included (in some editions) web repository. This requires the use of the model repository and displays the contents of your model(s) on a web site. The drawbacks are a little bit clunky setup process, very limited customization, and almost no capability to integrate non-model data into the pages. Also, getting the style to match anything else you might be going to communicate to users is a mammoth task.
Use a third party tool that can read PowerDesigner models. Great if you have one or can buy one.
Extract data from the model to load into whatever metadata system you have.

We’re generally using option 3 in an attempt to integrate design metadata (from the model) with implementation metadata. This allows us to check if our implementation matches the design and present a more complete view of the system.

To make #3 work, we’re using extended generation of files. This is a nice feature of PowerDesigner that allows you to add file definitions for any type of object to your XEM and then generate output based on those definitions. For use, we export CSV files for tables, columns, entity, attributes, table and column mappings, business rules, etc.

Here’s the table template:

.foreach_item(Tables)
   .ifnot (%IsShortcut%)
%Model.Code%,[%Owner%?%Owner.Code%],%Code%,%Name%,
      .if %Comment%
"
         .replace("\"","")
%Comment%
         .endreplace
"      
      .endif
\n
   .endif
.next
.foreach_item(Packages)
%extGenMDRTable%
.next

The template runs in the context of a package and recursively traverses child packages to export all table data. That’s all there is to it! Now, under the Tools->Extended Generation menu, I have the option to export all sorts of metadata in a format that is easily loaded into our metadata repository.

]]>

PowerDesigner Tips – Mappings in the Model

Tue, 14 Aug 2012 13:00:15 +0000

We’re constantly looking for ways to streamline the BI development process. Data mapping (source to target) can be a complex and cumbersome process due to:

Ongoing model changes – the target moves!
Poor (or “loose”) communication between those mapping and the development team.
Lack of consistency in collecting and documenting mappings (Excel, I’m looking at you!). “What does a green background mean again?”

Bringing the mapping process into the data modeling tool alleviates all these (albeit with a few gotchas) by providing a controlled environment that includes the critical dependencies (the models!).

PowerDesigner includes direct support for data mapping in the base product (Tools->Mapping Editor). For physical models (where we do our mapping, although you could do it at the conceptual or logical level), mappings are defined at the table and column levels. Since we generally try to create one ETL package per target table, this creates a natural relationship between the ETL design and implementation.

To create a mapping in PowerDesigner you need a source model (generally we reverse engineer the source database into a dedicated model) to add to the mapping editor. Once it’s there, just drag and drop between source tables or columns and the targets. Attributes of the mappings can include anything (through the customization magic of extended model definitions) and out-of-the-box have a place for selection criteria (source filter conditions), column mapping expressions, and comments everywhere. From these limited items, PD will create source SELECT statements that could be used directly to source the data (although anything complex will quickly go beyond this basic capability).

In addition, we add validation rules or other business rules, including the action on failure. We add historical tracking options. And we add information about job scheduling if needed.

The goal here is to communicate consistently and clearly to the ETL development team how data it to move between systems. We publish this information using PD’s reporting facility, including images of the models, metadata about the tables, and the mapping themselves. Viola: ETL specifications directly from the modeling tool. We then supplement this information in the wiki with things like flow diagrams where needed.

The other HUGE benefit of mapping this way is that we now have discrete design lineage information that can be published to a metadata wiki. Users can see where any data element ultimately came from and what things influence its value. Additionally, if your ETL tool provides implementation (technical) lineage, we can write automated tests to compare design lineage to implementation lineage to ensure the ETL packages are correct.

Speaking of tests, it becomes trivial to look for (“model check”) things like incompatible data types, missing transformation rules, missing validation rules, etc.

Finally, collocating model and mapping changes means that determining the impact of a change is much easier, and model change logs can include both structural and procedural changes.

There are a few gotchas however:

PowerDesigner can be difficult to navigate for new users, and you’re expanding the number of people who need to directly interact with the model.
Keeping models in sync with multiple editors can be a challenge. Fortunately the “model merge” facility works quite well and can even be incorporated into the source control process via COM scripting.
PD is expensive, and you’re going to need more licenses (assuming you’ll have more people using it).

Altogether, I think that the benefits far outweigh the costs, especially as we push more towards transparency in metadata and excellent communication on our BI projects.

]]>

PowerDesigner Tips – Column Ordering

Thu, 09 Aug 2012 13:00:37 +0000

In our standards XEM file (extended model definition), I’ve added column ordering functionality by categorizing each column and then ordering the categories.

Column groups are defined in a BaseColumn template. We have:

PRIM,0,%Primary%
ROWMETA,3,%extIsRowDateMeta%
META,4,%extIsMetaColumn%
AKNONMETA,1,%AllKeys%
BASE,2,TRUE

Basically, the first value is the name of the group, the second is the rank of the group in the table, and the remainder is a template expression that can be evaluated in the column context as a boolean expression. Each line is evaluated until a true is found. So, if a column is %Primary%, the group is PRIM and will be placed first (rank 0) in the table.

Note that I’ve added extended attributes as needed for non-trivial cases, such as whether or not a column is a meta-data column. In our case this is based on the stereotype of the column’s domain.

There’s a catch-all at the end for all other columns that don’t meet any other criteria.

The function that evaluates all this is as follows:

Function %Get%(col)
   Dim ln
   For Each ln In Split(col.Model.FindMetaExtensionByName(col, cls_TemplateTargetItem, EXT("extColumnGroups")).Value, vbCrLf)
      If Len(Trim(ln)) > 0 Then
         Dim parts : parts = Split(ln,",")
         If UCase(col.EvaluateTextFor(parts(2),EXTNAME)) = "TRUE" Then
            %Get% = parts(0)
            Exit Function
         End If
      End If
   Next
   %Get% = "None"
End Function

Basically, split the template by line and then by column and evaluate each line in turn. extColumnGroups is the name of the template from above, EXTNAME is the name of our XEM. All this goes into the Get section of an extended attribute (here called extColGroupName).

Finding the rank is the same, just change the line to

%Get% = parts(1)

]]>

Setting Expectations for Limited Data Governance

Wed, 08 Aug 2012 13:00:23 +0000

Setting up a new data governance program means bringing deep change to the operating culture of an organization. As a result, getting traction for an enterprise scale effort may not always be the pragmatic route. Sometimes its better to be small and excellent here and let the enterprise adopt governance organically.

Limited Data Governance

The concept of limited or localized data governance has a couple of key tenants developed to penetrate a reluctant or even resistant organization:

Group members are empowered by their business organizations to make decisions related to the definition and use of their organization’s data.
We will make decisions that will be binding on a very specific set of systems and/or processes. Everyone else is free (and encouraged) to observe and adopt as they see fit.
We will favor excellent communication over mandatory involvement. We assume that members can make informed decisions about when to contribute and when to simply observe if the activities of the group are sufficiently transparent.
Our pace is set by the development roadmap, and the milestones are respected. Decisions are documented, published for comment, reviewed, and adopted according to the roadmap schedule. To ensure we stay on pace, decisions do not require “signoff” by members which inevitably creates bottlenecks. Everyone knows the schedule, how to contribute/comment, and that silence is consent.
Our performance is measured by milestones met (positive) and refactoring “issues” required (negative).

This process is tightly coupled with the development team to support it directly and eliminates many of the lower value or longer developing benefits of larger governance groups in favor of quick turnaround and non-blocking progress.

]]>

PowerDesigner – Standards in Action

Tue, 07 Aug 2012 13:00:30 +0000

I love PowerDesigner. It’s the Cadillac (Mercedes, BMW, etc.) of the modeling tools. I love that I can do conceptual, logical, and physical data modeling. I can do business process modeling with direct linkage to my data models. And I can model mappings between everything for both data flow relationships (ETL mappings) and design flow relationships (entities become tables become dimensions, etc.).

But my favorite feature is the ability to set standards and enforce them directly in the tool using extended model definitions (XEMs).

PowerDesigner is built upon a COM based object model. Everything is an object, derived from the aptly named “BaseObject” class and can be read and manipulated via any COM-aware tool or language. This allows us to script things in VBScript right in the tool, or use VB.NET or even Groovy anywhere in the environment.

Here’s just a few of the things that we’re doing in our standards XEM:

All tables will have a set of metadata columns appropriate for the model type (OLAP or OLTP).
Column/Attribute names must use classwords, and the data types must align with those classwords.
Columns are ordered by group and then alphabetically: primary keys first, then alternate keys, then other columns, then row date columns (begin, end, current flag), and finally metadata (Last Update Date).
Foreign keys are named based on the tables they connect: FK_parent_child. Multiple keys between the same tables require the addition of a role name.
FKs to type 2 dimensions are based only on the surrogate keys (since the row dates are range matches rather than value matches).
All sorts of things are required (sometimes conditionally), and model checks point our missing or insufficient definitions, rules, naming, etc.
Virtual data marts are created by automatically generating views based on the dimensional tables.

All of these rules are checked by custom model checks and automatic fixes are available for many of them.

Think of the time saved by never deploying a model with one of the above (minor) issues. You never have to choose between living with a little non-standard column ordering or redeploying the entire model!

Stay tuned for some tips and tricks posts about how we made these work!

]]>

Iterative BI + Gradle Tips and Tricks: Building a Custom Plugin

Mon, 06 Aug 2012 13:00:06 +0000

To build a custom plugin in groovy, do this. I know this seems redundant, but it wasn’t as clear as it should have been.

1. Create a project directory. Just a plain ol’ directory. Wherever you like.

2. Add the source file subdirectories:

src\main\groovy\… (with the package path you’d like to use. We have src\main\groovy\com\perficient\gradle)

3. Create a plugin class (in a file named .groovy, we have DatabasePlugin.groovy here).

package com.perficient.gradle

import org.gradle.api.Project
import org.gradle.api.Plugin

class DatabasePlugin implements Plugin {
    void apply(Project project) {
        project.convention.plugins.database = new DatabaseConvention(project) // if you have a custom convention class. Omit if not.

        // Configure your tasks here.  Add as many as you need.
        project.task('build') {
            description = 'Configure the environment with the specific environment code/name.'
        }

        project.task('build') << { println "Add actions like this if needed." }

        // All the normal task configuration (dependsOn, Type) can go as the first param.
        project.task(dependsOn: 'build', 'test') {
            decription = 'Run tests after building the database.'
        }
    }
}

4. Add a properties file in src\main\resources\META-INF\gradle-plugins with the name of your plugin. We have Database.properties and AgileBiEnv.properties in here. In these files add one line (modified as needed):

implementation-class=com.perficient.gradle.DatabasePlugin

5. Add a build.gradle file to the plugin root:

apply plugin: 'groovy'

dependencies {
    compile gradleApi()
    groovy localGroovy()
}

Once you start getting non-trivial, you may need to add external jars to the classpath (add this to build.gradle):

sourceSets {
    main {
        groovy {
            compileClasspath += fileTree(dir: '../../../tools/lib', includes:['*.jar'])
        }
    }
}

And, if you want the jar to end up somewhere specific:

jar {
    destinationDir = new File('../../lib')
}

6. Finally, add a settings.gradle file to the root to set the name of the jar you’ll create:

rootProject.name = 'AgileBi'

You’re all set! Run “gradle build” from the command line in the plugin root directory and you should have a shiny new plugin jar to use in your build.

OK, I have a jar. Now what?

Elsewhere in your system, you’ll have a place where you want to use your plugin. In the build.gradle there, add this:

buildscript {
    dependencies {
        classpath fileTree(dir: '../system/build/lib', include: '*.jar')
    }
}

apply plugin: 'Database'

When you run “gradle tasks” from this directory, you should now see the tasks you created in your plugins Apply() function.

]]>

Iterative BI + Gradle Tips and Tricks – Plugins

Thu, 02 Aug 2012 13:00:53 +0000

Quick tip of the day:

Use Gradle plugins to package up functionality for easy reuse. For example, we developed a simple “database” plugin to handle the common tasks associated with building and upgrading a database. Things like:

build – create a new database from scratch. Runs the CREATE DATABASE and sqlcmd’s the full DDL
upgrade – checks the current version, looks for (and generates from the model if needed) and upgrade script, and runs it.
clean – DROP DATABASE – get ready for something new.
load – part of “build”, load static initial data from CSV files into the newly created database structure.

Now, all we do to enable a database in our environment is add a build.gradle file to the database directory (with ddl/, sp/, and data/ subdirectories) with:

apply plugin: 'Database'

Away we go!

]]>

Iterative BI + Gradle Tips and Tricks: A Primer on Gradle Objects

Wed, 01 Aug 2012 13:00:03 +0000

While the Gradle manual is extensive, since we’re not building an executable or the like from source code, many of the concepts are a little opaque. Here’s a quick primer on Gradle for BI:

Gradle is “project” based, with each project containing a set of inter-dependent tasks which in turn contain actions:

The whole thing is configured in a build.gradle file. Wherever you put this file becomes the root of your project. In that file you can manually add tasks and/or apply plugin(s). Plugins then in turn add tasks relevant to their purpose.

Things that we’ve found helpful in developing for BI:

Try and find commonalities between the out-of-the-box plugins and features and what you’re working on. For example, we extended the Exec plugin for sqlcmd execution with a set of options specific to our need, and we didn’t have to reinvent things like PATH handling.
Conventions are your friend. They pre-suppose how things should be laid out and then allow you to selectively modify that assumption.
Custom plugins are (relatively!) easy to develop – so decide what functionality you’ll use over and over and create your own plugin for re-use.

]]>

Iterative BI – Building with Gradle

Tue, 31 Jul 2012 13:00:55 +0000

We’ve chosen Gradle as our build system for our iterative BI environment. It’s a powerful tool, but there’s a bunch of awesomeness in there.

Gradle uses Groovy as its scripting language. Groovy is just plain great. You get the power of the Java platform in a scripting language and can do things like this:

def client = new XMLRPCServerProxy("http://localhost:8090/rpc/xmlrpc").confluence2

def session = client.login(user, pass)

client.getPages(session, "MyWiki").each { page ->

  println "${page.title} was authored by ${page.author}"

client.logout(session)

5 lines of code to access the XML-RPC interface of Confluence and print out all the pages and their authors. Easy like it should be. And, we can do stuff like this in our builds. All of the sudden, checking the version of a database before applying a change script is no big deal. Checking if a wiki page has been modified before updating it is a cinch.

Our build system does:

DDL generation from the PowerDesigner model (yep, we scripted this in Groovy using the included scriptom COM bridge). No more wondering if the architect updated DDL when she renamed that table.
Completely automated database builds, reading the DDL from above.
Syncronization of the metadata wiki based on metadata exported from the PowerDesigner model.
Deployment of SSIS packages to the new 2012 package catalog.

We’re excited about the possibilities this opens up in terms of accelerating the pace of development on our teams and allowing for more parallel development than ever. It’s a game-changer down in the trenches!

]]>