In order to know the expected “after” of a test, you need to know the “before” (usually!). Testing in BI requires test fixtures, a defined state of the system in the “before” state. In BI, this is almost the same as a test data set, but we add in non-data system configurations, user sets, and other environment considerations.
Most test frameworks include the capability to setup and teardown (return to the original state) test fixtures. However, I generally prefer to rely on the build system, perhaps called from the test framework, for any non-trivial test fixtures. The build system should already be capable of configuring the system environment, acquiring (from source control!) and configuring applications, and loading static data. A test fixture can be simply a specific instance (type) of a build. Therefore I recommend BI teams:
Use your build system for the setup of non-trivial test fixtures.
Note that the build system is not well suited to fixture teardown. However, I’ve found that teardown in BI systems is inherently difficult in any case due to foreign keys and data dependence issues. Instead, I usually rely on repeatable test data loads as described above paired with solid process metadata stored with the data itself. A “LOAD_KEY” value allows the teardown to remove any data loaded by a particular job execution.
In practice automated testing means the continual setup and teardown of test environments. If your overall system design limits the environments available to development, test, and production or some other finite set, the overall productivity of the development team will be negatively impacted. To avoid this, I recommend teams design with a potentially unlimited number of environments in mind. This can be implemented through virtualization or by simply including the environment name in component names. For instance, the “TST12” environment may target the “TSTINST\TST12_DWH” database and store code at “C:\DWAPP\TST12\ETL\”.
Note that every configuration parameter or setting must be scoped properly for this to work, with settings for system, server (node), environment, and application or some other mix appropriate for your situation. What won’t work are environment variables and other system or server-wide configurations when a single system will host multiple environments.
A few other suggestions for multiple environments include:
- Including the environment name in all status and logging messages from the system.
- Setting up a server-scoped area with static, shared artifacts such as test data sets, security certificates, etc. These shared artifacts should be under separate version control from the environments themselves and should be relatively static.
- Keeping the underlying configuration of production and non-production environments as similar as possible. Avoid “#ifdef PROD” type switches in build scripts whenever possible. However, this type of switch may be helpful to limit the size of the non-production builds and allow things like dynamic tablespaces for non-production where production uses fixed size tablespaces.
A side benefit of this tactic is a team extremely familiar with their build and deployment systems. This leads to smoother deployments down the road.