Recently I was in a conversation where a PM declared “Agile’s just waterfall really fast – we can do that no problem!” Uh oh.
Like (most) everything, delivery methodologies are subject to fashion and trend, and Agile/Scrum/Kanban and the like are en vouge. Collective, I’ll refer to these highly cyclic methodologies as “iterative” or (little a) agile development. My interest being BI, I’ll take a little time discussing how these iterative delivery methods impact your BI delivery processes.
Generally, iterative development does a number of things to your teams. When operating effectively, it (among other things):
- Brings your users much closer to the development process.
- Multiplies the number of builds/deployments you do by a factor of LOTS (probably 10-20).
- Multiplies the number of tests (esp. regressions) required.
- Makes juggling project tasks more complex by putting many more “balls” in the air.
- Eliminates the formality (and safety) of predefined scope and quality gates.
The Future of Big Data
With some guidance, you can craft a data platform that is right for your organization’s needs and gets the most return from your data capital.
A successful move to iterative development means planning for each of these, often in radically different ways that your current processes. If your plan is just “do it more often/faster”, look out! For instance, the cost of regression testing for quarterly releases is manageable using manually scripted test cases. QA has weeks to execute the tests, report defects, and retest. On a two-week cycle, this is either impossible or at least impossibly expensive. So, you either a) stop/severely limit regression testing or b) automate.
In BI, iterative development gets tricky if you try and bite off whole BI sandwich at once. It helps to step back and realize that an end-to-end BI process actually delivers at least 3 complete, testable, user acceptable components:
- The Information Model – Some combination of business requirements, business process modeling (BPM), conceptual, logical, and physical data models, metadata, business rules, data quality rules, etc. On its own, the information model describes information driven processes of the organization. While the primary (or only) purpose of these artifacts may initially be BI, there is defensible value here to a much wider audience.
- Data Integration – Getting data into the warehouse, ODS, or other data repository. Includes data analysis and profiling, source to target mapping, ETL development, and, more and more often, message based real-time data integration development.
- BI Delivery – Getting information to its business consumers including report/scorecard/dashboard design, development, and delivery.
More complex environments may add Master Data Management (MDM), Metadata Management, operational integration, or others to this list. The point is, from requirements to report needn’t be viewed as a single “project” so much as a set of interdependent projects. This frees iterative teams to split the work into relatively independent “sprints” or cycles that can be scheduled and managed as such.
In practice, the overall delivers begins to look like a “cascade” of activities segmented by the above components and executed by relatively independent teams. This looks very different from a waterfall Gantt chart.
Technically, relevant topics include:
- How to design for continual change in the data model, ETL environment, and BI environment?
- How to provision the many independent development and test environments needed to support these teams?
- How to automate testing across the wide variety of technologies deployed in a BI stack (DBMS, ETL, BI delivery, metadata, etc.) including generating appropriate test data to load into a continually changing model?
- How to package releases in such a way that loads and operations, issues, defects, and new requirement can be effectively traced to a particular release level of the environment?
I realize I’m only raising questions at this point. The first step is realizing you have a problem! Going forward I’ll attempt to answer some (hopefully most!) of these questions by including the unique challenges of iterative development in my discussions of BI tools and technologies as well as development processes.
Agile is the solution of choice when shared ownership is possible, i.e when expectations and commitments can be managed under the same roof. Otherwise milestones (aka waterfalls)are to be introduced but agile approaches should remain the solution in between.
http://caminao.wordpress.com/engineering/system-engineering-processes/milestones/
http://caminao.wordpress.com/engineering/system-engineering-processes/agile-falls/