The Release Coordination Challenge
Release Day. There are few more terrifying words in the development team lexicon. This is the moment of truth; will our efforts be for nothing or will we out eke out another production release? And so many things can go wrong; have we missed critical requirements, did we failed to test everything effectively, are there other applications that will be negatively impacted or that may release at the same time? And what of our security compliance, have we adequately evaluated our application to ensure that not only is the application properly secured, but the target environment as well? With all of these questions and more to be answered on or just before release day it should come as no wonder that releases are delayed, or worse yet, forced to roll-back to a previous version.
Figure 1. Shuttle launch day – photo from NASA.gov
So, we can identify many of the problems faced by release coordinators, but how do we actually use this information to improve the quality and speed of our releases? The main blocking factor is often the late identification of “release readiness.” This represents the confidence that a given system candidate has been properly defined, built, tested, and prepared for release into the “wild”.
Much has been written about using DevSecOps to speed up the delivery of functionality into production via agile development, continuous integration, and continuous deployment. But how does the DevSecOps automation improve the quality of our releases, and more importantly, how does it reduce the time taken to “approve” a given release? Automation alone is likely insufficient without a corresponding and complimentary release coordination practice. After all, faster delivery of a broken product is not likely to make the end-user any happier. Moreover, if the process imposes a series of delaying “gates” that require multiple senior leader approvals, ostensibly to increase the confidence of a quality product, then any speed gains will be eliminated from automating the CI/CD build and deployment.
Thus, we are faced with the situation where we would like to reduce the manual labor associated with deployments by DevSecOps automation, but we want to have confidence that the released product has been evaluated for production use.
The Presumptive Release
In “The Goal,” Eliyahu Goldratt and Jeff Cox introduced the idea that in a manufacturing process any steps that do not lead to the “goal” of the process should be eliminated. This approach emphasizes the importance of process end-result over the process itself, and is central to the idea of Lean Manufacturing. It also represents the heart of agile development where product delivery is pushed over, say, creation of documents. For the purpose of driving production release of a given software system, I will introduce the term “presumptive release” where every candidate release is considered a target for production! Consider the consequence of this statement – if we are to consider every announced candidate from the delivery team as a potential production release, then we must shift our evaluation of that candidate as far left as possible in the development process. This means that we must discover and correct release-related issues early in the evaluation (such as security policy violations). Moreover, this also implies that we will eliminate non-performing process steps by empowering key team members to be responsible for asserting completion of critical release preparation steps. Finally, we must unambiguously define what criteria are to be met prior to such an assertion.
Release Coordination – Key Readiness States
Regardless of the process followed to create a given development candidate release (e.g. agile, waterfall, scrumfall, wagile, etc.), there are a set of defined release readiness states that must be determined, verified and asserted. Moreover, there are key roles associated with shepherding the candidate through the release process who are the only ones empowered to halt the release. By definition, a presumptive release approach assumes that every announced candidate is not only ready for production, it will be released unless one of the empowered release coordination roles elects to halt.
Figure 2. Release readiness states and DevSecOps responsibilities
There are four key readiness states and four key release roles, as illustrated in figure 2. The release readiness states are Product, Organization, Security, and Operation. Product Readiness is an indication that the candidate release system requirements have been thoroughly evaluated and tested. This is typically first done by the development team through a series of manual and automated unit tests. Once the system is considered sufficiently stable it is promoted to a testing environment for evaluation by the QA team. This team then verifies compliance with functionality, performance, and system security requirements. Finally, once the development team announces the candidate release, it is promoted to the user acceptance environment for final product functional capability evaluation. Asserting ‘product readiness’ indicates that all of these steps have been completed and any remaining issues are noted but waived for the current candidate. Organizational Readiness is an indication that all necessary communications have been prepared and distributed, such as release notes, support run-books, trouble-shooting guides, and support team training. Operational Readiness indicates update of the release plan, run-books, and other required support artifacts needed for ongoing production support. Finally, the Security Readiness state indicates that the product has been evaluated for vulnerabilities, potential exploits, and verification against all defined security policies.
Release Coordination – Key Roles
The four key release roles are Product Owner, Security Architect, Operations Coordinator, and Release Coordinator. These roles, and only these roles, are empowered to assert that their specific release readiness state is complete. If any of these roles, and only these roles, is unable to confirm that a particular aspect of one or more readiness states is not complete, they can halt the release. No other approvals are required. If an error is later found in the production system a back-trace audit will indicate where in the release process the failure occurred and allow correction. As shown in figure 2, the Product Owner has ultimate responsibility for the Product Readiness state, the Operation Coordinator owns the Operation Readiness state, and the Security Architect owns the Security Readiness state. Overseeing all of these roles is the Release Coordinator who not only directly owns the Organization Readiness state, but is the overall release owner. The Release Coordinator ensures that all of the various readiness states are tracked and asserted. Moreover, this role owns and runs the Release Coordination Meeting that schedules resources for upcoming releases, verifies that no product collisions (e.g. dependency violation) are created, and that the target environments are prepared for the production release.
We will investigate more deeply each of these roles and release readiness states in the following blog posts. Taken together with DevSecOps automation of the build, package, version, deploy and test activities, the ‘presumptive release’ approach to release management encourages responsibility, assigns clear ownership, and reduces the management overhead associated with the typical production release process.