Skip to main content

Development

Story Points Estimation on a Data Warehouse Project

It has been decades since people started to develop data warehousing (DW) systems. In fact, most of the delivery strategy and approach on DW is to follow the traditional waterfall cycle – discovery, requirement, design, development, test, training, and transition, etc. With this pattern, only the previous step is completed; the team will start on the next step. In recent years, we’ve solidified the agile concept and proven practices. The good thing is that more and more people realizes its value and put into their daily practice. Our product team has completed several DW product releases, and all of these activities were implemented with Scrum framework. Here, I share the challenges we encountered and how we did the estimation on the DW projects.

Challenge

In the beginning of these types of projects, there are sometimes uncertainties: the team is under experienced, the scope is not finalized, or the team members are not fully onboard. Likewise, we encountered challenges in many aspects, which affected the accurate estimation:

  • Team growing with Tuckman model. In the first months, we did not get all the members needed, even we onboard the persons, they did not work together before and don’t know much about each other. By nature, the team has been following Tuckman model (Forming, Storming, Norming, Performing) to grow up.
  • Architecture is not clear. It is normal that the data model and ETL architecture is being discussed at the earlier project stage. However, in our case, the unclear architecture significantly impacts the development effort in each functional area.
  • Lack of the experience. Most of the team members except senior guys lacked sufficient experience in both technology and process management. Also, some members are new to the Scrum, they don’t have many ideas on how to do the estimation better.
  • Different opinions to define story points. As the requirements and architecture are vague at that moment, it is difficult to define the base user story and get everyone agreed on the same page.

Gain the Initial Story Points

So, how do we get started on the estimation? The team needs to be open-minded; it’s fine that we cannot make the accurate estimation, and even make a mistake at the beginning. We believe that everything will be better and better. We did some necessary training to the team on both Scrum and our product vision. The team knew what to build out overall and how we could break down the big building into each floor, and even into single bricks in some areas. In the DW and ETL projects, there will be a similarity in each mapping, job, model, so the better way was to find some typical sample, ask the team to sit down together to start with a sample estimation.

For example, we take Patient functional area in the health industry as the goal for the team to do the estimation. We can break down one-dimensional table in the DW into several types of artifacts: Modeling effort on each layer, source to stage and stage to target job, source to stage and stage to target mapping (design), scheduling. The senior persons should take a lead of the group estimation, and their estimated number will have more influence over junior members.

It is real that everyone has a different understanding on the story size. Try to find a small and simple table so the related ETL jobs/mappings will be simple, which can be identified as a base user story. With base user story, the team gets concept how to measure the relative numbers of the effort. We are not encouraging the story points to our conversion, but I think we can use this for the new team if the majority of the members are not familiar with the points. It will be easier for the newbies to understand.

Find the Pattern for Efficient Estimation

Eventually, we have the story points number that we are going to deliver. It is definitely fine that number does not reflect the real situation and user story complexity.The team just get to work and practice. We kept monitoring the velocity and find out the root cause why that slow down the productivity. The team had more and more sense on the how big is that user story. If you have built out many ETL artifacts, you can find each of them is quite similar and closely related. Hence we obtained the pattern that can accurately and efficiently do the estimation. Note that utilizing this approach is based on several factors in the broader practice (another type of projects):

  • Team’s velocity is stable
  • Architecture is stable
  • The tasks type can be categorized

For DW related efforts and activities, we can define the base user story points for simplest and smallest star schema. This table list out the basic definition for each artifact.

Artifact Type Table Type Points Artifact Type Table Type Points
Model Anchor X1 Mapping Dimension Y4
Model Detail X2 Mapping Fact Y5
Model Relationship X3 Mapping Array Y6
Model Dimension X4 Job Anchor Z1
Model Fact X5 Job Detail Z2
Model Array X6 Job Relationship Z3
Mapping Anchor Y1 Job Dimension Z4
Mapping Detail Y2 Job Fact Z5
Mapping Relationship Y3 Job Array Z6

We know that the effort on each user story depends on how many columns that table has. So practically there can be several buckets which represent the factors.  Here is one example.

Buckets Factor
Fields count 1~50 1
Fields count 51~100 1.2
Fields count 101~150 1.4
Fields count 151~200 1.6

In the calculation, the story points = Base Story points * factor

If we are not sure how well this formula works for the estimation, one great way is to test this for several sprints then we can adjust the factor with more inputs from the project team. The advantage of this approach is that we can quickly answer how many sprints we need to finish the specific scope with given team’s velocity.

As the agile person, bear in the mind that nothing will be the perfect in the beginning, but that is totally fine. We just give enough trust in the team and have them practice to gain the experience, therefore the estimation will be more and more accurate.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Kent Jiang

Currently I was working in Perficient China GDC located in Hangzhou as a Lead Technical Consultant. I have been with 8 years experience in IT industry across Java, CRM and BI technologies. My interested tech area includes business analytic s, project planning, MDM, quality assurance etc

More from this Author

Follow Us