We’re all aware that we pretty much suck at estimating how long it will take to build a given piece of software.
There are many reasons we fail at this. However, the core of the issue is simple. In general terms, if we want to know how much time some task will take, we need to be pretty sure about What to do, How to do it and to Have done it multiple times.
Since building software products is essentially exploratory work, we are usually lacking at least one of these conditions (a fourth one, the Why, often also goes unanswered, but that’s a whole 'nother topic.)
Our current toolset
This age-old issue has seen many different approaches over time - ranging from the informal, to the extremely detailed and formalized.
In the agile software development world, the most popular methods measure effort as story-points, t-shirt sizes or strive for similarly-sized work items. Recently, the #NoEstimates “movement” has generated a lot of interest and debate. They do however present valid points, as they question the need for most estimates[1].
Why do we estimate?
The goal of estimating is to help us plan. By having some kind of notion of the effort that’s needed to build something, we can:
- Compare its cost (and usually also its value) to other things we’re considering doing (for prioritization);
- Have a rough idea of when it will be done (to help give some general guidance to the release planning process)[2];
Most estimation techniques work with abstract scales (e.g. story points, t-shirt sizes, effort level). This helps teams avoid wasting too much time getting to a precise estimate, that will most likely be wrong anyways. As Product Managers, these scales are usually good enough to help us with point #1 (cost comparisons between different work items.)
For point #2, however, we usually need to translate these scales into time units. This is best done by looking at the team’s historical data to get to the conversion rate between points/sizes/level and time.
Going through many of these methods, our team felt there was still a lot of wasted time and lack of precision. This naturally led to frustration, which nobody likes.
A simple process
Knowing this is a common issue for many teams and Product Managers, I wanted to share our (mostly positive) experience of switching to a simpler process.
This is far from groundbreaking stuff, as it’s based on common principles from agile methodologies. However, I hope it’s helpful for others to go through what works and what doesn’t.
The estimation process we’ve used at our team is based around two basic goals:
- Aiming for “same-sized” backlog items[3];
- Estimating as late (and as little) as possible.
“Same-sized” backlog items
After trying out different estimation scales and systems over the years, I’ve come to really appreciate working with a simpler one: every backlog item should have roughly the same size. More specifically, every item should be doable in under two days by a single team member. That’s it.
The first question I get about this is usually: “why two days?”. The answer is simple: it’s what felt right to the team, given the initial objectives:
- trying to shoot for the original Kanban precept of same-sized items;
- it should be quick for anyone to understand if some item could be done in under the effort unit;
- the reference effort unit should be both:
- large enough to get something done;
- small enough to avoid overestimating what can be done[4];
Given these constraints, we decided to just go for using a single team-member’s time as the effort unit. For our team, one-person-for-two-days fit all of the above.
Lazy, “just-in-time” estimation
Nobody likes to estimate. Since it’s a pain for the team, we try to put it off for as long as possible. That means that most of the backlog sits without estimation until we need to, for some reason. We don’t want to waste time estimating things that aren’t really necessary to make decisions right now.
There are two major ways for the team to estimate work, and they are complementary.
First, whenever we’re planning (and moving forward with a new initiative), there’s a “one-off” estimation session. The team goes through the specs for the epics/themes that will be worked on and starts breaking them down into two-day chunks. It’s normal for some of them to remain incomplete or whole, but that’s alright — those can be pulled apart later. This kind of session is rare for us: about one or two afternoons every two to three months.
Second, on an ongoing basis, if some high priority item (that either popped up or was left over from the planning session) comes up, it is then broken down before it’s worked on. This is a continuous process and the team member can either break it apart completely or just take a two-day chunk and leave the rest in the backlog.
Do note that for this approach to work, great care around backlog structure and maintenance should be in place.
What works
-
It’s simpler to estimate
The team has felt that it’s easier for them to look at something and answer if it’s doable within two days than estimating how long it will take or assigning it some sort of abstract category. The cognitive load seems to be lower when the decision is to compare something to a fixed amount instead of attributing some value within a scale. Also, by the time it gets hard to say if it’s under 2 days or not, it’s in the ballpark
-
Builds shared understanding
When breaking things down, a shared understanding of the work to be done is created. The analysis and debate that occurs through this process yields much better insights into potential problems and collectively brings out the best possible solutions.
-
It can be pretty accurate
The time spent per item usually tracks pretty closely to the target two days. There’s natural variance, specially during (and after) launches. However, more than accuracy, what’s important is to strive for precision. The more consistent the historic data is, the more confident you can be that each item has about the same size.
-
Simpler to analyze Kanban metrics
Kanban metrics are made simpler to analyze. Feature cycle (or lead) time can be looked at globally, instead of breaking it down by story-points, item size or whatever effort sizing model is used.
-
Helps less-experienced team members
Less experienced (or new) team members can also take part in the estimation process with more confidence[5]. When looking at some feature, they can decide whether it’s possible for them to do it in under two days and break them down. The “worst” that can happen is that a more experienced team member may find it too short, but that’s not really a problem, as it still fulfills the “under two days” rule.
-
Single Value for Effort and Time
By using time to break items apart, we get both effort and (rough) time estimates rolled into one. This is not a major benefit, but it’s still nice to have as many conversations around tradeoffs during the prioritization and planning processes are made easier.
What doesn’t work well (or at all)
-
Losing track of items that are part of bigger ones
When you start having smaller items that are part of a larger one, it’s very easy for some of them to get “lost” amidst others. This will of course depend on the project management tool and workflow you use, but it’s important to take this into account to avoid getting to the release date with incomplete functionality.
-
Bloated backlog
This level of granularity for backlog items can easily lead to it being bloated and chaotic. It’s important to structure things in such a way that this doesn’t happen. Again, this will depend on your issue tracker and workflow, but try to keep this in mind: the team’s actual development backlog should be kept short. Pending items should tend to be organized somewhere else to avoid clutter[6].
-
Not breaking things down after the two day limit
Even with all of this, it’s quite common to find after starting to work on some items that they’re going to take more than two days. This is normal, as the conditions under which they were estimated may have changed or the estimation was just off. What is important however, is to try to make it a habit to break these items down when this situation is uncovered.
-
Individual work items might not be full user stories
One major downside to breaking things down to under a fixed amount of effort is that we often end up with items that aren’t full user stories. This makes it hard (or impossible) to define user-level acceptance/testing criteria. One way to work around this would be to define said criteria at the higher level item and test on that, but it’s messy.
-
Inconsistent parallelization
You’d think that breaking things down would translate to better work parallelization, but you’d be wrong. The amount of items doesn’t relate to how many of them can be done in parallel. There are a lot of interdependencies and this means that it’s often best for just one person to work on them in sequence, as if they were a larger item.Another consequence from this is that if a set of items that are part of larger whole are prioritized and ordered in the backlog, it may not be possible for some other team member to pick the next top item, due to interdependencies. This non-obvious selection criteria, and the increased synchronization among team members it brings, can be problematic.
-
Doesn’t translate well to bugs and other non-feature tasks
This model doesn’t apply to stuff that doesn’t have a measurable scope such as bugs, performance issues and similar kinds of exploratory tasks. This is also common to most estimation methods. Although sometimes it’s possible to have a clear notion of the effort necessary for these tasks, I’ve come to believe that it’s not worth it to shoehorn them into this kind of estimation. These tasks are either critical or not. If they are, they should be fixed or ameliorated, no matter the effort.
Takeaways
Independently of how you do it, estimation will never be a perfect process. The thing is that nobody likes doing something that feels meaningless or futile.
The estimation problem is a natural consequence of the type of work we’re doing and we have to accept it. However, we can try to make it work reasonably better. The most important thing is to get to a process that makes everyone satisfied (both team and stakeholders), by producing acceptable and useful results. It’s hard, but not impossibly so.
Unsurprisingly, due to the (provocative) hashtag used to stand for these ideas, many people equate it to doing no estimates, at all. In reality, the #NoEstimates proponents generally state the importance of: (a) reconsidering the processes and other elements in the team’s working context that may render most estimates unnecessary and (b) estimating as late as possible and when sure that the effort will be useful. ↩︎
Do note that the key phrase here is rough idea of when - there’s a lot of important nuances to consider when working with release dates and estimates, but that’s beyond the scope of this article. It’s quite toxic (and meme-worthy) to treat estimates as commitments. ↩︎
There’s a reason for “same-sized” to be in quotes. As we know, there’s no such thing in software, but that’s the rough goal anyways. ↩︎
The idea here being that we tend to get confident on our ability to do something when given more time to do it. This can lead to underestimating the amount of work to be done. ↩︎
I’ve seen many new team members feel they can’t participate in estimation meetings, as they find it hard to set an effort value that’s aligned with the rest of the team. ↩︎
Here’s how we do it in Trello ↩︎