
Charles Goodhart stated: "Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes." In other words, "When a measure becomes a target, it ceases to be a good measure."
Charles Goodhart stated: "Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes." In other words, "When a measure becomes a target, it ceases to be a good measure."
For many years, I have argued against the poor set of metrics we use in IT. Recently, I came across a reference to Goodhart's Law, which explains our problem. Proposed in 1975 by Charles Goodhart—a former advisor to the Bank of England and emeritus professor at the London School of Economics—the law states that once a social or economic indicator is made a target for the purpose of guiding policy, then it will lose the information content that originally made it useful. Goodhart wrote, "Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes." [1] Professor Marilyn Strathern has restated Goodhart's Law more succinctly and more generally: "When a measure becomes a target, it ceases to be a good measure." [2]
We often see this in the world of IT metrics. Management chooses a metric in an attempt to understand the behavior of a system or process. Later, this same metric becomes a goal ("Attain this value or else"). When this occurs, the "or else" part motivates people to change their behavior to achieve the goal.
A classic metric of this type is "lines of code written per day." When used as a measure, this metric can be valuable for estimating. However, when LOC/day becomes a goal, developers may be enticed to write more lines of less efficient code. For example, a classic Java "for" loop has the generic form:
for (initialization; termination; increment) {
statement(s)
}
An example of this loop is:
for(int i=1; i<11; i++){System.out.
println("Count is: " + i);}
If writing more lines of code is rewarded (and thus becomes my personal goal), I can write the equivalent Java as:
int i=1;
while (i<11) {
System.out.println("Count is: " + i);
i++;
}
In the early days of the object-oriented paradigm, a number of metrics were proposed as indicators of the quality of system design. One of these was the subclass:superclass 11ratio—sometimes called the specialization ratio. The specialization ratio measures the extent to which a superclass has captured an abstract idea. A large value indicates a high reuse by the subclasses. For structures using single inheritance, the values range from 1 to ∞. Values close to 1 suggest a poor design since deep, linear inheritance trees are generally considered to be poor design. Various standard class libraries have specialization ratios ranging from 1 to 4 [3].
At one company I know, to facilitate the evaluation of object-oriented designs by people who knew nothing about what made a good design, not only was the specialization ratio measured, but a goal was set—the specialization ratio had to be 3 or above. Designs not meeting that goal were sent back for rework. Developers, not liking rework, simply added cleverly disguised empty classes until the ratio was met.
In other cases, rewarding testers for the number of test cases resulted in many poorly written test cases; rewarding testers for the number of bugs they found resulted in a high number of unimportant or duplicate bugs reported; and penalizing testers for bugs rejected by the development staff resulted in important bugs going unreported. It’s not always about the design or code. An example is a metric for the age of trouble tickets (the length of time a ticket was open before being resolved). The organization set a goal of x days or less for a ticket to be open. Since many tickets did not close within the established goal, the service desk manager met the goal by closing and then reopening the tickets, resetting the clock and thus faking the numbers. However, management got what they seemed to want—a report showing that everything was fine.
Goodhart's Law reminds us that connecting rewards and punishments to the achievement of specific goals can create unintended consequences. Some will strive to reach those numbers without concern for anything else. If the person being measured is affected by the outcome, she is likely either to lie, thus subverting the usefulness of the measurement, or to focus on what is being measured without regard for the consequences.
References
[1] Goodhart, C.A.E. "Monetary Relationships: A View from Threadneedle Street" in Papers in Monetary Economics Volume I, Reserve Bank of Australia, 1975.
[2] European Review [2]
[3] Henderson-Sellers, Brian. Object Oriented Metrics: Measures of Complexity [3]. Prentice Hall, 1996
Links:
[1] http://test.techwell.com/members/leecopeland
[2] http://journals.cambridge.org/abstract_S1062798700002660
[3] http://www.stickyminds.com/s.asp?F=S161_BOOK_4
[4] http://test.techwell.com/sites/default/files/articles/Goodharts_Law_Technically_Speaking_0.pdf
[5] http://test.techwell.com/category/issue/better-software-magazine
[6] http://test.techwell.com/category/issue/better-software-magazine/volume-13-2011
[7] http://test.techwell.com/category/issue/better-software-magazine/volume-13-2011/issue-5-septoct
[8] http://test.techwell.com/category/source/sqemagz
[9] http://test.techwell.com/category/topics/test-evaluation/analysis
[10] http://test.techwell.com/category/topics/test-evaluation