# How to measure anything

*by Douglas W. Hubbard*

- 269 pages
- on Amazon

## Outline

Measuring is not counting: anything lowering the level of uncertainty can be counted as measurement.

- The first step of measurement is establishing what need to be measured - what is valuable is usually not easy to measure (calculated through value of information). If information is not measurable it might be decomposed into more measurable information.
- The second step is to determine what precision is required - sometimes only a rough range is giving most of the information.
- The next step would be to chose a measuring instrument and use it. The book is presenting plenty, including
**calibrated estimates**,*monte-carlo simulations*,**sampling**,*bayesian statistics*, a few human-based measurements (*willingness to pay*,*risk tolerance*,*subjective trade-offs*,*Rasch Models*,*Lens Model*and*linear models*), and*a few new-technology methods*.

## Brilliant ideas

- To have a value, estimates must be
**calibrated**: we must determine*how good we are*at estimates by*measuring how good we are at estimates*. (See calibration exercise) - Information has value. The
**Expected Value of Information**is equivalent to the reduction in Expected Opportunity Loss.`EVI = EOL before info`

-`EOL after info`

where`EOL = chance of being wrong x cost of being wrong`

. - The vast majority of variables have an information value of zero (current level of uncertainty about that variable is acceptable) - no further measurement is justified. It usually is worth measuring more if the confidence in the new measurement is higher than the existing threshold.
- Start by documenting - others most probably thought and worked on measuring it.
- When sampling, beware of the
*observer bias*(Heisenberg & Hawthorne bias) - observing causes changes in behavior. Also*expectancy*and*selection*bias are lowering results’ quality. - When sampling, if sample are completely random in an homogeneous distribution, a very small sample (5 or less) is giving a 93% CI measure (simple statistics). Samples also help a lot with calibrated estimates (e.g. if asked to estimate average jelly bean weight, knowing that the first two weigh 1 gram is good information)
- If information does not exist to confirm or infirm hypotheses, create it by experimentation
- Scores and weighed score cards are usually very bad at measuring. They are often used where actual quantitative measurements could be done and much more suited. A way to make them better is to calculate the std-dev of each criteria and adjust each score to make it more calibrated.

## Other Ideas

- Determine whether to measure by asking
*is there any measurement method that can reduce the uncertainty enough to justify the cost of measurement* **Calibration exercise**: try to estimate a 90% confidence range on some questions and verify how frequently we are correct, difference between 90% and actuals is how good we are. Look at each bound independently and ask:*am I 95% sure value is over/under that estimate*. Same thing with binary tests: estimate validity of statements and give an estimate that you’re true. Average of confidence compared to actual result tells if you’re under or over confident.- Risk paradox:
*if an organization uses risk analysis, it is usually for the routine operational decisions, rarely for risky decisions* - Don’t start a brand new topic by Google but Wikipedia
- Observation - does the information we’re searching for leaves a trail of any kind? If not, can we observe it directly? If not, can we observe it through proxies? If not, can we “force it” through experiments?
**Release-Recatch**is how they calculate the population of animals: catch 1000 birds, tag them, release, wait for some time for them to blend again, then catch a 1000 again. If you catch 50 with tags, means that 1000 is 5% of the population and you have 20.000 birds. Move to a range by calculating simple variance, then combine with t-statistics →`20.000 + 20.000 * variance x 1.645`

and`20.000 - 20.000 * variance x 1.645`

- Correlation in data is expressed by a number between 1 and -1. 1 means variable increase in direct relationship (there is a transfer function from one to the other), 0 means there is no relation whatsoever. When put on a graph, correlations can be easy to spot
- Rasch - to be completed
- New methods include using markets or large portions of internet to get opinions.

### Errors in measurement

- Systemic: tendency to have a consistent error in one direction
- Random error: error that is not predictable, not consistent
- Accuracy: characteristic of a measurement having low systemic errors
- Precision: characteristic of a measurement having low random errors levels.

### Estimates

Estimates usually have a value only if the estimator is calibrated (actually knows what 90% confidence is and verified she is 90% right all the time)
1) Estimates can be done by range with a confidence indicator: 90% confidence means that you think the actual value has 95% chance of being higher that lower bound and 95% chance of being lower than higher bound.
2) Binary estimates can be done by statements + confidence indicator (**CI**)

### Simple Monte-Carlo

- Decompose the value to measure into smaller variables
- Generate random normally-distributed values using Excel:
`=norminv(rand(), average, (upper-lower) / stddev in values)`

. - Observe results and look at the distribution
- Conclude

### Assemble an instrument

If the instrument is not directly visible:

- Imagine what the consequences through absurd
- How would others do it?
- Iterate
- Just do it.

### T-statistics and samples

Samples are great to reduce uncertainty when it’s very large, and is usually helping to narrow down calibrated estimates.

To calculate the error margin for a small sample set, multiply std-dev by the corresponding T-score for 90% CI: ||Sample Size||t-score|| |2|6.38| |3|2.92| |4|2.35| |5|2.13| |6|2.02| |8|1.89| |12|1.80| |16|1.75| |28|1.70| |More|1.645|

### Bayesian stats

`P(A|B) = P(A) x P(B|A) / P(B)`

. This can be used with calibrated estimates: give an estimate, then refine using the topic, on both A and B → refined estimate. That can be translated into “inverted-bayesian”: *what is the chance of seeing X if the truth is Y* is equivalent to *what is the chance the truth is Y if I see X*.

### Willingness to pay

Measuring happiness can be achieved by establishing correlation between income, life events, and happiness. (A. Oswald did that and achieved the calculation that a healthy marriage is happiness equivalent to an additional $100k / y).

Measurements just help making enlightened decisions, they don’t necessarily call for making the most cost-effective justified decisions → e.g. if you prefer giving money to a local business that is more expensive that a big competitor. That is *art buying* problem.

### Risk tolerance and trade-offs

It is possible to measure trade-offs and acceptable thresholds by measuring several variables using different predetermined setups. The goal is to find points on the “boundary”:

It is possible to take additional parameters into account by determining severable boundaries (e.g. for $100k investments, then $120k investments, …). Point of the same boundary are equally valuable. This can be **very useful when comparing options with different strong points**, such as performance comparison across people.

### Using human judges

A good way to get estimates is gathering people with *know-how* who can naturally estimate things through their experience. Beware of biases:

- anchoring (being influenced by other unrelated numbers)
- halo-effect (if somebody favours a solution, she might interpret every new information about the solution under a positive light) - respectively horns effect
- bandwagon effect - we are influenced by other people’s opinions
- emerging preferences - post-rationalizing judgement criteria because we like a solution