“ A flow in human judgement ”

by Daniel Kahneman, Oliver Sibony, Cass Sunstein

Book Cover

I enjoyed learning about the concept. However if you plan to read that book, I would advise to start with the last chapter, which gives 80% of the concept, then read the rest as a deep-dive in the details. It would warrant a 4/5, but the unnecessary length, and the reversed structure reduces it by .25. AND YES, I’m conscious this is a noisy rating, and I’m fine with it. Take that!

Outline

Human judgement is very unpredictable, and produces poor quality of results. This needs to be considered when assembling decision systems in order to take the necessary actions to reduce that noise and produce more reliable, optimal, and fair decisions.

Summary

Note: the last chapter of the book is an excellent summary.

Noise in the context of decision making refers to the unpredictable variability of the outcome. It is different from bias, in that bias can be predicted based on past judgements. Noise is bad because it reduces accuracy, efficiency, and fairness.

Judgement refers to the using the human mind as a tool to come up with a decision. Humans largely overestimate the quality of their decisions, even when made by people considered experts. Predictive judgements in particular are exceptionally noisy, and produce very low quality results (.55 success as an average, so barely better than random)

wherever there is judgment, there is noise, and more of it than you think.

Noisy decisions don’t cancel out: while a negative noise judgement and a positive noise judgement produce a desirable average, it in fact produced two undesirable outcomes

if half the managers who make hiring decisions are biased against women and half are biased in their favor, there will be no overall bias, but system noise will cause many hiring errors.

There are multiple ways of reducing noise:

  1. Replacing judgements by rules (or algorithms) and standards (loose version of rules)
  2. Aggregate judgements from different judges (wisdom of crowds)
  3. Structure decisions into independent tasks
  4. Use pre-defined, standardized matching scales (define what each of the steps of the scales mean), or ranking instead of rating

Reading notes

wherever there is judgment, there is noise, and more of it than you think.

Noise is unpredictability in a phenomenon. Judgment noise occurs when the same question answered by different people, or the same person on different days, provides different unpredictable outcomes. It is different from bias because the cause of variability is not in the question itself, but the state of the person answering it. E.g mood, proximity, weather, personality, etc. Different studies on judges have shown that, independent of bias, noise is resulting in vastly different outcomes on the same judgment.

Noise is the unwanted variability of judgments

we can be sure that there is error if judgments vary for no good reason

Judgement in this context refers to using the human mind as the tool to come up with a decision.

  • Some are predictive and their accuracy can be measured (eg employee performance as a result of selection process, or weather forecasting)
  • Some are evaluative and the accuracy is much harder to determine, but noise can still be measured (eg sentencing, promotion evaluation).

Noise is different from bias in that it is unpredictable. People intuitively understand and dislike bias, not noise, whereas its consequences are similar and additive. Noise and bias are measured using the mean of squared errors (MSE), which is literally the mean of the square of errors.

Crucially, the errors that come out of noise are not cancelling each other. While the average of all judgements can be the desired state, the individual noisy judgements are completely undesirable. Eg average of two cases for theft is 10y in prison, but one individual gets 0 and the other 20, both cases are not desirable.

The large role of noise in error contradicts a commonly held belief that random errors do not matter, because they “cancel out.” This belief is wrong.

if half the managers who make hiring decisions are biased against women and half are biased in their favor, there will be no overall bias, but system noise will cause many hiring errors.

Different types of noise:

  • Level noise: when judges tend to provide judgements that are calibrated differently, eg one tend to overestimate whereas another underestimates (note: in this case we can consider judges to be biased, but the result is a noisy system)
  • System noise: when the system uses interchangeable professionals, and as a whole produces unpredictable judgements.
    • Stable Pattern noise: reflects the uniqueness of judges: when individuals in the system produce difference decisions for the same case.
    • Occasion noise: the same judge might not give the same judgement on two occasions. For example, when something in particular influenced the decision, eg you heard a bad news, your team lost, you’re tired, …
  • One-time noise: not really its own type of noise: decisions that are produced once, like going to war, are non systemic, but the decision can be considered as a a one-time system.

Noise occurs in either regular decisions (insurance adjusters) or one time decisions (buying a house, going to war).

Predicting judgements (hiring, … ) tend to be extremely noisy. Because of noise, success of recruits after 2y is .55, which is barely more than toss of a coin. Using models and algorithms, even simple one based on very few parameters, usually outperform human judgement because they are noise free. These models are often distrusted because not being 100%, despite being better than humans.

People are much more confident in their predictive skills than reality, they’re oblivious to noise. We are heavily invested in reducing biases but we should also be invested in reducing noise. It’s much easier to perceive bias, noise requires a statistical view of the world.

There is a limit to the accuracy of our predictions, and this limit is often quite low. Nevertheless, we are generally comfortable with our judgments.

Most of the time, professionals have confidence in their own judgment. They expect that colleagues would agree with them, and they never find out whether they actually do

Reducing noise is important to improve accuracy and fairness

The goal of judgment is accuracy, not individual expression

Bias leads to errors and unfairness. Noise does too

Professional performance evaluations are exceptionally noisy. It is impossible that all employees are in the top 50%. It is possible though that everyone meets expectations.

Fixing noise

rules replacing judgement and decision making by rules and algorithm is the most effective way of reducing noise. By definition, algorithms are noise-free.

Crowd wisdom (the larger the crowd the better) - ask for a second wisdom and average out (though social influence makes it worse). The average of individual judgements tends to be less noisy. Note: this is different from the idea that noise doesn’t cancel out, in that these several noisy judgements are then aggregated into a single one. The concept of not cancelling out applies to several outcomes.

Crowd within (ask yourself a week later) is equivalent to 1/3 a second opinion

Dialectical estimate - provide a second judgment but force yourself to provide a different answer, then average. Equivalent to 1/2 to second opinion

qualifying scales (adjectives) rather than number scales gives less noise. Number scales work when the judges have anchors (data about past judgements)

better judges: use experts who have proven good decisions and are respected. There’s a high correlation between GMA (general mental ability) and decision outcomes. Even within the higher GMA, the highest still produce less noise than slightly lower. To reduce noise, select someone based on intellect. However determining intellect is hard.

A proxy is cognitive style, CRT (cognitive reflective test). CRT is a scale measuring whether people are more prone to do critical thinking and slow reflection rather than intuition, which seem to be correlated with better decisions. They tend to seek more opinions and see changing their own as the sign of a strong mind.

Counterintuitively, fast thinkers who can provide decisions on their feet inspire more confidence and tend to be selected, whereas more slow thinking personalities tend to provide better results on the long term.

Debias: it’s easier to recognize bias and noise in others, have someone observe bias in decision.

second opinion: experts provide noisy opinions. Ask for several. Don’t give the prior judgement or that biases the new one.

minimum context: giving too much information to experts might bias them.

rate by matching: don’t just ask for a number because everyone’s scale is different. Give examples for each step of the scale for people to match

structure into independent tasks: dont ask for a single rating/ranking, but multiple, then aggregate.

Structured interviews work better than traditional ones.

separate criteria when judging: avoid judging other criteria when judging a specific one.

rankings instead of ratings: comparing 2 by 2 is less noisy than rating/matching.

standardized scales explain what each level means

act as an outside observer judge an event as a member of a class of judgements rather than a singular case.

evaluate independently - some amount of individual work before team work reduces noise. Similarly for noisy decisions, collect judgment individually before making known to otherd to avoid hallo bias

estimate talk estimate give an estimate independently, talk, then give another estimate and aggregate

against reducing noise

In contexts where diversity of opinion and taste is desirable, noise is a feature (e.g food critics, move reviews, …)

it can be costly to reduce noise, both in terms of implementation cost, but also in results. Rules aren’t perfect, and arbitration might be necessary but they add noise. Usually rules are still better than just human because more predictable.

Noise can also be helpful, for example when associated with punishment (the uncertainty can be a deterrent)

Mercy is noise but still a quality

Too many rules can make employees feel like cogs in a machine

rules vs standards

Standards are loose implementation of rules. They allow human judgement in a framework. However they’re also more noisy. Eg. Drive at a safe speed rather than limit is 50 kph.

Rules are problematic because they might create loop-holes, or circumstances where the result is clearly wrong. Standards are more loose, which allows for a portion of judgement, but are also more noisy. Rules can allow a last say by a human - this is how mercy works ; however this adds a lot of noise. Another alternative is to review rules often to make sure they stay adequate, and add processes to change the rules rather than produce one-off decisions.

Implement standards when there’s trust in non noisy judges. Review standards often. Have stronger standards or rules where there is specific noise. Give examples (eg this is reasonable, this is not).

Adapt rules/standard balance based on however risk-prone the person being judged is (e.g. when people take advantage of loop-holes in rules, use standards to cover those holes)


About Reading Notes

These are my takes on this book. See other reading notes. Most of the time I stop taking notes on books I don't enjoy, and these end up not being in the list. This is why average ratings tend to be high.