Thursday, 17 June 2010

EWT22 - How to think about science/Something Fishy

At EWT22, Michael Bolton moderated a new type of session – listening to an audio recording “How To Think About Science” and finding parallels to software testing.

Background to the interview:
On July 3, 1992, the Canadian Fisheries Minister John Crosbie announced a moratorium on the fishing of northern cod. It was the largest single day lay-off in Canadian history: 30,000 people unemployed at a stroke. The ban was expected to last for two years, after which, it was hoped, the fishery could resume. But the cod have never recovered, and more than 15 years later the moratorium remains in effect. How could a fishery that had been for years under apparently careful scientific management just collapse? David Cayley talks to environmental philosopher Dean Bavington about the role of science in the rise and fall of the cod fishery."

Listen to the recording. Take notes on what you’re hearing, with the goal of capturing lessons from the interview that might usefully influence how we test, and how we think about testing. Stories or patterns from your own testing experience that you can relate to something in the interview are especially valuable.

My experience tells me that it's good to take lots of notes, regardless if it's reviewing documents or acively testing. I fully expect to discard 90% of it but it can serve to confirm or discard ideas, theories and guesses.
So I listened to the interview and made notes about important events and their order, my raw data. I'd stop from time to time to listen again to passages to get a clearer understanding and to be able to keep up with note writing.

Michael clarified some terms that were used in Skype always shortly before I got to that passage. That was helpful on two fronts. One, I understood the term and context quicker. And two, I could tell that I was falling behind with my progress in listening as it took longer and longer between Michael explaining terms and me getting to the passage - a good way to measure progress, though fallible.

About 2/3rds in I changed the approach as I felt that I had enough data and because time was pressing. I started drawing parallels between the fishery events and software testing and noted those as well. Adding
to the raw data suffered as a result but that was fine as I was working towards the mission, drawing parallels and matching them to my experience. I concentrated on the similarities in process and process failings rather than thinking of 'fishing for bugs' or comparing the government quotas to test certification like some other people in the group did. Nothing wrong with either, I just took this particular approach.

Listening to the audio recording I noted several occasions of Hubris as reason for that monumental failure of preserving the cod population.
From Wiki: Hubris means extreme haughtiness or arrogance. Hubris often indicates being out of touch with reality and overestimating one's own competence or capabilities, especially for people in positions of power.

I reckon that is why the radio program was called "How to think about science", the idea that, while science has not all the answers, the ones it has can be taken as the truth.

That's only one part of it though, I don't want to make the same mistake and believe that one reason alone can explain the failures on many fronts.

Since I'm currently reading Jerry Weinberg's book "The Psychology of Computer Programming" (Thank you, Jerry,  for giving me permission to use parts of this book) I saw some interesting similarities between his study of the psychology of computer programming and the events that lead to the disappearance of the Canadian cod. Here's Jerry's list from page 40 of his book.

1. Using introspection without validation
2. Using of observations on an overly narrow base
3. Observing the wrong variables, or failing to observe the right ones
4. Interfering with the observed phenomenon
5. Generating too much data and not enough information
6. Overly constraining experiments
7. Depending excessively on trainees as subjects
8. Failing to study group effects and group behaviour
9. Measuring what is simple to measure
10. Using unjustified precision
11. Transferring borrowed results to inapplicable situations

Matching my notes to this list I got the following results. We could argue with some of the matches, however the overall match of similarities should become clear:

1a. Key drivers were managing nature fluctuations in order to get back the money that was invested in the industry. Fluctuations turned out to be unmanageable – there was no driver to understand the model in detail
1b. Introducing averaging assumptions was one of the traps – the issue was not understood and two wildly different results had just been averaged with not thought of the consequences. One example of Hubris.
2. There was a simple model for the amount of cod in the sea which was far too simple and therefore wrong
3a Not all parameters of the model had been identified, for example parameter changes over time had been missed.
3b Some parameters had been misunderstood. These two points were the biggest mistakes made in my book. Science was not seen as fallible but a blind eye turned to the possibility that the current understanding does not fully cover the situation in question - see the definition for Hubris again.
4. There was constant interfering with the model as fishing continued throughout the years up to the point where the cod population didn't recover
5. Data from two sources was created, the scientific trawlers and the commercial fleet. They only seemed to look at total population, not age groups, etc
6. Local knowledge of inshore fishermen = “domain matter experts” has been ignored/dismissed as anecdotal – scientific methods were seen as more accurate and objective. Another example of Hubris
7. Not applicable in this scenario?
8. There were several examples of failing to recognise group behaviour, for example variation in location of spawning grounds and season, migration patterns, etc
9. Parameters found were dismissed as complicating the model too much (and making it unmanageable)
10. Not applicable in this scenario?
11. There were several examples of borrowing results from one are to another, for example assuming that inshore groups of cod behave the same as off shore groups

Some failings that I can't put into any of the categories but which are very important as well are:

A. The inshore fishermen’s protests have not been looked at in detail. They wanted to fish in the traditional way but reasons for that were not investigated. Scientists were supposed to speak truth. I'm wondering if, with a bit more questioning, some of the mistakes may have been prevented, for example damaging and destroying stock using the jigger (hook).
B. Investing in tools to make it cheaper became a driver to make more money which backfired. The parallel to automated software testing tools should be easy to see.
C. The fishing industry put pressure on the government to get higher allowable quotas. Available information was ignored and numbers changed against better knowledge to serve the industries need for money.
D. As the model got very complicated it didn’t help management as it didn't provide the control measures that were needed. In software terms this is the point where a project is either stopped or a drawn out death march begins
E. The target of management changed from figuring out how to achieve the goal to targeting people, the fisherman. Sounds oddly familiar, doesn't it?

I find it quite amazing how two, apparently unrelated areas like a book about the psychology of computer programming and discussing shortfalls of science in a fishery project are so similar at a higher level. I have a good number of other examples in mind, some of which I'll blog about shortly.

I'm in two minds about the message that science got it all wrong. I'd summarise it as scientists (bringing the people component in here) probably did their best but didn't look at their limitations. Results of their experiments were also disregarded and decisions taken based on politics and the industries needs/greed.

I come from a science background myself so may have a blind spot there. My overall (scientific approach/thinking) is to identify a problem, analyse it, put up some theories and experiments/tests that then either prove or disprove these theories. In other words, I'm building a close as possible picture or model of the issue in question. Any steps up to the results can be criticised, the result can not. The thinking behind that is that if you agree with the thinking behind all steps the result is independent. You don't get to argue with it if you don't like it. The result can be a pointer that something is wrong with your thinking, it's dangerous to build your thinking based on the results you want though. That's a different story though and far to involved to be discussed here.
I'd be happy to discuss these results in more detail so please feel free to agree or, even better, disagree with me.

I should note that these findings are based on a single source, the radio interview. What actually happened might or might not be exactly as portrayed, but I think it's useful to keep in mind that in testing as in life it's dangerous to believe only in one source without questioning it. Or would you believe a developer who tells you that a bug has been fixed and doesn't need re-tested?

PS: I would also like to point you towards an excellent blog of Jeroen Rosink who looks at this session from a slightly different angle.