Wednesday 25 November 2009

Mining The Source Code

In the last post we saw that accusers are willing to quote mine the released CRU emails, selectively taking a choice phrase at face value and missing the preceding and proceeding context in the longer email.

Now we will see them doing similar with some of the released CRU source code. The released source code included source for some of CRU's surface temperature record and source code for some proxy work.  No climate model source code was released as far as I know, although that hasn't stopped many of the accusers rampantly assuming there has been - presumably either confusing or not knowing the difference between temperature records and climate models.

This post concerns the an accusation which is now spread far and wide all over the internet.

Here is one example:

http://esr.ibiblio.org/?p=1447

Here's the code and comments in question:

;
; Apply a VERY ARTIFICAL correction for decline!!
;
yrloc=[1400,findgen(19)*5.+1904]
valadj=[0.,0.,0.,0.,0.,-0.1,-0.25,-0.3,0.,- 0.1,0.3,0.8,1.2,1.7,2.5,2.6,2.6,$
2.6,2.6,2.6]*0.75 ; fudge factor
if n_elements(yrloc) ne n_elements(valadj) then message,’Oooops!’
;
yearlyadj=interpol(valadj,yrloc,timey)

The accusers point to the words "very artificial", "fudge factor" and to the nature of what is being done. 

yrloc is assigned a 20 element array, the first value starts at 1400, the second at 1904 and the rest increment by 5 until 1994. Ie 1400, 1904, 1909, 1914, ... 1994. They are obviously years.

valadj is another 20 element array, you can see the values it is assigned above in the line "fudge factor". The 'Oooops!' message is displayed if the number of elements in the yrloc and valadj arrays are different. They shouldn't ever be according to the code, this line was probably added in as a first pass safety check and not subsequently removed.

yearlyadj=interpol(valadj,yrloc,timey)

I have to guess a little here (I don't know IDL), but I think this is producing an array yearlyadj to hold an adjustment value for every year since 1400, derived by interpolating yrloc over valadj

Despite so many accusers citing this snippet of code, they amazingly fail to mention (or perhaps notice?) that directly following this snippet is:

;filter_cru,5.,/nan,tsin=yyy+yearlyadj,tslow=tslow
;oplot,timey,tslow,thick=5,color=20
;
filter_cru,5.,/nan,tsin=yyy,tslow=tslow
oplot,timey,tslow,thick=5,color=21

The top line contains yyy+yearlyadj. This is the only place where the previously created adjustment array is used, I presume (I don't know IDL, the language used here) that yyy contains each years temperature data and that this is adding the adjustments to the temperature data to produce the plot. But notice at the start of that line is a semi-colon. That line is commented out, inactivated. The lines that are used instead do not contain the use of yearlyadj and therefore do not apply the adjustment, they only plot yyy.

Of course it would be trivial to switch the comments around and activate the adjustment, but as the accusers are relying on a face-value interpretation of the source code they should fall by such silliness.

They haven't even shown their quoted adjustment was used, let alone what it's purpose is. A proper analysis of this would require knowing what the adjustment was based on (it clearly isn't arbitrary), why it was done (perhaps nothing more than an experiment), and not to forget - whether it was even used at all in published results.

It's not difficult for me to point out why the accusations of fraud are misplaced. All I have to do is point out that they have insufficient evidence. Come back with better, if you can. I am surprised they haven't picked up on the mispelt "artifical", surely that beggars belief - true scientists wouldn't spell words wrong! Quick to the blogs!

Isn't it surprising some of the same people who demand so much evidence when faced with the science behind manmade global warming are surprisingly relaxed at placing accusations of fraud with such a dearth of evidence?

11 comments:

  1. 'I am surprised they haven't picked up on the mispelt "artifical" ....'

    Eric Raymond did, the universe has been restored to its proper order :)

    ReplyDelete
  2. People on both sides are getting confused by the details. It doesn't matter what the code looks like or says, its what it does.

    Same with emails. Who cares about their tone of voice, it's their actions that matter.

    ReplyDelete
  3. Uh huh, and what do their actions show? Why, they wrote stuff up and got it published, as usual, and oddly enough nobody has any evidence to suggest that the published stuff is wrong...

    ReplyDelete
  4. Keep up the good work. I would love to see more analysis. I check this blog each day.

    ReplyDelete
  5. > Why, they wrote stuff up and got it published, as usual, and oddly enough nobody has any evidence to suggest that the published stuff is wrong...

    Rather difficult when the original datasets the published data are derived from go missing, innit?

    ReplyDelete
  6. I'm with OBH. If these emails and files were leaked AND the raw data and final processing algorithms were freely available to the public, this would be a non-story or certainly more easily defensible.

    But the fact that there was a clear concerted effort to keep the taxpayer-owned material hidden (or ultimately trashed) despite numerous FOI requests causes us all to wonder why. It puts the results into question, and that is fair and objective.

    So all that we're asking is to be shown what has not yet been shown -- we've seen the graph, and the data, and the resulting conclusions, but how were the graphs made and how was the data collected and processed?

    This is fair. In fact, this is science.

    ReplyDelete
  7. This adjustment is used right here, is completely legit, with the detailed explanation for its use, with dire warnings on its "artificality", in sections 4.3 and 4.4:

    http://74.125.93.132/search?q=cache:www.cru.uea.ac.uk/~timo/papepages/pwosborn_summertemppatt_submit2gpc.pdf

    ReplyDelete
  8. ... and then they immediately go on to say: "Though this is a rather ad hoc approach..."

    So, we get it -- it's not necessarily bad science or bad data. It's not an out and out hoax of total data falsification. The problem is that it is data worth re-processing and checking and worth *questioning*. Is this the best or only data that makes the case that humans are causing runaway global warming?

    Good scientists would welcome and encourage external experts checking their work -- if it checks out, it is all the more solid. But the emails reveal a profusion of evasion, discussions of hiding, and flaunting the FOI requests.

    To his credit, Briffa comes off in the emails (most of the time) as one of the ones who cares about credibility, the overstatement of consensus, and public openness. The same cannot be said of Jones, Mann, and Santer.

    ReplyDelete
  9. The idea that simply putting the 'raw data and final processing algorithms' out there would end the denialist noise, is naive at best and disingenuous at worst.

    Consider for starters the non-trivial difficulty of reproducing such analysis exactly:

    tp://moregrumbinescience.blogspot.com/2009/11/data-set-reproducibility.html

    ReplyDelete
  10. It doesn't matter if it's 'non-trivially difficult', krabapple. If the data can't be reproduced by independent researchers that don't have a stake in whether or not it turns out the same way, it's not good science.

    I find it deeply ironic that the ONE field of science that might mean catastrophic disaster or the waste of trillions of dollars is the field that is most resistant to basic, indpendent checking of the data. If it can be picked apart by the critics and still remain standing, it's all the stronger. If it can't be, then it deserves to be thrown out. No truly objective scientists wouldn't want that, uncomfortable though it might be.

    So is this -- advocating for fair and open science practices -- just a lot of denialist noise? You'd be happy if we just stopped trying to find better, fairer, more modern ways of analyzing the mountains of existing and new data?

    ReplyDelete
  11. Tim Lambert did the same work, and also dug up the related paper:

    http://scienceblogs.com/deltoid/2009/12/quote_mining_code.php

    ReplyDelete