Who Admits to Committing Fraud?

I don't use the word "fraud" lightly. I've long criticized people who do. Not only do I think it is wrong in principle, I think it is wrong from a strategic perspective. If you cry, "Fraud!" over every little thing, nobody will listen when you point out real fraud. It's called a sense of proportion. One's rhetoric should ramp up with the severity of what is being criticized.

I bring this up because I want to follow up on my last post which discussed a case of fraud involving $100,000. Or rather, it was a case of fraud where a person used the false promise of $100,000 to cheat people out of money. You can read that post for the details. It's a long post so I won't re-hash the details here. I'll just give a short summary.

Last year, a man named Douglas Keenan announced he would give $100,000 to anyone who could win a contest he had created. There were tons of red flags which should have made people suspect this was bogus, but despite that, several big names in the global warming "skeptic" movement promoted the contest. After people spent some time publicly discussing how they might try to win the contest, Keenan switched out the data set used in the contest for one the proposed methodologies would be less effective on.

This contest involved a $10 entrance fee. That makes what he did fraud. I've been pointing that out for the last year. Just 24 hours ago, Keenan admitted it. Of course Keenan did not say, "I committed fraud." What he admitted to is the underlying facts of the accusations I've been making. First, After Keenan changed his contest, he updated the web page for it to say:

The 1000 regenerated series were posted online four days after the Contest was announced—on 22 November 2015. (Each person who submitted an entry before then was invited to submit a new entry, with no fee.)

He didn't even offer to give a refund to people who paid to enter his contest before he changed it. Pretty messed up, huh? Who opens up a contest, takes entrance fees then changes the contest after the fact to make sure those who paid can't win it? Apparently Keenan does.

That's not what I want to focus on today though. What I want to focus on is the fact for the last year I've made the same point over and over: Keenan changed the nature of the data set used in his contest in a way which made his contest more difficult to win. I've received a not insignificant amount of criticism for this, largely in e-mail exchanges. The main example I've used has been the discussion the proprietor of the major "skeptic" blog Watts Up With That, Anthony Watts, and I had when I contacted him about this. I've quoted excerpts before because I felt it was necessary, but now I'd like to quote an entire e-mail he sent:

After doing further investigation, it seems Keenan has notified everyone that there is a restart to the contest due to the weakness in the PRNG. There was indeed a change. to both the data and key.

The change was announced on the Contest web page yesterday
http://www.informath.org/Contest1000.htm#Notes

Additionally, Keenan tells me he telephoned Andrew Montford to discuss it, and there is now a Bishop Hill post about it:
http://www.bishop-hill.net/blog/2015/11/23/a-change-to-the-playing-field.html

From what I can see, Keenan left a comment on each blog that had previously posted about the contest. He left one at WUWT:
Spot the trend: $100,000 USD prize to show climate & temperature data is not random

It seems to be he has done due diligence, made proper notifications about the key and the data being changed. Initially I got the impression he’s changed the key only, and the updated PRNG was only in the key, but it seems it was in both because both had a weakness that would allow somebody to win by exploiting that weakness rather than winning the contest on its terms.

If he had changed the data and didn’t notify anyone of the change to the data, I’d agree that would be terribly wrong. That’s what I thought you were complaining about. But, he didn’t do that.

That language you used in your post might be considered libelous under the circumstances.

Finally, you do NOT have my permission to publish my email.

I won't publish the rest of the e-mails Watts sent in which he repeatedly defended Keenan's actions and argued I was in the wrong. I don't like posting people's e-mails without permission, and I think this one is enough to show what the "party line" is. In case it is not, here is what Andrew Montford of Bishop Hill wrote in the post Watts mentions:

Doug Keenan has posted a note at the bottom of the notice about his £100,000 challenge, indicating that he has reissued the 1000 data series. This was apparently because it was pointed out to him that the challenge could be "gamed" by hacking the (pseudo)random number generator he had used.

Brandon Shollenberger emails to say that this is a terrible thing, but I can't get terribly excited about it. Presumably it doesn't make any difference to those who think they can detect the difference between trending and non-trending series.

He wrote this after I contacted him providing clear evidence Keenan had changed his contest to make it more difficult after people discussed how they would try to beat it (see the last post for details). He didn't respond to my e-mail. Instead, he went and published a blog post trashing me a bit while intentionally disregarding what I had said. I believe "willfully obtuse" is an apt description.

Two big names in the "skeptic" blogosphere were saying the contest wasn't changed in a manner that made it more difficult to win while ignoring clear evidence that showed it was. They went as far as to paint my criticisms as completely unreasonable and even kind of call for public ridicule. These "skeptics" just accepted it on blind faith with Keenan wrote:

The generation of the 1000 series relies on the generation of random numbers. That presents a difficulty, because current computers do not generate truly random numbers. There is a widely-used method of addressing the difficulty: use a computer routine that generates numbers that seem to be to random (i.e. fake it). Numbers generated by that method are called “pseudorandom”.

A computer routine that generates pseudorandom numbers is called a “pseudorandom number generator” (PRNG). PRNGs have been studied by computer scientists for decades. All PRNGs have weaknesses, but some have more serious weaknesses than others.

The Contest was announced on 18 November 2015. Shortly afterward, a few people pointed out to me that the PRNG I had used might not be good enough. In particular, it might be possible for researchers to win the Contest by exploiting weaknesses in the PRNG. I have been persuaded that the risk might be greater than I had previously realized.

The purpose of the Contest is to test researchers' claimed capability to statistically analyze climatic data. If someone were to win the Contest by exploiting a PRNG weakness, that would not conform with the purpose of the Contest. Ergo, I regenerated the 1000 series using a stronger PRNG, together with some related changes.

Keenan says he changed the random number generator he used. That shouldn't be an issue, except notice that last phrase "together with some related changes." Those "related changes" were, "I changed the contest in a way which makes it more difficult to win, and I'm just not going to say so." That's fraud. You don't get to run a contest taking people's money then make secret changes to the contest to ensure nobody can beat it.

Watts, Montford and others refused to admit these changes were made. Since evidence wouldn't convince them, how about Keenan's own words? I don't mean the vague words I quote above. I mean the stark admission he changed the statistical properties of the data set he created. That's what we got yesterday.

Yesterday, Keenan posted the answer to his contest along with the code he used for it. I'm not going to discuss the code today as he used Maple Worksheets. I find those terribly obnoxious to work with. We don't need the code anyway. If you want to see it, you can find the code for the original data set here and the code for the new data set here. The latter has this commentary at the beginning:

12_2_keenan_admission

There are two important points here. First, there is absolutely no evidence the randomness used in creating the 1,000 series used for Keenan's contest was vulnerable to any sort of attack. In Keenan's preening about how nobody won his contest, he says:

Acknowledgements. I thank Andrew Montford, for advice in developing the rules of the Contest. I am also grateful to Mike Haseler, who presented strong evidence for a flaw in the PRNG that was originally used to generate the series (for details, see his blog post “The Doug Keenan Challenge”); he also appears to be the only person who appreciated an aspect of the generated series, namely that the polynomially-decaying autocorrelations are accurate (unlike seemingly all such series in the peer-reviewed literature).

But the blog post in question doesn't say anything like what he claims. The blog post only argues the randomness used for the encryption Keenan applied to an answers file was vulnerable. It says nothing about the randomness used in creating the 1,000 series. It appears Keenan is just making things up, both in the commentary on his code and the commentary on his website, to give himself an excuse for changing his contest to ensure nobody could beat it.

I think that's pretty important. I don't think it's as important as this part though:

To address the problem, the 1000 series were regenerated, using a different PRNG, on 22 November 2015. Some other changes also had to be made to the generating program, because otherwise, if someone had been able to crack the original series, they could have determined the generating program, and thereby cracked the regenerated series. (The main change to the generating program was to revise the parameters in the ARMA submodel.)

Whether or not you accept this excuse (you shouldn't), what matters is Keenan openly states he changed the process by which he generated the data set used in his contest. Importantly, the "ARMA submodel" he refers to determines the autocorrelation (basically, self-similarity) of the series he created. That is the central element in determining how difficult his challenge would be to complete. Here, we have him admitting he changed it.

Will Anthony Watts or Andrew Montford say anything now? I don't know. Douglas Keenan committed fraud by altering his contest in a way which made it more difficult to win. He attempted to hide this fact from people, making excuses about RNG systems that had nothing to do with the changes he made. "Skeptics" accepted it, willfully disregarding any evidence showing Keenan made changes that made his contest more difficult. Now we have Keenan's own words saying he made these changes. What more do we need?

I don't know. What I do know is I checked the original data set Keenan used for his contest. I would have beat it. I would have won $100,000 if Keenan hadn't intentionally changed his contest after the fact to ensure nobody could win it.

Of course, Keenan never would have paid me. Keenan's a lying cheat who committed fraud with the blessing of multiple big name "skeptics" in what was nothing more than a stupid, rhetorical stunt. People who promoted this fraudulent contest, like Andrew Montford, Anthony Watts and Ross McKitrick, should publicly acknowledge this.

As a final note, my last post mentioned Keenan took in $540 for this contest for 54 entries. After writing that post, the list of entries shrank to only 33. I regret not making a copy of the original list, but I would be interested in knowing why it changed.

12 comments

  1. You clearly have no idea. Doug has not committed any kind of fraud. It was clear from the very beginning that the competition was about finding a way to detect trends in random data.

    Instead, whilst Doug Keenan is a great statistician, he's not a great programming and apparently was not aware of the inbuilt problems with random number generators.

    As such when his contest came out of the blue last year, when I had a quick look and noticed that the "random" sequence he had generated was not random (it had lots of pairs which shouldn't have happened).

    I (and possibly others) pointed this out to Doug - and highlighted the problem of getting a proper random number generator, and he clearly had not considered this aspect properly and withdrew the competition until he corrected this problem.

    Quite properly (as he had failed to use a properly random number generator as he had intended) he restarted the competition.

    Or let's turn that around - if he had mistakenly made the competition much more difficult than stated - would he have been right to continue it?

  2. Mike Haseler, while it might feel good to say things like, "You clearly have no idea," it doesn't accomplish anything. I would recommend sticking with actual arguments rather than what is basically just throwing insults.

    In that regard, I will point out you have seemingly ignored the central issue I repeatedly stressed in this post. Before I get to that, even if one believes the RNG Douglas Keenan used to encrypt his answer file was inappropriate (it was not, and your explanation of why it supposedly was is wrong), the encryption used on the answer file has nothing to do with the RNG used to create these series.

    With that said, even if Keenan did need to create a new data set, doing so would not require changing the statistical properties of the data set in a way which directly impacted the strategies people were publicly discussing by making it more difficult for those people to win the contest. If such were necessary for some reason, Keenan was obligated to refund the entrance fee of anyone who had signed up and clearly tell all future participants the statistical nature of the data set being used had been altered.

    Nothing in your comment addresses any of that. Even worse, you say things which are simply not true. For instance, you say:

    As such when his contest came out of the blue last year, when I had a quick look and noticed that the "random" sequence he had generated was not random (it had lots of pairs which shouldn't have happened).

    I (and possibly others) pointed this out to Doug - and highlighted the problem of getting a proper random number generator, and he clearly had not considered this aspect properly and withdrew the competition until he corrected this problem.

    Quite properly (as he had failed to use a properly random number generator as he had intended) he restarted the competition.

    Keenan never "withdrew the competition." If he had, what followed would have been a new competition. There would have been an announcement of the original contest being cancelled. Some time later there would have been an announcement of a new version of the contest being started. That never happened. Keenan simply switched out the data sets being used and said that's that. Heck, if he had actually cancelled it at some point, people would have been given their money back.

  3. I've spent some time with Keenan's code (it's not pleasant), and I've worked out the main aspects of it. It appears the code he released for the original data set is incomplete as I cannot find any function in it that would generate series though there is a call to a function not included in the code. That function does appear in the updated code>

    In the code, Keenan uses four models. Three models are used to create "trendless" series. They are labeled ARMA, fGn and Model 3. I still need to work on figuring out the exact details of these models, but Keenan uses each to generate 195 "trendless" series used in the final batch. The result is 585 "trendless" series. As a note, more than 195 series are created with each of these models. Keenan filters out the additional series by removing those with the largest amount of variance.

    The other 415 series are generated via what Keenan labels the "IPCC Model." This has no real connection to the IPCC save that Keenan uses an annual HadCRUT series to determine some parameters for this model. Keenan falsely claims this annual series is the main series the IPCC uses. Why anyone would think climate scientists limit themselves to annual data is beyond me, but... whatever.

    With that outlined, I should point out it does not appear the code for any of these four models have been changed. Instead, what's changed is the parameters Keenan used for one of the models. The function missing from Keenan's code would show what parameters were used. That would let us compare his code and see exactly what he changed. Unfortunately, that function is missing.

    I find that rather strange. Fortunately, the code does contain code to estimate these parameters. Because of that, it is possible to reconstruct what the function would be were it not missing. I might do that later, but here are two images comparing the parameters Keenan came up with. This is the old set; this is the new.

    While the numerical changes are obvious (I don't know why the ARMASim function call was was 50 in the old but is 135 in the new), the key change is in the ARMARSS function call. The parameters passed to this function determine the restraints placed on the autocorrelation models that can be tested against the series in question. The original called passed:

    p1 = -3:3
    p2 = -3:3
    p3 = -3:3
    q1 = -3:3

    The updated function call passed:

    p1 = -2:2
    p2 = -2:2
    q1 = -2:2
    q2 = -2:2

    These changes change the ARMA model used, explaining the differences I highlighted in these data sets. Using the original ARMA model and the parameters estimated for it, the contest was winnable. I can write a post showing the methodology I'd use to beat it. The only reason I couldn't use that methodology (or possibly others) is Keenan's changed the AMRA model he used when generating ~20% of the series for his contest.

  4. For anyone who might want to look at just how those parameters are used, here is a screenshot of the ARMASim and AMRARSS functions they're involved in. I need to take some time to work out exactly what's going on before I write anything specific about this, but people familiar with ARMA modeling may be able to work it out for themselves in the meantime.

    In the meantime, realize this change has nothing to do with any RNG. That's because the RNG issue was a red herring. All code involving any RNG released by Keenan is exactly the same. If there were any vulnerability in the encryption like Mike Haseler claims existed in the original version of this contest, the updated version of the contest would have it as well. If there is any difference in the RNG between these two versions, it must be in code that Keenan didn't publish.

    But even that can apparently be ruled out as Keenan wrote this over at Mike Haseler's site:

    The PRNG that was originally used was a Mersenne Twister. I was astounded that it gave such weak pseudorandomness. Indeed, at first I suspected that the apparent weakness was a fluke, due to an unfortunate choice of random seed; testing showed, however, that the same problem occurred with some other seeds.

    The Maple documentation for the SetState function Keenan used says it uses that very Mersenne Twister algorithm:

    The state argument specifies the data used to set the state of the generator. This argument is passed on to the SetState method of the underlying pseudo-random number generator. Currently the MersenneTwister[SetState] function is called. If the state argument is not given then the state is seeded using values taken from the system.

    If Keenan's code can be trusted, he used the exact same RNG algorithms in both versions of his code. The only other explanation is Keenan has posted false code. Given the updated code says he used the Mersenne Twister algorithm despite telling people he changed his contest because the Mersenne Twister algorithm was flawed, it's difficult to see how he could have possibly changed the RNG system used in his contest.

  5. Hoi Polloi, the contest is over and it's accomplished exactly what it was supposed to. It gave Douglas Keenan a talking point which he can use, That's all this ever was. It was a stupid, rhetorical stunt. There was never the slightest chance Keenan would pay out any money.

    So yeah, Keenan ought to refund people their money, but that won't happen. And it won't change that "skeptics" promoted then defended what was criminal fraud. The only way that will get fixed is if people like Anthony Watts and Andrew Montford come out and acknowledge their role in promoting Keenan's dishonesty. It doesn't seem that will happen.

  6. Yes, the purpose of the whole exercise was clear from the beginning - to get a talking point. Just as Cook's 97%-research. Or Mann's hockey stick. What a sad state of affairs this whole "debate" is in 🙁

  7. My interpretation of the situation is the same as what you guys say. I get why this sort of thing happens. Understanding it doesn't make me accept it though. Right and wrong do matter. Or at least, they matter to me.

    I don't really care about the global warming debate, but I do care about the absurdity of it. It's a wonderful window into human behavior.

  8. I'm going to go off topic.

    I don't entirely understand the math, but am tending to agree with you. I think you have the argument backwards. You should have started with I could have won the contest.

    >"You clearly have no idea," it doesn't accomplish anything. I would recommend sticking with actual arguments rather than what is basically just throwing insults.

    I think you have a tendency to do that as well.

    Now the off topic part. This post is basically a demonstration of how Trump won the election.
    He didn't go thru big details of servers and federal law on classified materials. He said, "Crooked Hillary",
    and let people see the case proven. Any story that comes up later, including unrelated hacks of DNC and John Podesta,
    because they involve e-mails, are now automatically proof of "Crooked Hillary".

  9. MikeN, you bring up an interesting topic which I've spent some time thinking about. You say:

    >"You clearly have no idea," it doesn't accomplish anything. I would recommend sticking with actual arguments rather than what is basically just throwing insults.

    I think you have a tendency to do that as well.

    But I think you miss an important aspect of what I was saying to Mike Haseler. I have no problem with using insults. My writing will attest to that. What my writing will also attest to is I rarely use insults alone. It is tied to the difference between saying, "You're an idiot" and saying, "Your argument is stupid because X."

    I have been criticized on many occasions for for using the latter, but those criticisms often misrepresent it as being the former. It is not. The two are quite different. The latter is a valid form of mixing arguments and rhetoric. The former is just using rhetoric. Sometimes I will mix these two. For instance, I have no problem saying, "You idiot. This is the dumbest thing I've read all week" then proceeding to give a detailed explanation of why what the person said is wrong.

    Sometimes I will resort to the pure rhetoric of insults, but that almost inevitably comes after multiple uses of the other approaches. That's because I disagree with the approach you describe here:

    Now the off topic part. This post is basically a demonstration of how Trump won the election.
    He didn't go thru big details of servers and federal law on classified materials. He said, "Crooked Hillary",
    and let people see the case proven. Any story that comes up later, including unrelated hacks of DNC and John Podesta,
    because they involve e-mails, are now automatically proof of "Crooked Hillary".

    Not because I think you are wrong, but because I think what you describe is distasteful. What you describe can work quite well. In many cases leading with rhetoric can be more successful than ending with rhetoric. The difference is in what you hope to accomplish. I include rhetoric because I think it serves a useful purpose, but I want evidence and logic to be what convinces people. My use of rhetoric is intended to add context/interpretation to the evidence and logic I provide.

    Donald Trump is very different. To him, evidence and logic don't matter. He seeks to convince people purely with rhetoric. Any "evidence" or "logic" he might provide are superfluous details that don't matter because all he offers that anyone could ever be convinced by is his rhetoric. That's why it doesn't matter Trump says such absurd and obviously untrue things.

    Unless or until I provide the evidence and analysis showing I would have won Keenan's contest, it would be inappropriate to make it the focus of my narrative as anything I say on the issue serves no purpose beyond a rhetorical one. It might work, but it'd be wrong. Any improvement in how compelling the narrative is would be due to convincing people to look past facts, evidence and logic to believe something without cause.

    Of course, I could have started the discussion with a substantive analysis of how I would have won Keenan's contest. The problem with that approach is the amount of time and effort it'd take. Leading with the strongest point first is great, but it is often the case the strongest point takes the most effort to develop. Trump avoids that problem by simply making things up. If you're willing to do that, it's always easy to lead with the strongest point.

Leave a Reply

Your email address will not be published. Required fields are marked *