No, this is not a duplicate post. There's no error. My last post was titled, "A $100,000 Scam?" because it was asking the question, is Douglas Keenan's $100,000 contest a scam? Today's post doesn't have the question mark because I'm answering that question with a resounding YES.
Douglas Keenan is a lying, law-breaking cheat who is scamming people out of money.
That might sound a bit harsh, but it's actually rather tame given the situation. You can read about the circumstantial issues I discussed in the last post if you want, but none of them really matter anymore. You see, Keenan just broke the law in order to ensure he wouldn't have to pay out the $100,000. Anything else just seems irrelevant compared to that.
To explain what I'm talking about, Douglas Keenan recently updated the contest page with a couple notes. Amongst them was this text:
The Contest was announced on 18 November 2015. Shortly afterward, a few people pointed out to me that the PRNG I had used might not be good enough. In particular, it might be possible for researchers to win the Contest by exploiting weaknesses in the PRNG. I have been persuaded that the risk might be greater than I had previously realized.
The purpose of the Contest is to test researchers' claimed capability to statistically analyze climatic data. If someone were to win the Contest by exploiting a PRNG weakness, that would not conform with the purpose of the Contest. Ergo, I regenerated the 1000 series using a stronger PRNG, together with some related changes.
The 1000 regenerated series were posted online four days after the Contest was announced—on 22 November 2015. Each person who submitted an entry before then has been invited to submit a new entry with no fee. Everyone who plans to enter the Contest should ensure that they have the regenerated series.
That weaknesses in some random number generator he used might allow people to solve his challenge in an unattended way is a problem that might reasonably need to be addressed. Keenan claims this problem was brought to his attention by a number of people, such as this person. That person had said:
The Encryption Key
If Doug is going to lose money, this is where he is vulnerable. The key encoded answer:
...This is 500 characters long and consists of characters a-zA-Z0-9/+ Thus in total 64, so, this is a 64bit number. The first thing I noticed was the unusually high number of of pairs:
There are 15 in 500 characters, but there should be only 500/64 = 7.8. Therefore there are twice as many repeating characters as should occur by chance. This means that the 64 bit number we see is not random! This reminds me of the way the German enigma machine was broken (where characters were repeated). Thus it appears that whatever way Doug Keenan has chosen to encode the text, it is not random and thus (unlike the first challenge), it may be broken. Also, unlike the first challenge, the “code” must make sense to a human, so we will be able to check if the decoding is valid because the answer must make sense. This is in sharp contrast to the first part of the test, where there is no way to validate whether the method we chose is valid except by paying this $10 and asking.
This is a discussion of Keenan's choice of encryption method. Keenan had released an encrypted answer file, and if that file could be decrypted by someone, they could "cheat" to win the challenge. That's a reasonable concern and generating a new set of test series with a new answer file using a better encryption might be reasonable.
But here's the thing. Changing how you encrypt your answer file should never change the nature of the test series created for the challenge. The exact details of the 1,000 series would be different since a new set would have to be created again, but that set should have all the same properties as the old one. Any analysis which worked on the old data set should also work on the new one.
That's not the case. You see, Keenan made "related changes" which he doesn't disclose. Those changes significantly alter the statistical properties of the test series. This is clear to even the briefest of examinations. For instance, one thing many people thought to do for this challenge was to plot a histogram of the trends of the series in Keenan's data set. This is one a user named Magma posted over at the blogger Anders's place:
Now, I'm using his graph because I didn't think to save a copy of the original data file so I can't create one of my own. I have some of the results from tests I ran, but unfortunately, I can't show a version I prefer. When I was working on this, I took the absolute value of all the trends and effectively "folded" the graph over. It made things simpler, and it shouldn't make a difference since there is no difference between positive and negative trends for this challenge.
That's a non-issue though. I'm really only bringing it up because I want to emphasize the fact Keenan removed the old data set and rendered it unavailable. He didn't just move it somewhere and post a link. He didn't leave an archived copy. He just got rid of it. That's not right. It is, however, necessary for him to be able to try to hide the fact the new data set has a significantly different distribution than the old one. Here is Magma's graph for the new data set:
As you can see, the distribution has been significantly flattened. Before, there were three clear peaks, each representing a different subset of the data set. Now, two of the peaks aren't visible. If you didn't know to look for them, you might not even guess they were there.
Of course, Keenan's challenge requires people identify which series belong to the subsets corresponding to those peaks (with a 90% accuracy rate). Before, that might have been possible. Now? It's probably not. By flattening out the distribution of the trends in his test series, Keenan has made his challenge significantly more difficult. And he only did this after people paid him money to try their hand at the easier version.
That's horribly wrong. It's completely dishonest. You don't pose a challenge, saying you'll pay $100,000 to anyone who can complete it, then wait around and watch how people try to solve it so you can use that information to change the challenge to ensure nobody can solve it. You don't take people's money then tell them, "Sorry, the easy challenge you signed up for is gone now; you have to do this much harder one if you want the prize money." And you certainly don't say, "No refunds."
But that's exactly what Keenan did. Keenan waited days, letting people work on his challenge and discuss it publicly. He let people submit answers to his challenge, paying him money for the opportunity. He then turned around and changed his challenge to make it much more difficult. And he didn't even offer to give people a refund if they signed up for the original challenge; he just said they could get a free entry to the new one.
That's not just wrong; that's illegal. When a person pays money to sign up for a contest, they form a legal contract with the entity running the contest. If the entity running the contest changes the contest in a way which substantially alters it, such as by making it far more difficult as Keenan has done, they have breached the contract. That means they have no legal right to the entry fees any longer. Saying people can get a free entry into the new contest without offering them a refund is committing theft.
And it gets worse. Because Keenan made undisclosed changes to his data set, nobody can possibly know if their earlier attempts at this challenge worked. That means, for all anyone knows, they might have actually completed the challenge already. Somebody out there may well have already earned the $100,000. Keenan could have easily gotten an entry, saw it was a winner and freaked out, realized he was in big trouble and decided to change the data set to avoid having to pay out the $100,000.
So yeah, Douglas Keenan is a lying, law-breaking cheat who is scamming people out of money. Anyone who has promoted this "challenge" should be embarrassed and should promptly and publicly speak out about his dishonesty.