International Skeptics Forum - View Single Post - How to manipulate data, get results as good as PEAR, and be $1,000,000 richer

Baron Samedi · 1st March 2007, 01:12 PM

Since the news of PEAR shutting down came out, I've been reading over some of the news reports and quotes given by the key players. One thing that struck me was their pure dogmatic belief that they have proof beyond a shadow of a doubt.

For example, in their test with a 50/50 outcome, they claim that they have statistically significant results to show that ESP exists. The overall effect is only 0.5003 vs. 0.5000, but they have the statistics to show that this difference is not due to random chance alone. My first question that came to mind was exactly how many experiments needed to be run to even get "statistically significant results"? Time to break out my second year stats course notes:

To test P_observed vs. 0.5, use the formula
(P_observed - 0.5)/sqrt(0.5*0.5/n) > Z
where Z is your critical value for alpha (Type I error) on a one sided test. Knowing this, we can solve for the minimum value of n such that the inequality holds true, assuming that P_observed = 0.5003 as they stated. Since we don't know what alpha level they used, I have to pick some common ones:

alpha = 0.1; z=1.28; n=45,622
alpha = 0.05; z=1.645; n=75,154
alpha = 0.01; z=2.326; n=150,330
alpha = 0.001; z=3.09; n=265,268

So if I ever want to replicate and test their results myself, I'm going to have to try roughly 50,000 tests before I start to see some kind of statistical evidence above chance? Imagine trying for the million dollar prize with this number? It's impossible. This is why the people at PEAR will not touch Randi's prize. If their results are indeed true, and Randi runs a tight test, then the preliminary test will have about 250,000 trials of guessing "Good" or "Bad"? I don't think we could find anyone to volunteer a year's worth of their life full time sitting in on this exam.

Someone raised an interesting point. What if the claimant says that their psychic ability lets them go from 50% chance of being right to 60% chance of being right. How many tests do you need to run to prove to Randi that the ability exists? 6 out of 10 right if laughable. 30 out of 50? Still not enough. 60 out of 100? We're getting there. 150 out of 250? You may start to make believers out of us. But as you can see, people complain that Randi keeps raising the bar, and that if you pass 50, he'll ask for 100. If you pass 100, he'll ask for 250. They think and accuse Randi of cheating. This is wrong.

The answer to this is in the Power test. This is a statistical calculation that asks: If the person has ability of being 60% right, how many trials must the person do so they pass our test 80% of the time. People have good days and bad days, so we want to make sure that even on a bad day and the person's running 58% that they will pass.

So for the power calculation, we have:
Random: 50%
Claimant: 60%
Randi's alpha level: 0.001
Now we solve for n. After struggling for a bit and running some simulations, I have a calculation of 385 trials needed. Because of this test, the scenario is set. If the claimant says their ability is 60%, then 385 trials will be done, no more, no less. This may sound like overkill to a person ("Ah, but I tried it 50 times at home! Why do I need to do it 385 times?"), but that's the way it needs to be done.

Here's the key thing to notice, and where jury rigging the numbers comes in. I said that 385 tests, no more, no less, have to be done. In practice, how often is this really done? Let's say that on one test case, the person is successful 30 out of 40 times. The p-value in this case is indeed less than 0.001. Should you stop or should you go on? According to my rules, you have to continue. Most people, though, will say that the sitter has proven themselves, so we have to quit. There's no need to go on and waste everybody's time if we've "proven the ability" after only 40 tests? This, while looking naive and simplifying the process, is the fatal flaw in these studies.

An alpha level means the chances of Type I Error, or the probability of saying that some effect exists when truly there is nothing and this was due to randomness. Therefore, if I run 10,000 trials using an alpha level of 0.001, I should see roughly 10 of these having positive results. Running off of randomly simulated data, this matches. Sometimes I get 8 successes, sometimes 16, but usually its around 10.

Now I tried running simulations to show what usually happens due to human intervention. For each person sitting to test for psychic abilities, I started off with 30 guesses. If the current p-value < 0.001, I stop and call the person potentially gifted. Otherwise, I'll try one more coin flip. Is the person's hit rate high enough now? If not, try one more flip. If I reach 385 flips and the person still hasn't shown powers beyond a shadow of a doubt, I finally call it a day.

In each individual case, I have a p-value < 0.001 for all potentially psychic people. If this effect is due to random noise, again we should see only about 10 potential hits. In my simulated data, I am now getting, on average, 116 potential psychics. Remember, this data is indeed pure random garbage. Now is time to start spinning the results and throwing in statistical jargon:

10,000 trials were tried at an alpha level of 0.001. Assuming a typical binomial distribution for the trials, one would expect a mean of 10 trial successes. Of all trials, 116 showed a success. A typical hypothesis test for these levels shows a z-score of 47.46, which relates to a probability infinitesimally small (p << 0.0001). This is clear proof than an overall effect is harboured in human potential. Overall, the hit rate on the entire population is 0.5014 which, albeit slightly above chance, can be explained by performance fatigue, negative ESP skills (abnormally incorrect reading ability), and general skepticism in the total population.

Using this same kind of faulty reasoning, I'm fairly confident that I can produce just as significant results with only 200 sittings. With 200 people, and this lousy procedure, I should be able to get 3 "psychic" people and show that 3/200 is absolute proof of ESP.

So now that I've blown my chance for the $1,000,000, do any of you know of a good "Woo" journal in case I want to pull off another Sokal scam?

1st March 2007, 01:12 PM	#1
Baron Samedi Critical Thinker Join Date: Dec 2006 Posts: 476	How to manipulate data, get results as good as PEAR, and be $1,000,000 richer Since the news of PEAR shutting down came out, I've been reading over some of the news reports and quotes given by the key players. One thing that struck me was their pure dogmatic belief that they have proof beyond a shadow of a doubt. For example, in their test with a 50/50 outcome, they claim that they have statistically significant results to show that ESP exists. The overall effect is only 0.5003 vs. 0.5000, but they have the statistics to show that this difference is not due to random chance alone. My first question that came to mind was exactly how many experiments needed to be run to even get "statistically significant results"? Time to break out my second year stats course notes: To test P_observed vs. 0.5, use the formula (P_observed - 0.5)/sqrt(0.50.5/n) > Z where Z is your critical value for alpha (Type I error) on a one sided test. Knowing this, we can solve for the minimum value of n such that the inequality holds true, assuming that P_observed = 0.5003 as they stated. Since we don't know what alpha level they used, I have to pick some common ones: alpha = 0.1; z=1.28; n=45,622 alpha = 0.05; z=1.645; n=75,154 alpha = 0.01; z=2.326; n=150,330 alpha = 0.001; z=3.09; n=265,268 So if I ever want to replicate and test their results myself, I'm going to have to try roughly 50,000 tests before I start to see some kind of statistical evidence above chance? Imagine trying for the million dollar prize with this number? It's impossible. This is why the people at PEAR will not touch Randi's prize. If their results are indeed true, and Randi runs a tight test, then the preliminary test will have about 250,000 trials of guessing "Good" or "Bad"? I don't think we could find anyone to volunteer a year's worth of their life full time sitting in on this exam. Someone raised an interesting point. What if the claimant says that their psychic ability lets them go from 50% chance of being right to 60% chance of being right. How many tests do you need to run to prove to Randi that the ability exists? 6 out of 10 right if laughable. 30 out of 50? Still not enough. 60 out of 100? We're getting there. 150 out of 250? You may start to make believers out of us. But as you can see, people complain that Randi keeps raising the bar, and that if you pass 50, he'll ask for 100. If you pass 100, he'll ask for 250. They think and accuse Randi of cheating. This is wrong. The answer to this is in the Power test. This is a statistical calculation that asks: If the person has ability of being 60% right, how many trials must the person do so they pass our test 80% of the time. People have good days and bad days, so we want to make sure that even on a bad day and the person's running 58% that they will pass. So for the power calculation, we have: Random: 50% Claimant: 60% Randi's alpha level: 0.001 Now we solve for n. After struggling for a bit and running some simulations, I have a calculation of 385 trials needed. Because of this test, the scenario is set. If the claimant says their ability is 60%, then 385 trials will be done, no more, no less. This may sound like overkill to a person ("Ah, but I tried it 50 times at home! Why do I need to do it 385 times?"), but that's the way it needs to be done. Here's the key thing to notice, and where jury rigging the numbers comes in. I said that 385 tests, no more, no less,* have to be done. In practice, how often is this really done? Let's say that on one test case, the person is successful 30 out of 40 times. The p-value in this case is indeed less than 0.001. Should you stop or should you go on? According to my rules, you have to continue. Most people, though, will say that the sitter has proven themselves, so we have to quit. There's no need to go on and waste everybody's time if we've "proven the ability" after only 40 tests? This, while looking naive and simplifying the process, is the fatal flaw in these studies. An alpha level means the chances of Type I Error, or the probability of saying that some effect exists when truly there is nothing and this was due to randomness. Therefore, if I run 10,000 trials using an alpha level of 0.001, I should see roughly 10 of these having positive results. Running off of randomly simulated data, this matches. Sometimes I get 8 successes, sometimes 16, but usually its around 10. Now I tried running simulations to show what usually happens due to human intervention. For each person sitting to test for psychic abilities, I started off with 30 guesses. If the current p-value < 0.001, I stop and call the person potentially gifted. Otherwise, I'll try one more coin flip. Is the person's hit rate high enough now? If not, try one more flip. If I reach 385 flips and the person still hasn't shown powers beyond a shadow of a doubt, I finally call it a day. In each individual case, I have a p-value < 0.001 for all potentially psychic people. If this effect is due to random noise, again we should see only about 10 potential hits. In my simulated data, I am now getting, on average, 116 potential psychics. Remember, this data is indeed pure random garbage. Now is time to start spinning the results and throwing in statistical jargon: 10,000 trials were tried at an alpha level of 0.001. Assuming a typical binomial distribution for the trials, one would expect a mean of 10 trial successes. Of all trials, 116 showed a success. A typical hypothesis test for these levels shows a z-score of 47.46, which relates to a probability infinitesimally small (p << 0.0001). This is clear proof than an overall effect is harboured in human potential. Overall, the hit rate on the entire population is 0.5014 which, albeit slightly above chance, can be explained by performance fatigue, negative ESP skills (abnormally incorrect reading ability), and general skepticism in the total population. Using this same kind of faulty reasoning, I'm fairly confident that I can produce just as significant results with only 200 sittings. With 200 people, and this lousy procedure, I should be able to get 3 "psychic" people and show that 3/200 is absolute proof of ESP. So now that I've blown my chance for the $1,000,000, do any of you know of a good "Woo" journal in case I want to pull off another Sokal scam?