
Welcome to the International Skeptics Forum, where we discuss skepticism, critical thinking, the paranormal and science in a friendly but lively way. You are currently viewing the forum as a guest, which means you are missing out on discussing matters that are of interest to you. Please consider registering so you can gain full use of the forum features and interact with other Members. Registration is simple, fast and free! Click here to register today. 
10th November 2012, 06:32 PM  #1 
Critical Thinker
Join Date: Aug 2011
Posts: 288

What's the required pvalue to beat?
From an earlier thread:
What is the requires pvalue? I tried searching around, but no number came up. More importantly, are there any negotiations in which there were debates over this (even if they don't include the word "pvalue")? Another poster mentioned that there were some negotiations over this with Ziborov, but I found little to that effect. I ask because I teach a subject that involves applied statistics. I'd love to use an attempted demonstration of the supernatural as an example, because the meaning of "happened by chance" really stands out in this context. 
10th November 2012, 08:49 PM  #2 
So far, so good...
Join Date: Apr 2012
Location: On the outskirts of Nowhere; the middle was too crowded
Posts: 2,450

Generally p < .01 for the preliminary challenge.

10th November 2012, 10:44 PM  #3 
Gentleman of leisure
Tagger
Join Date: May 2005
Location: Flying around in the sky
Posts: 22,676

I think the answer is 0.001 for the first test. Though sometimes this cannot be calculated. For example if I claimed I could defy gravity by rising from the ground then that would be impossible. So if I could demonstrate it then I win.

10th November 2012, 11:09 PM  #4 
Gentleman of leisure
Tagger
Join Date: May 2005
Location: Flying around in the sky
Posts: 22,676

Here is a challange that did have p value. I leave it up to you to work out what the value was CONNIE SONNE, Dowser
She failed. 
10th November 2012, 11:24 PM  #5 
Critical Thinker
Join Date: Aug 2011
Posts: 288

Thanks, xterra and rjh. RJ, the example you gave inspired me to come up with a hybrid example.
.001 is indeed high. In most research, 0.025 is the pvalue used for a twotailed test (can't be different from...). My guess is that this is to safeguard the one million in case someone without legitimate supernatural powers (er, everyone) kept going in for repeated challenges. 
11th November 2012, 12:40 AM  #6 
Schrödinger's cat
Join Date: May 2004
Location: Malmesbury, UK
Posts: 8,780

1:1000 seems to be a general rule of thumb for the preliminary test, but because claims (and therefore test protocols) vary so wildly JREF seem to be reluctant to state that officially.
Most people assume that the success criteria would be higher for the final test, though a simple repetition of the preliminary test would produce combined odds of 1:1,000,000 which seems adequate to me. But until and unless someone passes the preliminary test, that question is obviously moot. 
__________________
"If you trust in yourself ... and believe in your dreams ... and follow your star ... you'll still get beaten by people who spent their time working hard and learning things"  Terry Pratchett 

11th November 2012, 01:34 AM  #7 
Gentleman of leisure
Tagger
Join Date: May 2005
Location: Flying around in the sky
Posts: 22,676

The difference is that it does not really matter if a piece of research is wrong. The research would be repeated and found to be wrong. In fact a lot of it is incorrect. That is why you have metadata in research. In the MDC it does matter if the result is incorrect. JREF could lose $1m.

11th November 2012, 10:00 AM  #8 
Critical Thinker
Join Date: Aug 2011
Posts: 288

Yep, precisely what I said above. Given a high enough number of claimants, and enough repeated trials for individual claimants, sheer chance would allow someone to claim the prize if the pvalue was high enough. But I think the existing initial obstacles (the need for a recommendation letter from a professor) vastly reduces the number of preliminary trials, and and a limit on the number of attempts (if it doesn't exist already) would take care of the problem altogether.
In a field like medicine, it CAN matter if a piece of research is wrong. In practice, since it often takes years for metaanalyses to appear, treatment decisions are often made on the newest research. 
11th November 2012, 12:22 PM  #9 
Graduate Poster
Join Date: Apr 2012
Posts: 1,299

As I said in the last thread, this is not intended as a lottery. Ideally, what is being aimed for is certainty. In practice, that's not always possible, but discussing the pvalue certainly seems like a red flag. If you believe you have a power, then you should be confident enough not to care how small the pvalue is, because you should believe you'll win handily no matter how small it is.

__________________
"Those who learn from history are doomed to watch others repeat it."  Anonymous Slashdot poster "The problem with defending the purity of the English language is that English is about as pure as a cribhouse whore."  James Nicoll 

11th November 2012, 01:27 PM  #10 
Critical Thinker
Join Date: Aug 2011
Posts: 288

I think the question of what counts as a supernatural power of prophecy is a perfectly fair one. The answer could be anything from beating chance to 100% accuracy in any given trial. Given my lack of supernatural powers, I have little stake in the answer, but I don't see why bringing it up is a red flag. And as you've implied above, if the trial involves any element of chance (e.g. sensing what integer within a range is on a hidden sheet of paper), then it's always a lottery of sorts. 
11th November 2012, 04:23 PM  #11 
Graduate Poster
Join Date: Apr 2012
Posts: 1,299

I think it's a red flag because it starts off by asking what the odds are. In other words, it's treating the million dollars as a lottery instead of a prize for a successful demonstration.
The only reasonable answer is: small enough to be convincing. Otherwise you're inviting people to try to game the system. 
__________________
"Those who learn from history are doomed to watch others repeat it."  Anonymous Slashdot poster "The problem with defending the purity of the English language is that English is about as pure as a cribhouse whore."  James Nicoll 

11th November 2012, 05:12 PM  #12 
Sarcastic Conqueror of Notions
Join Date: Mar 2004
Posts: 28,449

The problem with that argument is it assumes there isn't a fatal flaw in the design.
I submit the real reason for a doublelayer test is so that, should someone pass the preliminary by some manner, it will allow experts to doublecheck where fraud could have crept in undetected and tighten their observation for the second round. Stats is the error most scientists make in studying the paranormal. It's about magician sleightofhand. There are no real odds going on (and if there is something real, a million tests in a row will succeed.) 
__________________
"Great innovations should not be forced [by way of] slender majorities."  Thomas Jefferson The government should nationalize it! Socialized, singlepayer video game development and sales now! More, cheaper, better games, right? Right? 

11th November 2012, 05:18 PM  #13 
So far, so good...
Join Date: Apr 2012
Location: On the outskirts of Nowhere; the middle was too crowded
Posts: 2,450

FluffyPersian, Take a look at my post from the thread entitled "How are MDC protocols designed and carried out?"
http://www.internationalskeptics.com...8&postcount=58 Post #67 is the answer to what I asked in #58; post #77 is my response to #67. Does this help explain why people here are not concerned with pvalues?  xtifr, here is the last sentence in the original post in this thread: "I ask because I teach a subject that involves applied statistics. I'd love to use an attempted demonstration of the supernatural as an example, because the meaning of "happened by chance" really stands out in this context." I take this to mean that FluffyPersian is not going to become a claimant, and thus he/she* does not think there is a red flag. As usual, if I have misconstrued, misinterpreted, or misunderstood either of you, I ask for correction so we can continue the discussion. *FluffyPersian, for clarity, please tell us which pronoun to use. 
11th November 2012, 08:55 PM  #14 
Critical Thinker
Join Date: Aug 2011
Posts: 288

xterra, I'm female. Thanks for the link.

12th November 2012, 02:17 PM  #15 
So far, so good...
Join Date: Apr 2012
Location: On the outskirts of Nowhere; the middle was too crowded
Posts: 2,450

FuffyPersian,
My error. I have no idea what went awry in the link I posted previously. My post in that thread was number 58, but the link showed it incorrectly. Try this: http://www.internationalskeptics.com...d.php?t=238290 Then go to page 2, and look for my username  the easiest way is to use the find feature on your browser. From there, follow down as indicated in my previous post. I think this will work.... 
14th November 2012, 06:29 PM  #16 
Goddess of Legaltainment™
Join Date: Aug 2006
Posts: 35,375

Xterra, your post is still number 58, but you might instead wish to link via the little "link" button at the bottom right of the posts you wish to cite.
58: http://www.internationalskeptics.com...76#post8389976 67: http://www.internationalskeptics.com...70#post8391270 77: http://www.internationalskeptics.com...67#post8392067 Hope that helps. 
14th November 2012, 06:57 PM  #17 
So far, so good...
Join Date: Apr 2012
Location: On the outskirts of Nowhere; the middle was too crowded
Posts: 2,450

Thanks. I'll keep that in mind.

15th November 2012, 10:23 AM  #18 
Scholar
Join Date: Apr 2012
Posts: 52

The probability of getting a pvalue of 0.001 assuming the claim is true depends on the statistical power of the test. The power in the test depends on the samplesize, alpha level, and the effectsize of the claim. Unfortunately, small effectsizes are generally hard (not impossible) to detect at the 0.001 significance level when the samplesize is small. Otherwise, there's a good chance they will detect the effect. Since I seriously doubt that claimants know how strong or weak their paranormal claims are, chances are they are being tested under inappropriate conditions.
I agree that alpha of 0.001 is a standard in the preliminary test. However, if what Pixel said is true that a single replication of the preliminary creates a pvalue of 0.000001, then the claimant better be good on whatever he claims. If they don't combine them, then I guess the claimant must as well beat odds of billion to one in order to pass the formal test. 
15th November 2012, 10:33 AM  #19 
Scholar
Join Date: Apr 2012
Posts: 52

A million tests will succeed in a row? That is so unrealistic in practical terms, even via conventional research. So, if a study found a pvalue of 0.001, what is the probability of getting five 0.001 pvalues in a row? Simple! 1_ X 10^14
Not even conventional research has reach those kinds of odds. 
15th November 2012, 12:02 PM  #20 
Muse
Join Date: Sep 2008
Posts: 866

Obviously, the question is how many trials you expect to run in total. To safeguard the million, you'd want the chance that the million is paid out to be low, even after all of them are done. And by low I mean fraction of a percent.
If we expect a thousand trials then a million to one is the least that will do. Given the rather large population of professional psychics (IE potential claimants at whom the challenge is actually aimed), expecting thousands of applicants seems reasonable. 
__________________
I don't think it's quite fair to condemn a whole program because of a single slipup. 

15th November 2012, 01:05 PM  #21 
Schrödinger's cat
Join Date: May 2004
Location: Malmesbury, UK
Posts: 8,780


__________________
"If you trust in yourself ... and believe in your dreams ... and follow your star ... you'll still get beaten by people who spent their time working hard and learning things"  Terry Pratchett 

15th November 2012, 02:24 PM  #22 
Scholar
Join Date: Apr 2012
Posts: 108


15th November 2012, 02:45 PM  #23 
Scholar
Join Date: Apr 2012
Posts: 52


15th November 2012, 02:50 PM  #24 
Schrödinger's cat
Join Date: May 2004
Location: Malmesbury, UK
Posts: 8,780


__________________
"If you trust in yourself ... and believe in your dreams ... and follow your star ... you'll still get beaten by people who spent their time working hard and learning things"  Terry Pratchett 

15th November 2012, 03:01 PM  #25 
Schrödinger's cat
Join Date: May 2004
Location: Malmesbury, UK
Posts: 8,780

That depends on what the claimant's claim is. Most claimants claim a considerably higher success rate than they need to achieve to reach the sort of success criteria JREF usually set. For example dowsers usually expect to be able to tell the difference between a buried barrel of water and a buried barrel of sand every time, so the 70% or 80% success rate that's actually needed should be a doddle.
What needs to be remembered is that the applicants never actually do any better than chance. It's not that they do a little bit better, but not well enough to meet the JREF success criteria  their results are always well within that which would be expected by chance alone. 
__________________
"If you trust in yourself ... and believe in your dreams ... and follow your star ... you'll still get beaten by people who spent their time working hard and learning things"  Terry Pratchett 

15th November 2012, 08:57 PM  #26 
Penultimate Amazing
Join Date: Aug 2001
Posts: 10,325

Even if someone only claims a minimal success rate above chance, sufficient repetition could make achieving the required plevel not difficult at all...

__________________


16th November 2012, 08:50 AM  #27 
Master Poster
Join Date: Nov 2007
Posts: 2,120

I disapprove of pvalues, particularly when applied to hypothesis testing for deeply implausible situations as the JREF tests.
A pvalue is usually giving an estimate of the result occurring by chance. This isn't what we're interested in  we want to know the chance the person has paranormal abilities. A pvalue of 0.001 is not useful if someone is claiming an ability that you a priori consider much less likely than that. I'd therefore naturally argue that you want to do a Bayesian model comparison. In practice I'd be prepared to admit that sufficiently strong tests are going to reach the same conclusion whichever approach you take. However, I think that there's also some educational value in the fact that this approach should encourage applicants to make strong claims about their ability. If a dowser thinks they can perform right 7080% of the time they should be encouraged to go for that and be tested on that, and if they don't want to then they can broaden their claim at the expense of having to work harder to demonstrate it by needing a larger sample size. (It's also the sort of approach that is more likely to lead you to a correct conclusion when yet another homeopath claims p < 0.01 results or something, so I think it's considerably more useful when you're at risk of seeing publishing biases) 
__________________
When I look up at the night sky and think about the billions of stars out there, I think to myself: I'm amazing.  Peter Serafinowicz 

16th November 2012, 09:58 AM  #28 
Critical Thinker
Join Date: Nov 2004
Location: Santa Barbara, CA
Posts: 438

While I agree in general principle, in the case a Bayesian model comparison is problematic precisely because JREF and challengers disagree on the model priors.
More to the point probably, JREF is pretty clear that this is not a scientific investigation to uncover the truth. It's a) a chance for a challenger to prove JREF wrong (in which case a classical test is probably reasonable). b) a publicity stunt...so the statistical stuff is just a safeguard against something going wrong accidentally. In my one experience trying to help an applicant negotiate a protocol with JREF there was indeed an issue of a small effect size requiring a somewhat lengthy test. Basically, JREF was unwilling/unable to deal with it. This makes me suspect that item (b) is what governs. (Which I don't have a problem with.) 
16th November 2012, 09:59 AM  #29 
Scholar
Join Date: Apr 2012
Posts: 52


16th November 2012, 10:08 AM  #30 
Scholar
Join Date: Apr 2012
Posts: 52

Pvalues are actually quite useful. The pvalue basically tells you how likely it of getting an observation extreme or more than extreme if the nullhypothesis is true. The pvalue basically measures the evidence for the nullhypothesis. If the pvalue is greater than the standard 0.05, then it can't be argued that the nullhypothesis should be rejected. If, on the other hand, is less than 0.05, then it can be said that the null should be rejected. Keep in mind that the pvalue tells you the probability of the result occuring by chance, not the alternative hypothesis. If P=0.05, then there is a 0.95 chance that the alternative is correct.
Quote:
Quote:
Quote:

16th November 2012, 10:26 AM  #31 
Critical Thinker
Join Date: Nov 2004
Location: Santa Barbara, CA
Posts: 438


16th November 2012, 10:30 AM  #32 
Scholar
Join Date: Apr 2012
Posts: 52

Why not? Aren't pvalues and confidence intervals connected? P=0.05, hence you can be 95% confident that the observed result is due to the alternative hypothesis whereas there's a 5% chance that the observed result is a Type I Error.
Also, I don't agree with his Bayesian approach. Bayesian Statistics is quite controversial and problematic in the statistical community. That's why I said stick with point estimates and confidence intervals. 
16th November 2012, 10:56 AM  #33 
Critical Thinker
Join Date: Nov 2004
Location: Santa Barbara, CA
Posts: 438

Sure pvalues and confidence intervals are connected. The right statement is that 95% of the time the confidence interval includes the true value of the parameter. The confidence interval is not a posterior distribution for the true value, although it maybe approximately so...if you're a Bayesian.
Loosely speaking, the problem is that Bayes law (nothing to do with being a Bayesian) requires paying attention to Type II error as well as Type I error. And my reading is that Bayesian statistics is much less controversial than it once was, although there remain skeptics on both sides. [Note to mods: I assume if this drifts too far you'll move it.] 
16th November 2012, 11:20 AM  #34 
Master Poster
Join Date: Nov 2007
Posts: 2,120


__________________
When I look up at the night sky and think about the billions of stars out there, I think to myself: I'm amazing.  Peter Serafinowicz 

16th November 2012, 02:24 PM  #35 
Scholar
Join Date: Apr 2012
Posts: 108

I don't understand what your comment has to do with mine. I was responding to Beerina's claim that a million tests in a row need to succeed. Why should psychic abilities require 100% accuracy? If they exist, they likely operate the same way other human abilities do, subject to constraints, good days/bad days, and external stressors. The very best batters only hit about 10% of the pitches thrown their way. Why does Beerina think psychics could successfully perform a million tests in a row when no other human endeavor can?

17th November 2012, 02:27 AM  #36 
Philosopher
Join Date: Aug 2005
Posts: 6,363

First, I think Beerina was speaking metaphorically.
Second, without sufficient data I would try to refrain from speculation what psychic abilities  should they exist  can and cannot do, how they are influenced, etc. Third, picking baseball hitters is a clever ploy because in baseball success for a hitter is (roughly) defined b a .300 batting average. One could as easily have chosen baseball pitchers, even better relievers, and see success rate jump significantly. But that would have weakened one's argument, would it not? Conclusion: What people like Beerina, Pixel42 and myself are trying to convey is, that e.g. a spoonbender sitting in a comfortable kitchen should have a blowusallaway success rate, easily clarifying something "paranormal" or "supernatural" going on. Under controlled conditions absolutely eliminating manipulation from both sides, this success rate would be one in a million. Furthermore, that would be a noodlescratcher for both sides, would it not? 
17th November 2012, 02:28 AM  #37 
Schrödinger's cat
Join Date: May 2004
Location: Malmesbury, UK
Posts: 8,780

I was just pointing out that even if we concede your point that we shouldn't expect these abilities to be any more consistent than those of talented batsmen, chess players etc, we would still expect that they would (as with such abilities) produce results that are significantly better than random chance. And they don't.

__________________
"If you trust in yourself ... and believe in your dreams ... and follow your star ... you'll still get beaten by people who spent their time working hard and learning things"  Terry Pratchett 

17th November 2012, 08:44 AM  #38 
Scholar
Join Date: Apr 2012
Posts: 52

That's why in Statistics we calculate the Type I Error probability before doing a one/twotailed ttest. Since the Type I error rate for the preliminary is 0.001, hence we would expect by average one in a thousand applicants to pass by dumb luck. If the significant results were significantly better than the thousand to one rate, we can conceive these results as evidence for the paranormal. This can be determined by calculating the pvalue of significant studies out of nonsignificant ones.
Unless the JREF decided to combine the pvalue, the overall Type I Error probability of the claimant passing both tests is a billion to one. Expecting an exact 100% or near 100% replication is very ridiculous and extremely conservative. Telling a psychic to pass 100 tests in a row is like telling famous baskeball player, Brian, to never miss a basket. 
17th November 2012, 11:47 AM  #39 
Philosopher
Join Date: Aug 2005
Posts: 6,363


17th November 2012, 12:36 PM  #40 
Scholar
Join Date: Apr 2012
Posts: 108

Why? Any human ability should fall within normal parameters compared to other human abilities. Anyone can play piano after a few lessons, but only some people will reach virtuoso level after many years of study and practice.
An exceptional baseball pitcher may be defined as one who pitches a nohitter game. There have only been 236 nohitters in the past 111 years, so the success rate does not exactly jump significantly. And no, despite your bizarre claim about my presumed motive, choosing pitchers or any other skilled human would not weaken my argument. Education and practice are the keys to acquiring skill in any field. If psychic skills exist, why should they be any different? Because you say so? 
Bookmarks 
Thread Tools  

