P-curve app (Morey & Davis-Stober)

You can enter any of five kinds of test statistics (F, z, t, r, chi2) in the format shown by the default value of the analysis. In general, these are the same statistics and formats supported by Simonsohn et al's app. Lines highlighted in pink cannot be read as test statistics and are ignored. You can use feature to add comments to your input.

Also, any text after a test statistic that follows a hash (#) will be read as a line-specific comment and highlighted in blue. You can use this to label specific test statistics; for instance, to note what study they've been taken from. The comments will be shown in the plot and in the data table.

You can also click the "📋 Link" button at the top of the page to obtain a link that will point back to the current analysis. The link will be copied to your clipboard.

If you would like to copy a simplified version of the analysis text for pasting into Simonsohn et al's app, click the "📋 Text" button below the analysis text area. This removes all comments and copies the simplified analysis to your clipboard.

The code base for this app is completely independent of Simonsohn et al's app, so may produce slightly different results in some cases. For instance, as of version 4.10, Simonsohn et al's app truncates all p values to be greater than 2.2e-16 (corresponding to a Z statistic of about 8.21). Our app does not do this.

See the Examples section for demonstrations.

The plot shows the empirical cumulative distribution of the p values of the significant (at 0.05) test statistics entered into the app. Note that for clarity the x axis is logarithmic. The figure also shows what would be expected if the p values were independent draws from a uniform distribution.

Each point represents a study entered into the textbox. Move your mouse over a point to get information about that point. Each point is treated as an order statistic.
The light blue ribbon represents the 90% interval (5%-95%) in which the corresponding order statistic (e.g. the third smallest p value if the point is the third from the bottom) would be expected to be found if the distribution of p values were uniform. Note that this interval is point-wise, and not simultaneous (that is: there is a 90% probability that each point is within the ribbon, not that all points are within the ribbon). Note also that order statistics are not independent (so the probability that more than one is outside the ribbon, even given a uniform distribution, may be larger than one would expect).
The dark blue line within the ribbon represents the median for the corresponding order statistic if the distribution of p values were uniform.
The purple diamond represents the geometric mean of all the significant p values entered into the app.
The light blue band represents the 90% interval (5%-95%) within which the geometric mean p value would be expected to be found if the distribution of p values were uniform. This is a form of Fisher's meta-analytic test conditioned on significance (that is, Simonsohn et al's 2014 test for "evidential value"); if the diamond is to the left of this band, this test is significant at the 5% level.

The tests table contains the results of the various P-curve tests that Simonsohn et al have developed. Each column is described below. We do not show the half P-curve tests by default, for reasons that are explained in our paper. Click the "Include half P-curves?" check box to show them. We also do not show the "left skew" tests by default, given that the authors have not focused much on these tests. lick the "Include LS tests?" check box to show them.

Test: "EV" refers to Simonsohn et al's "evidential value" or "right skew" test; "LEV" refers to their "lack of evidential value" or "flatter than 33% power" test; "LS" refers to their "left skew" test. Note that the authors never developed the LS test for the Stouffer transform in 2015, but its p value is simply one minus the p value for the Stouffer "EV" test, so it is trivial.
α: α (alpha) represents the critical boundary chosen for the test. 0.05 is their "full" P-curve; 0.025 is their "half" P-curve.
Fisher χ²: The χ² test statistic for Simonsohn et al's (2014) P-curve test using a log transformation on the scaled p values (Fisher's method). The test statistic has 2×k degrees of freedom, where k is the number of significant studies.
Fisher p: The p value corresponding to the Fisher χ² test statistic.
Stouffer Z: The Z test statistic for Simonsohn et al's (2015) P-curve test using a probit transformation on the scaled p values (Stouffer's method).
Stouffer p: The p value corresponding to the Stouffer Z test statistic.
# studies: The total number of test statistics detected in the input.
# sig.: The number of studies significant at the α-level in the α column.

The data table shows all the test statistics detected in the input.

Line: The line number of the input on which that statistic was found.
Input: The test statistic entered on this line of the input.
Comment: The comment given for that test statistic (entered after a hash (# on that input row).
p: The recomputed p value for the entered test statistic.
Sig.?: Is the p value in this row significant at 0.05? ✅ means p<0.05, and hence the value will be included in the full P-curve; ❌ means p≥0.05, and hence the value will not be included.
Fisher: The contribution of that row's test statistic to the overall Fisher's χ² statistic. The sum of this column is the total χ² statistic in the tests table (within rounding error).
Stouffer: The contribution of that row's test statistic to the overall Stouffer's Z statistic. The sum of this column is the total Z statistic in the tests table (within rounding error).
LEV NCP: The noncentrality parameter used for the full P-curve LEV test ("lack of evidential value", or "flatter than 33% power") for the test statistic in that row; i.e. the distribution of the test statistic that would yield a 1/3 probability of significance at 0.05.

P-curve analysis app

Morey & Davis-Stober

Scroll down, or choose from the menu above, to see the results of P-curve analysis. You can modify the analysis be editing the textbox to the left. Also, try some of the examples. The code for this app can be found on GitHub.

We do not recommend the use of the P-curve due to its poor statistical properties. See the examples and our paper for details.

Visualization

No significant results entered.

Tests

Include half P-curves?

Include test LS?

No significant results entered.

Data table

No significant results entered.

Examples

All links open in new windows; you won't lose your current analysis by opening an example.

Replications of published analyses

The replications of analysis demonstrate how this app improves the transparency of P-curve analyses using comments and links.

Replication of the analysis in the supplemental information of Lee & Schwartz (2021)
Replication of the main analysis in Simmons & Simonsohn (2017)
Replication of the main analysis in Cuddy, Schultz, & Fosse (2018) - the reply to Simmons & Simonsohn (2017), above

Demonstrations of problems with the P-curve

For most of these we do not use Simonsohn et al's (2015) full/half rule for determining "evidential value"; to see why, see the nonmonotonicity demonstration.

Sensitivity

Rounding sensitivity in full P-curve, test EV (Stouffer/probit) and test LEV (Fisher/log)
Rounding sensitivity in half P-curve, test EV (Stouffer/probit) and test LEV (Fisher/log)
Cancellation of arbitrarily large test statistics by values near the boundary, tests EV and LEV (Stouffer/probit)
Cancellation of many arbitrarily large test statistics by values near the boundary, test LEV and LS (Fisher/log), producing a "lack of evidential value" and "left skew". Also notice that the EV tests are significant.

Evidential value

Studies both have "evidential value", and lack it tests EV and LEV (Stouffer/probit)

Nonmonotonicity

These six sets of studies have test statistics that dominate each earlier set. Set 1 is not significant by Simonsohn et al's (2015) half/full evidential value rule and set 2 is significant. But then, set 3 is not significant and set 4 is significant. Then set 5 is not significant, and set 6 is significant. The procedure is not monotone in the evidence.

Set 1, "not significant": Full P-curve yields p>0.1
Set 2, "significant": Increased two test statistics from set 1, and both full and half P-curves yield p<0.1
Set 3, "not significant": Increased two test statistics from set 2, but half P-curve yields p>0.1
Set 4, "significant": Increased two test statistics from set 3, and both full and half P-curves yield p<0.1
Set 5, "not significant": Increased two test statistics from set 4, but half P-curve yields p>0.1 (so not significant by their rule)
Set 6, "significant": Increased two test statistics from set 5, and both full and half P-curves yield p<0.1 (so significant by their rule)

Citation

Our citations

Morey, R. D., & Davis-Stober, C. P. (2025). On the poor statistical properties of the P-curve meta-analytic procedure. Journal of the American Statistical Association, 1–19. https://doi.org/10.1080/01621459.2025.2544397
Morey, R.D. & Davis-Stober, C.P. (2024). P-curve demonstration app (website) https://richarddmorey.github.io/pcurveAppTest

General P-curve citations

Simonsohn, U., Nelson, L. D., & Simmons, J. P. (2014). P-curve: A key to the file-drawer. Journal of Experimental Psychology: General, 143, 534–547. doi:10.1037/a0033242
Simonsohn, U., Simmons, J. P., & Nelson, L. D. (2015). Better P-curves: Making p-curve analysis more robust to errors, fraud, and ambitious p-hacking, a reply to Ulrich and Miller (2015). Journal of Experimental Psychology: General, 144(6), 1146–1152. doi:10.1037/xge0000104

General help

Visualization help

Tests help

Data table help

P-curve analysis app

Morey & Davis-Stober

Visualization

Tests

Data table

Examples

Replications of published analyses

Demonstrations of problems with the P-curve

Sensitivity

Evidential value

Nonmonotonicity

Citation

Our citations

General P-curve citations