We may use these term interchangeably in the discussion below. Data dredging is recognized by several names such as ‘fishing trip’, ‘data snooping’, ‘p-hacking’ and so on. This may lead to an exponential increase in the risk of inclusion of large quantities of false positive results, thereby corrupting the data that was meant to be originally reported. Impact of data dredging on epidemiologyĭata dredging is defined as “cherry-picking of promising findings leading to a spurious excess of statistically significant results in published or unpublished literature”.The following discussion will attempt to define data dredging and provide an answer to such questions. You have just fished for data.What is data dredging? How does it affect the p-value? What is its impact on the world around us? You would like to show progress, so you decide to include them in your report to management. Your company has no business in Ohio or Louisiana, and it there is no reasonable conclusion why these correlations exist, even though the p-values are less than 5%. Of the 100, two show statistical significance: interest rates on boat loans in south Louisiana and household debt in north Ohio. You download a database from a government website and decide to run a series of automated regressions. Imagine that instead of choosing only four macroeconomic indicators, you chose 100. This could have gone another way…īUT wait! This could have gone another way. An honest analyst would not make this claim, but show that there is no relationship. You had conflicting evidence of the statistical significance of two correlations, but since one side of the evidence supports a personal claim you would like to make, you decide to accept it. Since you’re in a bind, you decide to claim that GDP and Employment rate have a strong relationship to revenue, and that you’re going to explore this further.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |