SAC # 112

Data Mining for Dummies: Moving away from the P-value and into predictive analysis  ___________________________________________________________________

 

TIME | 30 June 2011, 4pm onwards

LOCATION | CSAFE Seminar Room, , Dunedin

SPEAKER | Grant Humphries

Abstract

The P-value was introduced in the early 20th century as a method to denote the statistical significance of data. Most commonly this is used in experimental studies to determine whether two sets of data are considered “significantly” different. Despite advances in statistics and mathematics in the last 100 years, the p-value is still commonly used to delineate “significance” despite its ability to capture long-run outcomes. Data mining, a type of statistical analysis which does not require a priori assumptions about the distribution of data, was introduced in the late 1980s and has become increasingly popular among statisticians in a variety of fields. The most common uses of data mining include Google advertisements, insurance predictions and marketing campaigns. With the success of this type of analysis in the business field, ecologists have begun to utilize these techniques to analyze their own data. Ecological data is often “messy”, complex, and involves many interacting variables. It has been common practice for ecologists to remain faithful to the p-value and linear regression techniques, which have been outdated for many years. But what is the real value in limiting one’s study by attempting to find “significance” in static datasets without making predictions on how systems change if the underlying data changes? This presentation will outline a brief history of the p-value as well as introduce data mining in a simplistic fashion to illustrate the differences between static and predictive analysis. Examples of predictive analyses will be given using generated and real data. Though the P-value is a tried, tested and true way of performing analysis, data-mining provides a new alternative that is up-to-date, accurate, and powerful. 

About the Speaker

Grant Humphries is originally from Newfoundland, Canada where he did his undergraduate Honours degree at Memorial University in Marine Biology. During his time there, Grant studied geographic variation in Leach’s Storm-Petrel (Oceanodroma leucorhoa) vocalizations between Alaska and Newfoundland. His studies continued at the University of Alaska Fairbanks where he began work in data-mining to create global ocean models of Dimethylsulfide (a climatically active biological compound), and to create models of Storm-petrel distribution in the North Pacific. During his time there he learned GIS analysis techniques and programming skills which led him to write several spatial analysis packages for the Alaska GAP project and another for Salford Systems ltd. in San Diego. Grant has given 11 talks since beginning his Master’s degree and has collaborated with individuals from many countries and is very active in the seabird community. Grant started his PhD work at the University of Otago in April and is now working with sooty shearwaters (Puffinus griseus) as a predictor of large scale climate oscillations. 

 

Back to 2011 Seminars >

Website by Loop Solutions © 2013 Centre for the Study of Agriculture, Food and Environment. All rights reserved.

University of Otago

Tel.  | Fax.