Environmental Science Division (EVS)a Division of Argonne National Laboratory

High-Dimensional Time Series Forecasting with Back Propagation Neural Networks: A Method of Feature Selection Utilizing Conditional Inference Forests

Monday, August 15, 2016

Paul Tarpey
Environmental Science Division
Argonne National Laboratory
Monday, August 15, 2016
12:00 P.M. to 1:00 P.M.
Argonne National Laboratory
TCS Building 240
Room 1407

As the cost of data acquisition and storage continues to decrease with the persistent advent of technological advancements, the size of data sets generated through academic and industrial research continues to increase in both the number of observations and the number of variables recorded. Given that much of this data has a natural temporal ordering, the resulting proliferation of high-dimensional time series has created the need for forecasting methods well-suited to handle the many challenges inherent to this new information landscape.

The application of artificial neural networks to high-dimensional time series forecasting has recently continued to gain popularity throughout the machine learning community. However, the potential they present for highly accurate results can be offset by their complex initialization procedures and their lack of explanatory power. While classification and regression trees have often been used in feature selection for other forecasting methods, in no small part due to the ability to readily interpret their results, it has been shown that these trees have a selection bias towards covariates with many possible splits. Conditional inference trees separate the covariate selection and splitting procedure, wherein the conditional distribution of statistics measuring the association between responses and covariates allows for an unbiased selection among covariates.

Using the high-temporal-frequency hyperspectral data sets generated by the EcoSpec project in order to forecast gross primary productivity (GPP) of the terrestrial ecosystem as a case study, we propose a method of feature selection utilizing conditional inference forests in order to generate the initial input for a back propagation neural network with a focus on optimizing a combination of prediction accuracy and model interpretability.

Paul Tarpey
Paul Tarpey
CONTACT US
portrait of Yuki Hamada