Methods for big spatial data – from statistics

Methods for big spatial data – from ML/AI

  • Apply the proposed method for regression tasks for various datasets: Combined Cycle Power Plant, Ailerons from LIAD (Laboratory of Artificial Intelligence and Decision), Elevators from LIAD, California Housing from LIAD, Airline Delay from DOT, Household Power Consumption from UCI, and the sinc function.

  • Compare to other methods such as Sparse Gaussian Process and Stochastic Variational Gaussian Process.

  • The results showed that proper statistical models (BCL, GpGp, GpGp0, SPDE, and NNGP) consistently outperformed other (i.e., deep learning and algorithmic) approaches (DeepGP, DeepKriging, Gap-fill, FRK) on all five simulated competition datasets and also the total precipitable water (TPW) dataset.

Methods for spatial data – from statistics

Methods for spatial data – from ML/AI

  • Provide a systematic review on the principles and methods on spatial prediction.

  • Provide a taxonomy of existing spatial prediction methods, as shown in Fig. 1.

GP (Gaussian process) models

Variants of GPs

NNGPs (Nearest-neighbor Gaussian process)

NNMPs (Nearest-neighbor mixture process)

Deep learning for spatial data

Deep GPs

Deep kriging

Convolutional Gaussian Processes

  • Establish a formal connection between GMRFs and convolutional neural networks (CNNs).

  • Refer to Lindgren et al., 2011 for the link between GPs and GMRFs.

  • DGMRF outperforms all the methods from the competition on all criteria (e.g. MAE, RMSE, CRPS, INT) except CVG, where it is slightly worse than NNGP (Nearest-neighbor Gaussian process).

Deep learning

  • Discuss: What is deep learning? What are the new characteristics of deep learning, compared with classical methods? What are the theoretical foundations of deep learning? from a statistical point of view.

Methods for spatiotemporal data

Generative mixture models

Mixture-based models

Copula-based models


Copula-based models for non-Gaussian time series

Copula-based models for non-Gaussian spatial data

Methods for time series (temporal data)

  • The results of this study have shown that no model is a panacea over the other two models, but they demonstrated that the deep learning LSTM (long short term memory) and EM-algorithm based MTD (mixture transitions distribution) models slightly outperformed the classical SARIMA (seasonal autoregressive integrated moving averages) model in predicting ARI (acute respiratory infection) values.

Big data

Big spatial data

Big time series