Obradovic, Z., Vucetic, S., Peng, K., and Han, B., (2004). Data Mining for Efficient and Accurate Large Scale Retrieval of Geophysical Parameters . Eos Trans. AGU, 85(47), Fall Meet. Suppl. 2004, Abstract # NG34A-01 INVITED
Our effort is devoted to developing data mining technology for improving efficiency and accuracy of the geophysical parameter retrievals by learning a mapping from observation attributes to the corresponding parameters within the framework of classification and regression. We will describe a method for efficient learning of neural network-based classification and regression models from high-volume data streams. The proposed procedure automatically learns a series of neural networks of different complexities on smaller data stream chunks and then properly combines them into an ensemble predictor through averaging. Based on the idea of progressive sampling the proposed approach starts with a very simple network trained on a very small chunk and then gradually increases the model complexity and the chunk size until the learning performance no longer improves. Our empirical study on aerosol retrievals from data obtained with the MISR instrument mounted at Terra satellite suggests that the proposed method is successful in learning complex concepts from large data streams with near-optimal computational effort. We will also report on a method that complements deterministic retrievals by constructing accurate predictive algorithms and applying them on appropriately selected subsets of observed data. The method is based on developing more accurate predictors aimed to catch global and local properties synthesized in a region. The procedure starts by learning the global properties of data sampled over the entire space, and continues by constructing specialized models on selected localized regions. The global and local models are integrated through an automated procedure that determines the optimal trade-off between the two components with the objective of minimizing the overall mean square errors over a specific region. Our experimental results on MISR data showed that the combined model can increase the retrieval accuracy significantly. The preliminary results on various large heterogeneous spatial-temporal datasets provide evidence that the benefits of the proposed methodology for efficient and accurate learning exist beyond the area of retrieval of geophysical parameters.
[Full text not yet available]