Abstract: |
Manage Geospatial Big Data is a challenging task and each time a more frequently task in ecosystem monitoring, due the accelerated increase and accessibility of geographical technologies and archives. For this reason, this research focus in the design and development of two reproducible processing chains in open source software, using the High Performance Computing approach for manage the volume, variety and velocity dimensions of two cases of Geospatial Big Data. The first one, constitutes a large collection of images of the Landsat satellites, which should be sequentially processed, in order to prepare a time-series analysis of the regeneration process of disturbed tropical forests in Ecuador. The second case constitutes a unique complex database of different sources and types of Geospatial data, which should be organized and harmonized to allow an exploratory statistical analysis and pattern extraction of the drivers that influence the restoration process of disturbed tropical forests in Ecuador. For this purpose, the design of the processing chains are based in parallel computing for divide and distribute small pieces of data between the processing units available. Therefore, the design implemented allows the possibility to scale-up the computing resources, if they are available. Our first results, applied to a multi-core computer, showed that the design of the processing chain applied to the large collection of images of the Landsat satellites is the only way to manage the volume and velocity dimensions of Geospatial Big Data. |