Elephant Meeting on Big Data Services
While we try to get to grips with the ever-increasing “Big Data” deluge we recognize that adequate Web services are a key prerequisite for ubiquitous, flexible, and fast data access. In a massive concertation effort several large European initiatives have teamed up now to address the service challenge. From 12 through 13 November 2015 an inaugural EUDAT Workshop on services for Big Data held successfully at the Supercomputing Center in Barcelona, Spain. Representatives of three decisive Big Data projects - EUDAT, EarthServer, and EPOS - have come together to discuss innovative alternatives for value-adding services.
To consolidate activities around these specific themes the workshop was divided in several tracks focusing on the topics of Big Data semantics, federated Data Mining, and multi-dimensional Array Databases for large time series. Discussions started by capturing best practices and discussing the current state of development and activities in the respective areas. Questions like: How can data processing be orchestrated optimally or how can scientific workflows make use of EUDAT services were discussed intensively in different working groups.
Peter Wittenburg, scientific coordinator of the EUDAT Data Infrastructure, convened a critical variety of expertise from Europe and the US. Especially the topic of multidimensional arrays was focused by the experts because of playing a major role in scientific and engineering data. In a summary Mark van de Sanden, EUDAT Workpackage Leader, and Peter Baumann, workshop facilitator of the EUDAT Array Database track, pointed out possible roles of EUDAT in the future:
- Iaas service provider: providing a cloud infrastructure to run Array Databases
- SaaS service provider: providing an Array Database as an domain-independent, horizontal service
- Providing tools for easy data movement between EUDAT DCI domain and User domain
- Providing domain services (e.g., geo, astro, life sciences) based on a common horizontal platform of array services, thereby leveraging cross-community effects
Peter Baumann resumed his experiences of running large-scale infrastructures in his presentation:
“Of course multidimensional arrays do not stand alone, they are intertwined with other data types, but typically they constitute the “Big Data” part. Therefore, it makes sense to integrate arrays into common data management platforms.“ The flexibility of querying data, achieving data independency, scalability and standards conformance are critical advantages of Array Database technologies. Among the challenges spotted were integration of heterogeneous data types, including arrays, into a single common information space for users. Array intensive domains like the Earth-, Space- and Life Sciences were considered as possible candidates of future EUDAT services.
The following presenters contributed their expertise to the Array Database track:
- Peter Baumann (Workshop Facilitator, Array Database expert) - Jacobs University Bremen, Germany
- Kwo-Sen Kuo (Array Database expert) - NASA collaborator, US
- Stefan Pröll (Data Citation expert) - SBA Research, Austria
- Simone Mantovani (Atmospheric Analysis expert) - MEEO s.r.l., Italia
- Alessandro Spinuso (Seismology expert) - KNMI, Netherlands
- Luca Trani (Seismology expert) - KNMI, Netherlands
- Thomas Zastrow (expert for Data Analysis in the Humanities) - Max Planck Gesellschaft, Rechenzentrum Garching, Germany
- Mark van de Sanden (EUDAT Workpackage Leader) - SURFsara, Netherlands
The European Data Infrastructure EUDAT aims to contribute to the production of a Collaborative Data Infrastructure (CDI). The project´s target is to provide a pan-European solution to the challenge of data proliferation in Europe's scientific and research communities. Increasing complexity and massive growth of data has outpaced the development of tools to deal with it.
Corresponding to this challenge the intercontinental initiative EarthServer aims for unleashing the potential of Big Data through a disruptive paradigm shift in service technology. EarthServer has established open ad-hoc analytics on massive Earth Science data, based on and extending leading-edge Array Database technology, rasdaman. Now the participating data centers are extending this to a Petabyte of 3-D and 4-D datacubes. Technology advance will allow real-time scaling of such Petabyte cubes, and intercontinental fusion.
The European Plate Observing System EPOS contributes by planning a research infrastructure for European Solid Earth science, integrating existing research infrastructures to enable innovative multidisciplinary research, recently prioritized by the European Strategy Forum on Research Infrastructures ESFRI for implementation.