WANDS Background
The term workflow has grown to encompass a diverse range of technologies, ranging from scheduling systems, via graphical representations of functional programs, to high level business process abstractions. However, in recent years, different facets of technology have started to converge and combine to provide novel and often surprising solutions to problems in numerous scientific and industrial problem domains.
Indeed, scientific workflow systems are playing an increasingly central role as part of the computational infrastructure for eScience, through a rich and diverse offers of workflow design tools and execution environments, both in the public domain and in the commercial sector. In recent years, we have witnessed interest in the adoption of workflow technology in domains as diverse as sequence analysis, toxicity profiling, business intelligence and reporting, pharmaco-epidemiology and earth sciences. Applications in these domains range from compute-intensive, repetitive simulations, to data integration and data analysis.
The recent shift in the focus of experimental science, from a mainly computational to a predominantly data-centric discipline, is presenting new challenges to the workflow community. As more scientific data will be collected in the next five years than it has been in the entire past history of science, some of the key challenges for science include very large-scale data analysis, aggregation, filtering, visualization, and quality management.
A number of recent public funding initiatives on the "DataNet" theme, including among others the recently started NSF-funded DataOne project\footnote{http://www.dataone.org}, show the relevance and timeliness of a technical discussion on the technology for data-centric eScience.
Recognising the convergence between data and workflow management, this workshop will bring together experts from both areas, to discuss the future role, potential impact, and technical challenges of workflow technology for large-scale data-intensive science.
