Osterweil, L. J., L. A. Clarke, A. M. Ellison, R. Podorozhny, A. Wise, and E. Boose. 2008. Experience in using a process language to define scientific workflow and generate dataset provenance. Pages 319-329 in Proceedings of the 16th ACM SIGSOFT International Symposium on the Foundations of Software Engineering (ACM SIGSOFT 2008 / FSE 16).


This paper describes our experiences in exploring the applicability of software engineering approaches to scientific data management problems. Specifically, this paper describes how process definition languages can be used to expedite production of scientific datasets as well as to generate documentation of their provenance. Our approach uses a process definition language that
incorporates powerful semantics to encode scientific processes in the form of a Process Definition Graph (PDG). The paper describes how execution of the PDG-defined process can generate Dataset Derivation Graphs (DDGs), metadata that document how the scientific process developed each of its product datasets. The paper uses an example to show that scientific processes may be complex and to illustrate why some of the more powerful semantic features of the process definition language are useful in supporting clarity and conciseness in representing such processes. This work is similar in goals to work generally referred to as Scientific Workflow. The paper demonstrates the contribution that software engineering can make to this domain. 

