Harvard Forest image
Home

Research

Data
Data archive
New England Center of Ecological Synthesis

Publications

Professional and Education Opportunities

Staff and Contacts

Site Map and Search



Harvard Forest Logo

The Analytic Web: Process Metadata for Ecological Analysis and Synthesis

HF091 Overview Data EML Archive
  • Investigators: Emery Boose, Lori Clarke, Aaron Ellison, David Foster, Julian Hadley, David Jensen, Paul Kuzeja, Leon Osterweil, Alexander Wise
  • Contact: Aaron Elison
  • Start date: 2002-07-01
  • End date: ongoing
  • Location:
  • Latitude:
  • Longitude:
  • Elevation:
  • Taxa:
  • Keywords: analytic web, computer science, eddy covariance, eddy flux, Little-JIL, metadata, modeling, ODIN
  • Abstract:

    A wide variety of datasets produced by individual investigators are now synthesized to address ecological questions that span a range of spatial and temporal scales. It is important to facilitate such syntheses so that "consumers" of datasets can be confident that both input datasets and synthetic products are reliable. Necessary documentation to ensure the reliability and validation of datasets includes both familiar descriptive metadata and formal documentation of the scientific processes used (i.e., process metadata) to produce usable datasets from collections of raw data. Such documentation is complex and difficult to construct, so it is important to help "producers" create reliable datasets and to facilitate their creation of required metadata. In this project, researchers from the Laboratory for Advanced Software Engineering Research at the University of Massachusetts at Amherst, together with researchers at the Harvard Forest, developed and demonstrated the SciWalker software for creating "analytic webs" - systems that aid both producers and consumers of datasets by providing complete and precise definitions of scientific processes used to process raw and derived datasets. The formalisms used to define analytic webs are adaptations of those used in software engineering, and they provide a novel and effective support system for both the synthesis and the validation of ecological datasets. We used a prototype analytic web to produce synthetic datasets through a worked example: the synthesis of long-term measurements of whole-ecosystem carbon exchange. Analytic webs are also useful validation aids for consumers because they support the concurrent construction of a complete, internet-accessible audit trail of the analytic processes used in the synthesis of the datasets. Ongoing work is now focused on determining if the process metadata created by SciWalker can be adapted for inclusion in Ecological Metadata Language (EML) files.

  • Methods:

    Statistical Models of Carbon Exchange

    We used statistical models to estimate whole-ecosystem carbon exchange and one of its major components, soil respiration. Data collected continuously during the summer and fall of 2002 in and above a relatively young deciduous forest and in and above a hemlock forest were modeled using multiple regression of commonly measured environmental factors, specifically photosynthetically active radiation above the canopy, above-canopy air temperature, soil temperature and soil moisture.

    Two examples of statistical models are the following, for whole-forest carbon exchange during light and dark periods in June 2002:

    Daytime: FCO2 = -3.87 + Tair*1.8217 + Tsoil*-1.2667 + PAR*-0.02478 + ln (PAR)*3.880 + Tair*PAR*0.000768 + Tair*ln(PAR)*-0.3395

    Nighttime: FCO2 = 2.742*10^(0.015*Tsoil)

    where FCO2 is whole-ecosystem carbon exchange, Tair is air temperature above the canopy, Tsoil is soil temperature at 10 cm depth, and PAR is photosynthetically active radiation above the canopy. Each of the models was developed from more than 150 half-hour measurements of CO2 flux above the canopy measured by the eddy covariance method, plus an equal number of half-hourly averages for Tair, Tsoil and PAR. Similar models were created for each month from July through December. Only environmental parameters or cross-products with a statistical significance less than or equal to 0.01 were used in the models; in the nighttime example above, only Tsoil met this condition.

    A challenge inherent in these data is that some of the data are judged to be reliable for measuring of carbon flux, and some of the data are unreliable because of insufficient turbulence or other conditions that prevent adequate mixing of air from within and above the forest. Using the reliable data, a predictive model can be created based on environmental measurements, and this model is used to impute estimates of carbon-flux when data are unreliable.

    Process Modeling

    We used a prototype analytic web produced by SciWalker to first identify and separate reliable measurements of carbon exchange from unreliable measurements. The analytic web then applies an integrated model of the carbon exchange process, based on the two equations described above to the reliable data, and imputes data values to replace the unreliable data. Finally, the analytic web computes net (monthly or annual) carbon flux using statistical models applied to the combination of the reliable and imputed data. The same analytic web can be used to model data from other eddy covariance towers.

    In these models, we distinguish between activity-centered process representations, such as our process language Little-JIL, and more familiar data-centered representations. The data-centered representation includes both a dataflow graph (type model) that describes the legal and expected relationships between the types of published artifacts in the web, and a data derivation graph that describes the actual web of interconnected artifact instances.

    We have developed a more generic user interface for SciWalker to allow for the creation of analytic webs for other scientific processes. The interface and its associated analytic webs are based on data-centric process representations, constrained by activity-centric processes. Both processes are used in combination as a mechanism to select a relevant subset of the possible user activities and artifact instances.

  • Related datasets: HF072 HF103 HF092