Methods Board Meeting, Middleton, WI, June 18-21, 2007
Notes from Physical Habitat Data Elements Committee
This document combines the minutes of the 4 special work meetings held in Middleton, WI between PHAB workgroup members, MDCB officers, and USGS staff.
A. Monday June 18, 2007 (AM) and Tuesday June 19, 2007 (AM)
Participants: Faith Fitzpatrick and Revital Katznelson
Summary by: Revital
Purpose: Prepare progress report and handouts for meetings with MDCB officers and USGS staff
PHAB workgroup members Faith and Revital, who took the lead on development of PHAB data element list and Protocol matrix, met and discussed their progress to date and their plans for the meeting. Revital received feedback on the handouts she intended to use to facilitate the discussion.
Action: Revital to edit and update the handouts for Tuesday afternoon’s deliberations with Dan and Eric.
B. Tuesday June 19, 2007 (PM)
Participants: Faith, Revital, Dan Sullivan, and Eric Vowinkel (by phone)
Dan’s notes from afternoon of 6/19, with Revital’s edits/additions
Purpose: Clarify everyone’s ideas about what products the PHAB workgroup is hoping to deliver and what are the options for generating them; discuss how to present our strategy in Wednesday’s meeting.
Background: The initial PHAB workgroup’s task is to update the PHAB data element lists so it can be added to existing WQDE lists. However, there are additional requests regarding integration of PHAB data elements with WQX that need to be addressed, and there are several options of response. Other desirable products include outreach and communication materials to encourage the use of PHAB WQDE by all monitoring entities.
Eric clarified the outreach and communication products to be developed:
- Nat’l. Monitoring Conference in May 2008: web page, workshop, presentation, poster, etc.
- Are there other conferences we should also consider?
Revital reviewed the three types of element lists we may need:
- the “Long List” - Field (Project) level – comprehensive list, lots of details at the project level, for communication of items that are of interest to Project manager, QA/QC officers, field operators, and other Project personnel; information is needed at the finest spatial resolution (e.g., Sub-station level).
- the ‘Short List” – core WQ Data elements – to be shared with people outside the Project; just the essential metadata
- WQX list – may be a sub-set of the Long List (#1) or the same as the short List (#2), but needs to be organized as Data Fields and pick lists (a.k.a. domain lists, or lookup lists)
Starting point – where should the Board begin? Revital and Faith laid out the different strategies:
One – create the Long List first, then hone it to select the essential – or Core – data elements.
Two – start with existing WQDE; refine the draft “short list” of PHAB elements and add it to the existing modules. Expand the Short List to create the Long List if needed.
The group agreed that there may be other strategies, or a different combination of steps, that we need to be open to.
The meeting participants recommended looking deeper into creation of the Long List from Scratch (option one), i.e., build the comprehensive list based on existing protocols. We also discussed how to organize the list, and Revital suggested to organize it by subject matter and content (similar to the “Field Road Map” she previously shared with MDCB) with clear distinction between data field name and database cell content, so it is immediately usable by WQX. This organization can also become a tool for building cross-walks between agencies and protocols.
Action: Revital will finalize and prepared the handouts for the Wednesday meeting with USGS staff.
Action: ASAP after the meeting, Dan, Faith and Revital will prepare a meeting summary for MDCB and a few power-point slides for Eric to present at the NWQMC council meeting in July.
C. Wednesday, June 20 (all day) and Thursday, June 19, 2007 (AM)
Participants: Faith Fitzpatrick, Revital Katznelson, Dan Sullivan, Jana Stewart, Morgan Schneider, Nate Booth, Mitch Harris, Pete Ruhl, Eric Vowinkel (on the phone), Randy Hill (on the phone)
Dan’s notes from all sessions, with Revital’s edits/additions
Objective: Initiate a dialog between MDCB workgroup, USGS Biological database, and WQX to discuss the relationship between PHAB data elements, the existing WQDE, and WQX.
C.1 The meeting began with a round of introductions by everyone present:
- Faith Fitzpatrick – Fluvial Geomorphologist, USGS Middleton Water Science Center. WQDE workgroup member since 2006. Lead Author of the NAWQA habitat protocol.
- Revital Katznelson – WQDE workgroup member since 2004. Has special interest in the data sharing aspects of environmental monitoring. Recently retired from CA State Water Resources Control Board. Teaches a water quality monitoring design course at UC-Berkeley Extension and provides technical writing services as an independent contractor.
- Jana Stewart - USGS Middleton Water Science Center, GIS (NAWQA, Aquatic GAP) and database management specialist for NAWQA National Team. Currently works on development of a database for biological data.
- Morgan Schneider – database developer, USGS Middleton Data Center. Likely to be lead developer of the USGS biological database.
- Dan Sullivan – USGS Middleton Water Science Center. Exec Sec of the Board since 2004. Chair of the NEMI workgroup since inception; built, maintains, and updates NEMI database and website.
- Nate Booth – Systems Analyst, USGS Middleton Data Center. Involved in WQX effort and USGS biological database.
- Mitch Harris – USGS Illinois Water Science Center. Formerly lead biologist on Upper and Lower Illinois River NAWQA studies now writing the Spatial Framework requirements for the USGS biological database with Jana Stewart. Also works on the BioTDB, a transactional database for NAWQA biological data.
- Pete Ruhl – USGS HQ. Data manager for NAWQA BioTDB, based at HQ in Reston, VA. Member of Ecological National Synthesis team. Pete is leading the development of new db for stream ecology for Water Resources Division (WRD) of USGS. Interested in discussions on:
- Spatial ideas for various protocols
- Conceptual framework that covers all national protocols
- Delivery of data to WQX
- Eric Vowinkel (on the phone) – USGS New Jersey Water Science Center. Co-chair of the Methods Board from USGS. Member of the National Water Quality Monitoring Council, Co-chair (with Chuck Spooner) of the conference coming up in May, 2008 in Atlantic City, NJ.
- Randy Hill (on the phone) – EPA STORET project manager; hired by EPA 5 months ago. Has background in Oracle, project management, & as environmental consultant. The Program Goal is to have WQX schema up by May, 2008
C.2 Setting the stage - Introduction, definitions and terminology
Dan reviewed the history of WQDE. Pete: What is the difference between a data element and a data standard? Dan: Methods Board is a Federal Advisory Committee (FACA), therefore we can only recommend, cannot form policy (requirements) – that is up to someone else to take our data elements and make them data standards (Ex: EPA has the ESAR standards)
(def) Data elements – address the who, what, where, when, why and how data are collected and reported.
(def) Data standards – lists of mandatory elements and the way these elements must be reported.
Revital walked the participants through a set of handouts she has prepared, with examples from her Data Quality Management (DQM) system, to facilitate use of common language in reference to basic database terms. The handouts are appended to this document below the Minutes.
- A Database table is made of Data Fields (columns) and Records (rows); Each Result has many descriptors. Handout #1 shows an example of a basic Result table.
- Tables in a relational database may be linked via tracking entities with unique IDs. Handout #2 shows linkages between groups of Result descriptors.
- It helps to organize all the Data Fields – all the information about the “who, what, where, when, why and how” – by subject matter. Revital had used a hierarchical structure that leads from the general to the more specific like a “Road Map”. Handout #3 shows an example of such structure that lists items by Category, Group, and Subject [the version is included in these Minutes has been shortened]. Note: this ‘skeleton’ of the Road Map is a suggestion for the “Long List” of information bits, and is different from the WQDE list published by the MDCB.
- The fourth level in this hierarchical structure is the Data Field. Handout # 4 shows how each Subject is broken into separate bits of information, each bit is essentially a Data Field in a database table, such as those shown in Handout # 1.
- The same nested structure can easily be expanded to provide a pick-list of allowed values one can put under each of the Data Fields. Handout # 5 shows examples of verbal categories to choose from for selected Data Fields; providing a definition in the same structure is easy.
- In some cases, the contents of a database cell under Field A will determine the contents of the cell in Field B of the same row (examples not shown)
The group then discussed how these ‘levels’ correspond to WQDEs, metadata, and Data Standards
- Data Fields (Revital’s level 4) correspond to data elements and include the descriptors that are often called metadata.
- pick lists (Revital’s level 5) are a type of data standards. They define the domains for the data elements.
- Pete suggestion– at some time in the future it would be good for the methods board to provide a cross walk between data elements and meta data
Important *The Focus of the Methods Board is Level 4.
Dan distributed a hardcopy of the WQDE User’s guide (NWQMC Technical Report #3, April 2006) to all participants and walked them through the Appendices that provide the 3 lists of WQDE (Chemistry & bacterial counts, Toxicity, Population & Community). The group noted that the list of elements is not a list of Data Fields, and the definitions may or may not include pick-lists (domain lists). In other words, they are not ‘database ready’ as published (and were not meant to be).
C.3 Discussion of the challenges
Revital and Faith (with input from others) developed Handout # 8 that shows a list of challenges with examples.
The first challenge brought up to the group was the issue of different table formats used by different agencies. Handout # 7 shows an example. Revital: Most folks would enter monitoring data into the tabular, or horizontal, format (shown at the bottom of Handout #7); it is the ‘intuitive’ way. This a very good format for reporting and viewing the data, but it does not provide adequate space for descriptors and does not preserve the relationship between the Results and their descriptors. Database tables organized in the vertical format (e.g., Handout # 1 and top of Handout # 7) can preserve these relationships, are much more flexible, and have many other advantages.
The group discussed a few other Handout # 8 challenges in depth.
(def) Attribute vs. Characteristic (challenge 10)
Revital: Attribute can contain more than 1 characteristic; for example:
- Attribute: Particle-size distribution
- Characteristics: d50, % fines, etc.;
Pete: attribute means something else in a database. The group concurs. Maybe we should avoid use of the word attribute…
Revital: The STORET catch-all term “characteristic” is good because it includes properties, analytes, conditions…. Randy: if it has a unit associated, it is probably a characteristic.
(def) descriptors vs. results (challenge 9)
For some folks Descriptor = data qualifier (Sometimes we have results that are descriptors of other results). Is it important to separate them?
Activity – is it an action or object? Various definitions, legacy meaning from water chemistry databases
(def) Activity – legacy from STORET. Revital: Activity is something done in the field to generate data or start data-generation process. In STORET, two choices:
a. measurement/observation, or
b. Sample (collection of something to be analyzed later).
STORET uses Activity as a noun. Randy: working to get a better definition. Is Activity the same thing as “Method”? no. Pete: Bottom line – Activity is a thing.
Pete: Sample/measurement/observation (S/M/O) is preferred to “Activity” but everyone understood. Revital: we need to hone the list of activity types to include estimates, counts, scores, etc.
Estimated vs. Measurement: some thought an “E” in remark field next to result is enough. Revital – different agencies use Estimated in different way; as a data qualifier it means that the Result quality is not up to par, as a field estimate it means the operator was eyeballing the situation and came up with a number in the data sheet. It is really important to differentiate between these estimates and the numbers that were actually measured.
ACTION: Randy Hill will send ERD of WQX schema to the group when it is available.
ACTION: Randy will also send WQX data dictionary to the group.
Randy: with the start of the WQX Bio/habitat pilot, EPA will devise some examples.
ACTION: send to the group when assembled. (mid July?)
Pete asked Randy: can NAWQA data be loaded into WQX? Help us understand how much granularity has to be given up to get into WQX.
STORET conference in November, 2007, in Austin, TX will include WQX training and Randy is also working with volunteer monitoring community
After lunch discussion continued unabated; USGS folks perused the WQDE lists at their leisure.
Important * Revital’s skeleton (Handout # 3) is different than existing WQDE’s
Frame of reference (challenge 6)
Discussed graphical representation of streams from different scales.
- How to compare to protocol content
- How are data grouped
- One way of grouping is to break out by scale
(def) Reference point: (ref. location) – permanent geographic reference point (usually lat-long). Not necessarily associated with any particular activity. (WQX = “x-site”)
(RK) needs to be recoverable by someone who hasn’t been there
(FF) need to take scales and dimensions and relate them back to the reference location
(PR) let’s stick to reference location as the preferred terminology
(FF) aside – the reach can fall on either side of the reference point. Downstream means it is outside of the watershed. Not a big deal at sites downstream of large watersheds but may be significant in small (<10 sq mi) streams; may be watershed characteristics that affect WQ at a site that are not accounted for in GIS data.
Scale terms (NAWQA):
- Reach – sub-sample of segment
- Segment – piece of stream with similar characteristics defined by bounds (tribs, land use, etc.)
- Watershed – entire basin upstream of sampling area
(JS) non-NAWQA samplers may use other definitions for reach, segment, etc.
(RK) “fragment” term introduced as a unifying concept: any stretch of stream that can be described by a point of origin, its length, and its direction (upstream, downstream, or mid).
(FF) scale can be addressed – rather than reach or segment look at channel type, valley type, etc.
(PR) “slope” can be measure at multiple scales. (JS) so do you have one slope field with many descriptors?
(PR) list of location types and the possible results that can go with each. Suggested that Faith put scale basis on the rows in her spreadsheet.
(JS) also need to pay attention to map scale (i.e., 1:25K, 1:100K, etc.)
Scales and dimensions – we recognized that dimensions (e.g., point, line, area, volume) are not equivalent to scale (reach, segment, basin), therefore you can have a dimension at more than one scale.
The group discussed handling dataset comprised of Resutls that are related to each other, i.e., sequential Results sets (such as data logger records of ‘continuous monitoring’, o a sequence of points along the Thalweg.
Important: *Series is a sequential set of Results – clustered based on relationships.
Group Discussions continued on Thursday, June 21, 2007, with most participants present.
The major topic was spatial relations between habitat components
Q: Do we want the data elements to hold the relationships?
Q: What level do we want to modify the data elements?
(JS) there is one location, everything else is related to it
(PR) Look at it in terms of taxonomy. NAWQA and other programs such as WQX have a “spatial taxonomy”. Therefore create a taxonomic system
- NAWQA assigns features on a per-visit system
- Need to accommodate “permanent” sampling features
USGS BUG database will accommodate NAWQA, WSA/EMAP, and Great Rivers but not state protocols
* Module 6.0 of existing WQDE - “Sample Collection” in current data elements to be replaced by Spatial Framework?
(PR) only real “new” idea is spatial framework, rest is similar to water chemistry
|