Bulletin 17B (B17B) of the Interagency Advisory Committee on Water Data (IACWD, 1982) codifies the standard methodology for conducting flood-frequency studies in the United States. B17B specifies that annual peak-flow data are to be fit to a log-Pearson Type III distribution. Specific methods are also prescribed for improving skew estimates using regional skew information, tests for high and low outliers, adjustments for low outliers and zero flows, and procedures for incorporating historical flood information.
The authors of B17B identified various needs for methodological improvement and recommended additional study. In response to these needs, the Advisory Committee on Water Information (ACWI, successor to IACWD), Subcommittee on Hydrology (SOH), Hydrologic Frequency Analysis Work Group (HFAWG) has recommended modest changes to B17B. These changes include adoption of a generalized method-of-moments estimator donated the Expected Moments Algorithm (EMA) (Cohn and others, 1997) and a generalized version of the Grubbs-Beck test for low outliers (Cohn and others, 2013). The USGS has implemented these changes in the PeakFQ program.
A set of frequently asked questions and answers (FAQ) have been prepared to provide additional information relative to the implementation of the Expected Moments Algorithm (EMA) within the PeakFQ program. These supplement the guidelines given in Bulletin 17B and the Bulletin 17B FAQ [B17B FAQ link]. It is anticipated that this FAQ will be modified or extended in the future as better information becomes available.
Any comments on these frequently asked questions and answers or any new questions and/or answers should be provided by email to Will Thomas, a member of the Hydrologic Frequency Analysis Work Group, at email@example.com for review by the Work Group.
DETERMINING FLOOD FREQUENCY
USING EMA in PEAKFQ
FREQUENTLY ASKED QUESTIONS - April 2014
Representing peak flow data
Low Outliers (LO) or Potentially Influential Low Flows (PILFs)
Gage base discharge
PeakFQ program and files
Where can I find more information?
Representing peak flow data
Question: How do B17B and EMA differ in their representation
of peak flow data?
B17B recognizes two categories of data, systematic
peaks (annual peaks observed in the course of the systematic
streamgaging at the station) and historic peaks (records of
floods that occurred outside the period of regular streamgaging).
EMA employs a more general description of the historical period
(the time period which includes both systematic and historic
peaks). For every year Y during the historical period, it is
assumed that there was a peak discharge QY,
regardless of whether this discharge was recorded. In the
framework of EMA, the hydrologist's knowledge of the peak flow
QY is described by the flow interval
(QY,lower, QY,upper). When running EMA, a
flow interval must be specified for each year in the historical
record, including any gaps for which no discharge is recorded,
as well as for censored and interval peaks. Censored peaks
are those peaks which only have a record that the flow was
either greater than some value or less than some value.
Interval peaks are those peaks which only have a record that
the flow was greater than some value and less than another value.
Question: When using the EMA analysis option, how are
appropriate flow intervals set for a year when I know
the recorded peak? What about for a year when I do not
have any information about the peak?
For most recorded peaks, the flow interval
for year Y can be described as
(QY,lower = QY,
QY,upper = QY), indicating
that the peak is known with confidence to be equal to
QY . For gaps in the record, if no information
is available, an interval of (QY,lower = 0,
QY,upper = inf) describes the lack of knowledge
and indicates that peak in year Y is between 0 and ∞.
Question: How do the B17B and EMA procedures differ in their
representation of the sampling properties of each peak flow?
For both B17B and EMA procedures, the user needs
to understand and convey to PeakFQ the sampling properties
of each recorded peak flow. B17B differentiates between
sampling properties of observations taken when the entire
range of flows would have been recorded (i.e.
systematic data) and peak flows that have been recorded
because their magnitudes dictated the creation of a
permanent record (Stedinger & Cohn, 1986). The authors
of B17B recognized that these two types of data need to be
treated differently and thus employs an historic adjustment.
EMA distinguishes among sampling properties by employing
"perception thresholds" denoted (TY,lower,
TY,upper), which reflect the range of flows
that would have been measured/recorded had they occurred.
Perception thresholds describe the range of measurable
potential discharges and are independent of the actual
peak discharges that have occurred. The lower bound,
TY,lower, represents the smallest peak flow
that would result in a recorded flow, while the upper
bound, TY,upper, represents the largest peak
flow that would result in a measured flow.
Question: What are some basics of setting perception
For periods of continuous, full-range peak flow
record, the perception threshold is represented by
(TY,lower = 0, TY,upper = inf),
where TY,lower = 0 is the gage-base discharge.
However, if the gage-base discharge is not zero, such as
in the case of a crest-stage gage, TY,lower, would be set
equal to the smallest flow that would result in a recorded
flow. As a perception threshold is required for every
year in the historical period, TY,lower can
be adjusted to accommodate a changing gage-base. For
most peaks at most gages, TY,upper, is assumed
to be infinite, as bigger floods that might exceed the
measurement capability of the streamgage are determined
through study of highwater marks and other physical
evidence of the flood. However, if there is a maximum
value above which flows cannot be determined,
TY,upper would be set equal to the largest
flow that would result in a recorded flow.
Question: My data set has a historic peak in 1930 of
25,000 cfs, but the period of regular systematic data
does not begin until 1950. What do the PeakFQ default
flow intervals and perception thresholds from 1930 to
As stated in Bulletin 17B, a historic peak is
a peak which occurs either before, after, or during a
break in the period of systematic data collection and
is a maximum in an extended period of time. Thus,
since the peak from 1930 is classified as a historic
peak, we know that it is the largest peak during the
time period before the systematic record begins in 1950.
We also know that if a peak equal to or larger than
25,000 cfs had occurred from 1931-1949, it also would
have been recorded as a historic peak. Thus, we can
say that we know for every year from 1931-1949 the flow
interval is (0,25000) which indicates that the annual
peaks in those years are between 0 cfs and 25,000 cfs.
The default perception thresholds are based on the
knowledge that we would have been able to measure or
have a recorded record of any flood event that exceeded
25,000 cfs during the period from 1930-1949. Thus, the
default perception threshold for the period from
1930-1949 is as (25000, inf).
Low Outliers (LO) or Potentially
Influential Low Flows (PILFs)
Question: The Multiple Grubbs Beck test identifies
Potentially Influential Low Flows (PILFs) while the
older Grubbs Beck Test uses the term Low Outliers:
Why change terminology?
"Low outliers" and "potentially influential
low flows" (PILFs) can be synonymous in some cases,
but, for clarity of understanding, it is important
to distinguish between the two. "Low outliers"
typically refers to one or possibly two values in a
believed-to-be-homogeneous dataset that do not conform
to the trend of the other observations. PILFs, in
contrast, may constitute half or more of the observations
and are believed to arise from physical processes that
are not relevant to the processes associated with large
floods. Consequently the actual magnitudes of PILFs,
because they reflect physical processes that are not
relevant to large floods, reveal little about the upper
right-hand tail of the frequency distribution and thus
should not have an influential role when estimating the
risk of large floods. The term "low outlier" has been
replaced with the term "PILF" in order to more
accurately describe the situation.
Question: How is the Multiple Grubbs Beck (MGB) test for
identifying Potential Influential Low Flows (PILFS)
different from the Grubbs Beck (GB) test in B17B?
The B17B GB test provides an objective and
defensible recommendation as to which values should
be treated as outliers. However the B17B GB test is
based only on the distribution of the single smallest
observation in a sample. As a result, even though
multiple outliers in flood data are common, the GB test
rarely identifies more than a single low outlier.
MGB is a generalization of GB that provides objective
criteria for the identification of multiple potentially
influential low flows (or PILFS). Identifying PILFS
and recording them as censored peaks can greatly improve
estimator robustness with little or no loss of efficiency.
The MGB procedure is an iterative scheme that works as
follows: (Note that in the discussion below, p(i;n)
is the probability that the i th largest
observation in a normal sample of size n might
have appeared to be smaller than the value observed.)
- The LP3 distribution is fit to all the data.
- The test starts at the median and sweeps outward
towards the smallest observation. An observation
is tested and identified as a PILF if
p(i;n) is ≤ 0.5%. If a PILF is detected,
all smaller observations are declared to be PILFS
- Then, like the B17B GB procedure, the test sweeps
sequentially inward starting at the smallest
observation and moving towards median. An
observation is tested and identified as a PILF if
p(i;n) is ≤ 10%.
- The number of PILFS is equal to the maximum
detected by Steps 2 or 3.
Question: Once the MGB test identifies PILFS, how are
they treated in an EMA analysis?
The value of the smallest observation in the
data set determined to NOT be a PILF (Qs) is used
as the censoring threshold in the EMA analysis. All
peaks smaller than this value will be treated as
censored observations with flow intervals equal to
(0, Qs) and perception thresholds equal to
Gage Base Discharge
Question: When I select an EMA analysis, why can I no
longer set a gage base discharge?
When using PeakFQ to perform a B17B analysis
a single gage base discharge could be set for the entire
length of a record. When performing an EMA analysis the
user has the option of setting multiple gage base
discharges. This is not done using the gage base discharge
field. Instead the use of EMA perception thresholds takes
the place of setting a gage-base discharge. The lower
bound of a perception threshold serves the same function
as a gage-base discharge. Since a user can apply multiple
perception thresholds to a record, perception thresholds
allow for the flexibility of a changing gage-base during
the historical record.
Question: How does EMA deal with high outliers
EMA uses all available data in the historical
time period; this includes all historic and systematic
data. Thus, because EMA uses all of the data available
in the frequency analysis there is no reason to recognize
"high outliers" or treat unusually high values any
differently than one would treat other values. It is
also important to note that, like the Bulletin 17B
methodology, under no circumstances would or should a
high outlier be removed from a dataset.
PeakFQ Program and Files
Question: What is the function of the new Input/View
tab in PeakFQ v7?
The Input/View tab provides a representation
of the data in both tabular and graphic form. The
Input/View tab is populated with data read from the
input file, as well as the Station Specifications tab.
- Data Table: The Data
Table (lower left) provides the annual peak
flow data in tabular form. The table displays
the water year of the peak, the peak discharge,
remark codes qualifying the peak, lower and upper
bounds for the flow interval, and comments relating
to the peak. When running an EMA analysis, the
user must specify a flow interval for each peak
in the historical record. For almost all peaks,
the default flow interval is set as QY =
QY,lower = QY,upper. The user can
specify an alternate flow interval for a peak
QY by changing QY,lower and/or
QY,upper and entering a comment justifying
- Perception Thresholds Table:
The Perception ThresholdsI table (upper left).
A threshold must be specified for every year in the
historical period (from the Beginning Year to
the Ending Year specified in the Station
Specifications tab). A default threshold of
(0, inf) is applied to the entire period of record.
Default thresholds are also provided for historic
peaks/periods in the record, as well as censored
peaks in the records. The user can modify the
default thresholds as necessary and additional
thresholds can be added as needed. For each modified
or additional threshold, a comment is required for
the threshold to be applied. If a year is specified
in more than one threshold, only the last threshold
entered for any given year will be applied. For
gaps in the historical record, a red bar will appear
on the Input Peaks Graph on the
Input/View tab. The use of a threshold range
of 0 to infinity is unacceptable for periods of
missing record as it implies that the peaks were not
recorded because they were outside the range of 0 to
infinity, which is not possible. Thus, PeakFQ will
not run until an appropriate threshold is entered in
the Perception Thresholds, table. If there
is no knowledge pertaining to the missing years of
record, a threshold range of infinity to infinity can
be used, effectively ignoring the missing period.
However, if information is available for the missing
years of record, then the user can enter a more
appropriate lower threshold value (greater than 0
and less than infinity) or another more appropriate
- Input Peaks Graph: The
Input Peaks graph displays the peaks (from
the Data table) and perception thresholds
(from the Perception Threshold table), in a
discharge versus water year graph. The peaks are
represented by points, while the perception thresholds
are represented by colored bars. Censored flow values
and interval data are also depicted as a combination
of points and lines on the graph. Gaps in the
historical record are represented by a red bar until
an appropriate perception threshold is entered.
Question: Can you provide a brief explanation of the output
tables in the .PRT file corresponding to an EMA analysis?
The print file (.prt) contains a number of output tables.
- Input Data Summary: This
first table in the file summarizes the input data
including the number of peaks in record, peaks not
used in analysis, beginning year, ending year,
type of analysis, low-outlier test method, and
perception thresholds and interval discharge data.
- Notice: This section includes
any error or warning messages, as well as a summary of
any low outliers including the low outlier threshold,
the number of low outliers, and the peak discharges
classified as low outliers. It is very important
to carefully review the error, warning, and notice
messages in this section to confirm that the
analysis was run successfully.
- Kendall's Tau Parameters:
This table contains the results from the Kendall's
Tau test which checks for trends in the systematic
- Annual Frequency Curve Parameters- Log
-Pearson Type III: This table contains
the estimated moments for the analysis. The first
line of the table, EMA W/O REG. INFO, contains
the moments for an EMA analysis using the station skew.
The second line of the table, EMA W/REG. INFO,
contains the moments for an EMA analysis using the
skew option specified by the user. Note that the
moments from EMA W/O REG. INFO and
EMA W/REG. INFO will be equal if the user
choses the STATION skew option. The third
line of the table, EMA ESTIMATE OF MSE OF SKEW
W/O REG. INFO (AT-SITE), contains the MSE of the
station (at-site) skew, regardless of the Skew
Option selected by the user.
- Annual Frequency Curve-Discharges at
Selected Exceedance Probabilities:
This table contains the estimated flows for selected
exceedance probabilities, as well as the variance
and confidence intervals of those estimates. Again,
note that the estimated discharges from EMA W/O REG.
INFO and EMA W/REG. INFO will be equal if the user
choses the STATION skew option.
- Input Data Listings: This
table simply provides a summary of the recorded
peak flow input data including the water year, peak
value, discharge qualification codes, interval flow
values and remarks.
- Empirical Frequency Curves:
This table contains the plotting positions and
estimated discharge for each recorded peak discharge
in the record.
- EMA Representation of Data:
This table depicts the observed input data and EMA's
interpretation of the data (in interval ranges), as
well as the perception threshold for each year in
the historical record. This table allows the user
to easily understand EMA's representation of the peak
flow data for the entire historical period, including
gaps in the record, censored observations, interval
flows and low outliers.
Where can I find more information?
Question: Where can I find more information regarding
the new EMA procedures and MGB test, as well as data
formats and peak flow records for PeakFQ?
For help with PeakFQ data formats and peak flow
- User's Manual for Program PeakFQ, Annual
Flood-Frequency Analysis Using Bulletin 17B
- USGS National Water Information System: Web
- For more information on the Expected Moments
Algorithm (EMA) see:
- Cohn, T., W.M. Lane, W.G. Baier, 1997, An algorithm
for computing moments-based flood quantile estimates
when historical flood information is available, Water
Resources Research 33(9), 2089-2096.
- Cohn,T., W.M. Lane, J.R. Stedinger, 2001,
Confidence Intervals for EMA Flood Quantile
Estimates, Water Resources Research, 37(6), 1695-1706.
- Stedinger, J.R. and Cohn, T.A., 1986, Flood frequency
Analysis with historical and paleoflood information,
Water Resources Research 22(5), 785-793.
- For more information on the Multiple Grubbs-Beck
Cohn, T. A., J. F. England, C. E. Berenbrock,
R. R. Mason, J. R. Stedinger, and J. R. Lamontagne,
2013, A generalized Grubbs-Beck test statistic
for detecting multiple potentially influential
low outliers in flood series, Water Resources
Research, 49, 5047-5058, doi:10.1002/wrcr.20392.
- Bulletin 17B FAQ http://acwi.gov/hydrology/Frequency/B17bFAQ.html