IHACRES Calibration Practical

MATH3134 Environmental Mathematics Honours, 2003

Felix Andrews

Aims

To:

Sensitivity Investigation

This was done on the full 10-year data set of the larger catchment (Molonglo River at Burbong).

Nonlinear Module

The parameter "threshold CMD" was observed to be insensitive, with large changes having little effect on streamflow. When changed, CMD had the same pattern but around a different threshold value. To look at this parameter in a different way, I reversed the dependence between two parameters (a relation introduced to avoid corellation between them). When the parameter "threshold CMD" (a) was reduced, "stress threshold" (d) was increased proportionally such that the conceptual stress threshold (d*a) remained constant, but the conceptual wilting point was reduced. This turned out to be the same as just increasing the proportional stress threshold, further suggesting that a is insensitive. There was a slight effect on flow, but again it was probably due to the change in difference between a and d -- the amount of precipitation or evapotranspiration needed to move between the stress and wilting points.

The parameter "stress threshold" (d) had a sharp effect on streamflow. Roughly doubling it to 1.0 dropped streamflow by ¾, and it only appeared at otherwise extreme flow events: only very briefly did CMD drop below the wilting point. Halving it gave almost 4x more streamflow, including large peaks where originally there were none. CMD showed little change over each year.

The parameter "ET coefficient" (c) had less effect (~1:1 flow sensitivity) but still a considerable one. Reducing it increased flow peaks but the seasonal dependence on CMD remained. In general terms, d has similar effect to a threshold shift, whereas c is more like a scaling factor; of course, they act together to define the evaporation schedule, which is part of the moisture budget.

Linear Module

Increasing increased the mid-range flows at the expense of peaks, or vice-versa if it was reduced.

Etc... this altered the relative contribution and response of the two exponential components. While the unit hydrograph changed, total flow volume was conserved.

Input Data

Scaling temperature is probably not a physically useful perturbation, especially on the celsius scale where it is a contraction/expansion around freezing point. (the operation probably should have been a shift by x degrees). However, this way it is directly equivalent to changing the ET coefficient parameter.

50% more rainfall was very much like 'halving' temperature, causing increased streamflow at all scales. In contrast, halving rainfall caused a much more severe drop (~5:1) in streamflow than '50% more' temperature (~1:1).

Evaluating Model Performance

Playing with Molonglo River at Burbong (410705)

Examining the hydrograph suggests that there are serious problems with the behaviour of the supplied model parameterisation. It is generally getting the timing of peaks right but their magnitude quite dramatically wrong, either under or over-estimated. The Flow Duration Curve (FDC) shows that flows between the 2nd and 20th percentiles are seriously overestimated, explaining the relative bias of -0.4.

The calibration period has a reasonable R2 of 0.67 (as does the validation period) but a smaller bias of -0.1. Looking more closely, the initial conditions may be important: the model starts with CMD=a/2, but there is an observed dry period for the first year of calibration. Since the model is not dry enough to begin with it overestimates the first year; then to get a reasonable bias it must underestimate later events.

The first change was to get rid of some water by increasing the ET coefficient, so that the bias was negligible. I found I could reproduce the same FDC profile with various combinations of c and d, so I took the simplest approach of scaling only c; this gave model 1 (see Table 1).

Looking in detail at the hydrograph showed that the general behaviour was being poorly characterised, with many spurious peaks and missing event tails. I selected several isolated events and tried to get better behaviour, focusing on the response functions more than each actual event. The result was model 2: a highly biased model, but one that hopefully treats the medium to low flows more realistically.

Finally I tried to get the water balance looking OK (according to the hydrograph and bias), then adjusted the contribution of quick and slow flow to approximately match the observed median flow. This is model 3.



Table 1. Candidate model parameter sets for node 410705.


Calibrated

Model 1

(unbiased)

Model 2

(unit response)

Model 3

(balanced)

Threshold CMD (a)

200

200

200

150

Stress Threshold (d)

0.6

0.6

0.75

0.6

ET Coefficient (c)

0.2

0.25

0.2

0.28

Quick time constant ()

0.69

0.69

0.75

0.69

Slow time constant ()

36

36

18

36

Slow flow volume

( vs )

0.12

0.12

0.4

0.25



An example of output from these models is shown in Figure 1, and the associated error in Figure 2.

Figure 1. Example year of modelled streamflow from node 410705.




Figure 2. Error in example year of modelled streamflow from node 410705.




Model Evaluation for Molonglo River at Burbong (410705)

The first step in evaluating streamflow reproduction is to compare simple flow statistics. Some of these are given in Table 2 for the observed data and each candidate model. The runoff coefficient is the proportion of rainfall volume that is discharged; since rainfall is constant across these cases it is also an indicator of mean flow. We can see that models 1 and 3 match well, the original calibration overestimates and model 2 underestimates by a similar factor. Coefficient of Variation (CV) is the standard deviation relative to the mean. In this catchment it is very high; again models 1 and 3 do the best reproduction. The original calibration and model 3 do well predicting the median.



Table 2. Observed and modelled flow statistics for node 410705.


Observed

Calibrated

Model 1

(unbiased)

Model 2

(unit response)

Model 3

(balanced)

Runoff Coefficient

0.13

0.17

0.12

0.08

0.12

Coefficient of Variation

5.33

3.77

4.27

4.05

4.3

Median (cumecs)

0.29

0.32

0.2

0.09

0.28


It is probably not very useful to just compare the median. It is much more informative to look at the reproduction of all percentiles: the Flow Duration Curve (see Figure 3). As noted above the original calibration is pathetic for the medium-high flows; models 1 and 3 do better but still have an excess around 5-10%iles. Model 2 underestimates everything except that range.

Figure 3. Observed and modelled Flow Duration Curves for node 410705.




The next step is to look at explicit performance (fit) statistics. Several of these are given in Table 3. I will describe them briefly. r2 (a.k.a. Nash-Sutcliffe Efficiency) measures fit but is inherently more sensitive to large values (peaks). By log-transforming each measurement before calculating r2, the result gives more uniform weight across flow percentiles (though can only include non-zero flows). The Residual Mass Curve Coefficient (Letcher et. al., 1999) measures the fit of the Residual Mass Curve (the fit of broad temporal variation). These curves are plotted on Figure 4.

Relative Bias is the proportion (and sign) of error in total modelled flow. The remaining statistics depend on cumulative error -- these can be thought of as 'running bias' or 'error mass' curves. They are plotted in Figure 5. Relative Average Running Bias is the relative mean cumulative error: the mean of cumulative error divided by total observed flow. It is proportional to the integral of the error mass curve. This measure is supposed to differentiate between models which have reasonable bias thoughout, and those that may fall into bias but compensate to have low overall bias. For a similar purpose is the Relative Maximum Running Bias, though this is for identifying intolerably mismatched model states (cumulative error). It is defined as the maximum absolute value of cumulative error -- i.e. greatest magnitude of error mass curve -- divided by total observed flow.

Table 3. Performance statistics for candidate models of node 410705.


Calibrated

Model 1

(unbiased)

Model 2

(unit response)

Model 3

(balanced)

r2

0.67

0.68

0.49

0.68

log-transformed r2

0.39

0.33

-2.28

0.5

Residual Mass Curve Coefficient

0.75

0.59

0.57

0.61

Relative Bias

-0.27

0.06

0.42

0.09

Relative Average Running Bias

-0.101

0.078

0.256

0.091

Relative Maximum Running Bias

0.27

0.13

0.43

0.14


Figure 4. Observed and modelled Residual Mass Curves for node 410705.



Figure 5. Error Mass Curves for node 410705.



Several conclusions can be drawn from the statistics in Table 3. Model 2 fits poorly on the efficiency measures; differences between the others only appear under a log transform, where model 3 is the best. The original calibration does the best in reproducing the residual mass curve, although it should be remembered that this is around an erroneous mean. On this measure model 2 is fairly similar to 1 and 3. Looking at Relative Average Running Bias, the original calibration turns out to be not as bad as the final (total) bias suggests, however, it does blows out to twice the magnitude of models 1 and 3 (see Figure 5). Also the difference between models 1 and 3 is seen to be less than from looking at final bias. Model 3 has extreme cumulative error.

The chosen model -- and acceptance criteria used to evaluate it -- would have to depend on the specific application. For example, predicting total annual flow might be important, or matching a particular range of flow percentiles. On most measures, model 3 described here does considerably better than the original calibration.





References

R. A. Letcher et. al. (1999). Review of Techniques to Estimate Catchment Exports. Technical Report, NSW EPA.