What’s wrong with the Australian River Assessment System (AUSRIVAS)?

. The Australian River Assessment System (AUSRIVAS or AusRivAS) is a national biomonitoring scheme that supposedly assesses the ‘biological health’ of rivers. AUSRIVAS outputs observed-over-expected (O/E) indices derived from macroinvertebrate survey data obtained both at a site to be assessed and at designated reference sites. However, AUSRIVAS reference sites lack any consistent or quantified status, and, therefore, AUSRIVAS O/E indices have no particular meaning. Moreover, many studies have found AUSRIVAS O/E to be a weak or inconsistent indicator of exposure to anthropogenic or human-influenced stressors. Poor performance by AUSRIVAS may relate to numerous factors including the following: (1) variable reference-site status, (2) inappropriate model predictors, (3) limitations of O/E indices, (4) inconstant sampling methods, and (5) neglect of non-seasonal temporal variability. The indices Ephemeroptera–Plecoptera–Trichoptera (EPT) and stream invertebrate grade number – average level (SIGNAL) provide alternatives that have often outperformed AUSRIVAS O/E in comparative tests. In addition, bioassessment of Australian rivers might be advanced by the development of diagnostic methods to identify the stressors causing ecological impact rather than merely to infer impact intensity and assign quality ratings to assessment sites.


Introduction
The Australian River Assessment System (AUSRIVAS or AusRivAS) is a national biomonitoring scheme adapted from the British River Prediction and Classification System (RIVPACS: Clarke et al. 2003). AUSRIVAS was developed and tested as part of Australia's former National River Health Program during the 1990s (Davies 2000), and has changed little since that time. It is currently promoted by way of a website and biannual training courses. The AUSRIVAS software can produce various outputs, but the one that is principally used is an observed-over-expected (O/E) index of macroinvertebrate taxonomic richness. In the calculation of this index, E is the sum of probabilities of occurrence in a macroinvertebrate sample of those taxa with a predicted probability above a specified value (commonly 0.5), and O is the number of those taxa that were actually recorded in the sample (Nichols and Dyer 2013). The probabilities of occurrence are derived by a predictive statistical model from macroinvertebrate survey data that were collected at designated reference sites, weighted according to their physical, biophysical and chemical similarities to the assessment site for which an O/E value is to be generated. AUSRIVAS O/E values are used to assign sites to quality bands, variously labelled 'more biologically diverse than reference', 'similar to reference', 'significantly impaired', 'severely impaired' and 'extremely impaired' (Nichols and Dyer 2013).
According to its website (ausrivas.ewater.org.au; accessed 14 December 2020), AUSRIVAS is 'a prediction system used to assess the biological health of Australian rivers', based on computer models that 'predict the aquatic macroinvertebrate fauna expected to occur at a site in the absence of environmental stress, such as pollution or habitat degradation, to which the fauna collected at a site can be compared'. These claims are problematic from a scientific perspective because they cannot be tested objectively with empirical evidence. Ability to assess river health cannot be tested because ecosystem health is a metaphor or value judgment and not a measurable property (Suter 1993;Lancaster 2000). Ability to predict the aquatic macroinvertebrate fauna occurring in the absence of environmental stress cannot be tested because nowhere on Earth can any longer be regarded as untouched by anthropogenic stressors. Even in wilderness areas, the freshwater biota is exposed to pervasive anthropogenic climate change, atmospheric deposition of nutrients and toxicants, and invasion of alien species (e.g. Hageman et al. 2006;Havel et al. 2015;Knouft and Ficklin 2017).
Moreover, AUSRIVAS reference sites lack any consistent or quantified status, being chosen on rather vague and geographically variable criteria such as being 'selected primarily on the basis of riparian zone integrity and absence of major point sources of pollution upstream' (Turak et al. 1999¸p. 286) or being 'usually located in conservation reserves, little grazed pastoral land or forested areas not recently logged' but sometimes 'located in rivers running through farmland or other disturbed areas' (Halse et al. 2007, p. 164). Consequently, degrees of inferred deviation from reference status, as expressed by O/E values and band assignments, have no particular meaning.
Nevertheless, numerous studies provide evidence about the behaviour of AUSRIVAS outputs, and especially its O/E index, in relation to measurable properties such as repeatability, capacity to discriminate among sites with different levels of exposure to human influences, and strength of association with anthropogenic or human-influenced physical and chemical variables. An appraisal of the performance of AUSRIVAS based on these studies may help progress bioassessment of Australian rivers, which has fallen into decline in recent years . Accordingly, the present contribution critically evaluates the performance of AUSRIVAS by reviewing the findings of previous studies. Possible reasons for performance inadequacies are then explored, and, finally, some suggestions are made for more effective bioassessment of Australian rivers. Although the evaluation is limited to AUSRIVAS, issues raised may be relevant to similar systems used elsewhere in the world.

Materials and methods
Studies providing empirical evidence of the behaviour and performance of AUSRIVAS in Australian states and territories, mostly based on its O/E index, were summarised in terms of the geographic location of the study, the criterion or criteria by which the performance of AUSRIVAS could be evaluated, and the overall findings (Table 1). Occasional applications of AUSRIVAS outside of Australia were excluded. The performance of AUSRIVAS was then rated as 'good', 'fair' or 'poor' on the basis of the evidence presented and the authors' evaluations. Thus, if AUSRIVAS performed well or better than did alternatives tested, a 'good' rating was assigned, whereas if AUSRIVAS performed weakly or worse than did alternatives, a 'poor' rating was allocated. If AUSRIVAS performed adequately in some circumstances or respects but not in others, a 'fair' rating was applied. Publications on AUSRIVAS, and articles describing the application of RIVPACS-type methods to biota other than macroinvertebrates and outside of Australia, were also consulted for information on factors that might limit the performance of AUSRIVAS.

Results
Evaluation of the behaviour and performance of AUSRIVAS encompassed 25 studies including all Australian states and territories (Table 1). On the basis of information in these studies, 'good' ratings were assigned in seven cases (28%), 'fair' ratings in six cases (24%), and 'poor' ratings in 12 cases (48%). Many of the studies demonstrated a failure to discriminate between sites with lower and those with higher levels of exposure to anthropogenic stress, or a lack of statistically significant association with anthropogenic or human-influenced physical and chemical stressors that are well known to have an impact on aquatic macroinvertebrates. When AUSRIVAS showed statistically significant discrimination or association, it often did so more weakly than did alternatives (Table 1).

Discussion
Instances when the performance of AUSRIVAS was rated as 'good' mostly related to exposure to severe stress, such as gross pollution from acid mine drainage (Sloane and Norris 2003;Linke et al. 2005) or the presence of a major dam immediately upstream (Nichols et al. 2006a). This observation concurs with the conclusion of some authors that impact detection by AUS-RIVAS O/E is reliable only for severe stress (Smith et al. 1999;Edward et al. 2000). Even studies for which the performance of AUSRIVAS was rated as 'good' revealed some weaknesses, such as failure of some models to meet set criteria (Linke et al. 2005) or infrequent detection of a mild impact (Bailey et al. 2014;Nichols et al. 2014).
The frequent insensitivity of AUSRIVAS is of concern from a management perspective. For example, a stream in Western Australia with a nitrate concentration of 5.8 mg L -1 , due to a discharge of treated sewage from a small town, was evaluated as AUSRIVAS Band A, equivalent to reference condition (Halse et al. 2007). Edward et al. (2000) expressed disquiet about the inability of AUSRIVAS to detect impacts on macroinvertebrate assemblages related to anthropogenic salinisation caused by land clearing and rising water tables, which is a major environmental problem in south-western Western Australia.
At least five factors may contribute to weak performance by AUSRIVAS, including (1) variable reference-site status, (2) inappropriate model predictors, (3) limitations of O/E indices, (4) inconstant sampling methods, and (5) neglect of nonseasonal temporal variability. Below, each is discussed, in turn, before alternatives to AUSRIVAS are briefly explored and possible future directions for more effective bioassessment of Australian rivers are considered.

Variable reference-site status
Faunal predictions made by AUSRIVAS are derived from data collected at reference sites that are supposedly 'minimally disturbed' (Nichols and Dyer 2013). Although quantification of disturbance at these sites does not seem to be available, it is clear from descriptive accounts that they are exposed to spatially variable and often substantial human influence. For example, some reference sites have been located on regulated rivers and within farmland (Turak et al. 1999;Halse et al. 2007). Consequently, anthropogenic faunal alteration at AUSRIVAS reference sites has been suggested as a possible reason for weak performance by some authors (Chessman and Royal 2004;Chessman et al. 2006). A fundamental conundrum of the AUSRIVAS approach is that if there were an effective way to determine the degree of anthropogenic faunal alteration at reference sites, the same method could presumably be applied to assessment sites as well, in which case, comparison with reference sites would not be needed. In reality, the degree of anthropogenic faunal alteration at reference sites is unknowable, considering the plethora of anthropogenic and humaninfluenced stressors, cryptic biotic legacies of past human disturbance that may linger for decades or even centuries (e.g. Ogden 2000;Maloney et al. 2008;Wohl 2019), and transmission of stressors and biota between potential reference sites and other parts of the landscape (e.g. Pringle 1997;Lake et al. 2010;Spear et al. 2018).  Bailey et al. (2014) and Nichols et al. (2014) Murrumbidgee River system, ACT and NSW Ability to detect simulated impairment of a macroinvertebrate assemblage AUSRIVAS had fewer false positives than alternatives, but more false negatives than some alternatives Good Bunn and Davies (2000) Various streams, south-western WA Association with gross primary production, community respiration, total nitrogen concentration and turbidity AUSRIVAS had no statistically significant associations, unlike alternatives Poor Chessman (1999) Murrumbidgee River system, ACT and NSW Discrimination between relatively undisturbed and disturbed sites and associations with water-quality variables AUSRIVAS discriminated less and had similar or weaker associations than alternatives Poor Chessman and Royal (2004)   Park, NSW Discrimination between reference sites and sites exposed to ski-resort activities AUSRIVAS and alternatives all discriminated Good Sloane and Norris (2003) Molonglo and Queanbeyan River systems, NSW Association with a trace-metal gradient AUSRIVAS had a stronger association than alternatives Good Smith et al. (1999) Various streams, WA Inter-annual consistency at reference sites and assessment of sites judged to be ecologically disturbed

Inappropriate model predictors
The AUSRIVAS models collectively use a great variety of environmental variables for matching an assessment site to particular groups of reference sites, so as to generate occurrence probabilities of macroinvertebrate taxa at the assessment site (Simpson and Norris 2000). Many of these environmental predictors are subject to anthropogenic alteration, for example, alkalinity, discharge, stream depth and width, substratum composition and vegetation. The values of such predictors input to the AUSRIVAS models are the measured values, not the values that would occur in the absence of human influence, and this practice is likely to cause model predictions to deviate from natural expectations (Clarke et al. 1996;Hargett et al. 2007;Chessman 2014). For example, faunal predictions for an assessment site with unnaturally high alkalinity as a result of anthropogenic salinisation may be derived from reference sites with naturally high alkalinity, whereas reference sites with naturally low alkalinity would have been the appropriate comparison (Metzeling et al. 2006;Schäfer et al. 2011). The estimation of natural values of anthropogenically altered predictors (e.g. Hawkins 2012, 2013) might help alleviate this problem. By contrast, if environmental variables that are subject to anthropogenic alteration are simply excluded, prediction may be less accurate (Clarke et al. 1996;Theroux et al. 2020). For example, Chessman et al. (2010) noted that in western New South Wales, the applicable AUSRIVAS model used only slope and geographic position (latitude, longitude and elevation) to match assessment sites with reference sites, and thus did not use hydrological variables. It is, therefore, uncertain how well this model matches assessment and reference sites in terms of the natural variation in hydrological regimes that can have a major bearing on the composition of macroinvertebrate assemblages in Australian dryland rivers (Sheldon and Thoms 2006). Similarly, in south-western Western Australia, a major determinant of macroinvertebrate assemblage composition, i.e. salinity, could not be used for faunal prediction because both naturally saline and anthropogenically salinised sites had similar salinities (Halse et al. 2007).
Moreover, predictor variables that are not subject to human alteration may actually be surrogates for variables that are anthropogenically modified. For example, latitude and longitude obviously do not have a direct causal influence on macroinvertebrate assemblages, and if they have predictive value it must be because they correlate with other, unknown variables that do have a causal influence, that is, variables that might be subject to anthropogenic alteration.

Limitations of O/E indices
The O/E index used by AUSRIVAS and similar bioassessment methods such as RIVPACS combines two variables, namely, the predicted probability that a taxon will occur in a sample under reference conditions (a continuous variable ranging from 0 to 1) and the detection or non-detection of a taxon in a sample (a binary variable with values of 0 or 1). Neither variable takes account of taxon abundance, except to the extent that abundance affects likelihood of occurrence or detection. Aguiar et al. (2011) suggested that the non-incorporation of abundance information might explain the poor performance of a RIVPACS-type application for Portuguese stream macrophytes, relative to alternatives that were tested. However, Kanninen et al. (2013) found that an alternative to O/E that incorporated abundance did not have superior performance for lacustrine macrophytes in Finland.
The O/E index also has a structural weakness in that the detection of taxa with a low modelled probability of occurrence can counter the non-detection of taxa with a high probability of occurrence, leading to an under-representation of the difference between the predicted and observed assemblages (Van Sickle 2008). Furthermore, the choice of a threshold of predicted probability for including taxa in the calculation of the O/E index affects index values, even for reference sites (Yuan 2006). For assessment sites, the choice of a high threshold excludes taxa that are infrequent at reference sites but may, nevertheless, be adversely affected by anthropogenic stress (Clarke and Murphy 2006;Mazor et al. 2016), or alternatively may benefit from certain types of anthropogenic stress (Edward et al. 2000). However, the presence or absence of such taxa may be quite informative. For example, in applying RIVPACS-type methods to diatoms, Chessman et al. (1999) found that sites with a greater exposure to anthropogenic influence were characterised more by the presence of taxa with predicted probabilities of ,0.5 than by the absence of taxa with probabilities of .0.5. This problem may not be alleviated by the use of a low threshold, which has sometimes been reported to improve index performance (Clarke and Murphy 2006;Vander Laan and Hawkins 2014), but much more often found to reduce performance (e.g. Van Sickle et al. 2007;Aroviita et al. 2009;Meador and Carlisle 2009). Instead, the solution may be to use a different index Van Sickle 2008;Kanninen et al. 2013).
Finally, because the value of the AUSRIVAS O/E index depends on the number of expected taxa that are recorded in a sample, the index value is highly sensitive to the chance detection or non-detection of individual taxa that are present at an assessment site (Smith et al. 1999). This issue is particularly acute for naturally harsh environments with low taxon richness, such as dryland or nutrient-deficient streams or the profundal zone of lakes Halse et al. 2007;Jyväsjärvi et al. 2011), because the intrinsic variability of O/E is higher when the number of expected taxa is low (Hämäläinen et al. 2018).

Inconstant sampling methods
Inherent variability in taxon detection is likely to be compounded by weakly standardised sampling or subsampling methods. Protocols for AUSRIVAS invertebrate sampling and subsampling vary substantially among the separate manuals for each Australian state and territory (available from ausrivas.ewater.org.au/ index.php/manuals-a-datasheets). However, all protocols provide quite limited standardisation. For example, all manuals specify that samples should be collected over a 10-m transect, but in most cases this distance is permitted to be either continuous or broken up into multiple, physically separated segments at the operator's discretion. Moreover, none of the manuals specifies any time limit for sample collection or describes a procedure to measure the distance over which sampling actually occurs.
Procedures for subsampling macroinvertebrates from the bulk sample of macroinvertebrates and associated plants, algae, sediment and debris are quite varied among jurisdictions, and variously impose requirements for the number of animals to be retrieved, the time to be spent, or both. These requirements can be complex; for example, in the New South Wales manual , operators are instructed to use a sequence of different strategies to pick out specimens for successive periods of 5, 20, 5 and 10 min, variously focussing on collecting common taxa, seeking new taxa, or accumulating more individuals. After 40 min, picking may or may not continue for up to 20 additional minutes, depending on whether the operator believes that additional taxa are still being found. It is questionable whether such a complex procedure, applied by various operators for diverse samples, will retrieve a consistent proportion of the taxa present in the sampling area.

Neglect of non-seasonal temporal variability
AUSRIVAS attempts to deal with natural temporal variability by creating separate predictive models for its two sampling seasons, spring and autumn. However, non-seasonal (e.g. inter-annual) variation in macroinvertebrate faunas can also be high, even at reference sites (Bailey et al. 1998;Feio et al. 2006;Mazor et al. 2009). Over much of Australia, especially the arid and semi-arid zones, much of the variation in hydrological regimes and, consequently, biota, is naturally aseasonal or supraseasonal (Bunn and Davies 2000;Sheldon 2005). In such regions, it may be advisable to partition reference data according to antecedent rainfall or the phase of the flood-drought cycle, and not just calendar season (Davis et al. 2006;Chessman et al. 2010).

Alternatives to AUSRIVAS
Alternatives to the AUSRIVAS O/E index for macroinvertebratebased bioassessment of Australian rivers include the Ephemeroptera-Plecoptera-Trichoptera (EPT) index (Lenat and Penrose 1996) and family-level and genus-level versions of the stream invertebrate grade number -average level (SIGNAL) index (Chessman 2003;Chessman et al. 2007). Both of these indices have been widely tested, and found to out-perform AUSRIVAS O/E in several investigations (e.g. Chessman et al. 2006;Walsh 2006;Cox et al. 2019). Other options, such as the environmental filters method of Chessman and Royal (2004), the salinity index of Horrigan et al. (2005), and the invertebrate species index of Haase and Nolte (2008), have not been greatly tested, and so their general utility is uncertain. Surprisingly, multimetric indices of biotic integrity, widely used around the world for bioassessment based on macroinvertebrates, fish and other organism groups (Ruaro et al. 2020), have not been developed for Australian freshwater invertebrates, perhaps because of early criticism by proponents of AUSRIVAS (Norris and Hawkins 2000) and ecological risk assessment (Suter 1993(Suter , 2001. Ephemeroptera-Plecoptera-Trichoptera indices are based on the number or proportion of taxa or individuals belonging to the generally pollution-sensitive insect orders Ephemeroptera, Plecoptera and Trichoptera (Kitchin 2005). SIGNAL indices are an abundance-weighted or unweighted average of numerical grades assigned to individual taxa to represent their tolerance of general environmental stress (Chessman 2003). SIGNAL is conceptually different from AUSRIVAS O/E in that SIGNAL is proposed as an indicator of measurable environmental stressors such as chemical enrichment or contamination (Chessman 2003), and not of unmeasurable 'river health'. Because SIGNAL is an average, it is little affected by variation in sampling and subsampling methods (Growns et al. 1997;Metzeling et al. 2003), in contrast to AUSRIVAS O/E (Nichols and Norris 2006;Nichols et al. 2006b).
Unlike AUSRIVAS O/E, EPT and SIGNAL do not contain built-in reference data. Users of EPT and SIGNAL are, therefore, at liberty to generate reference values in a transparent way that is appropriate to their objectives. For example, in New Zealand, Collier and Hamer (2013) and Clapcott et al. (2017) used regression models to generate reference values of the EPT index and a macroinvertebrate community index similar to SIGNAL by setting values of predictors that represented anthropogenic stressors to zero.

Future directions
A notable trend in freshwater bioassessment globally is the development of diagnostic methods to identify the stressors causing ecological responses, rather than to merely estimate the intensity of anthropogenic impact and, thereby, assign quality ratings to study sites (e.g. Lemm et al. 2019;Feld et al. 2020). A diagnostic approach is needed to support effective management (Negus et al. 2020) in a world where a natural reference state is ever more hypothetical and unattainable, and the distinction between natural and human influences on aquatic biota is increasingly blurred (Bishop et al. 2009;Dufour and Piégay 2009;Bouleau and Pont 2015). An early Australian test of the diagnostic approach with stream macroinvertebrates had mixed success (Chessman and McEvoy 1998), and more recent Australian efforts based on the species at risk (SPEAR) method (Schäfer et al. 2011;Kath et al. 2018;Bray et al. 2021) are yet to demonstrate stressor specificity. Nevertheless, advances in other parts of the world suggest that a diagnostic approach, whether based on macroinvertebrates or on other biota, may be a vehicle to progress bioassessment of Australian rivers, particularly at a time when technical advances such as identification by DNA analysis promise greatly reduced costs (Dafforn et al. 2016;Carew et al. 2017).

Conflicts of interest
The author is the originator of the SIGNAL index and had minor involvement in the early development of AUSRIVAS.

Declaration of funding
The present research did not receive any funding.