Details of the study area, sampling procedures, and stream measurements have been presented elsewhere [16] and are summarized briefly here.
Study area
The study area included streams in western South Carolina, USA (33.25°—35.20°N, 80.68°—83.30°W). Rivers in South Carolina drain southeastward from the Appalachian Mountains to the Atlantic Ocean. The state grades into four ecoregions [17]: Blue Ridge Mountains, Piedmont, Sandhills, and Coastal Plain. To define more meaningful regional scales with regard to stream habitat, boundaries between ecoregions of upper South Carolina delineated by Myers [17] were modified as follows [16] : 1) Mountains: rugged to hilly terrain 240–1100 m above sea level (asl), with fine-loam, sandy-loam, and clay soils, 2) Piedmont: gentle rolling topography 70–400 m asl, with clay-textured soils, and 3) Sandhills: rolling to hilly landscape 70–200 m asl, with dry, sandy soils.
Sampling regime
Study sites varied from rocky-bottomed mountain streams to sandy-bottomed blackwater streams. No standardized collecting procedure is relevant to all conditions, and timed collections have little meaning among sites that range from small torrents to meandering rivers. Accordingly, each site (i.e., each local assemblage) was sampled by walking transects from bank to bank while hand collecting larvae and pupae from all available substrates, with the intent of collecting a minimum of 30 specimens per site. We assumed that species in the sample from each site represent local occurrences [18, 19]. Faunal lists of black flies that result from hand collections are the same as those produced using more quantitative and repeatable sampling units, such as artificial and natural substrates [20, 21]. Only sites in which all specimens could be identified to species were used. This constraint produced data from 57 sites sampled from May to July 1992 (‘summer’ collections) and from 53 sites sampled from February to April 1993 (‘spring’ collections).
For each collection, we measured stream conductivity, depth, dissolved oxygen, pH, temperature, velocity, and width. We visually ranked riparian vegetation (open, brush, forest), dominant streambed-particle size (mud, sand, small stones, rubble, boulders, bedrock), and canopy cover (none, partial, complete). Discharge was calculated from stream depth, width, and velocity. Variables selected for measurement have been documented as useful predictors of insect distributions and diversity among stream reaches [18, 22–24].
Species identification
Larvae and pupae were fixed in three changes of acetic ethanol (1:3) and identified morphologically. Mid- to late instar larvae of taxa with isomorphic species were Feulgen-stained to examine their silk-gland polytene chromosomes [11]. To make cytospecific identifications, chromosome-banding patterns were compared with standard maps in the literature [25] or on file in the laboratory of PHA.
Data analysis
All tests were considered significant at P < 0.05. Presence or absence of each species in each collection was expressed as binary data. Previous studies [26–28] have shown that binary data from single-point collections are robust for measuring faunal differences among streams. The intent of a multivariate analysis of community data is to separate structural patterns from noise [29]. Taxa most likely to be missed in a binary enumeration of a stream habitat are rare species. However, rare species often contribute to noisy data, and their inclusion can obscure the detection of informative correlative patterns [29]. We found no significant correlations between number of larvae collected and species richness for the spring (r = 0.094, p = 0.488) or summer (r = 0.211, p = 0.082). Species accumulation curves for both seasonal collections (Figure 1) suggested that our sampling procedure produced a reasonable representation of the regional species pool. Taken in total, we concluded that no excessive bias existed in our binary data.
Analyses were run separately for spring and summer collections. For each of the spring and summer collections, analyses between the species assemblages across sites and stream variables across sites were conducted using the BV-STEP routine [30], creating two matrices. The species (community) matrix was constructed using similarity among stream sites based on species composition. The ecological (abiotic) matrix was formed using similarity among sites based on stream variables. Thus, similarity among sites was measured using species data and abiotic data, each in a separate matrix. The species matrix was correlated with the ecological matrix, with the best subset of explanatory stream variables selected from the ecological matrix. Selection of the best explanatory variables was based on the strength of the correlation coefficient between the species matrix and every possible combination of stream variables from the ecological matrix [30]. Similarities for the species matrix were constructed using Sorensen distance, which performs well with binary data [31]. For the ecological matrix, each stream variable was log-transformed, if necessary, to stabilize the variance, and all stream variables were standardized (normalized) to a mean of zero and a variance of 1. Similarities in stream conditions between sites were calculated using simple Euclidean distances [29]. Once the best explanatory variables were selected, we determined the significance of the resulting correlation coefficient between the species matrix and subset of the best explanatory variables from the ecological matrix, using 999 random permutations. Routines were run using Primer v. 6© [30]. Analyses were conducted across ecoregions (i.e., all sites sampled) and within ecoregions.
To account for geographic distances between sites, which are typically interpreted as dispersal limitations [32], latitude and longitude coordinates were expressed as decimal degrees east (westernmost point = 0) and decimal degrees north (southernmost point = 0). Female black flies of some species can disperse up to hundreds of kilometers [13]; therefore, only simple first-order terms of Euclidean distance were considered to construct distance matrices. That is, we assumed that dispersal between sites was aerial and best approximated by linear distances, a reasonable assumption, given the life cycle of most black flies [11, 13]. Distance matrices were then used in partial Mantel tests [33]. This test correlates a particular species matrix with the complementary distance matrix, while accounting for the effect of the stream conditions. In our case, stream conditions were the best subset of explaintory variables selected from the ecological matrix by the best BV-STEP routines. The partial Mantel test is essentially the multivariate equivalent of univariate partial correlations. The complementary analyses, that is the effect of the best subset of explanatory stream conditions, while accounting for the effect of distance, also were conducted. Partial Mantel tests were run using the PAST software package [34], with significance of each partial correlation based on 5000 (default) random permutations between the matrices.
Canonical correspondence analysis and redundancy analysis have been used to parcel out the variation due to effects of position, habitat, and position-habitat interactions [35]. However, the assumptions of these methods are rarely, if ever, satisfied by community (species) data [29, 30]; we, therefore, elected to use the BV-STEP Routine and partial Mantel tests, which make few, if any, assumptions.
Null models were used to determine if patterns of species co-occurrences among stream reaches were nonrandom for the spring or summer collections. For any data set (matrix), observed co-occurrences less than expected by random—negative covariation of co-occurrences (i.e., species segregation)—are often interpreted as consistent with a community structured by competition [15, 36, 37]. Predation through a variety of mechanisms, such as direct food-chain predation, intraguild predation, and apparent competition, also could be expected to produce negative patterns of co-occurrence [38]. Negative covariation of co-occurrences also can emerge from differing responses of species to abiotic conditions [39, 40]. Observed co-occurrences greater than expected by random—positive covariation (i.e., species aggregation)—could emerge as a result of species with similar abiotic requirements [15] or facilitation [41]. Habitats, such as streams, subject to a series of stochastic disturbances (e.g., spates) and recolonization events, might have co-occurrences no greater or less than expected by chance [14, 15, 42].
Each analysis entailed the construction of a presence/absence (1/0) data matrix with rows representing species and columns representing stream sites. Row margins represented frequency of occurrence for each species and column margins species richness at each site. These data sets are referred to as FULL to distinguish them from our data subsets described below. The algorithm used for randomizing matrices to create a test distribution of co-occurrence against which the observed test statistic (C-score) is compared fixes both the row margins (simulated species frequencies = observed species frequencies) and the column margins (simulated species richness at a site = observed species richness). This simulation has desirable Type I and Type II error properties [43]. Each randomization used the sequential swap to randomize data, and each analysis was based on 5000 randomizations, using the Ecosim© statistical package [37].
The C-score, measuring the number of checkerboard subunits, was used as our test statistic [44]. A checkerboard subunit occurs when species A is present at site 1 and absent at site 2, while the reverse is true for species B. In the co-occurrence matrix, a checkerboard subunit takes the form
or
The C-score is the average number of subunits between all possible species pairs. An observed C-score greater than expected by chance indicates an assemblage in which co-occurrences are less than expected by chance (negative co-variation or species segregation). An observed C-score less than expected by chance indicates an assemblage in which co-occurrences are greater than expected by chance (positive co-variation or species aggregation) [37].
To determine if patterns of co-occurrence were consistent with the expectations of a community structured, at least in part, by abiotic conditions, the complete data set (FULL) was subdivided into two groups: 1) streams with low variation in abiotic stream conditions (LOW data subset) and 2) streams with high variation in stream conditions (HIGH data subset). Grouping was determined by subjecting stream variables to a Principal Components Analysis (PCA). Although not appropriate for species data, PCA is appropriate for abiotic data [45]. Here, it is used as a simple, objective, and repeatable a priori method to define high and low variability in streams [28].
On an a priori basis, streams in which the first three Principal Components (PCs) were within 1.0 standard deviation of their means were considered to have low variation in abiotic conditions (LOW). The remaining streams were grouped as high-variation streams (HIGH). Null-model analyses were repeated for each of these two groups. If stream conditions were a major influence in structuring co-occurrence patterns, results of the analysis between the LOW data subset (‘control’) should differ from the results of the HIGH data subset (‘treatment’). The above PCA also was used to produce ordinations of stream conditions by ecoregion. Interpretation of the PCs was based on correlations between each PC and the original stream variables [16]. Differences in PCs across ecoregions for each season were determined using the nonparametric Multiple Analysis of Variance (MANOVA) with a Bonferroni correction of the p-value for the three pairwise comparisons among ecoregions [34].
In addition to the above null-model analyses, data from each ecoregion were analyzed for spring and summer collections, although sample sizes were too small to subdivide each of these data sets into LOW and HIGH subsets. For each analysis, in addition to the p-value, effect size was calculated as (Cobs – Cm)/SD, where Cobs = observed C-score, Cm = mean C-score of the test distribution, and SD = standard deviation of the test distribution [40].