A Statistical Speculation about the Future of Devils Lake
The water level of Devils Lake has fluctuated markedly over thousands of years and over the past century, reaching a recent minimum during the Dust Bowl era and a maximum in recent years. Since the lake is landlocked, the level must reflect changing balances between inflow from precipitation and loss by evaporation. Both of these two rates presumably reflect largescale and longterm variations and might be predictable by statistical analysis of longterm oceanic and atmospheric oscillations or by dynamical means with computer simulations of the evolving climate.
Searches for links between the Devils Lake precipitation and largescale atmospheric and oceanic patterns as reflected in an average Northern Hemisphere surface temperature (NHT) and in multidecadal oscillations (MDOs) such as the Atlantic Multidecadal Oscillation (AMO) and the Pacific Decadal Oscillation (PDO) have been inconclusive when focused on various running averages of precipitation. But it turns out that the extremes of precipitation are stronger predictors for Devils Lake than precipitation itself.
Thus the outlook for Devils Lake developed here depends on three key ideas. First, the level of the lake correlates strongly with decadal averages of precipitation extremes—we look at the tails of the probability distribution rather than precipitation itself. The variation of these extremes, as shown by McCabe et al. (2004), can be represented as a combination of timedependent coefficients and spatial patterns. And finally, we find that the temporal coefficients can be represented as functions of numerical indexes representing the three phenomena listed above and thus their trends and harmonic oscillations on decadal time scales provide an extrapolation for the next few decades to come.
While the statistical measures of confidence seem to be reassuring along a fairly complex chain of reasoning, the result must be considered a statistical speculation. Nevertheless, if similar results were obtained with computer climate models, then confidence in both would be enhanced. And the statistical prediction might be a catalyst for a physical model that explained the processes involved.
This document outlines the chain of reasoning and computation, presents key results, and then shows the evolution of Devils Lake elevation to 2064 as projected by this statistical approach. It also shows a similar prediction to 2040 made with precipitation extremes computed from decadal runs of the U.S. National Weather Service Climate Forecast System (CFSv2) described further in the article “Climate Model Simulations” on this website.
The basic assumption here is that the evolution of the Devils Lake precipitation regime and the resulting lake level are linked in quantifiable ways to the evolution of climate on hemispheric and decadal scales. While the statistical approach is both promising and intriguing, it must be refined and eventually confirmed by physical reasoning to predict the future of Devils Lake with confidence.
Lake Level and Precipitation Extremes
The elevation of the surface of Devils Lake correlates strongly with the trailing running 20year average frequency of wet and dry precipitation extremes represented by the 25 percent tails of the distribution of annual precipitation as presented in the National Climatic Data Center (NCDC) climate division data set. The precipitation quartiles were determined for each climate division from the 19002010 record; then each annual precipitation amount is assigned a quartile and the fractions in each quartile are averaged over trailing periods of 10, 20, and 30 years. The correlation between lake elevation and the frequencies of wet and dry extremes is shown in Table 1 and demonstrated in Figure 1, with Devils Lake data determined as the average of the fractions for the two climate divisions in which it lies.
Table 1. Correlation between the level of Devils Lake and the average frequencies of wet and dry extremes over 10year, 20year, and 30year trailing periods computed over the period 19002010.
Wet 
Dry 

Wet 10 
0.76 
Dry 10 
0.574 
Wet 20 
0.878 
Dry 20 
0.604 
Wet 30 
0.852 
Dry 30 
0.664 
Figure 1. Time histories comparing the level of Devils Lake with the 20year average frequencies of wet and dry extremes in precipitation near Devils Lake.
This strong correlation motivates a regression model of lake level as a function of the running averages of wet and dry frequencies. A first model (M1) uses only the fraction of wet extremes, a second (M2) uses both wet and dry, as shown in Table 2 and Figure 2.
Table 2. Coefficients for the regression equations modeling lake level as a function of trailing 20yr averages of the frequency of precipitation extremes.

Intercept 
DL Wet 
DL Dry 
Model 1 
1387.9 
152.2 

Model 2 
1389.6 
138.6 
31.2 
Figure 2. The Devils Lake elevation as represented by linear regression on the 20year averages of the frequencies of wet and dry precipitation.
Lake Level and MultiDecadal Oscillations
The strategy now is to search for global climate indexes that have strong 20^{th} century temporal trends or harmonic content—suggesting predictability—that might be used to foresee future precipitation extremes. Then these indexes could be used to predict the level of Devils Lake.
Extreme precipitation frequencies at Devils Lake correlate with various trailing running averages of the NHT, the AMO, and the PDO indexes^{1}. Future values of these indexes can be extrapolated on the basis of their strong harmonic content on decadal timescales observed in the historical record and, in the case of NHT, a continued warming trend, as shown in Figure 3. The AMO model has a periodicity of 67 years, the PDO model has periodicities of 59 and 27 years, and the NHT harmonics have periods of 63 and 104 years. The models were created by selecting amplitudes, periods, and phases to minimize the mean square error of the representation.

^{1}The data for the AMO index were obtained from the Earth System Research Laboratory (http://www.esrl.noaa.gov/psd/data/timeseries/AMO/), the PDO index from the Joint Institute for the Study of the Atmosphere and Ocean operated by NOAA and the University of Washington (http://jisao.washington.edu/pdo/PDO.latest) , and the Northern Hemisphere temperature data from the Goddard Institute of Space Science (http://data.giss.nasa.gov/gistemp/tabledata_v3/NH.Ts+dSST.txt with sources updated in June 2012).
Figure 3. Harmonic representation and extrapolation of the AMO, PDO, and NHT indexes. The AMO is modeled with one harmonic function that has minimum meansquare error, the PDO with two harmonics, and the NHT with a linear trend and two harmonic functions.
Empirical Orthogonal Functions
The national regimes of precipitation extremes in the NCDC Climate Divisions can be modeled compactly with empirical orthogonal functions (EOFs) in which the spatial patterns are represented with the eigenfunctions and the temporal variation of the 20year averages with the timedependent Fourier coefficients. With the EOFs, we can embed the Devils Lake situation in the much broader patterns of precipitation evolution on a continental scale. Indeed, McCabe et al. (2004) showed that centered averages of national patterns of extreme precipitation seem to be represented by EOFs that are strongly related to the indexes. We use trailing averages because of our interest in prediction.
The EOFs and the coefficients can be rotated with standard routines in the statistical system R to optimize the representation of the variance. The correlation between the coefficients for the 20yr trailing averages and between the coefficients and the three indexes is shown in Table 3.
Table 3. Correlations between the first three coefficients in the eigensystem for 20year averages of frequencies of precipitation extremes of U.S. precipitation and the NHT, the AMO, and the PDO indexes. Note that the signs of the coefficients are arbitrary and match with the eigenfunctions to be correct for the physical quantities they represent. Correlations are in percent and only those significant at 2 percent are shown.
Correlation Between Coefficients 
Correlation Between Coefficients and Indexes 
Correlation Between Coefficients and Harmonic Representation of Indexes 
The first three wet and dry coefficients for the rotated eigenfunctions are shown in Figure 4, grouped according to the correlations in Table 3. The relations here seem to be somewhat more complex than those reported by McCabe who found close correspondences between Dry 1 and the PDO, Dry 2 and the AMO, and Dry 3 and –NHT. We note that our Wet 1 is similar to Dry 2 and that both resemble the curve for NHT in Figure 3.
Figure 4. The first three wet and dry EOF coefficients, paired according to the correlations in Table 3.
The fraction of variance in the averages of extremes represented by expansions of the EOF series in models of increasing numbers of terms is shown in Figure 5.
Figure 5. The cumulative variance explained by rotated eigensystem expansions with one to nine terms for the 20year averages of wet and dry frequencies.
Models of the dependence of the rotated Fourier coefficients on the indexes are obtained with allsubsets regression in R (Kabacoff, 2011; Miller, 2002), a process that selects the best model with one, two, three, … , members of the set of predictors. As shown in Figure 6, a satisfactory fraction of the variance is explained with representations involving three to five of the nine average variables NHTnn, AMOnn, PDOnn, in which the nn’s of 10,15, and 20 represent the length in years of the runningaverage intervals. Using loworder representations helps to avoid overfitting.
Figure 6. Variance explained (adjusted Rsquared) by models of various order produced by the allsubsets selection method. The box outlines the region including three to fivemember models in which a satisfactory fraction of the variance is already represented.
These regression models combine with the harmonic extrapolations of the indexes to produce a statistical extrapolation of the level of Devils Lake for the next halfcentury or so, as shown in Figure 7 for a period extending to 2064.
The variance explained for nine of the models over the observed history is shown in Table 4 and motivates showing the three best models as darker lines, the next three as lighter lines. Here Mnm identifies the models n=(1,2) corresponding to the models M1 and M2 in Table 4 for the regression of lake level on wet and dry frequencies and m=(1,2,…,5) identifies the number of MDO indexes in the model from allsubsets selection. The variance explained in Table 4 is computed directly as
_{}
Note that the variance explained in Table 4 for the final prediction models is a different quantity than the variance explained in Figure 7 for the models of the W and D coefficients using the three indexes.
Figure 7. The evolution of the elevation of Devil’s Lake as projected by the models that most satisfactorily represent the observed conditions from 1920 to 2011.
Table 4. Fraction of variance explained (in percent) in the observed history of the surface elevation of Devils Lake by nine of the statistical models based on EOFs and multiple linear regression on the indexes. The best three are emphasized in Figure 7.
Model 
Variance Explained 
Model 
Variance Explained 
M11 
0.61 
M21 
0.38 
M12 
0.84 
M22 
0.5 
M13 
0.92 
M23 
0.67 
M14 
0.90 
M24 
0.63 
M15 
0.89 
M25 
0.62 
The three models, M13, M14, and M15, that most successfully represent the observed history predict that the level of Devils Lake will remain within a few feet of the present level over the next 30 years as the extrapolated NHT enters a cooling oscillation; they then predict that it will start increasing again as the projected NHT begins to rise again. The actual level will be limited by the overflow to the Sheyenne River at 1459 ft.
Comparison of Decadal CFS and Statistical Outlooks
The statistical prediction takes advantage of the strong correlation between the 20year trailing averages of precipitation extremes and the elevation of the surface of Devils Lake. The dry and wet extremes are defined to be the precipitation events occurring in the 25 percent dry and wet tails of the precipitation distributions.
As another approach to attempting to foresee the future of Devils Lake, we turn to 30year climate simulations (Center for OceanLandAtmosphere Studies (COLA), 2012) computed with the dynamical CFSv2 model used by the National Weather Service to compute intraseasonal and seasonal forecasts.
The CFSv2 decadal precipitation simulations and forecasts were calibrated using observations from temporal domains in which they overlapped and the calibrated values were converted into the evolving 20year trailing averages of the precipitation extremes going out to 2040. Then the two models M1 and M2 described above for relating elevations to precipitation extremes were used to obtain estimates of the Devils Lake elevation based on the CFSv2 precipitation forecasts. Thus the difference between the statistical and CFSv2 forecasts lies in the source of the estimates for future precipitation extrema.
The results are shown in Figure 8, in which the three leading and similar models of the statistical method and the two nearly identical models for CFSv2 were averaged to obtain one representative estimate for each method.
Figure 8. Comparison of the outlooks for the elevation of Devils Lake obtained from a statistical method based on the harmonic content of multidecadal indexes and from precipitation extrema derived from CFSv2 decadal simulations and projections. As shown at the bottom of the figure, the curve for the CFS decadal projections includes observations on the left, the overlap regions in the middle, and lake levels derived from CFSv2 forecasts on the right.
The CFSv2 representation of the history portrays the longterm trends in the Devils Lake elevation but oscillates unrealistically about the trend with somewhat remarkable extremes.
Next Steps
Further research might include two obvious steps:
§ Reduce the number of MDO indexes to a minimum but optimal set. Running averages of harmonic functions tend to differ only by small changes in phase. Thus the set of nine indexes might be reduced considerably, perhaps to the best three.
§ Verify that the method reliably represents U.S. precipitation variability. The modeled EOF coefficients carrying the temporal information can be used with the spatial information contained in the eigenfunctions to generate patterns of extreme precipitation for the entire U.S. that could be compared with observations to ascertain whether the major decadal drought and moist events are portrayed with useful accuracy. If so, then we have a statistical outlook for U.S. precipitation.
Assessing Confidence in the Statistical Speculation about Devils Lake
We have proceeded along a fairly complex chain of logic to reach an outlook about the level of Devils Lake between now and 2064. Although we have used the word predict, we have been careful to denote the entire process and the result as a statistical speculation. Let us review the main links in the chain:
§ The level of the lake is closely related to the 20year average frequencies of extreme precipitation and these average fractions can be used to create a regression model.
§ The records of extreme precipitation for the United States can be represented with empirical orthogonal functions in which the functions provide the spatial patterns and the coefficients provide the time variation. This is a good assumption. The eigensystem can be rotated to improve the representation of the variance.
§ The temporal variation of the EOF coefficients can be represented as linear functions of largescale indexes in models created with a statistical process that selects best subsets of the 10, 15, and 20year trailing averages of the three indexes.
§ The three indexes all have strong linear or harmonic components that can be resolved with the available data and then used to project their values for the decades to come. This is of course no assurance that the observed harmonic behavior will continue even though we might expect that it will. There is also considerable uncertainty surrounding the magnitude of global temperature trends that will prevail in the next century.
§ Finally, we use the harmonic content of the indexes to represent the EOF coefficients which we then use to represent the extreme precipitation which we then use to represent the lake level from 1919 to 2064.
Now we look again at Figure 7 and observe that we have created a remarkably good representation of the level of Devils Lake from 1919 to 2011. That suggests that the representation may be valid for at least a few years into the future. If so, then we expect the lake level to remain at about the present level for the next eight years or so. Figure 9 provides a summary of the evolution of the 20year averages of the three indexes over the past ninety years and uses their harmonic content to project their behavior to 2064. The threedimensional trajectory of the two MDOs is transported by the behavior of the average Northern Hemisphere temperature into a region of the phase space not previously visited. That in itself suggests that Devils Lake may produce some surprises if it does indeed respond to the forces represented by these indexes.
Figure 9. The threedimensional trajectory of the standardized AMO, PDO, and NHT indexes.
Having reviewed the process by which we arrived at the speculative prediction, let us consider what might go wrong:
§ The relations between the EOF coefficients and the indexes are valid for the present but turn out not to be valid in the future
§ The harmonic nature of one or more of the indexes undergoes a dramatic change in the coming years or decades with a significant change in amplitude or phase.
§ A major event such as a multiyear drought or a series of very snowy winters with causes not linked to the indexes examined here appears and disrupts the predictive relations.
§ The variations in the EOF coefficients are actually controlled by forces or events not yet identified and not considered here.
Let us also consider the comparison of the statistical outlook and the predictions from the 30year outlook from the seasonal forecast model. As we see in Figure 8, the extreme precipitation amounts in the model were markedly greater than the actual experience in the last 90 years, and we are inclined to discount its outlook for the future. But again, we will see what happens in the next eight years as a first indication about the efficacy of this outlook.
Finally, there are two serious criticisms that could be aimed at this work. We have already mentioned the first: we do not have physical cause and effect relationships that support the notion that the level of Devils Lake or the extremes of precipitation might depend strongly on the three indexes considered here. Perhaps, as said earlier, these statistical results might stimulate a search for such a model.
Second, the chain of reasoning produced a deterministic rather than a probabilistic forecast. We would much prefer to have a prediction that contained information about uncertainty, as do the contemporary probability forecasts used in weather and seasonal prediction. If we had such a model, then we would expect Figure 7 to have a fan of possible lake levels emanating from the present value. The center cluster of the fan would indicate the most likely trajectories, the edges of the fan would contain those of smaller probability. The fan would generally become wider with increasing time, indicating that the uncertainty increases further into the future.
With such a probabilistic outlook, we could say that at some time T in the future, the probability of the lake level increasing more than _{} feet above the present level is _{}, the probability of decreasing more than _{}feet below the present level is _{}, and the probability of staying within _{} feet of the present level is _{}. We could then treat these three cases as scenarios, consider what actions might be taken in response to them, and perhaps arrive at a perception of what the most likely and appropriate courses of action might be.
And thus we might be better prepared to understand the implications for Devils Lake of the prediction paradox identified by Silver (2012): “The more humility we have about our ability to make predictions, the more successful we can be in planning for the future.”
References
Center for OceanLandAtmosphere Studies, 2012, http://monsoondata.org:9090/dods/slimline/nemo/flxf
Kabacoff, Robert I., 2011. R In Action: Data Analysis and Graphics with R, Manning Publications, Shelter Island, NY, 450 pp.
McCabe, Gregory J., Michael A Palecki, Julio L. Betancourt, 2004. Pacific and Atlantic Ocean influences on multidecadal drought frequency in the United States, Proc. Nat. Acad., Sci., 101, 41364141.
Miller, Alan, 2002. Subset Selection in Regression, (2^{nd} ed.), Chapman & Hall/CRC, Boca Raton, FL, 238 pp.
Silver, Nate, 2012. The Signal and Noise: Why Most Predictions Fail but Some Don’t, Penguin Press, New York, 544pp. The quotation is a summary from marketing material issued by the publisher.