Next Article in Journal
A NISQ Method to Simulate Hermitian Matrix Evolution
Previous Article in Journal
A Model of Interacting Navier–Stokes Singularities
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

State-of-the-Art Statistical Approaches for Estimating Flood Events

1
Zhengzhou Key Laboratory of Big Data Analysis and Application, Henan Academy of Big Data, Zhengzhou University, Zhengzhou 450052, China
2
School of Mathematics and Statistics, Zhengzhou University, Zhengzhou 450001, China
3
Department of Civil, Environmental, and Infrastructure Engineering, George Mason University, 4400 University Dr, Fairfax, VA 22030, USA
4
College of Hydropower & Information Engineering, Huazhong University of Science & Technology, Wuhan 430074, China
5
Department of Statistics, School of Mathematics and Statistics, Central China Normal University, Wuhan 430079, China
*
Author to whom correspondence should be addressed.
Submission received: 4 June 2022 / Revised: 26 June 2022 / Accepted: 27 June 2022 / Published: 29 June 2022

Abstract

:
Reliable quantile estimates of annual peak flow discharges (APFDs) are needed for the design and operation of major hydraulic infrastructures and for more general flood risk management and planning. In the present study, linear higher order-moments (LH-moments) and nonparametric kernel functions were applied to APFDs at 18 stream gauge stations in Punjab, Pakistan. The main purpose of this study was to evaluate the impacts of different quantile estimation methods towards water resources management and engineering applications by means of comparing the state-of-the-art approaches and their quantile estimates calculated from LH-moments and nonparametric kernel functions. The LH-moments (η = 0, 1, 2) were calculated for the three best-fitted distributions, namely, generalized logistic (GLO), generalized extreme value (GEV), and generalized Pareto (GPA), and the performances of these distributions for each level of LH-moments (η = 0, 1, 2) were compared in terms of Anderson–Darling, Kolmogorov–Smirnov, and Cramér–Von Mises tests and LH-moment ratio diagrams. The findings indicated that GPA and GEV distributions were best fitted for most stations, followed by GLO distribution. The quantile estimates derived from LH-moments (η = 0, 1, 2) had a lower relative absolute error, particularly for higher return periods. However, the Gaussian kernel function provided a close estimate among nonparametric kernel functions for small return periods when compared to LH-moments (η = 0, 1, 2), thus highlighting the importance of using LH-moments (η = 0, 1, 2) and nonparametric kernel functions in water resources management and engineering projects.

1. Introduction

Extreme environmental events have always been a vital part of human history. Floods, rainstorms, droughts, and windstorms are some of the manifestations of these events, which cause enormous destruction. Flood is often referred to as one of the most devastating natural disasters in terms of damage to property, infrastructure, and the environment, even threatening human lives.
Quantile estimations of floods, commonly extracted from annual peak flow discharges (APFDs), are of great importance for the description of such events. These estimates allow for the assessment of flood characteristics by associating their magnitude to a corresponding frequency, from which mitigating hydraulic structures and management practices may be designed. The objective of flood frequency analysis (FFA) is to obtain a flood quantile magnitude estimator for one or more stations on the river system. Depending on the magnitude of the flood, an estimate of its return period may be required. The common interest in estimating quantiles of extreme floods for different return periods, i.e., 50-year, 100-year, or 500-year flood, is the design of hydraulic structures such as dams, culverts, and bridges [1,2,3].
The parametric estimation method (LH-moments) and nonparametric method (kernel functions) were used in the present study to estimate the flood magnitude at different return periods. In parametric methods, the APFDs are assumed to be independent and identically distributed, as well as being drawn from a population with a known probability distribution function (PDF). In addition, an adequate PDF is selected from a set of candidate PDFs using a robust goodness-of-fit test. As described by [4], extensive literature for FFA is available, not only for the detailed description of PDFs, but also for parameter estimation. The commonly used PDFs for modeling of the APFDs include: generalized extreme value (GEV), generalized logistic (GLO), generalized Pareto (GPA), Pearson type 3 (P3), log Pearson type 3 (LP3), Weibull (WEI), extreme value type 1, extreme value type 2, normal, log normal, gamma, and exponential [4,5,6,7,8,9].
The widely used parameter estimation techniques for PDFs include maximum likelihood, L-moments, LH-moments, and method of moments. The main drawback of the maximum likelihood and method of moments is that the product moments of the APFDs are similarly affected by the small sample size. Furthermore, the higher moments (e.g., coefficient of variation and skewness) are greatly influenced by the extremes observation in the data series [6]. On the other hand, L-moments are less affected by extreme observations in the data series [10]. Wang [11] introduced LH-moments, which are the generalization of L-moments, using higher order, i.e., “H”, L-moments. Wang found that LH-moments produce consistent quantile estimates for large return periods, since LH-moments provide greater weight to the larger values in the APFD series and hence better fits to the upper tail of PDFs. This characteristic is even more relevant when sample sizes are small, which is a common reality of streamflow monitoring in developing countries, including Pakistan [8,9,12,13,14]. Various studies have been conducted on LH-moments in different regions of the world [11,15,16,17,18,19,20,21,22,23]. Most of these researchers have focused on comparing LH-moments with L-moments by using goodness-of-fit-tests for different PDFs.
The aforementioned studies were based on parametric methods, which require a priori PDF selection. Alternative nonparametric methods do not assume the APFD series in a distributional form. Many studies on FFA based on the nonparametric approach have been conducted [24,25,26,27,28,29,30,31,32,33], among which kernel function estimators have stood out for producing the most reliable nonparametric methods. Adamowski [24,25] proposed a nonparametric kernel estimation for FFA and conducted a Monte-Carlo simulation experiment in order to compare the nonparametric approach with parametric PDFs, namely, LP3 and WEI distributions. His results show that the nonparametric method produces more accurate estimates than parametric methods, but the probability of extrapolation is lower than the highest data observed in the sample. Lall et al. [34] stressed the choice of a kernel function that reflects the shape and bandwidth of FFA. Kernel function and bandwidth selection techniques are implemented in three situations: Gaussian data, skewed data, and mixed data.
By definition, the extreme event is rare and often occurs in a short time period; therefore, estimating floods for large return periods is a challenging task, often leading to gross errors in the estimation of quantiles. Another problem is the identification of a suitable statistical model. The standard methods of estimation, including maximum likelihood, method of moments, least squares, may not give consistent quantile estimates for large return periods when the sample size is small. Therefore, we need an appropriate method of estimation that gives consistent quantile estimates [35]. The current research attempts to highlight the significance of LH-moments (η = 0, 1, 2) and nonparametric kernel functions in FFA. On the one hand, this allows for comparison of the two estimation approaches within this framework, while on the other hand, it observes the effect of utilizing RAE to evaluate quantiles for various return periods. This study does not imply that nonparametric kernel functions are always the best fit and should be used in place of adequately constructed parametric methods. However, the possibility of obtaining the best fit for nonparametric kernel functions may be more significant when using FFA, as these approaches give precise estimations for observed flood variables [36]. Therefore, the current study emphasizes the significance of using LH-moments (η = 0, 1, 2) and nonparametric kernel functions to develop a comprehensive framework for at-site FFA. The steps in the framework are as follows: (i) selection of the optimized PDFs with a comprehensive level of LH-moments (η = 0, 1, 2) for the annual peak flow of 18 stations; (ii) estimation of quantiles for various flood return periods through LH-moments (η = 0, 1, 2) and nonparametric kernel functions; and (iii) comparison of various flood return periods in terms of RAE. This paper is arranged into five sections. Section 2 contains a detailed step-by-step explanation of the methodology. The details of the study area are presented in Section 3. The results and discussions of the study are provided in Section 4. Section 5 summarizes the conclusions of the study.

2. Methods

2.1. Linear Higher Order-Moments (LH-Moments)

LH-moments were proposed by [11] as the expectations of a linear combination of higher-order statistics. Let n be the sample size drawn from the distribution F ( x ) = P r ( X x ) ; then the four LH-moments are defined as follows:
λ 1 η = E [ X ( η + 1 ) : ( η + 1 ) ]
λ 2 η = 1 2 E [ X ( η + 2 ) : ( η + 2 ) X ( η + 1 ) : ( η + 2 ) ]
λ 3 η = 1 3 E [ X ( η + 3 ) : ( η + 3 ) 2 X ( η + 2 ) : ( η + 3 ) + X ( η + 1 ) : ( η + 3 ) ]
λ 4 η = 1 4 E [ X ( η + 4 ) : ( η + 4 ) 3 X ( η + 3 ) : ( η + 4 ) + 3 X ( η + 2 ) : ( η + 4 ) X ( η + 1 ) : ( η + 4 ) ]
where
E [ X ( j : n ) ] = m ! ( j 1 ) ! ( m 1 ) ! 0 1 x ( F )   F j 1 ( 1 F ) n j d F
When η = 0, LH-moments become equal to L-moments [10]. As η increases, it reflects more and more characteristics of the upper part of the PDFs and extreme events in data. Here, λ 1 η provides the location of the distribution, λ 2 η is the spreadness of the distribution,   λ 3 η represents how the upper part of the distribution is asymmetric, and λ 4 η measures the peakedness of the upper parts of the distributions. For η = 0, 1, 2, the LH-moments are referred to as L-moments, L1-moments, and L2-moments, respectively. The LH-moment ratios are described below.
τ 2 η = λ 2 η λ 1 η
τ 3 η = λ 3 η λ 2 η
τ 4 η = λ 4 η λ 2 η
Let x 1 x 2 x n be the order statistic; then the unbiased estimators of LH-moments are given below.
λ ^ 1 η = 1 ( n η + 1 ) i = 1 n ( i 1 η ) x i
λ ^ 2 η = 1 2 ( n η + 1 ) i = 1 n [ ( i 1 η + 1 ) ( i 1 η ) ( n i 1 ) ] x i
λ ^ 3 η = 1 3 ( n η + 1 ) i = 1 n [ ( i 1 η + 2 ) 2 ( i 1 η + 1 ) ( n 1 1 ) + ( i 1 η ) ( n i 2 ) ] x i
λ ^ 4 η = 1 4 ( n η + 1 ) i = 1 n [ ( i 1 η + 3 ) 3 ( i 1 η + 2 ) ( n i 1 ) + 3 ( i 1 η + 1 ) ( n i 2 ) ( i 1 η ) ( n i 3 ) ] x i
The sample LH-moment ratios are as follows:
τ ^ 2 η = λ ^ 2 η λ ^ 1 η
τ ^ 3 η = λ ^ 3 η λ ^ 2 η  
τ ^ 4 η = λ ^ 4 η λ ^ 2 η

2.2. Estimation of the Parameters of the Selected PDFs Based on LH-Moments

Several PDFs for fitting flood series exist in FFA; among them, the GLO, GEV, and GPA distributions have been recommended in Pakistan by various researchers to model extreme flood events [8,9,13,14,37,38,39]. Based on the comprehensive literature, three PDFs were selected in the current study, i.e., GLO, GEV, and GPA. The parameters of each respective PDF were estimated using the LH-moments and given below.

2.2.1. Generalized Logistics (GLO) Distribution

GLO distribution is a generalized variant of the logistic distribution [40,41] that has been applied in recent years to assess extreme value events. Since GLO was recognized as the acceptable approach for FFA in the UK [42], its use in hydrology has gained popularity [43]. The PDF, distribution function (DF), and quantile function (QF) of the GLO distribution are expressed, respectively, by:
f ( x ) = 1 α [ 1 k ( x ξ α ) ] ( 1 k 1 ) [ 1 + { 1 k ( x ξ α ) } ( 1 k ) ] 2
F ( x ) = [ 1 + { 1 k ( x ξ α ) } ( 1 k ) ] 1
x ( F ) = ξ + α k [ 1 ( 1 F F ) k ]
where x is the APFDs, k is the shape parameter, ξ is a location, and α is a scale parameter. These parameters are estimated by [17] using LH-moments and given below:
α ^ = Γ ( η + 2 ) ( η + 2 ) β ^ n + 1 ( η + 1 ) β ^ n Γ ( η + 1 k ^ ) Γ ( 1 + k ^ )
ξ ^ = ( η + 1 ) β ^ n α ^ k ^ [ 1 Γ ( η + 1 k ^ ) Γ ( 1 + k ^ ) Γ ( η + 1 ) ]
k ^ = ( η + 3 ) ( η + 2 ) β ^ n + 2 [ ( η + 2 ) 2 + ( η + 2 ) ( η + 1 ) ] β ^ n + 1 + ( η + 1 ) 2 β ^ n ( η + 2 ) β ^ n + 1 ( η + 1 ) β ^ n

2.2.2. Generalized Extreme Value (GEV) Distribution

The GEV was developed by [44] as a feasible tool for extreme value analysis, and it has gained widespread favor in FFA. The GEV distribution PDF, DF, and QF are written, respectively, as:
f ( x ) = 1 α [ 1 k ( x ξ α ) ] ( 1 k 1 ) e [ 1 k ( x ξ α ) ] ( 1 k )
F ( x ) = e x p { [ 1 k ( x ξ α ) ] 1 k }
x ( F ) = ξ + α k [ 1 ( l n F ) k ]
where kξ, α are shape, location, and scale parameters, respectively, which are estimated by [11] and are described below:
α ^ = k ^ [ ( η + 2 ) β ^ n + 1 ( η + 1 ) β ^ n ] Γ ( 1 + k ^ ) [ ( η + 1 ) k ^ ( η + 2 ) k ^ ]
ξ ^ = ( η + 1 ) β ^ n α ^ k ^ [ 1 ( η + 1 ) k ^ Γ ( 1 + k ^ ) ]
k ^ = a 0 + a 1 [ τ 3 η ] + a 2 [ τ 3 η ] 2 + a 3 [ τ 3 η ] 3

2.2.3. Generalized Pareto (GPA) Distribution

Pickands [45] proposed GPA distribution, and several scholars have widely acknowledged it as the logical alternative for evaluating severe events [46,47]. The PDF, DF, and QF for GPA distribution are given, respectively, by:
f ( x ) = 1 α [ 1 k α   ( x ξ ) ] ( 1 k 1 )
F ( x ) = 1 [ 1 k α   ( x ξ ) ] 1 k
x ( F ) = ξ + α k [ 1 ( 1 F ) k ]
The location (α), scale (ξ), and shape (k) parameters were estimated by [17] and are stated below:
α ^ = k ^ Γ ( η + 3 + k ^ ) Γ ( η + 2 + k ^ ) ( η + 2 ) β ^ η + 1 ( η + 1 ) β ^ η ( η + 1 ) ! Γ ( 1 + k ^ ) [ ( η + 2 ) Γ ( η + 2 + k ^ ) Γ ( η + 3 + k ^ ) ]
ξ ^ = ( η + 1 ) β ^ n α ^ k ^ [ 1 ( η + 1 ) Γ ( η + 1 ) Γ ( 1 + k ^ ) Γ ( η + 2 + k ^ ) ]
k ^ = 5 2 η + ( η + 3 ) ( η + 3 ) β ^ η + 2 ( η + 1 ) β ^ η ( η + 2 ) β ^ η + 1 ( η + 1 ) β ^ η 1 + ( η + 3 ) β ^ η + 2 ( η + 1 ) β ^ η ( η + 2 ) β ^ η + 1 ( η + 1 ) β ^ η

2.3. Goodness-of-Fit (GOF) Tests

Four statistical measures, namely, the Anderson–Darling (AD) test, Kolmogorov–Smirnov (KS) test, Cramér–Von Mises (CVM) test, and LH-moment ratios (LH-ratios) diagram, were used in this study to determine the GOF tests for the selection of PDFs using LH-moments (η = 0, 1, 2). The PDFs for the APFDs that produced the smallest values for all of these GOF measures (AD test, KS test, and CVM test) were determined as a best-fit, and hence they were chosen for further estimation of quantiles. These GOF tests were previously applied to peak flow data and are frequently used to choose the best-fitting PDFs in FFA [8,9,35,48].
The AD test is used to evaluate the fit of an observed distribution function (DF) to its theoretical DF. The AD test gives a higher weight to the PDF’s tail, which is a necessary feature in modeling extreme events [35,48]. Heo et al. [49] describe the AD test statistic as follows:
A 2 = n 1 n i = 1 n ( 2 i 1 ) l o g   F ( x i ) 1 n i = 1 n ( 2 n 2 i 1 ) l o g   F ( x i )
Here, A2 denotes the test result, n represents the sample size, x is the variable being studied, and F ( x i ) denotes the DF.
The KS test is based on the empirical DF and is used to assess whether a sample is drawn from a hypothesized continuous distribution. Assuming we have a random sample ( x 1 , x 2 , x 3 , , x n ) from some distribution, then empirical DF is as follows:
F n ( x ) = 1 n [ N u m b e r   o f   o b s e r v a t i o n s x ]
The maximum vertical distance between the theoretical PDF and empirical DF determines the KS statistic (D). The KS test statistic (D) is as follows:
D = m a x 1 i n [ F ( x i ) i 1 n , i n F ( x i ) ]
where xi is the ith order statistic, n signifies the size of the random sample, and F ( x i ) indicates the theoretical DF.
The CVM test, an alternative to the KS test, is used to compare DF to a given empirical DF. Let ( x 1 : n x i : n x n : n ) be the order statistics of a sample size n; then the CVM test is suggested in [50]:
ω 2 = 1 12 n + i = 1 n [ 2 i 1 2 n F ( x i ) ]
Hosking [10] initially proposed the L-moment ratio diagram as the simplest way to determine the best-fitted distribution for the actual data. The L-moment ratio diagram is extended to each level of LH-moments (η = 0, 1, 2) [11]. The LH-ratio diagram is based on the relationships between LH-skewness and LH-kurtosis. Therefore, this allows better discrimination between the PDFs, and hence the identification of parent distribution can also be achieved.

2.4. Quantile Estimates for Different Return Periods of Floods Based on LH-Moments

Various scientific fields are interested in estimating quantiles corresponding to different return periods. The return period is also known as a recurrence interval, defined as the average of inter-event times between flood events [4]. Sometimes, the hydrologist wants to know the chances of a flood reaching or exceeding a specific magnitude over a set time period. This is known as the probability of occurrence or the probability of exceedance. The probability that the exceedance for a given flood (q) with a return period (T) may be exceeded once in T years is computed as follows:
  P ( Q T q ) = 1 T
Equation (39) shows the cumulative probability of non-exceedance as follows:
F ( Q T ) = P ( Q T q ) = 1 P ( Q T > q ) = 1 1 T
Equation (39) is used to calculate the magnitude of a flood for given return periods. We can obtain quantile estimates for different return periods by substituting [F(QT) = 1 − 1/T] in the quantile function of the GLO, GEV, and GPA distributions, as described below.
GLO x ^ T = ξ ^ + α ^ k ^ [ 1 ( T 1 ) k ^ ]
GEV x ^ T = ξ ^ + α ^ k ^ [ 1 { l o g ( 1 1 T ) } k ^ ]
GPA x ^ T = ξ ^ + α ^ k ^ [ 1 T k ^ ]
Equations (40)–(42) above are used to calculate the quantile associated with the required return periods for the GLO, GEV, and GPA distributions at different levels of LH-moments (η = 0, 1, 2).

2.5. Quantile Estimates for Different Return Periods of Floods Based on Nonparametric Kernel Function

The nonparametric kernel function is based on kernel smoothing of the empirical QF of the variable under study [32]. Let ( x 1 ,   x 2 , x 3 , ,   x n ) be the series of observed APFDs arranged in ascending order; then the mathematical form of the kernel estimator of the nonparametric kernel function is expressed as [51]:
f ^ h ( x ) = 1 n h i = 1 n K ( x x i h )
where K(.) refers to the kernel function prescribed type (Epanechnikov, Gaussian, Biweight, or Triweight), n denotes the observation’s sample size, and h represents the bandwidth or smoothing parameter that controls the variance of the nonparametric kernel function [25,34,52]. The relation between DF and density of the nonparametric kernel function is as follows:
F ^ h ( x ) = x f ^ h ( t ) d t = 1 n i = 1 n H ( x x i h )
where
H ( x ) = x K ( t ) d t
Equation (44) is widely used in hydrology to calculate the quantiles associated with various return periods. The quantiles obtained by using a nonparametric kernel distribution estimator are as follows (for details, see [53,54]):
x ^ T = F ^ h 1 ( 1 1 T )
Estimating the nonparametric kernel density method necessitates the selection of a kernel function, K(.), and the computation of a smoothing parameter, or bandwidth, h (as shown in Equation (44). The choice of K(.) is less critical, and different types of kernel functions that provide good results can be used. This study applied the Epanechnikov, Gaussian, Biweight, and Triweight kernel functions, which are commonly used in the literature [24,25,53,54,55,56,57]. The expressions of standard kernel functions are given below:
Epanechnikov k ( x ) = { 3 4 ( 1 x 2 ) ;                   i f   | x | 1 0 ;                                                           o t h e r w i s e
Gaussian k ( x ) = 1 2 π e ( x 2 2 ) ;   x
Biweight k ( x ) = { 15 16 ( 1 x 2 ) 2 ;             i f   | x | 1 0 ;                                                   o t h e r w i s e
Triweight k ( x ) = { 35 32 ( 1 x 2 ) 3 ;             i f   | x | 1 0 ;                                                   o t h e r w i s e
In Equation (44), the smoothing parameter h plays a crucial role in the kernel estimator. In practice, selecting an effective technique for computing h for an observed data sample is a more complicated task due to the influence of the bandwidth on the shape of the associated estimate. If the value of h is small, we will obtain an undersmoothed estimator with a large variation. On the other hand, if h is large, the resultant estimator will be extremely smooth and will be farther away from the function that we are attempting to estimate [32,53]. In the context of the nonparametric kernel function, least-squares cross-validation, plug-in, and cross-validation procedures were considered for bandwidth selection. Overall, all the methods performed well both theoretically and practically; however, the least-squares cross-validation method needs a very large sample size to achieve satisfactory findings [58,59]. The two different plug-in bandwidth approaches were investigated by [58,60]. The cross-validation method was developed by [61], which showed promising results. Further, it was discovered that both cross-validation and plug-in bandwidths produced similar results in terms of DF estimation, but cross-validation had a clear disadvantage in terms of computation time [53,54,59]. The plug-in method, which has been employed previously in similar studies, yielded excellent results [54,60,62]. We used the plug-in approach suggested by [60] to determine the bandwidth for nonparametric kernel estimation of the DF of APFDs in this framework. The interested reader can obtain more theoretical information and a comprehensive explanation of the Polansky and Baker plug-in approach for bandwidth selection of the nonparametric DF; for details, see [59,60].

3. Study Area and Data

Climate change has significantly impacted the entire world. Its impacts may vary from increases in the magnitude and frequency of natural disasters, such as floods and droughts, to the extinction of species and the spread of vector-borne diseases. However, the effects of climate change are not equally observed across the globe. As a matter of fact, developing countries are much more vulnerable to climate change-related hazards, mainly due to their lack of proper infrastructure. For instance, Pakistan has suffered significant economic losses in the past 6 to 7 years as a result of the recent increases on the melting rate of South Asia’s glaciers, which leads to more frequent and severe floods.
Pakistan has a population of around 208 million people encompassing an area of approximately 796,000 km2. The country is bounded by the Himalayan Mountains to the north, India to the east, Iran to the west, and the Indian Ocean to the south (Figure 1). Its altitudes vary from 8500 m in the northern regions to 0 m in the coastal regions, thus having a strong orographic influence over monsoon winds. Pakistan’s climate is usually considered hot and dry, being classified as semi-arid in the south and dry cold in the north by means of Koppen climate classes [63].
The Indus River and its tributaries, i.e., Sutlej, Beas, Ravi, Chenab, Jhelum, Swat, and Kabul, are vital to the economy of Pakistan, as they are the main source of water for irrigation, industry, and urban water supply. However, this river network is also responsible for economic losses through large flood events, most frequent in the Punjab and Sindh regions, affecting not only fertile agricultural lands, but also large urban centers near the river network. This situation is aggravated when considering future climate change scenarios, as higher variabilities in precipitation and glacier melting are projected, as well as rises in sea level and storm surges, leading to stress of current drainage network systems, especially during the monsoon season.
In this study, the annual maximum peak flows data of 18 sites of Pakistan, located on five rivers, namely, Indus, Jhelum, Chenab, Ravi, and Sutlej, were used in this study. The geographical location of these river sites is given in Figure 1. The data for these sites were collected from the hydrology department of the Water and Power Development Authority (WAPDA) and the Federal Flood Commission. Most of the annual maximum peak flows at sites were recorded in the peak of the monsoon season (from July to September). The annual peak flow discharge of 18 stations was measured in cusec. Summary statistics for 18 stations are given in Table 1. The highest mean peak flow discharge was recorded at the Guddu station (609,909.423), and the lowest mean peak flow discharge was observed at Islam station (49,089.45). The highest standard deviations were observed at Kotri, Sukkur, Guddu, Qadirabad, Khanki, and Trimmu stations. The range of skewness varied from 0.552 to 4.240, while Guddu had the smallest skewness, and Mangla had the largest skewness. Similarly, the kurtosis varied from −0.645 to 22.770, with Mangle, Kotri, Rasul, and Tarbela having the highest Kurtosis, while the lowest kurtosis was observed at Sukkur, Guddu, Qadirabad, and Marala stations.

4. Results and Discussion

The AD test, KS test, and CVM test were applied to each station in order to choose the best-fit PDF among the GEV, GLO, and GPA. The selection of the best fitted PDF for each station was based on the smallest GOF tests among the three PDFs. The best fitted PDF results for each station according to GOF tests at a 5% significance level are reported in Table 2.
In the case of η = 0, it is clear from Table 2 that GPA and GEV distributions were best fitted for seven stations, while GLO was best fitted for four stations according to the AD test. Similarly, the KS test results well matching the results of the AD test except for Taunsa, Guddu, Khanki, and Panjnad stations. According to the KS test results for η = 0, the most appropriate PDFs were GPA and GEV. Investigating the CVM test results for η = 0, the GEV distribution was suitable for most stations, followed by GPA distribution, which was selected for six stations. Overall, in the case of η = 0, the results of the AD test, KS test, and CVM test indicated that the GEV and GPA distributions were the most adequate for most stations.
Moving forward to η = 1 in Table 2, the AD test selected the GEV distribution for the highest number of stations (10 stations out of 18) followed by GPA distribution, which was selected for five stations. However, the KS test and CVM test selected the GEV and GPA distributions for the same number of stations, 8 and 7, respectively. Considering AD test, KS test, and CVM test results for η = 1, the GEV distribution was selected for eight stations, the GPA distribution for seven stations, and the GLO distribution for only three stations. Finally, for η = 2, it is observed from Table 2 that the AD test and CVM test selected GPA distribution for eight stations, GEV distribution for seven stations, and GLO distribution for the remaining four stations. On the other hand, GPA and GEV distributions each yielded the best-fit for seven stations based on the KS test.
It is also worthwhile to mention that the GOF test produced different results for Mangla, Rasul, and Panjnad stations as we increased the value η = 1, 2. However, in general, increasing η = 1, 2 had no effect on the results for most stations.
The LH-ratio diagram is a useful tool that simplifies analysis, demonstrating the versatility of how various PDFs plot. Furthermore, it can be shown that PDFs can have several different skewness and kurtosis values, rendering them more valuable for analyzing the shape of the distribution. The LH-ratio diagram for η = 0, η = 1, and η = 2 of 18 stations was plotted in Figure 2. It is observed from Figure 2, for η = 0, that most of the scatter points were between the GEV and GPA distributions curves, whereas a few scattered points were closed to the GLO distribution curve. Therefore, according to the L-ratio diagram (η = 0), GEV and GPA were the most suitable PDFs for the annual peak flow series of 18 stations. By observing Figure 2 in the case of η = 1 and η = 2, most of scattered points were closed to the GPA distribution curve, followed by GEV. We also note from Figure 2 that as we increased η = 1, 2, peak flow series tended to follow GPA and GEV distributions. Overall, the findings obtained from GOF tests were generally in good agreement with the LH-ratio diagram for most stations.
Further the relation between the return period and APFD was also established for GEV, GLO, and GPA distributions. Figure 3 shows the curves for Balloki, Taunsa, and Islam stations, highlighting how well the APFD series at lower return periods and upper return periods were estimated by LH-moments (η = 0, 1, 2). It is seen from Figure 3 that the GEV, GPA, and GLO distributions well fitted the observed APFD series at lower and higher return periods. Figure 3 indicates that as the level of LH-moments increased (η = 1, 2), the GEV, GLO, and GPA distributions performed well in reflecting the extreme tail at higher return periods. In Figure 3, it is noted that most of the observations fell within 2–50 years (0.02 ≤ p ≤ 0.5), implying that hazardous flood events with low probability or large return periods (100 and 500) have rarely occurred at these stations.
In the planning and design of hydrological systems, it is critical to determine the return period since a given flood event. Further, we calculated quantiles for different return periods using LH-moments (η = 0, 1, 2) and nonparametric kernel functions; results for Tarbela, Kalabagh, Qadirabad, and Trimmu stations are reported in Figure 4. The results of quantile estimates in Figure 4 can be interpreted as follows: for Tarbela station’s return period of 500 years, the GEV distribution (η = 0) produced quantile (847,553.3) is the threshold value of flow that may occur once every 500 years on average. In other words, there is only a 0.2% chance that in a return period of 500 years, one-time discharge (peak flow) will exceed the threshold value (847,553.3) and consequently a flood will occur. At the same time, 99.9% is the chance that the one-time discharge (peak flow) will be less than the threshold value (847,553.3) in a return period of 500 years.
We also investigated the impact of LH-moment choice (η = 0, 1, 2) and nonparametric kernel functions on estimating quantiles associated with predefined return periods via relative absolute error (RAE). The RAE is an assessment of the difference between the actual flood estimate and the flood estimate by the best-fit PDF. The RAE was calculated using the following equation, described in [35,48,64].
R A E = | X Y Y |  
where X is the actual peak flow, and Y denotes the design quantile estimate obtained through the LH-moments and the nonparametric kernel function.
Table 3 and Table 4 summarize the RAE associated with each station using LH-moments and a nonparametric kernel function, emphasizing the significance of evaluating techniques for diverse return periods. It is crucial to investigate the discrepancy between actual flood estimates and quantiles produced via PDF and the nonparametric kernel function. Although all these PDFs passed GOF testing, there were still considerable discrepancies in quantile estimations. These discrepancies are significant to policymakers, planners, and decision-makers.
As shown in Table 3, the GLO, GEV, and GPA distributions produced very small errors for all stations. In Table 3, the results of GEV distribution for Tarbela, Guddu, Kotri, Khanki, Balloki, and Sidhani stations based on L2-moment (η = 2) produced a small error for all return periods. However, it was noticed that for return periods of 5 years, the GEV distribution provided the same result for Kotri and Guddu stations based on L1-moment (η = 1) and L2-moment (η = 2).
According to Table 3, the RAE findings for return periods (2, 5, and 10 years) for GLO distribution based on L-moment (η = 0), L1-moment (η = 1) and L2-moment (η = 2) for Kalabagh, Chashma, and Taunsa stations had a fairly close error, whereas for the remaining return periods, L2-moment (η = 2) had a clear edge over the L-moment (η = 0) and L1-moment (η = 1). It was also observed, as seen in Table 3, that the GPA distribution using L2-moment (η = 2) produced a minimal error for all return periods for Sukkur, Marala, Qadirabad, Sulemanki, and Islam stations.
Table 3 indicates that for Mangla station, the RAE for GLO and GEV distributions were estimated through L-moment (η = 0), L2-moment (η = 2) and L1-moment (η = 1), respectively; however, the GLO and GEV distributions produced the same amount of error for the return periods (2, 5 and 10 years) when using the L2-moment (η = 2) and L1-moment (η = 1), but the L2-moment (η = 2) had the edge for return the periods of 20, 50, 100, and 500 years. Similarly, the findings for Rasul and Panjnad stations indicated that GPA distribution using L2-moment (η = 2) had lower values of RAE at all return periods than did GEV distribution when using L-moment (η = 0) and L1-moment (η = 1).
Moreover, as can be seen in Table 3, the findings of RAE for all stations using L-moment (η = 0), L1-moment (η = 1), and L2-moment (η = 2) at the return periods of (2, 5, 10, 20 years) were close, but L1-moment (η = 1) and L2-moment (η = 2) had a slight advantage over L-moment (η = 0). However, for high return periods (50, 100, 500 years), the L2-moment (η = 2) performed better than the L1-moment (η = 1) and L-moment (η = 0). We also observed, as seen in Table 3, that with the increasing level of LH-moments (η = 0, 1, 2), the error became smaller, especially for high return periods. In other words, the L2-moment (η = 2) yielded the lowest error compared to the L1-moment (η = 1) and L-moment (η = 0). Additionally, it was found that there were a few overlaps among these PDFs for certain return periods. For example, for the Mangla station at return periods of 2, 5, and 10 years, the GEV and GLO distributions yielded the same amount of error (0.016, 0.018, and 0.026). This implies that the performance of the PDFs was the same for a certain return period.
Table 4 compares quantile estimates in terms of RAE for Epanechnikov, Gaussian, Biweight, and Triweight kernel functions for all stations. It is evident from Table 4 that the Gaussian kernel function had the lowest RAE throughout all return periods among the Epanechnikov, Biweight, and Triweight kernel functions for Tarbela, Kalabagh, Taunsa, Guddu, Sukkur, Kotri, Mangla, Rasul, Marala, Khanki, Qadirabad, Trimmu, Sidhani, Sulemanki, and Islam stations.
The results in Table 4 for Chashma station indicate that the Triweight kernel function had a low RAE for the return period of 2 years, followed by the Biweight and Epanechnikov kernel functions; however, the Gaussian kernel function performed better for the return periods of 5, 10, 20, 50, 100, and 500 years. Similarly, the findings for the Balloki station revealed that the Biweight kernel function had an edge at the return periods of 2 and 5 years. Additionally, we also observed that the Epanechnikov kernel function had a lower RAE than Gaussian, Biweight, and Triweight kernel functions for the Panjnad station at a return period of 2 years, whereas the Epanechnikov, Biweight, and Triweight kernel functions performed equally well for the return period of 5 years. In accordance with the findings of the previous study [55], the results of our investigation demonstrated that among nonparametric kernel functions, the Gaussian kernel function performed best for the observed flood.
Finally, to evaluate the performance of the LH-moments (η = 0, 1, 2), the nonparametric kernel functions in terms of the RAE measures of quantile estimates were calculated and given in Table 3 and Table 4. The findings reveal that the LH-moments (η = 0, 1, 2) led to more accurate estimates for most of the stations than did the nonparametric kernel function. On the other hand, it was also noted that the nonparametric kernel function performed better than LH-moments (η = 0, 1, 2) for Kalabagh, Chashma, and Guddu stations at return periods of 2, 5, and 10 years. Besides that, among nonparametric kernel functions, the Gaussian kernel function provides very close estimates for smaller return periods as compared to LH-moments (η = 0). Similar findings were reported by Adamowski et al. [52], who evaluated L-moments and nonparametric methods for the annual maxima and partial duration flood series. However, in the case of L2-moments (η = 2), we found significant differences, specifically in the higher quantile estimates, with nonparametric kernel functions. This ensures that the L2-moments (η = 2) accurately estimate the extreme quantiles for the current dataset compared to any other approach considered in this work.

5. Conclusions

Estimating quantiles is a widespread practice in hydrology, and it is often used in the planning, design, and operation of a hydraulic system. In this study, we employed LH-moments (η = 0, 1, 2) and a nonparametric kernel function to estimate the peak flow series at 18 stations in Punjab, Pakistan. Based on the findings of this study, the following conclusions may be drawn:
Main findings of the paper: The findings of the AD test, KS test, CVM test, and LH-ratio diagram indicate that the best fits PDFs for estimating peak-flow data are GPA, followed by GEV and GLO distributions. It was identified that by raising the value of (η = 1, 2) in the LH-moments, the GOF test produced different findings for Mangla, Rasul, and Panjnad stations; nevertheless, increasing (η = 1, 2) did not affect the results for the rest of the stations. The magnitudes of quantile estimates obtained using the nonparametric kernel function are greater than those obtained through LH-moments (η = 0, 1, 2). Overall, the LH-moments (η = 0, 1, 2) accurately estimate the quantile in terms of RAE for most of the stations; however, for Kalabagh, Chashma, and Guddu stations at return periods of 2, 5, and 10 years, nonparametric kernel function provide smaller RAE than LH-moments (η = 0, 1, 2). The L-moment (η = 0), L1-moment (η = 1), and L2-moment (η = 2) provide relatively close estimates of quantile errors for all stations at the return periods of 2, 5, 10, and 20 years; moreover, L2-moments (η = 2) yielded the lowest error for the higher return period of 50, 100, and 500 years among L-moment (η = 0), L1-moments (η = 1), and nonparametric kernel functions. We also found that among nonparametric kernel functions for small return periods, the Gaussian kernel function provides a very close estimate compared to LH-moments (η = 0, 1, 2).
Limitations of this work and future research: Further research is needed on nonparametric kernel functions, specifically for large return periods, to improve the results in terms of RAE. This is the first application of nonparametric kernel functions in flood frequency analysis of 18 stations in Punjab, Pakistan. This research may be expanded by integrating all of Pakistan’s river gauging stations in order to determine the best estimation methods for the whole country.
Broader impacts: The findings of this research will enhance recommendations for future development to preserve current infrastructure and minimize economic damage due to floods. Additionally, the findings will also aid in designing and implementing flood mitigation measures, such as more effective stormwater management.

Author Contributions

Conceptualization, M.F. and J.R.; Data curation, M.F.; Formal analysis, M.F.; Funding acquisition, T.Y.; Investigation, M.F., J.R. and L.C.; Methodology, M.F. and T.Y.; Software, M.F. and F.C.; Supervision, T.Y; Validation, L.C. and T.Y.; Visualization, M.F. and F.C.; Writing—original draft, M.F. and F.C.; Writing—review & editing, F.C., J.R., L.C. and T.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Acknowledgments

The authors are grateful to the Water and Power Development Authority, Pakistan, and the Federal Flood Commission, Pakistan, for providing the required data.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Stedinger, J.R.; Griffis, V.W. Flood frequency analysis in the United States: Time to update. J. Hydrol. Eng. 2008, 13, 199–204. [Google Scholar] [CrossRef] [Green Version]
  2. De Michele, C.; Salvadori, G.; Canossi, M.; Petaccia, A.; Rosso, R. Bivariate statistical approach to check adequacy of dam spillway. J. Hydrol. Eng. 2005, 10, 50–57. [Google Scholar] [CrossRef]
  3. Haddad, K.; Rahman, A. Regional flood frequency analysis in eastern Australia: Bayesian GLS regression-based methods within fixed region and ROI framework—Quantile regression vs. parameter regression technique. J. Hydrol. 2012, 430, 142–161. [Google Scholar] [CrossRef]
  4. Cunnane, C. Statistical distributions for flood frequency analysis. In Operational Hydrology Report; WMO: Geneva, Switzerland, 1989. [Google Scholar]
  5. Bobée, B.; Cavadias, G.; Ashkar, F.; Bernier, J.; Rasmussen, P. Towards a systematic approach to comparing distributions used in flood frequency analysis. J. Hydrol. 1993, 142, 121–136. [Google Scholar] [CrossRef]
  6. Haddad, K.; Rahman, A. Selection of the best fit flood frequency distribution and parameter estimation procedure: A case study for Tasmania in Australia. Stoch. Hydrol. Hydraul. 2010, 25, 415–428. [Google Scholar] [CrossRef]
  7. Hamed, K.; Rao, A.R. Flood Frequency Analysis; CRC Press: Boca Raton, FL, USA, 2019. [Google Scholar]
  8. Ahmad, I.; Fawad, M.; Mahmood, I. At-site flood frequency analysis of annual maximum stream flows in Pakistan using robust estimation methods. Pol. J. Environ. Stud. 2015, 24, 2345–2353. [Google Scholar] [CrossRef]
  9. Ahmad, I.; Fawad, M.; Akbar, M.; Abbass, A.; Zafar, H. Regional frequency analysis of annual peak flows in Pakistan using linear combination of order statistics. Pol. J. Environ. Stud. 2016, 25, 2255–2264. [Google Scholar] [CrossRef]
  10. Hosking, J.R.M. L-moments: Analysis and estimation of distributions using linear combinations of order statistics. J. R. Stat. Soc. Ser. B Stat. Methodol. 1990, 52, 105–124. [Google Scholar] [CrossRef]
  11. Wang, Q.J. LH moments for statistical analysis of extreme events. Water Resour. Res. 1997, 33, 2841–2848. [Google Scholar] [CrossRef]
  12. Hussain, Z.; Pasha, G.R. Regional flood frequency analysis of the seven sites of Punjab, Pakistan, using L-moments. Water Resour. Manag. 2008, 23, 1917–1933. [Google Scholar] [CrossRef]
  13. Hussain, Z. Application of the regional flood frequency analysis to the upper and lower basins of the Indus River, Pakistan. Water Resour. Manag. 2011, 25, 2797–2822. [Google Scholar] [CrossRef]
  14. Afreen, S.; Muhammad, F. Flood frequency analysis of various dams and barrages in Pakistan. Irrig. Drain. 2011, 61, 116–128. [Google Scholar] [CrossRef]
  15. Lee, S.H.; Maeng, S.J. Comparison and analysis of design floods by the change in the order of LH-moment methods. Irrig. Drain. J. Int. Comm. Irrig. Drain. 2003, 52, 231–245. [Google Scholar] [CrossRef]
  16. Hewa, G.A.; Wang, Q.J.; McMahon, T.; Nathan, R.J.; Peel, M.C. Generalized extreme value distribution fitted by LH moments for low-flow frequency analysis. Water Resour. Res. 2007, 43, W06301. [Google Scholar] [CrossRef] [Green Version]
  17. Meshgi, A.; Khalili, D. Comprehensive evaluation of regional flood frequency analysis by L- and LH-moments. I. A revisit to regional homogeneity. Stoch. Environ. Res. Risk Assess. 2009, 23, 119–135. [Google Scholar] [CrossRef]
  18. Meshgi, A.; Khalili, D. Comprehensive evaluation of regional flood frequency analysis by L- and LH-moments. II. Development of LH-moments parameters for the generalized Pareto and generalized logistic distributions. Stoch. Hydrol. Hydraul. 2008, 23, 137–152. [Google Scholar] [CrossRef]
  19. Bhuyan, A.; Borah, M.; Kumar, R. Regional flood frequency analysis of North-Bank of the River Brahmaputra by using LH-moments. Water Resour. Manag. 2009, 24, 1779–1790. [Google Scholar] [CrossRef]
  20. Gheidari, M.H.N. Comparisons of the L- and LH-moments in the selection of the best distribution for regional flood frequency analysis in Lake Urmia Basin. Civ. Eng. Environ. Syst. 2013, 30, 72–84. [Google Scholar] [CrossRef]
  21. Ahmad, I.; Abbass, A.; Saghir, A.; Fawad, M. Finding probability distributions for annual daily maximum rainfall in Pakistan using linear moments and variants. Pol. J. Environ. Stud. 2016, 25, 925–937. [Google Scholar] [CrossRef]
  22. Shabri, A. Comparisons of the LH moments and the L moments. Matematika 2002, 18, 33–43. [Google Scholar]
  23. Deka, S.C.; Borah, M.; Kakaty, S.C. Statistical analysis of annual maximum rainfall in North-East India: An application of LH-moments. Theor. Appl. Climatol. 2010, 104, 111–122. [Google Scholar] [CrossRef]
  24. Adamowski, K. Nonparametric kernel estimation of flood frequencies. Water Resour. Res. 1985, 21, 1585–1590. [Google Scholar] [CrossRef]
  25. Adamowski, K. A Monte Carlo comparison of parametric and nonparametric estimation of flood frequencies. J. Hydrol. 1989, 108, 295–308. [Google Scholar] [CrossRef]
  26. Adamowski, K.; Feluch, W. Nonparametric flood-frequency analysis with historical information. J. Hydraul. Eng. 1990, 116, 1035–1047. [Google Scholar] [CrossRef]
  27. Schuster, E.; Yakowitz, S. Parametric/nonparametric mixture density estimation with application to flood-frequency analysis. (JAWRA) J. Am. Water Resour. Assoc. 1985, 21, 797–804. [Google Scholar] [CrossRef]
  28. Adamowski, K.; Labatiuk, C. Estimation of flood frequencies by a nonparametric density procedure. In Hydrologic Frequency Modeling; Springer: Berlin/Heidelberg, Germany, 1987; pp. 97–106. [Google Scholar] [CrossRef]
  29. Bardsley, W. Using historical data in nonparametric flood estimation. J. Hydrol. 1989, 108, 249–255. [Google Scholar] [CrossRef]
  30. Guo, S.L. Nonparametric variable kernel estimation with historical floods and paleoflood information. Water Resour. Res. 1991, 27, 91–98. [Google Scholar] [CrossRef]
  31. Moon, Y.-I.; Lall, U.; Bosworth, K. A comparison of tail probability estimators for flood frequency analysis. J. Hydrol. 1993, 151, 343–363. [Google Scholar] [CrossRef]
  32. Moon, Y.-I.; Lall, U. Kernel quantite function estimator for flood frequency analysis. Water Resour. Res. 1994, 30, 3095–3103. [Google Scholar] [CrossRef]
  33. Silverman, B.W. Density Estimation for Statistics and Data Analysis; CRC Press: Boca Raton, FL, USA, 1986. [Google Scholar]
  34. Lall, U.; Moon, Y.-I.; Bosworth, K. Kernel flood frequency estimators: Bandwidth selection and kernel choice. Water Resour. Res. 1993, 29, 1003–1015. [Google Scholar] [CrossRef]
  35. Cassalho, F.; Beskow, S.; de Mello, C.; de Moura, M.M.; Kerstner, L.; Ávila, L.F. At-site flood frequency analysis coupled with multiparameter probability distributions. Water Resour. Manag. 2017, 32, 285–300. [Google Scholar] [CrossRef]
  36. Lall, U. Recent advances in nonparametric function estimation: Hydrologic applications. Rev. Geophys. 1995, 33, 1093–1102. [Google Scholar] [CrossRef]
  37. Ahmad, I.; Almanjahie, I.M.; Hameedullah; Chikr-Elmezouar, Z.; Laksaci, A. Artificial neural network modeling for annual peak flows: A case study. Appl. Ecol. Environ. Res. 2019, 17, 6917–6935. [Google Scholar] [CrossRef]
  38. Khan, M.; Hussain, Z.; Ahmad, I. A comparison of quadratic regression and artificial neural networks for the estimation of quantiles at ungauged sites in regional frequency analysis. Appl. Ecol. Environ. Res. 2019, 17, 6937–6959. [Google Scholar] [CrossRef]
  39. Khan, M.; Hussain, Z.; Ahmad, I. Regional flood frequency analysis, using L-moments, artificial neural networks and OLS regression, of various sites of Khyber-Pakhtunkhwa, Pakistan. Appl. Ecol. Environ. Res. 2021, 19, 471–489. [Google Scholar] [CrossRef]
  40. Zelterman, D. Parameter estimation in the generalized logistic distribution. Comput. Stat. Data Anal. 1987, 5, 177–184. [Google Scholar] [CrossRef]
  41. Zelterman, D. Order statistics of the generalized logistic distribution. Comput. Stat. Data Anal. 1988, 7, 69–77. [Google Scholar] [CrossRef]
  42. Reed, D. Flood Estimation Handbook: Overview; Institute of Hydrology Wallingford: Wallingford, UK, 1999. [Google Scholar]
  43. Ashkar, F.; Mahdi, S. Fitting the log-logistic distribution by generalized moments. J. Hydrol. 2006, 328, 694–703. [Google Scholar] [CrossRef]
  44. Jenkinson, A.F. The frequency distribution of the annual maximum (or minimum) values of meteorological elements. Q. J. R. Meteorol. Soc. 1955, 81, 158–171. [Google Scholar] [CrossRef]
  45. Pickands, J., III. Statistical inference using extreme order statistics. Ann. Stat. 1975, 3, 119–131. [Google Scholar]
  46. Hosking, J.; Wallis, J. An index flood procedure for regional rainfall frequency analysis. EOS Trans. Am. Geophys. Union 1987, 68, 312. [Google Scholar]
  47. Singh, V.P.; Guo, H. Parameter estimation for 3-parameter generalized pareto distribution by the principle of maximum entropy (POME). Hydrol. Sci. J. 1995, 40, 165–181. [Google Scholar] [CrossRef] [Green Version]
  48. Beskow, S.; Caldeira, T.L.; de Mello, C.R.; Faria, L.C.; Guedes, H.A.S. Multiparameter probability distributions for heavy rainfall modeling in extreme southern Brazil. J. Hydrol. Reg. Stud. 2015, 4, 123–133. [Google Scholar] [CrossRef] [Green Version]
  49. Heo, J.-H.; Shin, H.; Nam, W.; Om, J.; Jeong, C. Approximation of modified Anderson–Darling test statistics for extreme value distributions with unknown shape parameter. J. Hydrol. 2013, 499, 41–49. [Google Scholar] [CrossRef]
  50. Csörgő, S.; Faraway, J.J. The exact and asymptotic distributions of Cramér-von Mises statistics. J. R. Stat. Soc. Ser. B Stat. Methodol. 1996, 58, 221–234. [Google Scholar] [CrossRef]
  51. Parzen, E. On estimation of a probability density function and mode. Ann. Math. Stat. 1962, 33, 1065–1076. [Google Scholar] [CrossRef]
  52. Adamowski, K.; Liang, G.-C.; Patry, G.G. Annual maxima and partial duration flood series analysis by parametric and non-parametric methods. Hydrol. Process. 1998, 12, 1685–1699. [Google Scholar] [CrossRef]
  53. Quintela-Del-Rio, A. On bandwidth selection for nonparametric estimation in flood frequency analysis. Hydrol. Process. 2011, 25, 671–678. [Google Scholar] [CrossRef]
  54. Francisco-Fernández, M.; Quintela-Del-Río, A. Comparing simultaneous and pointwise confidence intervals for hydrological processes. PLoS ONE 2016, 11, e0147505. [Google Scholar] [CrossRef]
  55. Vittal, H.; Singh, J.; Kumar, P.; Karmakar, S. A framework for multivariate data-based at-site flood frequency analysis: Essentiality of the conjugal application of parametric and nonparametric approaches. J. Hydrol. 2015, 525, 658–675. [Google Scholar] [CrossRef]
  56. Rajagopalan, B.; Lall, U.; Tarboton, D.G. Evaluation of kernel density estimation methods for daily precipitation resampling. Stoch. Hydrol. Hydraul. 1997, 11, 523–547. [Google Scholar] [CrossRef]
  57. Adamowski, K. Regional analysis of annual maximum and partial duration flood data by nonparametric and L-moment methods. J. Hydrol. 2000, 229, 219–231. [Google Scholar] [CrossRef]
  58. Altman, N.; Léger, C. Bandwidth selection for kernel distribution function estimation. J. Stat. Plan. Inference 1995, 46, 195–214. [Google Scholar] [CrossRef] [Green Version]
  59. Quintela-del-Río, A.; Estévez-Pérez, G. Nonparametric kernel distribution function estimation with kerdiest: An R package for bandwidth choice and applications. J. Stat. Softw. 2012, 50, 1–21. [Google Scholar] [CrossRef] [Green Version]
  60. Polansky, A.M.; Baker, E.R. Multistage plug—in bandwidth selection for kernel distribution function estimates. J. Stat. Comput. Simul. 2000, 65, 63–80. [Google Scholar] [CrossRef]
  61. Bowman, A.; Hall, P.; Prvan, T. Bandwidth selection for the smoothing of distribution functions. Biometrika 1998, 85, 799–808. [Google Scholar] [CrossRef]
  62. Quintela-del-Rı, A.; Francisco-Fernández, M. Nonparametric functional data estimation applied to ozone data: Prediction and extreme value analysis. Chemosphere 2011, 82, 800–808. [Google Scholar] [CrossRef]
  63. Sarfaraz, S.; Arsalan, M.H.; Fatima, H. Regionalizing the climate of Pakistan using Köppen classification system. Pak. Geogr. Rev. 2014, 69, 111–132. [Google Scholar]
  64. Fawad, M.; Yan, T.; Chen, L.; Huang, K.; Singh, V.P. Multiparameter probability distributions for at-site frequency analysis of annual maximum wind speed with L-moments for parameter estimation. Energy 2019, 181, 724–737. [Google Scholar] [CrossRef]
Figure 1. Geographical locations of the eighteen sites of Punjab, Pakistan, used in this study.
Figure 1. Geographical locations of the eighteen sites of Punjab, Pakistan, used in this study.
Entropy 24 00898 g001
Figure 2. LH-ratio diagram for 18 stations.
Figure 2. LH-ratio diagram for 18 stations.
Entropy 24 00898 g002
Figure 3. Relation between return period and annual peak flow based on the LH-moments.
Figure 3. Relation between return period and annual peak flow based on the LH-moments.
Entropy 24 00898 g003
Figure 4. Quantiles for best-fit PDFs and nonparametric kernel functions for four randomly selected stations.
Figure 4. Quantiles for best-fit PDFs and nonparametric kernel functions for four randomly selected stations.
Entropy 24 00898 g004
Table 1. Summary statistics of 18 stations.
Table 1. Summary statistics of 18 stations.
Name of StationsRiverLatitude (North)Longitude (East)Period (Years)MeanStandard DeviationSkewnessKurtosisMinimum Peak FlowMaximum Peak Flow
TarbelaIndus33.9972.611960–2013386,962.96387,785.5372.62611.806273,000835,000
KalabaghIndus32.9571.501968–2013464,719.956151,843.3631.1862.102237,297936,453
ChashmaIndus32.4371.381971–2013475,333.046149,635.2741.223.727214,0451,038,873
TaunsaIndus30.5070.801958–2013452,791.554140,793.1020.8042.144182,372959,991
GudduIndus28.3069.501962–2013609,909.423284,534.4130.552−0.557170,8311,176,150
SukkurIndus27.7268.791982–2013546,609.594309,470.5190.629−0.645126,1301,172,000
KotriIndus25.2268.221970–2013395,262.068379,599.3333.70518.29047,1002,409,000
ManglaJhelum33.1573.651960–2013132,481.778136,385.2974.24022.77020,460932,700
RasulJhelum32.6873.501970–2013134,418.386161,219.5963.58215.78719,702952,170
MaralaChenab32.6874.431960–2013308,572.407196,419.2721.0970.22793,150792,765
KhankiChenab32.4073.921925–2013351,963.191242,710.6331.4941.39197,0581,086,460
QadirabadChenab32.3373.731970–2013356,547.704247,771.9981.0300.10676,336948,520
TrimmuChenab31.1472.151968–2013261,376.217194,828.9611.0990.169342,756706,433
PanjnadChenab29.3371.001960–2013260,134.722193,661.3390.9800.55417,833802,516
BallokiRavi31.2273.861922–201387,914.72864,039.5722.1836.18014,000399,356
SidhaniRavi30.5872.071925–201364,143.42756,691.8782.1594.9168488296,086
SulemankiSutlej30.3873.861975–201370,254.92384,914.1772.2675.8651506399,453
IslamSutlej29.8272.551974–201349,089.4563,209.7542.3626.4971231306,425
Table 2. GOF results for GEV, GLO, and GPA distributions using LH-moments (η = 0, 1, 2).
Table 2. GOF results for GEV, GLO, and GPA distributions using LH-moments (η = 0, 1, 2).
StationsL-Moments (η = 0)L1-Moments (η = 1)L2-Moments (η = 2)
AD TestKS TestCVM TestAD TestKS TestCVM TestAD TestKS TestCVM Test
TarbelaGEV(0.984)GEV(0.975)GEV(0.973)GEV(0.395)GEV(0.605)GEV(0.670)GEV(0.233)GEV(0.332)GLO(0.360)
KalabaghGLO(0.954)GLO(0.938)GLO(0.972)GLO(0.855)GLO(0.785)GLO(0.876)GLO(0.642)GLO(0.682)GLO(0.696)
ChashmaGLO(0.983)GLO(0.965)GLO(0.970)GLO(0.943)GLO(0.895)GLO(0.951)GLO(0.801)GLO(0.836)GLO(0.853)
TaunsaGLO(0.811)GEV(0.537)GLO(0.705)GLO(0.832)GEV(0.606)GLO(0.761)GLO(0.762)GLO(0.627)GLO(0.753)
GudduGEV(0.740)GLO(0.878)GEV(0.769)GEV(0.765)GLO(0.889)GEV(0.781)GEV(0.742)GEV(0.923)GEV(0.790)
SukkurGPA(0.990)GPA(0.978)GPA(0.992)GPA(0.944)GPA(0.962)GPA(0.960)GPA(0.963)GPA(0.988)GPA(0.971)
KotriGEV(0.974)GEV(0.837)GEV(0.924)GEV(0.947)GEV(0.821)GEV(0.916)GEV(0.859)GEV(0.831)GLO(0.887)
ManglaGLO(0.956)GLO(0.900)GLO(0.932)GEV(0.803)GEV(0.864)GEV(0.916)GEV(0.537)GLO(0.851)GLO(0.877)
RasulGEV(0.946)GEV(0.984)GLO(0.939)GEV(0.962)GEV(0.988)GEV(0.950)GPA(0.928)GEV(0.991)GPA(0.943)
MaralaGPA(0.969)GPA(0.973)GPA(0.974)GPA(0.758)GPA(0.823)GPA(0.880)GPA(0.735)GPA(0.875)GPA(0.787)
KhankiGEV(0.693)GPA(0.868)GEV(0.744)GEV(0.612)GEV(0.713)GEV(0.712)GEV(0.465)GEV(0.741)GEV(0.655)
QadirabadGPA(0.995)GPA(0.996)GPA(0.999)GEV(0.930)GPA(0.983)GPA(0.988)GPA(0.943)GPA(0.985)GPA(0.968)
TrimmuGPA(0.779)GPA(0.778)GPA(0.726)GEV(0.683)GPA(0.679)GPA(0.699)GEV(0.698)GPA(0.622)GPA(0.648)
PanjnadGPA(0.908)GEV(0.879)GEV(0.878)GPA(0.914)GPA(0.885)GPA(0.894)GPA(0.933)GPA(0.897)GPA(0.906)
BallokiGEV(0.582)GEV(0.624)GEV(0.551)GEV(0.486)GEV(0.621)GEV(0.517)GEV(0.325)GEV(0.621)GEV(0.480)
SidhaniGEV(0.978)GEV(0.990)GEV(0.974)GEV(0.971)GEV(0.992)GEV(0.969)GEV(0.933)GEV(0.982)GEV(0.957)
SulemankiGPA(0.996)GPA(0.994)GPA(0.998)GPA(0.998)GPA(0.991)GPA(0.998)GPA(0.999)GPA(0.990)GPA(0.998)
IslamGPA(0.900)GPA(0.753)GPA(0.877)GPA(0.931)GPA(0.693)GPA(0.885)GPA(0.936)GPA(0.715)GPA(0.886)
Table 3. RAE of quantile estimates for GEV, GLO, and GPA distributions using LH-moments (η = 0, 1, 2).
Table 3. RAE of quantile estimates for GEV, GLO, and GPA distributions using LH-moments (η = 0, 1, 2).
Station NameBest Fitted Distribution25102050100500
Tarbela(η = 0)0.0080.0090.0150.0280.0530.0770.156
GEV (η = 1)0.0040.0050.0080.0110.0170.0220.035
(η = 2)0.0030.0040.0060.0080.0110.0140.02
Kalabagh(η = 0)0.0110.0120.0230.0410.0720.1030.202
GLO (η = 1)0.0090.0120.0160.0210.030.0390.063
(η = 2)0.0090.0130.0150.0180.0240.0290.045
Chashma(η = 0)0.010.0120.0220.0360.0610.0850.156
GLO (η = 1)0.010.0140.0170.0220.030.0370.059
(η = 2)0.0110.0140.0160.0190.0250.0310.048
Taunsa(η = 0)0.0090.0120.0190.0290.0460.0610.106
GLO (η = 1)0.010.0140.0160.020.0270.0330.05
(η = 2)0.010.0130.0150.0190.0250.030.046
Guddu(η = 0)0.0160.0160.0250.0410.0650.0870.144
GEV (η = 1)0.0130.0150.0220.0320.0470.060.09
(η = 2)0.0120.0150.020.0280.0390.0470.067
Sukkur(η = 0)0.0230.0260.0320.0530.0860.1120.172
GPA (η = 1)0.0210.0240.030.0480.0760.0960.14
(η = 2)0.0180.0210.0270.0430.0660.0820.114
Kotri(η = 0)0.030.0410.0430.0810.1590.2350.483
GEV (η = 1)0.0190.0210.030.0470.0750.0990.163
(η = 2)0.0170.0210.0280.0390.0540.0670.096
ManglaGLO (η = 0)0.0420.0460.0730.0790.1590.2350.47
GEV (η = 1)0.0160.0180.0260.0420.0680.090.148
GLO (η = 2)0.0160.0180.0260.0370.0560.0720.119
RasulGEV (η = 0)0.0560.0670.0890.0960.1740.2560.505
GEV (η = 1)0.0220.0340.0360.0650.1130.1570.285
GPA (η = 2)0.0180.0220.0270.0440.0690.0870.124
Marala(η = 0)0.0220.0240.0310.0520.090.1240.214
GPA (η = 1)0.0170.0180.0250.0410.0690.0920.151
(η = 2)0.0130.0140.0190.0310.0490.0630.094
Khanki(η = 0)0.0210.0260.0360.0560.1120.1660.337
GEV (η = 1)0.0120.0170.0210.040.0730.1030.189
(η = 2)0.0090.010.0160.0250.040.0530.087
Qadirabad(η = 0)0.0250.0290.0340.0570.0980.1330.226
GPA (η = 1)0.0220.0230.0290.0480.0790.1050.166
(η = 2)0.0180.0190.0250.0410.0650.0820.119
Trimmu(η = 0)0.0270.0320.0350.060.1050.1450.253
GPA (η = 1)0.0220.0240.030.050.0830.1090.175
(η = 2)0.0170.020.0250.0420.0640.0790.109
PanjnadGEV (η = 0)0.020.0310.0350.0580.0990.1350.235
GPA (η = 1)0.0180.0260.030.0480.0710.0860.112
GPA (η = 2)0.0150.0220.0260.0440.0680.0790.096
Balloki(η = 0)0.0230.0260.0380.0560.1140.170.343
GEV (η = 1)0.0110.0140.020.0350.0610.0840.149
(η = 2)0.0080.010.0140.0210.0320.040.06
Sidhani(η = 0)0.0280.0280.0470.060.1230.1820.362
GEV (η = 1)0.0130.020.0230.0420.0730.0990.173
(η = 2)0.0120.0120.0190.030.0460.0590.093
Sulemanki(η = 0)0.0530.0590.0850.090.1730.2530.512
GPA (η = 1)0.0370.0420.0540.0760.1380.1930.355
(η = 2)0.0290.0390.0440.0690.1180.1580.263
Islam(η = 0)0.0580.0680.0910.0960.1750.2560.518
GPA (η = 1)0.0420.0440.0630.080.1490.2110.397
(η = 2)0.0310.0390.0460.070.1210.1640.278
Table 4. RAE of quantile estimates for Epanechnikov, Gaussian, Biweight, and Triweight kernel functions.
Table 4. RAE of quantile estimates for Epanechnikov, Gaussian, Biweight, and Triweight kernel functions.
Station NameKernel Function Type25102050100500
TarbelaEpanechnikov0.0270.0330.0460.0640.1820.3460.728
Gaussian0.0140.020.0270.0490.0790.120.24
Biweight0.0290.0470.0640.0870.2110.390.74
Triweight0.030.060.0810.1070.230.310.5
KalabaghEpanechnikov0.010.0240.1010.1240.20.230.33
Gaussian0.0040.010.0180.0230.0320.070.125
Biweight0.0060.0130.1250.1270.140.190.21
Triweight0.0140.0160.1450.1490.170.1980.24
ChashmaEpanechnikov0.0040.0780.0810.120.1830.2630.58
Gaussian0.0050.010.0240.0680.0880.20.534
Biweight0.0030.0550.1270.1520.2140.4340.63
Triweight0.0020.0290.1280.1780.2390.4880.678
TaunsaEpanechnikov0.0190.0890.0970.1010.1030.1210.2
Gaussian0.0140.0170.020.0250.0460.0670.167
Biweight0.0190.1150.1160.1220.1290.2220.29
Triweight0.0190.1340.1390.140.1530.2670.32
GudduEpanechnikov0.0130.0140.0230.0910.1820.3110.671
Gaussian0.0030.0050.0170.0420.110.2240.422
Biweight0.020.0230.0330.110.1960.3750.76
Triweight0.0250.0230.0540.1270.2150.460.845
SukkurEpanechnikov0.0320.0350.0690.1390.2160.2970.532
Gaussian0.0270.030.0350.060.10.190.383
Biweight0.0350.0410.0890.1710.3420.4410.72
Triweight0.0350.0460.0890.1990.4380.5640.783
KotriEpanechnikov0.0950.1310.1620.1910.2570.5010.732
Gaussian0.040.0550.0830.150.20.2950.527
Biweight0.050.110.130.1580.2220.450.69
Triweight0.0730.1240.150.1810.2450.4890.705
ManglaEpanechnikov0.1070.1270.1320.1540.2310.4760.845
Gaussian0.0510.0680.0990.1340.1980.3450.695
Biweight0.120.1530.1830.2030.2820.5230.912
Triweight0.1350.170.1850.2120.2920.5450.989
RasulEpanechnikov0.0690.0920.1050.150.3150.6051.21
Gaussian0.060.090.0990.130.2650.550.999
Biweight0.0830.1010.1630.1930.3860.711.421
Triweight0.0850.1050.1830.2130.4120.81.89
MaralaEpanechnikov0.030.0450.0550.080.120.2230.525
Gaussian0.0280.040.0530.0690.10.1930.412
Biweight0.0550.0750.0970.130.2740.4980.875
Triweight0.0670.0910.1040.1980.320.530.995
KhankiEpanechnikov0.0810.090.1240.1750.2430.4750.822
Gaussian0.0430.0750.1060.1350.2030.4220.79
Biweight0.0990.1090.1410.190.2750.490.918
Triweight0.1030.1160.1420.2030.3030.5031.116
QadirabadEpanechnikov0.0460.0610.0760.1220.170.3280.631
Gaussian0.0290.0520.0760.1050.1520.2980.608
Biweight0.0580.0730.0790.1390.1850.3470.675
Triweight0.0630.0790.0960.1510.1960.3650.692
TrimmuEpanechnikov0.0380.0830.1080.2440.3310.6441.976
Gaussian0.0250.0580.1010.1750.3050.5631.107
Biweight0.0330.0630.1060.2610.360.6822.19
Triweight0.0330.0730.1250.2730.360.7052.806
PanjnadEpanechnikov0.0340.0730.0950.1090.1360.2750.595
Gaussian0.0470.0570.0830.1050.1260.2340.498
Biweight0.0380.0730.1170.1280.1430.2830.607
Triweight0.0640.0730.1260.1560.170.3030.67
BallokiEpanechnikov0.0310.0470.0770.1190.1770.2190.445
Gaussian0.040.0490.0640.1080.1330.2120.414
Biweight0.0280.0430.0920.1240.1920.3240.59
Triweight0.0290.0460.1050.1350.2050.3350.67
SidhaniEpanechnikov0.0680.0810.0950.1550.2180.4020.851
Gaussian0.0410.0510.0750.1310.1970.3590.738
Biweight0.0850.0960.1050.1740.290.5070.907
Triweight0.0980.1040.1180.1920.3170.5410.942
SulemankiEpanechnikov0.0680.1120.1460.1820.2680.5731.165
Gaussian0.0630.0780.1120.1480.2330.4380.91
Biweight0.0710.0870.1370.1850.2710.5181.154
Triweight0.0710.0820.1320.1550.2510.5021.123
IslamEpanechnikov0.0990.1450.1910.2450.3990.7451.168
Gaussian0.0670.110.1540.210.3670.610.929
Biweight0.1090.1680.2170.2680.4090.7781.481
Triweight0.1180.190.260.3290.4180.8191.921
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Fawad, M.; Cassalho, F.; Ren, J.; Chen, L.; Yan, T. State-of-the-Art Statistical Approaches for Estimating Flood Events. Entropy 2022, 24, 898. https://0-doi-org.brum.beds.ac.uk/10.3390/e24070898

AMA Style

Fawad M, Cassalho F, Ren J, Chen L, Yan T. State-of-the-Art Statistical Approaches for Estimating Flood Events. Entropy. 2022; 24(7):898. https://0-doi-org.brum.beds.ac.uk/10.3390/e24070898

Chicago/Turabian Style

Fawad, Muhammad, Felício Cassalho, Jingli Ren, Lu Chen, and Ting Yan. 2022. "State-of-the-Art Statistical Approaches for Estimating Flood Events" Entropy 24, no. 7: 898. https://0-doi-org.brum.beds.ac.uk/10.3390/e24070898

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop