Next Article in Journal
Phosphorus and Nitrogen Yield Response Models for Dynamic Bio-Economic Optimization: An Empirical Approach
Previous Article in Journal
Barnyardgrass Root Recognition Behaviour for Rice Allelopathy
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Generating Improved Experimental Designs with Spatially and Genetically Correlated Observations Using Mixed Models

1
Department of Medicine, University of Florida, Gainesville, FL 32610, USA
2
School of Forest Resources and Conservation, University of Florida, Gainesville, FL 32611, USA
3
Gulf Coast Research and Education Center, University of Florida, Wimauma, FL 33598, USA
*
Author to whom correspondence should be addressed.
Submission received: 17 February 2018 / Revised: 23 March 2018 / Accepted: 29 March 2018 / Published: 30 March 2018

Abstract

:
The aim of this study was to generate and evaluate the efficiency of improved field experiments while simultaneously accounting for spatial correlations and different levels of genetic relatedness using a mixed models framework for orthogonal and non-orthogonal designs. Optimality criteria and a search algorithm were implemented to generate randomized complete block (RCB), incomplete block (IB), augmented block (AB) and unequally replicated (UR) designs. Several conditions were evaluated including size of the experiment, levels of heritability, and optimality criteria. For RCB designs with half-sib or full-sib families, the optimization procedure yielded important improvements under the presence of mild to strong spatial correlation levels and relatively low heritability values. Also, for these designs, improvements in terms of overall design efficiency (ODE%) reached values of up to 8.7%, but these gains varied depending on the evaluated conditions. In general, for all evaluated designs, higher ODE% values were achieved from genetically unrelated individuals compared to experiments with half-sib and full-sib families. As expected, accuracy of prediction of genetic values improved as levels of heritability and spatial correlations increased. This study has demonstrated that important improvements in design efficiency and prediction accuracies can be achieved by optimizing how the levels of a treatment are assigned to the experimental units.

1. Introduction

Designing an experiment is an essential stage in any research settings. Important planning decisions are taken in order to choose the most appropriate layout out of an array of design alternatives. Generating experimental designs relies on three basic principles: randomization, replication and blocking [1,2]. Replication enables estimation of experimental error variance and, adequate number of replicates provides with precise inferences. Randomization ensures that all experimental units are equally likely to receive any treatment, thus it minimizes systematic errors or bias induced by the experimenter. Finally, blocking controls for different sources of natural variation among experimental units, and when applied appropriately, controls for field variations and helps to reduce background noise. The generation of an optimal or near-optimal experimental design requires making best use of available information and resources with a goal of estimating statistical parameters of interest with the best accuracy and precision possible.
Plant breeders often conduct and analyze large field trials with the aim of selecting the best genotypes for future breeding [3,4], which is done by testing a large number of treatments (i.e., genotypes) in single or multiple locations. The basis for analysis of breeding trials relies on considering the genetic relationships between the genotypes. Often varied levels of relatedness, such as parent-offspring, half-sibs (where genotypes share a single parent) and full-sibs (where they share both parents) exist. Some of the common field experimental designs used in breeding include randomized complete block (RCB), incomplete block (IB) and row-column (RC) designs. RCB designs are, at present, the most common experimental layout used, due to its simplicity in terms of design and analysis. However, IB and RC designs are more adequate, particularly for large number of genotypes, as they control better for environmental spatial heterogeneity [5]. Nevertheless, these experimental layouts have restrictions and are often more complex to analyze [2].
Experimental treatments may not necessarily occur always with equal replications and, in addition, may not be equally represented in each block, as some treatments are available in larger quantities than others [6]. Thus, scarce treatments will be missing in some blocks while others will be represented in all blocks with one or more replications per block, resulting in uneqaully replicated (UR) designs. This occurs, often by chance but sometimes by choice, due to differential availability of seeds or plants, rates in fecundity, greenhouse survival, loss of experimental units, etc.
Several UR designs have been proposed [7,8,9,10,11]. An extreme case of these is the augmented block (AB) designs that are used mostly in early stages of breeding programs for evaluation of a large number of test treatments that are replicated only once, planted with control treatments that are replicated several times [7,8,9,10,11]. Another case, is the partially replicated (p-rep) designs where a portion ( p % ) of the test treatments are replicated at least twice, while the remaining treatments are replicated only once [10]. All analyses of UR designs use the replicated treatments to estimate background variability and make spatial adjustments for field heterogeneity, thus often requiring fitting an spatial model.
Field experiments are often characterized by varied levels of environmental heterogeneity. The standard traditional designs and their analyses often assume that residuals are uncorrelated. However, when experimental units are located in close proximities, and thus sharing microsite variability, their responses are likely to be more similar than those farther apart [12,13] inducing spatial correlation. This requires the implementation of spatial analysis by specifying and fitting an appropriate error correlation structure. Several error structures are available [14], a popular choice for breeding trials is the 2-dimensional separable autoregressive of first order [4,5,10,13,15].
Another type of dependency between experimental units, that is critical for breeding programs, is the genetic relationships that arises due to relatives sharing some common alleles. This correlation is specified by defining a matrix of correlations between genotypes to be incorporated into the statistical model. Genetic relationships can be calculated using genetic theory [16] based on pedigree information, or molecular data such as SNPs [17] to form what is known as the numerator relationship matrix [18].
Linear mixed models (LMM) allow for the incorporation of both spatial and genetic correlation structures, yielding to a more efficient use of the available information. LMM provide with estimates of variance components, which are used to calculate heritability, and to estimate best linear unbiased predictions of genetic effects (e.g., breeding values), which are used later to select outstanding genotypes. Hence, improvements in the design or analysis of breeding trials will translate into greater genetic gains as the best individuals are identified easier. For example, some studies, using spatial mixed models, have reported a positive impact on selection decisions and increased accuracy of genetic value predictions [19].
The generation of improved experimental layouts may require optimization routines, a process that often is computationally intensive. Several computer procedures [4,10,15] and software (such as CycDesigN, John and Williams [20]) have been developed for the generation of experimental designs. However, most of these methods use approximations of optimality criteria [10,15], or they only incorporate spatial correlations [20] or genetic correlations [21] into the optimization routine. In addition, none of the available routines to generate complex designs are flexible enough to incorporate both spatial and genetic correlations.

1.1. Optimality Criteria

Statistically, a design is considered to be optimal if it maximizes the amount of information extracted from a fixed number of experimental units. Here, an optimal criterion is required in the process of generating an optimal or near optimal experimental design, the choice of which to use depends on the objective of the experiment [22]. Most of these optimize a function of the variance-covariance matrix of treatment effects. The information matrix is used because standard errors of the mean (SEM) are calculated using the variance of treatment effects while standard error of differences (SED) are obtained from functions of variances and covariances. Information based A- and D-optimality criteria are perhaps the most frequently used [10,15,22,23,24,25,26].
A-optimality was first introduced by Chernoff [27] using the Fisher’s information variance-covariance matrix obtained under the framework of a fixed effects model. The objective function for A-optimality is:
A o p t = argmin { t r a c e [ M ( Ω ) ] }
where M ( Ω ) is the inverse of an information matrix (or variance-covariance matrix) of the treatment effects of a design Ω calculated from a given statistical model. More details are presented in Section 2.
A-optimality criterion seeks to minimize the sum of the diagonal elements (i.e., trace) of the variance-covariance matrix of treatment effects. For a treatment factor that is assumed fixed, minimizing the trace implies minimizing the average variance of the best linear unbiased estimators (BLUE) of treatment effects. For random treatment factor, this implies minimizing the average variance of the best linear unbiased predictors (BLUP) of treatment effects. A-optimality criterion is not scale invariant on the response variable; however, this does not affect blocked designs since treatments are on the same scale [22].
D-optimality was first introduced by Wald [28] with other researchers doing extensive work [29,30,31,32]. A design is D-optimal if it minimizes the determinant of M ( Ω ) , expressed as:
D o p t = argmin { | M ( Ω ) | } for | M ( Ω ) | 0 .
Minimizing the determinant of an inverse of an information matrix is equivalent to minimizing the generalized variance of the treatment effects [22]. Hence, this criterion chooses an optimal design for which the volume of the multivariate joint confidence ellipsoid is minimized [23]. A desirable property of this criterion is that it uses both the diagonal and off-diagonal values in the information matrix, it is scale invariant on the response variable and often computationally efficient [22].

1.2. Study Objectives

Designing an experiment is an essential stage in any research settings, which determines the precision of the final results of the study. Although, there is a need to address generation of optimal experimental designs, focus has always been on the statistical analyses side with limited progress done on developing algorithms to generate optimal or near-optimal designs.
The main objective of this study is to develop statistical routines and evaluate computational and algorithmic procedures to generate optimal or near optimal field experimental designs. This is done by using a linear mixed model framework that considers simultaneously genetic and spatial dependencies between experimental units, as those found in typical plant breeding trials. A secondary objective is to evaluate and assess prediction accuracies of genetic effects and the estimation of heritabilities for improved and unimproved experimental designs. Statistical models were formulated for RCB, IB, AB and UR designs to evaluate a proposed pairwise swap algorithm on the generation of improved experimental designs for both A- and D-optimality criteria. These were evaluated under field conditions with varying levels of heritability, genetic relatedness, and spatial correlations. Genetic relationships were incorporated with a numerator relationship matrix calculated from pedigree information, and spatial correlations were modeled using a 2-dimensional separable autoregressive first order variance structure (AR1).

2. Materials and Methods

2.1. Statistical Model for RCB Designs

For RCB designs, the LMM considered in this study assumes blocks as fixed effects and treatments (genotypes) as random effects. The latter are assumed random as they are a sample from a much larger population. This model is expressed as
y = X β + Zg + e
where y is a vector of observations; β is a vector of fixed effects (blocks), g is a vector of genetic random effects, and e is a vector of residual errors. Note here that the overall mean is abserved by the block effects. X and Z are full column rank incidence matrices for the block and treatment effects, respectively. The assumptions for the random effects are:
g e M V N 0 0 , G 0 0 R
with V = v a r ( y ) = ZGZ + R , where G and R are variance-covariance matrices for the genetic effects and residual errors, respectively. If treatment effects are assumed to be genetically unrelated then G = σ g 2 I , where σ g 2 is the additive genetic variance and I is an identity matrix. For related individuals, G = σ g 2 A , where A corresponds to the additive genetic numerator relationship matrix among individuals, derived from pedigree information [18]. If residual errors are assumed to be independent and identically distributed ( i i d ) then R = σ e 2 I , where σ e 2 is the residual variance. For the case when residual errors are assumed to be correlated, a 2-dimensional separable autoregressive of first order spatial error structure to model spatial variability along the rows and columns of the experimental layouts was used [33]. Here, R = σ e 2 Σ x ( ρ x ) Σ y ( ρ y ) with Σ x ( ρ x ) and Σ y ( ρ y ) the matrices with autocorrelation parameters ρ x and ρ y for rows and columns, respectively, and ⊗ is the Kronecker product. These are formed with V a r ( e i j ) = σ e 2 and C o v ( e i j , e i j ) = σ e 2 ρ x | d x | ρ y | d y | , where | d x | = | x i - x i | and | d y | = | y j - y j | are the row and column absolute distances, respectively. Note that this model does not consider a nugget effect, but this can be easily incorporated. For the above model, narrow-sense heritability, h 2 , is calculated as h 2 = σ g 2 / ( σ g 2 + σ e 2 ) . For simplicity, but without loss of generality, in later simulations σ e 2 = 1 - σ g 2 with h 2 = σ g 2 .
Variance components were estimated using Restricted Maximum Likelihood (REML) assuming that both g and e have multivariate normal distributions [34]. Estimation of BLUE and BLUP are done using mixed model equations [35] as:
X R ^ - 1 X X R ^ - 1 Z Z R ^ - 1 X Z R ^ - 1 Z + G ^ - 1 β g = X R ^ - 1 y Z R ^ - 1 y
β ^ g ^ = X R ^ - 1 X X R ^ - 1 Z Z R ^ - 1 X Z R ^ - 1 Z + G ^ - 1 - X R ^ - 1 y Z R ^ - 1 y = C 11 C 12 C 21 C 22 - X R ^ - 1 y Z R ^ - 1 y
= C 11 C 12 C 21 C 22 X R ^ - 1 y Z R ^ - 1 y = ( X V ^ - 1 X ) - 1 X V ^ - 1 y G ^ Z V - 1 ( y - X β ^ )
The computation of the variance-covariance matrix of treatment effects, M ( Ω ) , can be simplified by the following expression from [24], which is later used to apply the A- and D-optimality criteria.
M ( Ω ) = C 22 = V a r ( g ^ - g ) = ( Z R ^ - 1 Z + G ^ - 1 - Z R ^ - 1 X ( X R ^ - 1 X ) - 1 X R ^ - 1 Z ) - 1 = ( Z R ^ - 1 Z + G ^ - 1 - Z K x Z ) - 1
where K x = R ^ - 1 X ( X R ^ - 1 X ) - 1 X R ^ - 1 .

2.2. Statistical Model for Complex Designs

The complex field layouts considered in this study included IB, AB and UR designs, where both blocks and treatments are considered random effects, and an overall mean was included as the only fixed effect. This LMM is expressed as:
y = 1 μ + Z b b + Z g g + e = 1 μ + Z b Z g b g + e = 1 μ + Z γ + e
where Z = Z b Z g ,   γ = b g ,   and   G = D b 0 0 G g .
Here, y is a vector of phenotypic observations; μ is the overall mean; b is a vector of block random effects, such that b M V N ( 0 , D b ) ; g is a vector of genetic random effects, such that g M V N ( 0 , G g ) ; and e is a vector of residual errors, such that e M V N ( 0 , R ) . Also, 1 is a vector ones. Z b and Z g are incidence matrices for block and treatment effects, respectively. D b , G g and R are variance-covariance matrices for blocks, treatments and residual errors, respectively. Block effects are assumed iid, hence D b = σ b 2 I b , and genetic effects are assumed G g = σ g 2 I g or G g = σ g 2 A , for genetically unrelated or related individuals, respectively. Similarly to the RCB model, the residuals errors have R = σ e 2 I or R = σ e 2 Σ x ( ρ x ) Σ y ( ρ y ) , for independent or spatially correlated assumptions, respectively. All other variance components and matrices were defined previously.
Under the above model, the variance-covariance matrix of treatment effects, M ( Ω ) , is obtained by using Equation (7), and expanding Z and G based on Equation (8) to obtain:
M ( Ω ) = [ Z b Z g ] R ^ - 1 [ Z b Z g ] + D ^ b - 1 0 0 G ^ g - 1 - [ Z b Z g ] K x [ Z b Z g ] - 1
where K x = R ^ - 1 1 ( 1 R ^ - 1 1 ) - 1 1 R ^ - 1 . This matrix M ( Ω ) can be expressed as:
M ( Ω ) = Σ ^ b ( Ω ) Σ ^ b g ( Ω ) Σ ^ b g ( Ω ) Σ ^ g ( Ω )
where Σ g ( Ω ) is the relevant portion that represents the variance-covariance matrix of treatment effects, to be used to perform the A- and D-optimality criteria in later stages.

2.3. Pairwise Swap Algorithm

The procedure to improve an experimental design layout was based on a pairwise swap (exchange) search algorithm. In brief, from an initial available layout, random pairs of treatments are swapped and evaluated using either an A- or D-optimality criterion, if a better layout is found then this is kept and a new swap is evaluated until a stopping criteria is reached.
The steps for this swap procedure in more detail follows: (i) randomly generate m experimental layouts Ω i , where i = 1 , 2 , 3 , . . . , m ; (ii) for each layout, calculate M ( Ω i ) and obtain a criterion value τ i (that is, trace in case of A-optimality or determinant in case of D-optimality); (iii) select s experimental layouts with the smallest τ i values; (iv) for each layout randomly interchange a pair of treatments within a block, between blocks and/or between a list of treatments to produce a new layout Ω j ; (v) recalculate the new criterion value τ j ; (vi) if τ i > τ j , then accept τ j and use Ω j as the new layout, otherwise reject Ω j ; (vii) repeat steps (iv) to (vi) for a total of p iterations; (viii) select the best of the s imporved layouts.
For the improvement of the RCB designs the pair of treatments to be selected for swapping are selected from the same block in order to maintain the balanced nature of the experimental design. In contrast, for IB designs, given that in an incomplete block not all treatments are represented, then swapping pairs of treatments from within a block or between a block is allowed. In the case of AB designs, control treatments are allowed to be swaped only with a block but test treatments have no restrictions. Finally, general UR designs are generated from a list that identifies each treatment with its replication. These designs have no restrictions on the swapping of treatments due to its unbalanced nature. A complex case of UR can also be considered, where a list of treatments with constraints of allowed replications (i.e., minimum and maximum) is provided. Here, first a treatment is selected to determine if should be replaced by another random treatment according to its replication constraints, then, a second treatment was selected and checked for constraitns, and then this pair was allowed to swap freely.

2.4. Evaluation of the Swap Algorithm to Generate Designs

Performance evaluation of the above proposed swap algorithm was first conducted for RCB designs with specific number of treatments and blocks (see below), and a combination of heritability values ( h 2 = 0.1, 0.3, and 0.6), spatial correlation levels ( ρ = ρ x = ρ y = 0, 0.1, 0.3, 0.6 and 0.9), and genetic relationship structures (unrelated individuals, half-sib and full-sib families).
The scenarios Ω A ( 30 ) and Ω D ( 30 ) identify an RCB design with t = 30 genotypes generated using A- and D-optimality criteria, respectively. In these scenarios, there are b = 6 blocks, r b = 5 rows per block, c b = 6 columns per block. The independent (or unrelated) genetic structure considers the t genotypes to be unrelated. For half-sib families, pedigree consisted of a structure based on five parents with six offsprings each. Full-sib family pedigree consisted in a half-diallel with five parents for a total of 10 families (or crosses) with three offsprings each. Similarly, Ω A ( 196 ) scenario identifies an RCB design with t = 196, b = 4, r b = 14 and c b = 14 generated based only on A-optimality criterion. Here, pedigree for half-sib families consisted of 32 parents, with approximately six offsprings each. For full-sib families pedigree consisted of a total of 30 parents in 68 families with approximately three offsprings each. Parents were arranged in groups of five to form half-diallel with 10 crosses each. Eight additional crosses between diallels were also included to connect diallels.
Each of the evaluated combinations of conditions was used to generate designs based on the model presented in Section 2.1 that were replicated λ = 10 times. Each replicate had m = 100 initial RCB designs iterated and the best design was selected ( s = 1 ) which was then optimized for p = 5000 iterations to produce an improved experimental layout.
In a second stage, the complex experimental designs IB, AB and UR were evaluated. In contrast with RCB designs, these designs consider both blocks and treatments as random effects in their design and analysis (Equation (8)) The IB designs were generated with t = 30, b = 6 blocks ( r b = 5 and c b = 4); hence, each block had k = 20 experimental units, and each treatment was equally replicated r = 4 times. For the AB designs, t t = 492 unreplicated test treatments and t c = 3 replicated control treatments that were replicated r c = 12 times, arranged into b = 3 blocks ( r b = 10 and c b = 20); hence, each block had k = 200 experimental units where 164 belong to unreplicated test treatments. A complex UR design was generated with t = 30 treatments arranged in b = 6 blocks ( r b = 5 and c b = 6), where each block had k = 30 experimental units, but treatments were unequally replicated, where the number of replications ranged between 4 and 8 based on a provided random list of treatment contraints.
As with the RCB design, evaluation of the swap algorithm for these complex designs was done for varying levels of genetic and environmental conditions, considering a combination of heritability values ( h 2 = 0.1, 0.3, and 0.6) and spatial correlation levels ( ρ = ρ x = ρ y = 0, 0.3, 0.6 and 0.9). The genetic relationship structures evaluated consisted of unrelated individuals, half-sib and full-sib families. For the IB and UR designs, pedigree for half-sib and full-sib was identical to the one presented earlier for the Ω ( 30 ) scenario. Also, in all designs, the variance of the blocks, σ b 2 , was set to be 20% of the total phenotypic variance. For the AB designs, pedigree for half-sib families from the unreplicated test treatments comprised of 41 parents each with 12 offsprings; full-sib families consisted in a half-diallel with 12 parents for a total of 35 families with approximately 14 offsprings each. In both cases, the control treatments were genetically unrelated between them and to the test treatments.
Each of the evaluated combinations of conditions was used to generate designs based on the model presented in Section 2.1 that were replicated λ = 10 times. Each replicate had m = 100 initial random designs iterated and the best design was selected ( s = 1 ), which was then optimized for p = 5000 iterations to produce an improved experimental layout.
For all the designs considered, in order to evaluate the improved experimental layouts, two measures of efficiency were defined. The initial design efficiency (IDE%) is the percent improvement, in terms of A o p t or D o p t , of the best (i.e., minimum trace or determinant) design without optimization (i.e., p = 0) from the m initial designs, in relation to the average value of these designs. In contrast, the overall design efficiency (ODE%) is the percent improvement after p iterations of the final selected design, in relation to the average value of the m initial designs.
All computations were implemented in R [36] using a high performance computer from the University of Florida, and code is available from the authors upon request.

2.5. Data Simulation for RCB Designs

Simulations were implemented to evaluate the accuracy and precision of the estimation of random genetic effects and estimate narrow-sense heritabilities from fitting a LMM for the RCB design. A response variable was simulated following the model
y i j = μ + g k ( i j ) + E s ( i j )
where y i j represents the observation on the ith row and jth column, μ is an overall mean that was arbitrarily fixed to 10 units, g k ( i j ) represents a k-th random genotype effect and E s is the structured residual error. A subset of the experimental conditions for RCB designs described in Section 2.4 were considered here with spatial correlation levels of ρ = ρ x = ρ y = 0 . 3 and 0 . 6 and narrow-sense heritabilities of 0 . 1 , 0 . 3 and 0 . 6 with half-sib and full-sib families pedigree structures. Correlated genetic and residual effects were obtained based on the Cholesky decomposition of multivariate normal distributions with a zero expected value given by G = σ g 2 A and R = σ e 2 Σ x ( ρ x ) Σ y ( ρ y ) , respectively. More details of the simulation process are described in [5].
Three scenarios were generated, with a small experiment Ω A ( 30 ) and Ω D ( 30 ) both with 6 blocks exactly as described on Section 2.4. A large experiment Ω A ( 196 ) was also generated as described in Section 2.4, but with 16 blocks. A total of 12 conditions for each scenario were evaluated, each with λ = 50 replicates, m = s = 1 and p = 5000 iterations.
Analysis of data for the Ω A ( 30 ) and Ω D ( 30 ) scenarios was done by fitting two linear mixed models. Both of these models considered blocks as fixed effects, genotypes as random effects, but for the first case (Model 1) residual errors were modeled assuming independence, whereas for the second case (Model 2) residual errors were modeled by fitting an A R 1 A R 1 spatial correlation structure. For simplicity, under the Ω A ( 196 ) scenario, only Model 2 was fitted. These analyses were performed for the initial designs (p = 0), and final improved designs.
Pearson’s product-moment correlation r g between predicted and true simulated breeding values were computed, and estimation of heritabilities. The statistical package R [36] was used to simulate these conditions and the software ASReml-R v. 3.0 [13] was used to fit all models and to estimate heritabilities.

3. Results

3.1. Design Efficiencies for RCB

The evaluated conditions related to varying levels of heritability, genetic structure and spatial correlations for the RCB designs are shown in Table 1 and Figure 1. The results presented here show the percentage improvement in terms of reduction in average variance of the treatment effects when A-optimality criterion is used and in terms of reduction in volume of the hypersphere when the D-optimality criterion based on the IDE% and ODE% design efficiency measures.
As, expected, the average IDE% values for RCB designs were all smaller than their respective ODE% values. This would confirm the fact that randomly generating hundreds of experimental designs and simply choosing the best with respect to A- or D-optimality criteria results in designs with lower efficiencies compared to optimized designs. After applying the optimization procedure to improve the experimental layout, the average highest ODE% of 8.739 (S.E. = 0.065) from the optimal designs under Ω A ( 30 ) was obtained from the set of genetically unrelated individuals when h 2 = 0 . 3 and ρ = 0 . 6 . The only exceptions are the conditions with ρ = 0 (i.e., no spatial correlation) that resulted in consistently null improvements. Relatively lower gains were observed within half-sib and full-sib families compared to structures with independent individuals. Specifically, the highest ODE% of 7.262 (0.031) among half-sib families occurred when h 2 = 0 . 1 and ρ = 0 . 6 which was also the case among full-sib families that recorded the highest ODE% of 5.004 (0.034) when h 2 = 0 . 1 and ρ = 0 . 6 . Also experiments with either half-sib or full-sib families appear to achieve higher reduction of average variance of treatment effects when the heritabilities are very low (i.e., h 2 = 0 . 1 ). In addition, for any given heritability level, often highest design improvement was achieved when the spatial correlation level was 0.6.
The results from the larger experimental Ω A ( 196 ) show that, on average, the overall highest ODE% was obtained when the experiments consisted of genetically unrelated individuals with h 2 = 0 . 1 and ρ = 0 . 9 (ODE% = 5.664, S.E. = 0.032). Also, among the genetically unrelated individuals, when h 2 = 0 . 3 , large improvements occurred when ρ = 0 . 9 yielding a ODE% of 4.559 (0.032). However, for h 2 = 0 . 6 , large design improvements (ODE% = 3.213, S.E. = 0.036) occurred when ρ = 0 . 6 . Considering the half-sib families, overall highest reduction in average variance of treatment effects of 4.834 (0.055) was observed when h 2 = 0 . 1 and ρ = 0 . 9 . Similarly, among the full-sib families, highest ODE% of 3.040 (0.033) was obtained when h 2 = 0 . 1 and ρ = 0 . 9 . These results indicate that an experiment with strong spatial correlations and with very low heritabilities may have considerable design improvements over experiments with high heritabilities and low spatial correlations.
The small experiments Ω D ( 30 ) had the highest reduction in volume of the hypersphere (ODE% = 6.910, S.E. = 0.039) obtained when h 2 = 0 . 1 and ρ = 0 . 9 among the genetically unrelated individuals. Similarly, the highest ODE% among half-sib family was 3.943 (0.024) obtained when h 2 = 0 . 1 and ρ = 0 . 9 . Experiments with full-sib families recorded highest design efficiencies of 3.114 (0.023) when h 2 = 0 . 3 and ρ = 0 . 6 . In general, for the same conditions ODE% values from D-optimality resulted in lower gains compared to A-optimality reflecting the nature of these optimization criteria. However, a Pearson’s product-moment correlation of 0 . 98 between these criteria for Ω A ( 30 ) and Ω D ( 30 ) was obtained.
The number of successful swaps from the p = 5000 iterations for each condition and scenario were also monitored. The mean number of successful swaps for independent, half-sib and full-sib families in the Ω A ( 30 ) scenario across all conditions were: 139 (range 96–208), 150 (93–244) and 178 (97–283). respectively. The average successful swaps under Ω D ( 30 ) scenario were 185 (144–234), 190 (147–257) and 199 (131–260) for independent, half-sib and full-sib families, respectively. In contrast, under Ω A ( 196 ) the successful swaps for the same genetic structures were larger, with values of 894 (830–959), 950 (828–1144) and 1024 (844–1281), respectively. In general, it was noted that higher number of successful swaps were obtained when the treatments had lower heritabilities and spatial correlation values.

3.2. Design Efficiencies for IB, AB and UR

A summary of average ODE% for IB, AB and UR designs from each of the evaluated experimental conditions are given in Table 2 and Figure 2. IB designs achieved a highest average ODE% of 10.250 (S.E. = 0.274) when h 2 = 0 . 1 and ρ = 0 . 9 . For designs with h 2 of 0.3 and 0.6, mean highest reduction in average variance of treatment effects, with ODE% of 8.543 (0.139) and 6.854 (0.122), respectively, were obtained at ρ = 0 . 6 . For a given spatial correlation level, individual ODE% among full-sib families decrease with increasing heritability as shown in Figure 2 with average highest ODE% obtained when the spatial correlation was 0.6 (Table 2). Among half-sib families, a highest ODE% of 6.824 (0.163) was obtained when h 2 = 0 . 1 at ρ = 0 . 6 . For heritabilities of 0.3 and 0.6, mean highest ODE% of 6.413 (0.180) and 4.337 (0.087) were obtained when ρ = 0 . 6 , respectively. Improvements for IB designs with full-sib families were ODE% of 5.082 (0.089), 3.511 (0.098) and 1.833 (0.047) for h 2 = 0 . 1 , 0 . 3 and 0 . 6 , respectively, all of them obtained when ρ = 0 . 6 .
Results from the AB designs achieved the highest reduction of average variance of treatment effects among full-sib families when h 2 = 0 . 1 and ρ = 0 . 6 , yielding an ODE% of 3.752 (0.093). Among the full-sib families, designs with heritabilities of 0.3 and 0.6 obtained their highest ODE% of 2.939 (0.092) and 2.387 (0.083) at spatial correlations of 0.6. As with the IB designs, the ODE% increased with increasing spatial correlation from ρ = 0 . 0 to 0 . 6 , and then decreased at a spatial correlation of 0.9 (see Table 2). For generated AB designs with genetically unrelated individuals, a trend was observed of highest design improvements achieved when ρ = 0 . 9 at any levels of heritabilities with the highest value ODE% of 1.881 (0.120) obtained when h 2 = 0 . 6 and ρ = 0 . 9 . Similarly, an ODE% of 2.002 (0.111) was the highest improvement achieved among half-sib families when h 2 = 0 . 6 and ρ = 0 . 9 . Figure 2 shows individual ODE% values with a tendency to increase with larger heritability and spatial correlation among genetically unrelated individuals, and it decreased with increasing heritability among half-sib and full-sib families, except when h 2 = 0 . 6 and ρ = 0 . 9 among half-sib families.
Unequally replicated designs yielded, a highest average improvement level of 9.348 (0.190) achieved when h 2 = 0 . 3 and ρ = 0 . 6 observed among genetically unrelated individuals. At h 2 = 0 . 1 , the highest average design improvement of 9.283 (0.184) was observed at a spatial correlation of 0.6. In addition, when h 2 = 0 . 6 , the highest average ODE% of 6.207 (0.110) was obtained at ρ = 0 . 6 among the genetically unrelated individuals. For a given heritability, among all genetic structures, ODE% increase with increasing spatial correlations up to ρ = 0 . 6 and then dropped as spatial correlation increases to 0 . 9 . In general, for a specified heritability and spatial correlation level, ODE% values appear to decrease as genetic relationships increases (see Table 2). Also, for any given heritability, lower ODE% values were observed when spatial correlations were null ( ρ = 0 . 0 ) and in some conditions, when ρ = 0 . 9 with h 2 = 0 . 6 .

3.3. Analysis of Simulated Data for RCB Designs

A summary of the results obtained from the analysis of simulated data are presented in Table 3 and Table 4 based on initial (i.e., un-improved) and final (i.e., improved) designs. These contain prediction accuracies for the estimation of genetic effects and heritabilities by fitting a LMM for the RCB design with and without a 2-dimensional separable spatial correlation structure denoted by Model 2 and 1, respectively.
Considering results from the improved designs and for Ω A ( 30 ) scenario under Model 2 (Equation (3)), prediction accuracies of genetic effects among half-sib and full-sib families were found to be very high with r g = 0 . 984 and 0.981, respectively, obtained when h 2 = ρ = 0 . 6 . Similarly, Ω D ( 30 ) scenario under Model 2 resulted in strong prediction accuracies of genetic effects among half-sib and full-sib families with r g = 0 . 983 and 0.982, respectively, also obtained when h 2 = ρ = 0 . 6 . In all cases, Pearson’s correlation coefficients from the no-spatial model (Model 1) were lower than those from Model 2 but still very strong. For instance, prediction accuracies for Ω A ( 30 ) based on Models 1 and 2 among half-sib families were r g = 0 . 943 and r g = 0 . 984 , respectively, obtained when ρ = h 2 = 0 . 6 . As expected, the lowest predictive ability under each scenario was found when layouts had the lowest spatial and heritability values.
Results from Ω A ( 196 ) scenario (Table 4) had much stronger predictive ability of r g = 0 . 990 for both half-sib and full-sib families when h 2 = ρ = 0 . 6 . The lowest prediction accuracy occurred when h 2 = 0 . 1 and ρ = 0 . 3 resulting in r g values of 0.860 and 0.837 for half-sib and full-sib families, respectively. As expected, for the three scenarios ( Ω A ( 30 ) , Ω A ( 196 ) and Ω D ( 30 ) ), for a given level of spatial correlation, r g increased with increasing h 2 and similarly, for a given h 2 value r g increased with ρ .
Precision of the estimated heritabilities, measured using coefficient of variation (C.V. %), was found to be largest for smallest h 2 values, decreasing with increasing h 2 in both half-sib and full-sib families. In addition, C.V. % of heritability decreased considerably with increasing ρ values. Conditions with smaller heritability values were relatively more variable, presenting higher C.V. % than for those with larger heritability. For a given heritability value, prediction accuracies increased with increase in spatial correlations only for the spatial correlation model (i.e., Model 2). The C.V. % of heritability for Ω A ( 196 ) were notably smaller than for Ω A ( 30 ) and Ω D ( 30 ) scenarios, a result likely due to the larger number of experimental units used.
Kernel densities of estimated heritabilities for Model 2 based on Ω A ( 196 ) scenario are shown in Figure 3 for selected conditions. Here, comparisons of estimated heritabilities between the initial and improved designs for full-sib and half-sib families are presented, which clearly indicates that final designs provide greater precisions (i.e., narrower distributions) than initial designs in estimating heritabilities.

4. Discussion

The primary aim for this study was to focus on algorithms and statistical procedures that can be useful on the generation of optimal field designs with correlated observations and to assess their efficiency considering a wide spectrum of typical field conditions. Although we demonstrate how to simulate data from a RCB design in Section 2.5, this was only secondary to the generation of optimal designs and was necessary to show how we simulated the data for analysis and evaluate the accuracy and precision of genetic effects and heritabilties that were analyzed using ASReml-R v. 3.0 [13]. Further details on simulating data for other complex designs have been discussed by other studies [5].
Different experimental designs, and varying environmental and genetic conditions have a strong influence on the efficiency of field experiments. In this study, it was shown that, by the use of a linear mixed model framework to account for genetic relatedness and spatial correlation, it is possible to achieve important efficiency gains for the generation of experimental designs. Unlike other studies that have discussed optimality procedures by fitting mainly fixed effects models (e.g., [20,23]), the implemented procedure reported here provides with results that are practical for an array of field trial scenarios commonly used in plant studies.
For RCB designs, under the absence of spatial correlations (i.e., ρ = 0 . 0 ), regardless of the level of heritability and genetic structures, there are no gains in optimizing designs using the implemented swap algorithm. In contrast, for IB, UR and AB designs, there were some moderate gains achieved for the evaluated conditions. This is expected due to the balanced and orthogonal nature of the RCB designs, whereas for the other designs rearrangement of the experimental units (for example, between different incomplete blocks), yields to better treatment pairwise comparisons. However, zero spatial correlation might not be a reasonable assumption, as in practice for any field trial there is always some level of spatial correlations. Nevertheless, under non-zero spatial correlations, all scenarios showed considerable improvements, particularly for the independent genetic structure. However, the positive effect of strong spatial correlations tends to diminish as the level of genetic relatedness between the treatments increases.
The input required for the generation of an improved experimental design under the proposed pairwise swap algorithm, includes a choice of number of iterations and optimality criterion. Figure 4 presents an illustration for a RCB design based on h 2 = 0 . 3 and ρ = 0 . 6 for genetically unrelated individuals, half-sib and full-sib families for Ω A ( 30 ) , Ω D ( 30 ) and Ω A ( 196 ) scenarios based on p = 50,000 iterations. For small RCB designs ( Ω A ( 30 ) and Ω D ( 30 ) ), the rate of improvement was high until 5000 to 10,000 iterations and flattens out thereafter. The large RCB design ( Ω A ( 196 ) ) showed that there was room for improvement even after 50,000 iterations, a result that is expected from experiments with large number of treatments [20]. Interestingly, there was a small number of successful swaps. For example, with Ω A ( 30 ) scenario, for independent, half-sib and full-sib families, these reached only 178, 172 and 198 cases, with ODE% of 9.67, 7.00, and 3.52, respectively. In contrast for Ω A ( 196 ) , the successful swaps were 2589, 2685 and 2616, for the same genetic structures. Nevertheless, using 5000 iterations provides with a useful stopping criteria that limits the time required to run the optimizing procedure.
The decision between A- and D-optimality criteria depends on the aim, where the A-optimality criterion minimizes the average treatment effect variance and D-optimality criterion minimizes the generalized variance of the treatment effects, hence it considers covariances between pairs of treatment effects. Nevertheless, in this study, a strong Pearson’s product-moment correlation of 0 . 98 between A- and D-optimality criteria was detected. This is not completely unexpected as both criteria are a convex function of the eigenvalues of the information matrix [22,23].
Several measures of design efficiency can be evaluated. In this study, overall design efficiency (ODE%) was used. Other measures have been proposed, such as the average efficiency factor (a.e.f.) [20] that compares the average pairwise variance of a given design layout against an RCB design with the same number of treatments and replicates. Nevertheless, a.e.f is only defined for designs where treatment is considered a fixed effect.
For all conditions evaluated the current study has shown that, for RCB designs based on A-optimality criterion, gains of up to 8.7% can be achieved. Both small and large experiments with half-sib and full-sib families can achieve greater improvements under low heritability levels and moderate spatial correlation. Filho and Gilmour [21] also reported larger improvements on those layouts with genetically unrelated individuals against those that accounted for genetic relatedness; however, their residuals were assumed independent.
For the data simulated for an RCB design, high prediction accuracies (≥0.90) of genetic additive effects were obtained from both initial and improved experimental layouts from models with and without spatial correlations. As expected, better prediction accuracies were found for the model that incorporated spatial correlations, particularly for the improved designs. This result is similar to the findings reported by [5]. The current study has found that estimation of heritabilities was more accurate under improved designs for both models. Also, as expected, genotype effects estimated under layouts with small heritability values will exhibit larger C.V. % compared to layouts with large heritability values, and when the spatial correlations are low, the C.V. % for estimated heritability is larger, and vice-versa. The prediction accuracies of genetic values from larger experiments is higher compared to that from small experiments, but similar trends are observed as the accuracy increases with increasing levels of heritabilities.
In the case of more complex designs, such as IB, AB and UR, the highest average variance reduction of treatment effects of about 8.54%, 1.03% and 9.35%, respectively, were achieved when conducted with genetically unrelated individuals with heritabilities of 0 . 1 and spatial correlations of 0 . 6 . As indicated earlier, unlike RCB designs, it is still possible for these designs to improve their efficiencies even when spatial correlation is zero. It is expected that the improvement of the generation of these designs will require large number of iterations as a results of their unbalanced nature.
The proposed swap algorithm and procedures used here can be extended to include other experimental designs, genetic structures, and spatial variance-covariance structures (such as the ones described by [12,13,14,37,38]). In addition, instead of using pedigree information to calculate a numerator relationship matrix, molecular markers can be used to calculate a genomic relationship matrix [17]. The use of the relationship matrix within the mixed model framework is limited to only describing the additive genetic effects. However, the linear models can be extended with other matrices to consider all genetic effects, such as dominance and epistatis.
A similar algorithm to the one proposed here for the generation of experiments was developed by [4], which also deals with correlated effects and residuals using a different optimization approach. However, it is expected that these two algorithms may result in nearly identical optimal solutions, given the similarity of these methods. Nevertheless, in the present study, findings about generation of experiments under an array of varying conditions apply to any algorithm that generates an optimal design, and this should assist on the use of these tools.
In order to generate field experiments, there is a need for known values of the different variance components (e.g., spatial correlation). This could be a limitation of the proposed algorithm; however, for most mature breeding programs there is always some level of information about what to expect on a field experiments. For example, some fields are dominated by patches (i.e., high spatial correlations) others are dominated by gradient. In addition, usually the same farms are used repeatedly to establish breeding trials, and pedigree information often is available to be used to calculate relationship matrices that can be incorporated in a design to be generated. The present study has evaluated an array of varying typical field conditions that can be used; hence, it is possible to asses and select the most appropriate condition to generate a desired design. Finally, even if a generated design is not fully optimum for its field conditions, it is likely to be always better than a design that ignores any type of spatial and/or genetic correlations. More complex designs than RCB might be required when conducting experiments with large number of treatment levels as is common on breeding experiments. Thus, it is advisable in these cases to use more complex layouts such as IB designs.
This study has demonstrated how to generate experiments with random initial arrangements and optimize them considering both spatial and/or genetic relatedness. When designing complex field experiments, it is important to start with an optimal design. Generating an optimal experiment assuming all parameters are fixed with no sort of correlations has been implemented by many other studies such as CycDesigN [20]. A two step approach can be used where after the initial design is generated assuming no spatial or genetic correlations, then the second stage superimposes these realistic conditions. The strength of our method is that our algorithm can take any design and optimize it. The R-codes developed allow the user to input any initial design, or generate the design within our software. If the initial design is generated within this software, our current approach starts with a random design and models these correlations simultaneously in one step. The two-step approach was not implemented and can be viewed as a limitation of the study, but can be implemented in the future. For large experiments, long computation time were required to generate improved designs using a desktop computer. For example, the time taken for Ω A ( 30 ) scenario for 5000 iterations, was ∼2 min and for Ω A ( 196 ) scenario it took ∼30 min. Improved computation can be implemented with faster code and software to enable evaluation of experiments with large numbers of treatments. Additionally, other stochastic search algorithms, including simulated annealing, could yield potentially faster optimizations when compared to the pairwise swap algorithm proposed here.

5. Conclusions

This study has demonstrated that simultaneous considerations for genetic structure and spatial correlations can be incorporated to generate better experimental designs with important improvements in relative design efficiency and prediction accuracies of random treatment effects. Also, for RCB, IB, AB and UR designs higher relative design efficiencies are achievable from genetically unrelated individuals compared to experiments with half-sib and full-sib families. For RCB designs with half-sib or full-sib families, the optimization procedure may yield to important improvements under the presence of mild to strong spatial correlation levels and relatively low heritability values. As expected, accuracy of prediction of genetic values improves as levels of heritability and spatial correlations increases.
References

Acknowledgments

The authors would like to thank The University of Florida, Institute of Food and Agricultural Sciences (UF/IFAS) for funding the study as part of a Ph.D. Thesis for Lazarus K. Mramba.

Author Contributions

Lazarus Mramba and Salvador Gezan conceived and designed the experiments and wrote the paper; Lazarus Mramba performed the experiments and analyzed the data; Gary Peter and Vance Whitaker critically reviewed the paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Yates, F. The recovery of inter-block information in varietal trials arranged in three dimensional lattices. Ann. Eugen. 1939, 9, 136–156. [Google Scholar] [CrossRef]
  2. Welham, S.J.; Gezan, S.A.; Clark, S.J.; Mead, A. Statistical Methods in Biology; Chapman & Hall: London, UK, 2015. [Google Scholar]
  3. Möhring, J. Mixed Modelling for Phenotypic Data From Plant Breeding. Ph.D. Thesis, Institut für Kulturpflanzenwissenschaften, Universität Hohenheim, Stuttgart, Germany, 2010. [Google Scholar]
  4. Butler, D.G.; Smith, A.B.; Cullis, B.R. On the Design of Field Experiments with Correlated Treatment Effects. J. Agric. Biol. Environ. Stat. 2014, 19, 539–555. [Google Scholar] [CrossRef]
  5. Gezan, S.A.; White, T.L.; Huber, D.A. Accounting for spatial variability in breeding trials: A simulation study. Agronomy 2010, 102, 1562–1571. [Google Scholar] [CrossRef]
  6. Kuehl, R.O. Design of Experiments: Statistical Principles of Research Design and Analysis, 2nd ed.; Brooks/Cole, Cengage Learning: Boston, MA, USA, 2000. [Google Scholar]
  7. Federer, W.T. Augmented (or hoonuiaku) designs. Hawaii. Plant. Rec. 1956, 55, 191–208. [Google Scholar]
  8. Federer, W.T.; Raghavarao, D. On augmented designs. Biometrics 1975, 31, 29–35. [Google Scholar] [CrossRef]
  9. Federer, W.T. Recovery of interblock, intergradient, and intervarietal information in incomplete block and lattice rectangle designed experiments. Biometrics 1998, 54, 471–481. [Google Scholar] [CrossRef]
  10. Cullis, B.R.; Smith, A.B.; Coombes, N.E. On the design of early generation variety trials with correlated data. J. Agric. Biol. Environ. Stat. 2006, 11, 381–393. [Google Scholar] [CrossRef]
  11. William, E.; Piepho, H.P.; Whitaker, D. Augmented p-rep designs. Biom. J. 2011, 53, 19–27. [Google Scholar] [CrossRef] [PubMed]
  12. Stroup, W.W. Generalized Linear Mixed Models, Modern Concepts, Methods and Applications; Chapman & Hall: London, UK, 2013. [Google Scholar]
  13. Gilmour, A.R.; Gogel, B.J.; Cullis, B.R.; Thompson, R. ASReml User Guide Release 3.0; VSN International Ltd.: Hemel Hempstead, UK, 2009. [Google Scholar]
  14. Littell, R.C.; Milliken, G.A.; Stroup, W.W.; Wolfinger, R.D.; Schabenberger, O. SAS for Mixed Models, 2nd ed.; SAS Institute Inc.: Hong Kong, China, 2006. [Google Scholar]
  15. Butler, D.G.; Eccleston, J.A.; Cullis, B.R. On an approximate optimality criterion for the design of field experiments under spatial dependence. Aust. N. Z. J. Stat. 2008, 50, 295–307. [Google Scholar] [CrossRef]
  16. Falconer, D.S.; Mackay, T.F.C. Introduction to Quantitative Genetics, 4th ed.; Longman Group Ltd.: Harlow, UK, 1996. [Google Scholar]
  17. VanRaden, P.M. Efficient methods to compute genomic predictions. J. Dairy Sci. 2008, 91, 4414–4423. [Google Scholar] [CrossRef] [PubMed]
  18. Henderson, C.R. Use of all relatives in intraherd prediction of breeding values and producing abilities. Dairy Sci. 1975, 58, 1910–1916. [Google Scholar] [CrossRef]
  19. Gonçalves, E.; St. Aubyn, A.; Martins, A. Mixed spatial models for data analysis of yield on large grapevine selection field trials. Theor. Appl. Genet. 2007, 115, 653–663. [Google Scholar]
  20. John, J.A.; Williams, E.R. Cyclic and Computer Generated Designs, 2nd ed.; Chapman and Hall: London, UK, 1995. [Google Scholar]
  21. Filho, J.S.B.; Gilmour, S.G. Planning incomplete block experiments when treatments are genetically related. Biometrics 2003, 59, 375–381. [Google Scholar] [CrossRef]
  22. Kuhfeld, W.F. Experimental Design: Effiency, Coding, and Choice Designs; Technical Report; SAS: Solna Municipality, Sweden, 2010. [Google Scholar]
  23. Das, A. An introduction to optimality criteria and some results on optimal block design. In Design Workshop Lecture Notes; Indian Statistical Institute: Kolkata, India; Theoretical Statistics and Mathematics Unit: New Delhi, India, 2002; pp. 1–21. [Google Scholar]
  24. Hooks, T.; Marx, D.; Kachman, S.; Pedersen, J. Optimality criteria for models with random effects. Rev. Colomb. Estad. 2009, 32, 17–31. [Google Scholar]
  25. Piepho, H.P. Generating efficient designs for comparative experiments using the SAS procedure OPTEX. Commun. Biom. Crop Sci. 2015, 10, 96–114. [Google Scholar]
  26. Williams, E.; Piepho, H.P. Optimality and contrasts in block designs with unequal treatment replications. Aust. N. Z. J. Stat. 2015, 57, 203–209. [Google Scholar] [CrossRef]
  27. Chernoff, H. Locally optimal designs for estimating parameters. Ann. Math. Stat. 1953, 24, 586–602. [Google Scholar] [CrossRef]
  28. Wald, A. On the efficient design of statistical investigations. Ann. Math. Stat. 1943, 14, 134–140. [Google Scholar] [CrossRef]
  29. Kiefer, J. Optimum experimental design. J. R. Stat. Soc. 1959, 21, 272–319. [Google Scholar]
  30. Kiefer, J.; Wolfowitz, J. Optimum designs in regression problems. Ann. Math. Stat. 1959, 30, 271–294. [Google Scholar] [CrossRef]
  31. Mandal, S. Construction of Optimizing Distributions with Applications in Estimation and Optimal Design. Ph.D. Dissertation, University of Glasgow, Glasgow, UK, 2000. [Google Scholar]
  32. Yang, M. A-optimal designs for generalized linear models with two parameters. J. Stat. Plan. Inference 2008, 138, 624–641. [Google Scholar] [CrossRef]
  33. Stringer, J.K.; Cullis, B.R. Application of spatial analysis techniques to adjust for fertility trends and identify interplot competition in early sugarcane selection trials. Aust. J. Agric. Res. 2002, 53, 911–918. [Google Scholar] [CrossRef]
  34. Patterson, H.D.; Thompson, R. Recovery of inter-block information when block sizes are unequal. Biometrika 1971, 58, 545–554. [Google Scholar] [CrossRef]
  35. Henderson, C.R. The estimation of genetic parameters. Ann. Math. Stat. 1950, 21, 309–310. [Google Scholar]
  36. R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2016. [Google Scholar]
  37. Cressie, N.A.C. Statistics for Spatial Data, Revised ed.; John Wiley & Sons Inc.: Hoboken, NJ, USA, 1993. [Google Scholar]
  38. Zuur, A.F.; Ieno, E.N.; Walker, N.J.; Saveliev, A.A.; Smith, G.M. Mixed Effects Models and Extensions in Ecology with R; Springer: New York, NY, USA, 2009. [Google Scholar]
Figure 1. Initial Design Efficiency (IDE) and Overall Design Efficiency (ODE) for RCB designs with 30 genotypes generated using A-optimality criterion Ω A ( 30 ) and D-optimality criterion Ω D ( 30 ) and for 196 genotypes generated using A-optimality criterion Ω A ( 196 ) . All designs were evaluated with λ = 10 replicates per condition and iterated p = 5000 times to improve experimental layouts.Initial and overall design efficiency summary statistics based on RCB designs
Figure 1. Initial Design Efficiency (IDE) and Overall Design Efficiency (ODE) for RCB designs with 30 genotypes generated using A-optimality criterion Ω A ( 30 ) and D-optimality criterion Ω D ( 30 ) and for 196 genotypes generated using A-optimality criterion Ω A ( 196 ) . All designs were evaluated with λ = 10 replicates per condition and iterated p = 5000 times to improve experimental layouts.Initial and overall design efficiency summary statistics based on RCB designs
Agronomy 08 00040 g001
Figure 2. Individual ODE% for incomplete block (IB), augmented block (AB) and unequally replicated (UR) designs. Effective ODE% for IB was obtained after an initial design was subjected to random swapping of genotypes, first across blocks, then either within or across blocks. In contrast, effective ODE% for UR design was obtained after an initial design was subjected to first, random replacement of genotypes using a list of constraints and second, swapping of genotypes within blocks.Individual effective ODE% for IB, AB and UR designs
Figure 2. Individual ODE% for incomplete block (IB), augmented block (AB) and unequally replicated (UR) designs. Effective ODE% for IB was obtained after an initial design was subjected to random swapping of genotypes, first across blocks, then either within or across blocks. In contrast, effective ODE% for UR design was obtained after an initial design was subjected to first, random replacement of genotypes using a list of constraints and second, swapping of genotypes within blocks.Individual effective ODE% for IB, AB and UR designs
Agronomy 08 00040 g002
Figure 3. Kernel densities for estimated heritabilities. The vertical line represents the true heritability. Both initial and improved designs for different heritabilities and spatial correlations are presented based on Ω A ( 196 ) . Each condition had λ = 50 replicates each iterated p = 5000 times.Kernel densities for estimated heritabilities
Figure 3. Kernel densities for estimated heritabilities. The vertical line represents the true heritability. Both initial and improved designs for different heritabilities and spatial correlations are presented based on Ω A ( 196 ) . Each condition had λ = 50 replicates each iterated p = 5000 times.Kernel densities for estimated heritabilities
Agronomy 08 00040 g003
Figure 4. Design improvement trends for scenarios (a) Ω A ( 30 ) , (b) Ω D ( 30 ) , and (c) Ω A ( 196 ) based on h 2 = 0 . 3 and ρ = 0 . 6 for genetically unrelated individuals, half-sib and full-sib families, displaying successful traces and determinants during the optimization process based on p = 50 , 000 iterations. Scenarios Ω A ( 30 ) and Ω D ( 30 ) were generated with 6 blocks of dimensions 5 rows by 6 columns whereas Ω A ( 196 ) had 16 blocks of dimensions 14 rows by 14 columns.Motivating example showing rate of design improvement with time
Figure 4. Design improvement trends for scenarios (a) Ω A ( 30 ) , (b) Ω D ( 30 ) , and (c) Ω A ( 196 ) based on h 2 = 0 . 3 and ρ = 0 . 6 for genetically unrelated individuals, half-sib and full-sib families, displaying successful traces and determinants during the optimization process based on p = 50 , 000 iterations. Scenarios Ω A ( 30 ) and Ω D ( 30 ) were generated with 6 blocks of dimensions 5 rows by 6 columns whereas Ω A ( 196 ) had 16 blocks of dimensions 14 rows by 14 columns.Motivating example showing rate of design improvement with time
Agronomy 08 00040 g004
Table 1. Summary statistics for Initial Design Efficiency (IDE%) and Overall Design Efficiency (ODE%) for randomized complete block (RCB) designs with 30 genotypes generated using A-optimality criterion Ω A ( 30 ) and D-optimality criterion Ω D ( 30 ) and for 196 genotypes generated using A-optimality criterion Ω A ( 196 ) . All designs were evaluated with λ = 10 replicates per condition and iterated p = 5000 times to improve the experimental layouts. ODE mean values that are starred ( ) are the overall largest improvements per combination of pedigree and heritability conditions.
Table 1. Summary statistics for Initial Design Efficiency (IDE%) and Overall Design Efficiency (ODE%) for randomized complete block (RCB) designs with 30 genotypes generated using A-optimality criterion Ω A ( 30 ) and D-optimality criterion Ω D ( 30 ) and for 196 genotypes generated using A-optimality criterion Ω A ( 196 ) . All designs were evaluated with λ = 10 replicates per condition and iterated p = 5000 times to improve the experimental layouts. ODE mean values that are starred ( ) are the overall largest improvements per combination of pedigree and heritability conditions.
ConditionDesign Ω A ( 30 ) Design Ω A ( 196 ) Design Ω D ( 30 )
Pedigree h 2 ρ IDE%S.E.ODE%S.E.IDE%S.E.ODE%S.E.IDE%S.E.ODE%S.E.
Indep0.10.10.1200.0020.3050.0020.0040.0010.0170.0000.0460.0010.1240.001
0.30.4900.0152.1590.0200.0170.0020.2030.0050.1710.0060.8450.007
0.61.6360.0877.9750.0750.1020.0101.6020.0190.4170.0132.6270.022
0.91.5660.0786.9480.0620.3820.0625.664 0.0321.6950.0926.910 0.039
0.30.10.2110.0050.4810.0030.0080.0010.0420.0010.0930.0030.2360.002
0.30.7920.0383.0380.0160.0460.0080.5070.0070.3200.0071.4180.012
0.62.0430.0848.739 0.0650.1400.0162.7670.0310.5410.0193.2720.017
0.90.6380.0262.6700.0180.2700.0354.5590.0320.5860.0222.6720.026
0.60.10.2270.0060.4830.0050.0130.0020.0630.0030.0980.0010.2490.002
0.30.7950.0362.9780.0290.0720.0080.6830.0090.3220.0141.4340.011
0.61.4350.0516.0250.0710.2530.0273.2130.0360.5240.0213.0410.024
0.90.2030.0090.8640.0080.1700.0262.5150.0140.2060.0090.8680.008
Half-sib0.10.10.2430.0110.9080.0030.0240.0030.1740.0030.0510.0020.2020.002
0.30.6890.0163.3100.0270.0360.0060.8050.0060.1660.0060.9690.007
0.61.3350.0377.262 0.0310.1480.0262.0150.0140.4330.0242.6010.021
0.90.9280.0403.8480.0400.3410.0354.834 0.0550.9340.0243.943 0.024
0.30.10.2180.0060.6860.0030.0160.0020.1440.0020.0920.0030.2830.001
0.30.6430.0212.8980.0280.0510.0070.7040.0070.2830.0071.4300.010
0.61.4270.0566.4090.0480.1380.0162.6460.0410.5490.0193.2770.016
0.90.3040.0211.2870.0100.1580.0293.1610.0290.3190.0151.3070.012
0.60.10.1680.0050.4160.0040.0100.0010.0940.0020.1010.0030.2650.003
0.30.5550.0252.0540.0100.0640.0060.7140.0100.3310.0141.4460.010
0.60.8410.0503.5300.0310.1590.0132.8460.0330.5730.0183.0810.017
0.90.1000.0050.4030.0040.0960.0111.4050.0120.0920.0050.3900.003
Full-sib0.10.10.2620.0171.9550.0150.0390.0050.6230.0070.0560.0020.3580.003
0.30.6750.0194.4850.0390.1150.0181.9440.0150.1900.0111.1730.007
0.61.0640.0475.004 0.0340.1160.0182.8290.0320.3620.0102.4190.016
0.90.3450.0111.5660.0140.2040.0313.040 0.0330.3710.0131.5290.010
0.30.10.1660.0070.8990.0040.0230.0030.4670.0030.0910.0040.3980.001
0.30.4570.0192.1530.0230.0870.0121.2590.0070.2570.0101.4580.013
0.60.6710.0313.1280.0310.1260.0142.2450.0100.5400.0263.114 0.023
0.90.1080.0050.4650.0050.1030.0131.4830.0120.1130.0050.4580.005
0.60.10.0870.0030.2710.0020.0140.0020.2040.0010.0940.0020.2840.004
0.30.2560.0061.0160.0110.0570.0100.6980.0050.3070.0121.4180.010
0.60.3460.0171.4210.0130.1450.0231.8960.0140.5530.0263.0290.018
0.90.0340.0020.1380.0010.0360.0050.5120.0040.0320.0020.1380.002
Table 2. Summary of average ODE% and their standard errors (S.E) for incomplete block (IB), unequally replicated (UR) and augmented block (AB) designs with varying genetic relatedness, spatial correlations ( ρ ) and narrow-sense heritability ( h 2 ) based on m = 100 initial designs, λ = 10 replicates per condition and p = 5000 iterations.
Table 2. Summary of average ODE% and their standard errors (S.E) for incomplete block (IB), unequally replicated (UR) and augmented block (AB) designs with varying genetic relatedness, spatial correlations ( ρ ) and narrow-sense heritability ( h 2 ) based on m = 100 initial designs, λ = 10 replicates per condition and p = 5000 iterations.
ConditionIndepHalf-SibFull-Sib
h 2 ρ ODE%S.E.ODE%S.E.ODE%S.E.
IB designs
0.10.00.3170.0090.4680.0110.4210.012
0.31.8700.0522.6570.0603.4810.106
0.66.6630.1936.8240.1635.0820.089
0.910.2500.2745.9190.1382.6410.053
0.30.00.7110.0380.7330.0320.4760.014
0.32.9020.0502.9500.0632.2200.041
0.68.5430.1396.4130.1803.5110.098
0.94.6720.0902.3130.0380.8580.016
0.60.00.8720.0390.7090.0170.3710.020
0.33.3320.1052.4700.0841.2430.026
0.66.8540.1224.3370.0871.8330.047
0.91.6040.0360.7430.0100.2570.005
AB designs
0.10.00.6390.0000.8200.0001.5890.020
0.30.7330.0011.2350.0093.2740.049
0.60.9220.0061.7650.0183.7520.093
0.91.0900.0281.4770.0372.0020.061
0.30.01.0700.0000.9220.0001.7400.101
0.31.0740.0021.0950.0092.5460.088
0.61.0250.0051.4330.0152.9390.092
0.91.2750.0621.4460.0341.4740.056
0.60.01.1130.0000.5060.0021.3330.090
0.31.0860.0020.6460.0061.7180.101
0.61.0920.0061.1910.0182.3870.083
0.91.8810.1202.0020.1111.1060.039
UR designs
0.10.00.8890.0250.7870.0280.5840.010
0.32.9680.0573.5330.0873.8460.056
0.69.2830.1847.6500.0854.9140.075
0.97.3620.1523.9850.0531.5690.028
0.30.01.9180.0391.4880.0370.7770.028
0.34.5660.0923.8700.0562.4380.056
0.69.3480.1906.6070.0853.2460.063
0.92.7470.0431.2900.0140.4720.006
0.60.02.1310.0421.4740.0300.6990.016
0.34.5620.1152.9200.0701.3670.038
0.66.2070.1103.4170.0701.3890.022
0.90.8790.0170.3900.0070.1380.001
Table 3. Summary statistics presented with Pearson’s correlation coefficient ( r g ) and estimated heritabilities ( h ^ 2 ) together with their coefficient of variation (C.V. %) for Ω A ( 30 ) and Ω D ( 30 ) scenarios. Each condition had λ = 50 replicates each iterated p = 5000 times. In Model 1, residuals were assumed to be uncorrelated and in Model 2, an A R 1 A R 1 spatial correlation structure was fitted.
Table 3. Summary statistics presented with Pearson’s correlation coefficient ( r g ) and estimated heritabilities ( h ^ 2 ) together with their coefficient of variation (C.V. %) for Ω A ( 30 ) and Ω D ( 30 ) scenarios. Each condition had λ = 50 replicates each iterated p = 5000 times. In Model 1, residuals were assumed to be uncorrelated and in Model 2, an A R 1 A R 1 spatial correlation structure was fitted.
Estimates under Ω A ( 30 ) Estimates under Ω D ( 30 )
ConditionsHalf-Sib FamilyFull-Sib FamilyHalf-Sib FamilyFull-Sib Family
ρ h 2 Design h ^ 2 C.V. % r g h ^ 2 C.V. % r g h ^ 2 C.V. % r g h ^ 2 C.V. % r g
Model 1
0.30.1Initial0.09156.9320.6260.10557.5510.6470.10759.4290.6380.12256.8070.675
Improved0.10757.9580.6250.10767.3720.6600.13049.9840.6660.12961.6480.682
0.3Initial0.30325.5730.8450.28829.8280.8300.30931.0970.8510.30731.8740.833
Improved0.30230.7830.8540.31129.9850.8270.31526.2230.850.30132.0120.839
0.6Initial0.60910.6610.9460.59214.290.9390.58113.0290.9420.57814.2260.932
Improved0.59411.5050.9430.57414.4250.9320.58215.9340.9410.57418.9430.923
0.60.1Initial0.11948.3210.6440.11761.0600.6180.12646.3900.6590.11461.0200.682
Improved0.15749.6960.6430.14249.6820.6480.12654.1870.6040.14252.2760.615
0.3Initial0.29130.6020.8520.31726.0240.8440.31526.4740.8590.30832.3480.832
Improved0.34125.8370.8440.31730.520.8090.36125.5280.8470.32628.8040.826
0.6Initial0.60212.0460.9500.61311.3830.9400.60113.7280.9450.60815.5180.939
Improved0.61412.7160.9430.61513.1010.9350.61810.3610.9400.63212.1770.937
Model 2
0.30.1Initial0.08855.3350.6850.10058.7240.6650.08861.3060.6860.10656.8990.694
Improved0.09260.2710.6700.09461.8360.6940.11149.2590.7160.09956.7630.707
0.3Initial0.28425.7880.8740.28229.2950.8570.30727.1170.8760.30229.8650.857
Improved0.29332.0260.8860.29527.450.8590.29923.4270.8830.29629.4350.873
0.6Initial0.59412.0370.9540.58214.9840.9490.57711.4510.9550.56014.1800.944
Improved0.58511.6280.9570.56214.3910.9490.56915.5350.9570.55719.7560.943
0.60.1Initial0.09138.8930.8300.08751.0230.7970.09535.5280.8240.08845.9760.802
Improved0.09935.6750.8580.08837.5220.8280.09036.5860.8440.08436.7360.832
0.3Initial0.26822.6970.9430.28423.1240.9380.27822.1750.9440.26725.2160.933
Improved0.29221.3670.9510.27520.0510.9370.29520.8270.9530.28223.9330.940
0.6Initial0.56513.2050.9820.57110.8410.9790.56513.7320.9810.56915.9870.977
Improved0.57113.5790.9840.57112.6370.9810.56017.9160.9830.57613.5230.982
Table 4. Summary statistics based on Model 2 with Pearson’s correlation coefficient ( r g ) and estimated heritabilties ( h ^ 2 ) together with their coefficient of variation (C.V. %) for Ω A ( 196 ) scenario. Each condition had λ = 50 replicates each iterated p = 5000 times. Residuals errors were modeled using A R 1 A R 1 spatial structure.
Table 4. Summary statistics based on Model 2 with Pearson’s correlation coefficient ( r g ) and estimated heritabilties ( h ^ 2 ) together with their coefficient of variation (C.V. %) for Ω A ( 196 ) scenario. Each condition had λ = 50 replicates each iterated p = 5000 times. Residuals errors were modeled using A R 1 A R 1 spatial structure.
ConditionsHalf-Sib FamilyFull-Sib Family
ρ h 2 Design h ^ 2 C.V. % r g h ^ 2 C.V. % r g
0.30.1Initial0.10014.2790.8550.10215.1520.839
Improved0.10214.3900.8600.10113.4280.837
0.3Initial0.3018.1250.9480.2959.2370.946
Improved0.3038.0110.9500.3067.0700.949
0.6Initial0.5934.1720.9830.6014.7270.984
Improved0.5954.8510.9830.5933.9560.983
0.60.1Initial0.10914.1690.9000.10711.1420.893
Improved0.10811.8130.9020.10613.0720.895
0.3Initial0.3216.9990.9670.3197.6690.969
Improved0.3147.6490.9690.3117.4580.969
0.6Initial0.6164.7840.9900.6143.4160.990
Improved0.6133.7460.9900.6094.2060.990

Share and Cite

MDPI and ACS Style

Mramba, L.K.; Peter, G.F.; Whitaker, V.M.; Gezan, S.A. Generating Improved Experimental Designs with Spatially and Genetically Correlated Observations Using Mixed Models. Agronomy 2018, 8, 40. https://0-doi-org.brum.beds.ac.uk/10.3390/agronomy8040040

AMA Style

Mramba LK, Peter GF, Whitaker VM, Gezan SA. Generating Improved Experimental Designs with Spatially and Genetically Correlated Observations Using Mixed Models. Agronomy. 2018; 8(4):40. https://0-doi-org.brum.beds.ac.uk/10.3390/agronomy8040040

Chicago/Turabian Style

Mramba, Lazarus K., Gary F. Peter, Vance M. Whitaker, and Salvador A. Gezan. 2018. "Generating Improved Experimental Designs with Spatially and Genetically Correlated Observations Using Mixed Models" Agronomy 8, no. 4: 40. https://0-doi-org.brum.beds.ac.uk/10.3390/agronomy8040040

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop