advertisement

Module 9: Best Linear Unbiased Prediction

– Purelines

– Single-crosses

*Best Linear Unbiased Prediction (BLUP)*

•

Allows comparison of material from different populations evaluated in different environments

•

Makes use of all performance data available for each genotype, and accounts for the fact that some genotypes have been more extensively tested than others

•

Makes use of information about relatives in pedigree breeding systems

•

Provides estimates of genetic variances from existing data in a breeding program without the use of mating designs

Bernardo, Chapt. 11

*BLUP History*

• Initially developed by C.R. Henderson in the 1940’s

•

Most extensively used in animal breeding

• Used in crop improvement since the 1990’s, particularly in forestry

•

BLUP is a general term that refers to two procedures

– true BLUP – the ‘P’ refers to prediction in random effects models (where there is a covariance structure)

– BLUE – the ‘E’ refers to estimation in fixed effect models

(no covariance structure)

• “Best” means having minimum variance

• “Linear” means that the predictions or estimates are linear functions of the observations

•

Unbiased

– expected value of estimates = their true value

– predictions have an expected value of zero

(because genetic effects have a mean of zero)

*Regression in matrix notation*

Linear model

Parameter estimates

**Y = X**

**+ ε b = (X’X) -1 X’Y**

Source

Regression

Residual

Total df p n-p n

**SS b’X’Y**

**Y’Y - b’X’Y**

**Y’Y**

MS

MS

R

MS

E

*BLUP Mixed Model in Matrix Notation*

Observations

Design matrices

**Y = X**

**+ Zu + e**

Residual errors

Fixed effects Random effects

•

Fixed effects are constants

– overall mean

– environmental effects (mean across trials)

•

Random effects have a covariance structure

– breeding values

– dominance deviations

– testcross effects

– general and specific combining ability effects

Classification for the purposes of

BLUP

*BLUP for purelines – barley example*

Environments

Set 1

Cultivar Grain Yield t/ha

18 Morex (1) 4.45

Set 1

Set 1

Set 2

Set 2

Set 2

18

18 Stander (4)

9

9

9

Robust (2)

Robust (2)

Excel (3)

Stander (4)

4.61

5.27

5.00

5.82

5.79

Parameters to be estimated

• means for two sets of environments – fixed effects

– we are interested in knowing effects of these particular sets of environments

• breeding values of four cultivars – random effects

– from the same breeding population

– there is a covariance structure (cultivars are related)

Bernardo, pg 269

*Linear model for barley example*

Y ij

=

+ t i

+ u j

+ e ij t i u j

= effect of i th set of environments

= effect of j th cultivar

**In matrix notation: Y = X**

**+ Zu + e**

4.45

1 0

4.61

1 0

5.27

= 1 0 b

1

5.00

0 1 b

2

5.82

0 1

5.79

0 1

1 0 0 0

0 1 0 0 u

1

+ 0 0 0 1 u

2

0 1 0 0 u

3

0 0 1 0 u

4

0 0 0 1 e

11 e

12

+ e

14 e

22 e

23 e

24

*Weighted regression*

**Y = X**

**+ ε b = (X’X) -1 X’Y**

**Where ε ij**

**~ N (0, σ 2 )**

**When ε ij**

**~ N (0, R σ 2 )**

**Then b = (X’R -1 X) -1 X’R -1 Y**

For the barley example

18 0 0 0 0 0

0 18 0 0 0 0

R

-1

= 0 0 18 0 0 0

0 0 0 9 0 0

0 0 0 0 9 0

0 0 0 0 0 9

*Covariance structure of random effects*

XY

Morex 1

Robust

Excel

Stander

Morex Robust Excel Stander

1/2

1

7/16

27/32

1

11/32

43/64

91/128

1

Remember

XY

2

A

2

D r = 2

XY

2 A u

A

2

2

1

1

2

7/8 11/16

27/16 43/32

7/8 27/16 2 91/64

11/16 43/32 91/64 2

A

2

*Mixed Model Equations*

=

**X’R -1 X**

**Z’R -1 X**

**X’R -1 Z**

**Z’R -1 Z + A -1 ( σ**

**ε**

**2 / σ**

**A**

**2 )**

**R σ 2**

-1

**X’R -1 Y**

**Z’R -1 Y**

• each matrix is composed of submatrices

• the algebra is the same

Calculations in Excel

*Results from BLUP*

Original data

BLUP estimates

**For fixed effects b**

**1 b**

**2**

**= **

*= *

*+ *t

*+ *t

**1**

**2**

Environments

Set 1

Set 1

18

18

Cultivar

Morex

Robust

Grain Yield t/ha

4.45

4.61

Set 1

Set 2

Set 2

Set 2

18 Stander

9

9

9

Robust

Excel

Stander

5.27

5.00

5.82

5.79

1

2 u

1 u

2 u

3 u

4

Set 1

Set 2

Morex

Robust

Excel

Stander

4.82

5.41

-0.33

-0.17

0.18

0.36

*Interpretation from BLUP*

BLUP estimates

1

2 u

1 u

2 u

3 u

4

Set 1

Set 2

Morex

Robust

Excel

Stander

For a set of recombinant inbred lines from an F

2 cross of Excel x Stander

4.82

5.41

-0.33

-0.17

0.18

0.36

Predicted mean breeding value = ½(0.18+0.36) = 0.27

*Shrinkage estimators*

•

In the simplest case (all data balanced, the only fixed effect is the overall mean, inbreds unrelated)

BLUP (

i

)

h

2

Y i .

Y

..

•

If h 2 is high, BLUP values are close to the phenotypic values

•

If h 2 is low, BLUP values shrink towards the overall mean

•

For unrelated inbreds or families, ranking of genotypes is the same whether one uses BLUP or phenotypic values

*Sampling error of BLUP*

=

**X’R**

**Z’R**

**-1**

**-1**

**X**

**X**

**X’R -1 Z**

**Z’R -1 Z + A -1 ( σ**

**ε**

**2 / σ**

**A**

**2 )**

**-1 invert the matrix R σ 2**

**X’R -1 Y**

Z’R -1 Y coefficient matrix

**C**

**11**

**C**

**21**

**C**

**12**

**C**

22 each element of the matrix is a matrix

•

Diagonal elements of the inverse of the coefficient matrix can be used to estimate sampling error of fixed and random effects

*Sampling error of BLUP*

=

C

11

C

21

C

12

C

22

X’R -1

Y

Z’R -1

Y

2

C

11

2

** fixed effects**

2

22

2

** random effects**

*Estimation of Variance Components*

(would really need a larger data set)

1.

**Use your best guess for an initial value of σ**

**ε**

**2 / σ**

**A**

**2**

2.

Solve for

**ˆ and û**

3.

**Use current solutions to solve for σ**

**ε**

**2**

**σ**

**A**

2 and then for

4.

**Calculate a new σ**

**ε**

**2 / σ**

**A**

**2**

5.

Repeat the process until estimates converge

*BLUP for single-crosses*

Performance of a single cross:

G

B73,Mo17

= GCA

B73

+ GCA

Mo17

+ SCA

B73,Mo17

BLUP Model

**Y = X**

**+ Ug**

**1**

**+ Wg**

**2**

**+ Ss + e**

•

Sets of environments are fixed effects

•

GCA and SCA are considered to be random effects

Example in Bernardo, pg 277 from Hallauer et al., 1996

*Performance of maize single crosses*

Set Entry Pedigree

1 SC-1 B73 x Mo17

1

1

SC-2

SC-3

H123

B84

x

x

Mo17

N197

2

2

SC-2 H123 x Mo17

SC-3 B84 x N197

Grain Yield t ha

-1

7.85

7.36

5.61

7.47

5.96

Iowa Stiff Stalk x Lancaster Sure Crop

7.85

7.36

5.61

7.47

5.96

1 0

1 0 b

1

= 1 0 b

2

0 1

0 1

1 0 0

0 0 1 g

B73

+ 0 1 0 g

B84

0 0 1 g

H123

0 1 0

1 0

1 0 g

Mo17

+ 0 1 g

N197

1 0

0 1

1 0 0

0 1 0 s

1

+ 0 0 1 s

2

0 1 0 s

3

0 0 1 e

11 e

12

+ e

13 e

22 e

23

*Covariance of single crosses*

SC-X is j x k SC-Y is j’ x k’

Cov

SC

jj '

2

GCA ( 1 )

kk '

2

GCA ( 2 )

jj '

kk '

2

SCA

B73, B84, H123

G

1

=

1

B73,B84

B73,B84

B73,H123

1

B84,H123

B73,H123

B84,H123

1 g1

MO17, N197

G

2

=

1

Mo17,N197

Mo17,N197

1

G 2

1 GCA(1) assuming no epistasis

g2

G

2

2 GCA(2)

*Covariance of single crosses*

SC-X is j x k SC-Y is j’ x k’

Cov

SC

jj '

2

GCA ( 1 )

kk '

2

GCA ( 2 )

jj '

kk '

2

SCA

SC-1= B73 x MO17 SC-2= H123 x MO17 SC-3= B84 x N197

1

S =

B73,H123

Mo17,Mo17

B73,B84

Mo17,N197

B73,H123

Mo17,Mo17

1

B73,B84

Mo17,N197

B84,H123

Mo17,N197

B84,H123

Mo17,N197

1 s

S

2

SCA

*Solutions*

**X'R**

**-1**

**X X'R**

**-1**

**U**

**U'R**

**-1**

**X U'R**

**-1**

**U + **

Q

1

**W'R**

**-1**

**X W'R**

**-1**

**U**

**Z'R**

**-1**

**X Z'R**

**-1**

**U**

**X'R**

**-1**

**W**

**U'R**

**-1**

**W**

**W'R**

**-1**

**W + **

Q

**2**

**Z'R**

**-1**

**W**

**X'R**

**-1**

**Z**

**U'R**

**-1**

**Z**

**W'R**

**-1**

**Z**

**Z'R**

**-1**

**Z + **

Q

**S**

**-1**

**X**

**X'R**

**-1**

**Y**

**U'R**

**-1**

**Y**

**W'R**

**-1**

**Y**

**Z'R**

**-1**

**Y**

G

1

1

G

2

1

Q

S

S

1

/

/

/

2

GCA(1)

2

GCA(2)

2

SCA

b

1

b g

B

2

73

g

B 84 g

H 123 g

g

Mo 17

N

197 s

s s

1

2

3