# Issue ersion I V II ( F )
When there are some nonrandom quantities affecting a response variable, the analysis of data can be done by using a general linear model. There are some notable references about linear models such as [1][2][3][4]. In a general linear model, the response variable is composed of two parts in general; one part is the deterministic portion as a linear function of the unknown parameters of the independent or predictor variables and the other is the random portion. When data are collected, the sample general linear model in matrix form is applicable to the data. However, the matrix equation seems not to be useful for catching any idea about the relationship between the matrix of predictors and error vector only with the assumptions about the error vector; i.e., E(?) = 0 and var(?) = ? 2 I. This requires the error vector be the line segment from the origin, which is the point 0, to the point ?. The general linear sample model in matrix form is y = X ? + ? (1) where y denotes the n × 1 vector of observations, X denotes the n × p matrix of known values, ? denotes the p × 1 vector of unknown parameters and ? denotes the n × 1 vector of unobserved random errors. The detailed discussion on random errors can be seen in Searle [1]. The matrix equation (1) shows that the vector of observations is composed of mean vector and error vector. When E(y) is a 0 or a mean vector acting like an origin, the error vector satisfies the conditions of mean 0 and var(?) = ? 2 I. However, if E(y) = X ? then we might be interested in how to minimize the deviation vector, y = X ?. The least square method can be used as one of available methods for minimizing the error vector to estimate the parameter vector ?. However, the decomposition of an random error vector can be a little bit more comfortable and effective than the minimization of error sum of Abstract-This paper deals with the decomposition of an error vector to identify how the error vector is related to the expected value of an observation vector under a general linear sample model since the error vector is defined as the deviance of observation vector from the expected value. The main idea of the paper is in that a random error vector can be decomposed into two orthogonal components vectors; i.e., one is in a vector space generated by the coefficient matrix of the unknown parameter vector and the other is in orthogonal complement of it. As related topics to the decomposition, two things are discussed: partitioning an observation vector and constructing the covariance structure of it. It also shows the reason why a projection method would be preferred rather than a least squares method.
# Issue ersion I
V II ( F )
Decomposition of the Random Error Vector of a General Linear Model ? = y ? X ? is nonzero, ? is not in the column space of X. The related topics about these can be seen in Graybill [5], Johnson [6] and so forth. Thus, when X ? is used as a base for getting a error vector of y that is required to have one with the shortest distance from the origin among all the error vectors, we need to break up the error vector into a few component error vectors. This is the main idea of this paper. First, we discuss how to decompose the error vector. Secondly, we study the structure of error vector adjusted for the mean vector. Thirdly, we find the structure of var(y) that is related to the error structure. Finally, we discuss a method that is useful to calculate the sum of squares associated with the error components.
In the matrix equation (1) let the mean vector X ? be an nonzero vector. Then the equation can be changed into
y = X ? + ? (2) = X ? + ? m + ? r
where ? m and ? r are denoting two component vectors of the error vector. Since E(y) is X ?, ? can be broken down into two types of error vector; one type of error vector is the one that is in column space of X, and the other has the characteristic orthogonal to an every vector in a vector space generated by the columns of X, which is an orthogonal complement of the column space of X. In matrix equation (2), y ? X ? defines the error vector based on the mean vector X ? assumed to have mean 0 and ? 2 I in n dimensional space. Since the error vector is defined as a vector of deviations from the mean vector of E(y), we can decompose it into two component vectors depending on sources where the error vector is coming up; i.e., ? m or ? r . Let's rewrite the matrix equation (2) in terms of error vector from the mean vector. Then,
y ? X ? = ? m + ? r (3) = X X ? ? + (I ? X X ? )?
where ? m = XX ? ?, ? r = (I ? X X ? )?, and X ? = (X ? X) ?1 X ? denotes the Moore-Penrose generalized inverse where (X ? X) ?1 exists because X is a full column rank matrix. From the equation (3), we can know that there are two types of error vector when a vector space is decomposed into two orthogonal vector subspaces based on the mean vector, X ?, of y.
The above equation shows that E(y ? X ?) = 0 and var(y ? X ?) = ? 2 I. Here, the error vector y ? X ? is assumed as a linear combination of two different types of error vector each of which coming from two orthogonal vector subspaces. This is completely different concept of an error vector from the one we have thought traditionally. A lot of stuffs can be developed by the newly idea of viewing the error vector as the sum of a mean component error vector and a residual component error vector. Here, two specific terms are used to differentiate the types of error component: a mean component error for ? m and a residual component error for ? r .
Since the structure of a random error vector in matrix model ( 1) is changed depending on the structure of the mean vector, we are going to take a look at it with a bit simpler general linear models. Consider a situation where measurements are measured as deviations from the fixed size for products from a routine process. Let y be a random variable taking an observation on a randomly selected product from a population of products. Data collected from a sample of size n from the population can be arrayed in matrix form. The ith observation of y is expressed as y i = 0 + ? i for i = 1, 2, . . . , n, where ? i 's are assumed to be squares by the method of least squares in perspective that the structure of an random error vector is primarily considered. This idea is applied to breaking up an error vector in model (1). Since X ? is in a specific vector subspace generated by the column space of X, and Decomposition of the Random Error Vector of a General Linear Model independent with E(? i ) = 0 and var(? i ) = ? 2 . Applying the sample general linear model (1) to the data, the model turns out to be
y = 0 + ? (4)
where y is in R n denoting a Euclidean n-space, and E(y) = 0. Let V 0 be the vector space consisting only of 0 and let V 1 be the orthogonal complement of V 0 . Since y is the sum of two vectors such that one in V 0 and the other in V 1 , 0 ? ? = 0 which shows the relationship between mean vector and error vector of y; i.e., an orthogonal property. To express the equation ( 4) as the second expression in (2), ? can be divided into two terms, ? 0 and ? 1 where ? 0 denotes the error vector generated by 0 in a basis of V 0 , and ? 1 denotes the error vector generated by a basis set of V 1 . Now, the matrix equation can be expressed as
y = 0 + ? 0 + ? 1 (5)
where
? 0 is in V 0 of dimension 0, while ? 1 is in V 1 of dimension n.
Adding the information about the dimension of error component vectors, the equation ( 5) can be transformed into
y ? 0 = O? + (I ? O)?(6)
where O represents n × n zero matrix and I is n × n identity matrix. The equation shows that ? from the origin of rank 0 can be decomposed into two orthogonal vectors one of which being in the orthogonal complement of a vector space. In other words, this means that ? is actually composed of a linear combination of two component vectors; i.e. ? 0 and ? 1 . It seems such valuable to grasp the structure of a random error vector for finding the covariance structure of an observation vector. Now, we can study the error structure further with a little bit general but still simpler model having just one quantitative variable as an predictor. Consider the simple linear model with only one nonrandom independent variable in addition to an intercept term; i.e.,
y i = ? 0 + ? 1 X 1i + ? i , for i = 1, 2, ? ? ? , n.
We rewrite this in vector and matrix form as
y = j? 0 + X 1 ? 1 + ? (7) = X ? + ?
where X = (j, X 1 ) is an n × 2 coefficient matrix of ?, ? = (? 0 , ? 1 ) ? is an n × 2 parameter vector, j is an n × 1 vector of ones, and X 1 is an n × 1 vector of quantitative values. ? is an error vector assumed to have E(?) = 0 and var(?) = ? 2 I. The equation ( 7) is different from the one in (6) in that E(y) ? = 0. This is not a surprising thing in a general linear model other than that the mean vector X ? belongs to the column space of X, V m of dimension 2 which is thought to be a vector subspace in a Euclidean n-space, R n . Since the mean vector of y, X ?, is in V m , ? should be divided into two components: one in V m and the other in
V m ? denoting an orthogonal complement of V m in R n ; V m ? V m ? = R n .
The set of two vectors in the matrix of X can be regarded as a basis set of V m , which implies that X ? is in V m . Hence, the error vector can be divided into two component vectors such that one component in V m and the other in V m ? ; that is, ? = ? m + ? r . When we add this kind of information to the equation, the model will be
y ? X ? = ? m + (I ? ? m )?(8)
where
y ? X ? ? R n , ? ? V m and (I ? ? m )? in V m ? .
Both the set of the columns of a matrix X of rank 2 equivalent to a basis for V m and the set of the columns of a matrix XX ? can generate the same space, V m . Hence, the equation ( 8) can be changed into where XX ? ? replaces ? m and denotes the projection of ? m onto a vector space, V m , generated by two vectors j and x. The matrix equation ( 9) turns out to be
y ? X ? ? X X ? ? = (I ? X X ? )?(9(I ? XX ? )? = y ? (X ? + X X ? ?) (10) = (I ? X X ? )y where X ? + XX ? ? = XX ? y.
From the decomposition of a random error vector of a general linear sample model in matrix form, we can identify that the matrix model ( 1) can be transformed into
y = X ? + ? (11) = X X ? y + (I ? X X ? )y
where y is composed of two orthogonal vectors: i.e., (XX ? y) ? (I ? X X ? )y = 0. The model equation (11) implies that all types of a general linear model can be represented by a sum of two orthogonal vectors where one vector belongs to an vector subspace and the other is in the orthogonal complement of the vector space generated by the coefficient matrix of ?: i.e., XX ? y ? V m , and
(I ? XX ? )y ? V ? m .
Here, the primary concern is actually in structural aspects of an assumed linear model while the least square method focuses only on getting the best approximate solution from a system of inconsistent equations such that X ? ? y = ? by the method of minimizing the error sum of squares. Hence, they are different approaches developed from different view of points. Now, consider the calculation of var(y). The covariance matrix of y is
var(y) = var(X X ? y + (I ? X X ? )y) (12) = ? 2 X X ? + ? 2 (I ? X X ? ) = ? 2 I
From the above equation (12), var(y) can be obtained by identifying the linear transformations of y; i.e., the covariance matrix of y can be partitioned as the sum of component covariance matrices, which can be done by ascertaining transformation matrices for component vectors of y. There are some referable literature related to covariance matrix such as Milliken and Johnson [7], Hill [8], and Searle [9]. Hence, it is essential to figure out the coefficient matrices of component error vectors to find the projections of y onto the vector subspaces generated by the orthogonal coefficient matrices. Discussions on coefficient matrices are seen in Choi [10][11][12], where they are related to get nonnegative variance estimates.
As a result of the decomposition of e, we see y can be represented by the sum of two orthogonal component vectors such as (11) where one is in a vector space covering the E(y) and the other is in the orthogonal complement of it. This means that XX ? y actually defines a projection of y onto the vector space spanned by the XX ? where X is coefficient matrix of ? and given as X = (j, X 1 ). For the estimation of parameter vector ? we can use the mean part of the model in matrix form of (11). From the concept of a projection in a vector space the projection of y onto a column space of X is as follows:
X ? = X X ? y(13)
where E(y) = X ?. When XX ? y is viewed as the orthogonal projection of y onto a column space of X we can take ?p = X ? y as the value of ? where ?p is an notation for differentiating from ? obtained from the normal equations. When the expression in ( 13) is viewed as the system of equations, the best approximate solution to the system can be Decomposition of the Random Error Vector of a General Linear Model obtained as ? = X ? y because the system of equations is inconsistent and X is n × 2 matrix of rank 2. Although solutions of ? can be obtained in different approaches, the results are actually same. In a similar way that XX ? y can be used for the estimation of ?, a quadratic form in y can also be used for the estimation of ? 2 . Here, the required quadratic form is given as
Q r = y ? (I ? X X ? )y(14)
where (I ? XX ? ) is a symmetric and idempotent matrix of rank n ? 2. Since (I ? XX ? )y is regarded as a linear transformation of y, it has all the information about the residual random error component, ? r . Hence, the quadratic form Q r in y can be used to estimate the variance ? 2 . Taking the expectation of Q r is given as
E(Q r ) = E(y ? (I ? X X ? )y)(15)= ? 2 tr(I ? X X ? ) + (X ?) ? (I ? X X ? )X ? = ? 2 (n ? 2)
where tr(?) means trace of a square matrix denoted by (?), which is defined to be the sum of the diagonal elements of the square matrix. Some theorems and properties of trace can be seen in Graybill [2]. As an estimate of ? 2 from the equation ( 15), ?2 p can be taken as Q r /(n ? 2) which can also be obtained by the least square method when there is no normality assumption for ?. Even though those two procedures have the same result, it should be noticed they are basically approaching from different view of point; that is, one is from the decomposition of an error vector, and the other is from the minimization of error sum of squares.
As for an example of a simple linear model, we consider following data from Krumbein and Graybill [13]. The data are assumed to satisfy the model y i = ? 0 + ?X 1i + ? i , for i = 1, 2, ? ? ? , 10, where ? i are independent and identically distributed N(0, ? 2 ).
Krumbein and Graybill's Data [13]. For the estimation of two unknown parameters, ? 0 and ?, we can get ?p by multiplying (X ? X) ?1 X ? on both sides of the equation (13), which is given as: The primary concern of the study is on the decomposition of an error vector in matrix form of a general linear model. When ? is n × 1 vector, the usual assumptions for error vector are sometimes given as E(?) = 0 and Var(?) = ? 2 I. Under these assumptions, an idea for breaking up the error vector lies on the thought of which the mean vector is related to the error vector because the error vector is defined to be the deviation vector from the mean of the model. When the error vector is decomposed into two orthogonal components, it is shown that a projection can be defined from the decomposition of the error vector. Hence, a partition of the vector of observations can be seen as the sum of vectors which are orthogonal projections each other. The covariance matrix of y is partitioned into two covariance matrices; that is, one for (XX ? )y, and the other for (I ? XX ? )y. This implies that the covariance matrix of a vector of observations can always be partitioned into component matrices each of which corresponding to an orthogonal projection of y respectively. From the decomposition of an error vector of a general linear model, we derived two types of estimators; one is linear transformation of y for X ? to estimate ? and the other is quadratic form in y for ? 2 . Partitioning of the covariance matrix can be useful to ascertain the covariance matrix of each component projection. It is worth to note that decomposition of an error vector is actually defines a projection of y onto a column space of X and which is quite different approach from the least square method in a point of view for an error vector.
x(X ? X) ?1 X ? X ? = (X ? X) ?1 X ? XX ? y (16) ?p = (X ? X) ?1 X ? y = 0.
# VI. Example
Although the least squares method is very useful and accepted as one of well-known methods for estimating the unknown parameters included in a linear model, it seems not to be right for finding out whether there is any orthogonal property exists among errors. Since the least squares method concentrates only on minimizing the sum of squares of deviations of the observations from the expected values, it is not an appropriate method as a tool for getting the information on an orthogonal property between the groups of errors. The orthogonal property is extremely important in statistics especially in the analysis of variance for getting nonnegative estimates for variance components. There are lots of interesting papers [14][15][16][17][18][19][20] related to the negative estimates of variance components seemed to be caused by overlooking the orthogonality. So, it is emphasized that the orthogonal property can be found by the decomposition of the random error vector. Hence, the procedure discussed on this paper is distinct from any other methods for estimating the unknown parameters in a general linear model. Not applicable.
![II. Decomposition of Random Error Vector III. Structure of Random Error Vector Notes](image-2.png "")
![Decomposition of the Random Error Vector of a General Linear Model](image-3.png "")
982060878 ?2.312086e ? 03 ?0.002312086 6.060514e ? 061250 550500=?45.2273450 0.4462054Denoting ? 0p , and ?p as estimates of ? 0 and ? respectively, ? 0p = ?45.2273450 and?p = 0.4462054. Least squares estimates are given as ?0 = ?45.227 and ? = 0.446. For theestimation of ? 2 , we can get an estimate as:?2p = y ? (I ? X X ? )y/8 = 2398.13/8 = 299.7663(17) where (I ?
1ersion I VIIIssueVolume XXIII( F )Frontier Researchof ScienceGlobal Journal
© 2023 Global Journals
This research received no external funding. Not applicable.
## Not applicable.
The data analyzed in this study are openly available in reference number [13].
Not applicable.
The authors declare no conflict of interest.
## VII. Discussion
## VIII. Conclusions
Author Contributions:
Funding:
Institutional Review Board Statement:
Informed Consent Statement:
Data Availability Statement: Table 21.1, pp.231, Krumbein and Graybill [13].
Acknowledgments:
Conflicts of Interest:
*
Linear models
SRSearle
*
JohnWiley
Sons
1971
New York, USA
*
Theory and Application of the Linear Model
FAGraybill
*
Notes Decomposition of the Random Error Vector of a General Linear Model
Wadsworth
1976
Belmont, CA, USA
*
Linear Statistical Models
JHStapleton
*
JohnWiley
Sons
1995
New York, USA
*
Applied Regression Analysis
NRDraper
HSmith
*
JohnWiley
Sons
1981
New York, USA
*
Matrices with Applications in Statistics
FAGraybill
1983
Wadsworth; Belmont, CA, USA
*
Applied multivariate statistical analysis
DWJohnson
RAWichern
2014
Prentice hall
Upper Saddle River, NJ, USA
*
Analysis of messy data volume 1: designed experiments
GAMilliken
DEJohnson
1984
Van Nostrand Reinhold
New York, USA
*
Inference about variance components in the one-way model
BMHill
J. Am. Stat. Assoc
60
1965
*
Variance components
SRSearle
GCasella
CEMcculloch
*
Nonnegative estimates of variance components in a two-way random model
JChoi
Communications for Statistical Applications and 127 Methods
2019
26
*
Nonnegative variance component estimation for mixed-effects models
JChoi
Communications for Statistical Applications and Methods
27
2020
*
Nonnegative estimation of variance components for a nested three-way random model. symmetry 2022
JChoi
10.3390/sym14061210
14
1210
*
An introduction to statistical models in geology
WCKrumbein
FAGraybill
*
Mcgraw-Hill
1965
New York, USA
*
Negative estimates of variance components: an introduction; Bulletin, International Institute of Statistics
WAThompson
1961
34
*
The problem of negative estimates of variance components
WAThompson
Ann.Math.Stat
33
1962
*
Non-negative estimates of variance components
WAThompson
JRMoore
Technometrics
5
1963
*
The interpretation of negative components of variance
JANelder
Biometrika1954, 41
*
Estimation of variance and covariance components
CRHenderson
Biometrics
9
1953
*
Expectations, variances and covariances of ANOVA means squares by "synthesis
HOHartley
Biometrics
23
1967
*
An approximate distribution of estimates of variance components
FESatterthwaite
Biometrics Bulletin
2
1946