Next Article in Journal
Outliers in Semi-Parametric Estimation of Treatment Effects
Previous Article in Journal
Uncertainty Due to Infectious Diseases and Stock–Bond Correlation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Multidimensional Arrays, Indices and Kronecker Products

Department of Economics, University of Leciceter, Leciceter LE1 7RH, UK
Submission received: 18 November 2020 / Revised: 30 March 2021 / Accepted: 27 April 2021 / Published: 28 April 2021

Abstract

:
Much of the algebra that is associated with the Kronecker product of matrices has been rendered in the conventional notation of matrix algebra, which conceals the essential structures of the objects of the analysis. This makes it difficult to establish even the most salient of the results. The problems can be greatly alleviated by adopting an orderly index notation that reveals these structures. This claim is demonstrated by considering a problem that several authors have already addressed without producing a widely accepted solution.

1. Introduction

The algebra of the Kronecker product is essential to multivariate statistical analysis. It is involved in the formulation of multivariate statistical models and in the derivation of their estimating equations. Such derivations commonly require the differentiation of functions of matrices in respect of their matrix arguments. This generates expressions that demand the use of the Kronecker product (see Pollock (1979, 1985) and Magnus (2010) for examples).
The necessary algebraic results have become available in a variety of texts that have arisen over a long period. Early texts on the subject were provided by Balestra (1976), Rogers (1980) and Graham (1981). Other accounts were given by Pollock (1979) and by Magnus and Neudecker (1988). More recently, there have been the texts of Steeb (1997) and of Turkington (2002).
Given that the subject is so well served, one might wonder what scope exists for further contributions. The repository of specialised results is extensive and most of what is needed is available, if looked for diligently. Nevertheless, the subject remains problematic. In many respects, it lacks sufficient transparency to allow easy access to its users, who are liable to call upon it only to satisfy their occasional needs.
Much of the problem lies in the notational difficulty of the subject. This is not primarily a matter of notational complexity. It is a more a result of the use of a notation that conceals the underlying structure of the objects of the analysis. A litany of results has been created that are not readily proven and that are mostly immemorable.
The contention that underlies this paper is that many of the difficulties can be relieved by adopting an appropriate notation involving an explicit use of indices. In the context of a treatise on differential geometry, Cartan (1952) famously decried the confusion that can arise from what he described as une debauche d’indices. He suggested that a clearer understanding of the geometry could be achieved by appealing to abstract concepts. This paper offers a different opinion. An orderly index notation is an indispensable aid to efficient computation. It assists rather that impedes ready access to vector space interpretations.
The continued adherence of econometricians and statisticians to the conventional notation of matrix algebra, when dealing with Kronecker products, is explained by one of their essential purposes, which is to cast complex multivariate models into the formats of simpler models for which there exist well developed theories. An early example of the use of the Kronecker product in econometrics was provided by Zellner (1962) in his treatment of the so-called seemingly unrelated regression equations. His treatment enabled the system of equations to be cast in the format of a single multiple regression equation, which enabled him to exploit the existing results pertaining to such equations.
This paper is concerned primarily with matters of computation. Therefore, the next section considers the alternative ways in which computers handle multidimensional arrays. Section 3 presents an index notation, which is designed to assist in computations, and it provides an analysis of the Kronecker product, which is the matrix of a tensor product with respect to a natural basis.
Section 4, which may be omitted without interrupting the flow of the exposition, describes the formal algebra of the tensor product, and it mentions some of its applications. Section 5 and Section 6 deal with a particular problem concerning multiple Kronecker products, which, it is believed, will serve to illustrate the power of the index notation.

2. The Multidimensional Framework and the Computer

The natural framework for a multi-dimensional array is a set of points in the positive orthant of a multi-dimensional space that have integer-valued co-ordinates. These co-ordinates will serve as the indices of the elements of the array.
The simplest array is a linear array of one dimension, which is described as a vector. Next is a rectangular array of two dimensions, which is a described as a matrix. In three dimensions, there is a cubic array, and the progression continues with hypercubes of increasing dimensions. However, there is an obvious difficulty that limits the dimensions of a practical representation to no more than two. The elements in the interior of a cube are hidden from view, and an object of more than three dimensions is incapable of visualisation.
The only practical recourse for making higher-dimensional arrays accessible and amenable to manipulations is to map them into a linear array or into a matrix. Such a mapping establishes an ordering within the lower-dimensional array of the elements of the higher-dimensional array.
The simplest examples of such mappings are the ways in which a matrix A = [ a i j ; i = 1 , 2 , , M , j = 1 , 2 , , N ] of M rows and N columns can be mapped into a vector. The row-major mapping joins successive rows of the matrix in a wide row vector. Here, the index j passes through N values for each increment of the index i. The index that locates the element a i j within the vector is given by ( i 1 ) N + j . The alternative column-major mapping joins successive columns of the matrix in a long column vector, and then the index that locates the element a i j is given by ( j 1 ) M + i .
These two mappings are commonly employed in assigning the elements of a matrix to contiguous cells of the memory of a computer. Whereas the Pascal and C compilers adopt the row-major principle, the FORTRAN compiler and the MATLAB interpreter adopt the column-major principle.
The space to store a matrix A of order M times N in the memory of a computer is commonly provided by a statement such as
A : array [ 1 , , M ] [ 1 , , N ] .
Regardless of which of the two mappings has been used to store the matrix, its elements can be accessed and displayed readily in either order, and matrix can also be reconstituted and displayed in its original form. Thus, with the help of nested do-loops, the matrix can be displayed via a code such as
for i : = 1 to M do begin { i } for j : = 1 to N do begin { j } W r i t e ( a [ i ] [ j ] ) ; S k i p S p a c e ; end ; { j } S k i p L i n e ; end ; { i }
the row-major form would be displayed via
for i : = 1 to M do for j : = 1 to N do begin W r i t e ( a [ i ] [ j ] ) ; S k i p S p a c e ; end ;
and the column-major form would be displayed by reversing the order of the do-loops:
for j : = 1 to N do for i : = 1 to M do begin W r i t e ( a [ i ] [ j ] ) ; S k i p L i n e ; end ;
The extension to arrays of higher dimensions is straightforward. Consider a cubic array
A : array [ 1 , , M ] [ 1 , , N ] [ 1 , , P ] ,
in which the elements a i j k are indexed by i = 1 , , M , j = 1 , , N , and k = 1 , , P . On the assumption that the index k in now the most rapidly changing, the row-major mapping would give the element a position corresponding to ( i 1 ) N P + ( j 1 ) P + k , whereas, with i as the most rapid index, the column-major mapping would give it the position corresponding to ( k 1 ) M P + ( j 1 ) P + i .
It should be noted that, if the initial values of i, j and k are 1, and if M = N = P = 10 , then, in the row-major mapping, it might be declared that k stands for units, j stands for tens and i stands for hundreds. Then, the formula for the position corresponds to how we count in base 10. More generally, if M, N and P were to take different values, then the index would correspond to a mixed radix representation of a number. As it is, with the initial values at one, we are liable to describe i j k within the row-major mapping as the index of a lexicogoraphic or dictionary ordering.
A three-dimensional array has three indices. Two of these indices will serve to define the framework of a matrix and the third index will serve to define the order of the resulting matrices. The indices can be assigned to these roles in six different ways. To keep track of the elements of a multidimensional array, it is clear that an index notation is called for. This will be provided in the next section together with the definitions of a few of the objects that arise most frequently in the processes of statistical computing. These objects are liable to be compounded from objects of lower dimensions. This gives rise to structures that one may be able to exploit in pursuit of computational efficiency.
To achieve computational efficiency, it is necessary to avoid displacing the values that are stored in the memory of the computer. They must be left in place, regardless of the changing nature of the structures to which they may give rise, whether these are vectors or matrices or the products of matrices. Such objects can be created by the manipulation of pointers, which are variables that contain the addresses of the memory locations in which the values are stored.
It is notable that do-loops are implemented by pointers, which accounts for the ease with which their order can be altered within a nested sequence. Thus, whichever system of storage is used, one can rely on do-loops (or pointers) to pluck the elements from the memory in the orders that are required.

3. The Index Notation

The object that will be analysed in detail in subsequent sections of the paper is a threefold Kronecker product postmultuplied by a long vector. It is appropriate already to display this object, which may be denoted by
Y v = ( A B C ) X v .
Its factors are as follows:
A = [ a j p ] = ( a j p e j p ) , B = [ b k q ] = ( b k q e k q ) , C = [ c r ] = ( c r e r ) , X v = ( x p q r e p q r ) .
In these definitions, both the brackets [ , ] and the parentheses ( , ) are meant to signify that each index is varying over its designated range. The lower limits of the indices are unity, and their upper limits may be denoted by the corresponding capital letters, so that, for example, j = 1 , , J . The alternative notation for the matrices and for the long vector X v has been taken from a paper of Pollock (1985). It expresses the matrices as weighted combinations of the corresponding basic matrices. Thus, for example,
A = ( a j p e j p ) = j p a j p e j p ,
where e j p is a basic matrix with a unit in the jth row and the pth column and with zeros elsewhere (the parentheses indicate that summations are to be taken in respect of the indices j and p that are associated with the basic matrices and which are repeated within the accompanying scalar elements). The pth column of the matrix A is ( a j p e j ) = j a j p e j , whereas its jth row is ( a j p e p ) = p a j p e p . Here, as in Equation (8), the summations are over the basis elements, which are vectors in these cases. Summations occur only when there are repeated instances of an index.
There are alternative ways of arranging the elements of the matrix to define different objects. The transpose A of the matrix together with the long column vector A c and the wide row vector A r , which are described, respectively, as its column-major and row-major forms, may be denoted as follows:
A = ( a j p e j p ) = ( a j p e p j ) , A c = ( a j p e j p ) c = ( a j p e p j ) , A r = ( a j p e j p ) r = ( a j p e j p ) .
Observe that, in the mappings A A c and A A r , the migrating index moves ahead of the index or the string of indices that it is joining. In the mappings A A c and A A r , the migrating index joins from behind.
Whereas the matrix elements a j p will remain fixed in the computer memory in the order indicated by their indices, the nature of a derived multidimensional object is indicated by the positioning of the indices on the basic matrices or vectors. Thus, e j p denotes a basic matrix with a unit in the jth row and the pth column, whereas e p j is a long column vector with a unit in the { ( p 1 ) J + j } th position and e j p is a wide row vector with a unit in the { ( j 1 ) P + p } th position. Elsewhere, there are only zero-valued elements. For the basis vectors, the composite indices follow a lexicographic ordering, which accords with the conventional definition of a Kronecker product, as will be confirmed at the end of this section.
The translation from the elements that are held in the computer’s memory to the various multidimensional objects should be by virtue of the pointer arithmetic that is associated with the basic arrays e j p , e p j and e j p . These matrices (or vectors) determine the format of the object or its “shape”, which is the MATLAB terminology.
Observe that, in the case of the long column vector A c , the elements are arrayed in the reverse of their lexicographic ordering, with the trailing index p as the leading classifier within the vector. In the case of the long column vector X v = ( x p q r e p q r ) of (6), which is the transpose of the wide vector X r = ( x p q r e p q r ) , the elements follow the lexicographic ordering of the indices p , q , r that are associated with the basic vectors. It will also be necessary to consider the long vector X c = ( x p q r e r q p ) in which the elements x p q r are arrayed in reverse lexicographic order, which is the order adopted by the MATLAB program.
The composition of the matrix A with a matrix D = [ d p r ] = ( d p r e p r ) may be denoted by
A D = ( a j p e j p ) ( d p r e p r ) = ( { a j p d p r } p e j r ) .
The final expression is derived by cancelling the superscripted column index p of the leading matrix with the subscripted row index p in the trailing matrix. The index p that is affixed to the right brace within the product is to emphasise that the elements within the braces are to be summed over the range of the repeated index. It may be omitted if the eyes can be relied on to recognise repeated indices. Thus
{ a j p d p r } p = p a j p d p r ,
which is described as a contraction with respect to p.
The transformation A c = © A c from A c = ( a j p e p j ) to A c = ( a j p e j p ) is effected by the commutation matrix © = ( e j p p j ) . Thus
A c = ( a j p e j p ) = ( e j p p j ) ( a j p e p j ) = © A c ,
and it will be recognised that A c = A r = A v , albeit that these identities do not extend to objects of higher dimensions.
It is easy to seen that © is an orthonormal matrix such that © = © 1 . It has been given various names. Pollock (1979) has called it the tensor commutator, denoted by a capital T inscribed in a circle. Magnus and Neudecker (1979) described it as the commutation matrix, which they denoted by K, and Van Loan (2000) has described it as the shuffle matrix, denoted by S.
The notation for the Kronecker product of matrices is illustrated by the following example:
A B = ( a j p e j p ) ( b k q e k q ) = ( { a j p b k q } e j k p q ) .
Here, the row indices j , k and the column indices p , q associated with the basic matrices both follow lexicographic orderings. The order of the matrices within a Kronecker product can be reversed by applying the tensor commutator to both the rows and the columns. Thus,
© j k ( A B ) © p q = B A or ( e k j j k ) ( { a j p b k q } e j k p q ) ( e p q q p ) = ( { a j p b k q } e k j q p ) .
The subscripts on the © matrices indicate the indices that are to be commuted.
It is now possible to evaluate the product of (6). This can be expressed in the conventional matrix notation and in the index notation as follows:
Y v = ( A B C ) X v or ( y j k e j k ) = { ( a j p e j p ) ( b k q e k q ) ( c r e r ) } ( x p q r e p q r ) = ( { a j p b k q c r } e j k p q r ) ( x p q r e p q r ) .
On replacing the row-major vector X v by the column-major vector X c and on reversing the order of the matrices, an alternative version of the product is created in the form of
Y c = ( C B A ) X c or ( y j k e k j ) = { ( c r e r ) ( b k q e k q ) ( a j p e j p ) } ( x p q r e r q p ) = ( { a j p b k q c r } e k j r q p ) ( x p q r e r q p ) .
A series of commutation operations link the two forms. These forms might be regarded as generalisations of the equations
Y v = ( A X B ) v = ( A B ) X v , Y r = ( A X B ) r = X r ( A B ) and Y c = ( A X B ) c = ( B A ) X c ,
where X v = ( x p q e p q ) and X c = ( x p q e q p ) , and where the other matrices are as defined previously. The equations of (17) tend to convey the idea that a multiple Kronecker product can be computed via pairwise matrix multiplications. To pursue such a strategy, it is necessary to cast the vectors X v , X r or X c and their transforms into the shapes of matrices.
The various operations of reshaping that might be applied to the column-major vector X c = ( x p q r e r q p ) would produce matrices that are denoted as follows:
X p c = ( x p q r e p r q ) , X q c = ( x p q r e q r p ) , X r c = ( x p q r e r q p ) .
Similar operations can be applied to the row-major vector X v = ( x p q r e p q r ) . It is important that the reshaping operations should be effected by manipulating pointers, instead of by moving the elements themselves.
The conversions X c X p c and X c X r c involve separating the trailing and leading elements, respectively, from the rest of the composite index r q p , whereby e r q p e p r q and e r q p e r q p . These conversions will prove to be amenable to the reshape operator of MATLAB. To create X q c via MATLAB would require a commutation operation to bring the index q to the leading or trailing position. However, the creation of all of these matrices is readily achieved with do-loops that are nested appropriately. The MATLAB reshape operator is considered in detail in Section 6, and examples are provided in an appendix. In a recent paper, Fackler (2019) has show that some efficiency in computing the product of (6) can be gained by exploiting the reshape procedure of that program.
It may be helpful to demonstrate that the definition that has been provided in (13) for the Kronecker product A B does correspond to the usual definition, which depends on displaying the subscripted elements within the context of their matrices. It is sufficient to consider the product of two square matrices of order 2 as follows:
a 11 a 12 a 21 a 22 b 11 b 12 b 21 b 22 = a 11 b 11 b 12 b 21 b 22 a 12 b 11 b 12 b 21 b 22 a 21 b 11 b 12 b 21 b 22 a 22 b 11 b 12 b 21 b 22 = a 11 b 11 a 11 b 12 a 11 b 21 a 11 b 22 a 12 b 11 a 12 b 12 a 12 b 21 a 12 b 22 a 21 b 11 a 21 b 12 a 21 b 21 a 21 b 22 a 22 b 11 a 22 b 12 a 22 b 21 a 22 b 22 .
It can be seen that the composite row indices j k associated with the elements a j p b k q follow the lexicographic sequence { 11 , 12 , 21 , 22 } , as do the composite column indices p q , which is to say that the conventional definition of a Kronecker product is in accordance with the row-major scheme.

4. Tensor Products and Associated Notations

The majority of econometric texts that consider the Kronecker product make no mention of tensor products, notwithstanding the fact that a Kronecker product may be classified as a tensor product. An exception is the account of Pollock (1985), which defines tensors in the context of multilinear algebra.
There are two ways in which the algebra of tensors may be approached. The first way is via mathematical physics, where vectors are liable to subsist in three-dimensional or four-dimensional spaces. Tensors are used in expounding the theory of electrodynamics via Maxwell’s equations—see Brau (2004)—the theory of special relativity—see Lawden (1967) —and the theory of the elasticity of solids—see Bower (2010). In such contexts, vectors and tensors are liable to acquire substantive connotations.
A second way of approaching tensors and tensor products is via the abstract theory of multilinear algebra that was propounded originally by Bourbaki (1948) and which has been expounded in the more recent texts of Greub (1967) and Marcus (1973). Such expositions take a coordinate-free approach that avoids specifying bases for the vector spaces and which does not preempt the choice of a metric.
The coordinate-free approach is well suited to a physical analysis that regards vectors as geometric objects that exist in space independently of any coordinate system or of its basis vectors. This flexibility is exploited in the theory of special relativity to enable geometric vectors to be viewed within alternative frames of reference. The natural basis is appropriate to an observer at rest within their own frame of reference. However, objects that are in motion relative to the observer require other coordinate systems, which are derived by applying the Lorentz transform to the natural basis. (see Marder (1968) and Susskind and Friedman (2018), for examples).
In the main, data analysts and computer scientists are liable to regard vectors as algebraic objects comprising an ordered set of elements, which are the coordinates of the vector relative to a natural basis consisting of orthogonal vectors of unit length comprised by an identity matrix. Our approach, in this section, is to express the basic definitions in terms of arbitrary vector spaces, denoted by U and V etc., and to interpret them immediately in terms of real finite-dimensional coordinate spaces. An n-dimensional real vector space will be denoted by R n .
If V is a vector space defined over a scalar field F , then the dual space is the space of all linear functionals V * = L ( V , F ) mapping from V to F . Given that it is of the same dimension, the dual space V * is isomorphic to the primal vector space V . The relationship between V * and V is reflexive or reciprocal, which is to say that the dual of the dual space is the primal space.
The dual of the vector space R n may be denoted by R n * . The elements of a coordinate vector space R n are column vectors, which may be denoted by subscripted lowercase symbols, such as x 1 , y 2 , whereas the elements of the corresponding dual space R n * are row vectors denoted by symbols with superscripts, such as x , x * , x 1 , y 2 . Lowercase letters without subscripts, such as x , y , are liable to be regarded as column vectors. The inner product x y is a linear functional.
If U and V are vector spaces defined over a scalar field F , then a mapping ϕ : U × V F from the Cartesian product of the spaces to the scalar field is a bilinear functional if, when u 1 , u 2 U and v 1 , v 2 V and when α , β F , there are
ϕ ( α u 1 + β u 2 , v 1 ) = α ϕ ( u 1 , v 1 ) + β ϕ ( u 2 , v 1 ) , ϕ ( u 1 , α v 1 + β v 2 ) = α ϕ ( u 1 , v 1 ) + β ϕ ( u 1 , v 2 ) .
The set of all such bilinear functionals is a vector space denoted by L ( U , V ; F ) . Familiar examples of the mapping R n * × R n R are as follows:
x E i j y = ( x i e i ) ( e i j ) ( y j e j ) = ( x i e i ) ( y j e i ) = x i y j , x I n y = ( x i e i ) ( δ i j e i j ) ( y j e j ) = ( x i e i ) ( δ i j y j e i ) = { x i y i } i , x A y = ( x i e i ) ( a i j e i j ) ( y j e j ) = { x i a i j y j } i j ,
were E i j = ( e i j ) denotes a matrix with a unit in the i j th position and with zeros elsewhere. The second of these products may be written, more simply, as
x y = ( x i e i ) ( y i e i ) = { x i y i } i .
This is a bilinear functional, which can also be considered to be a linear functional, if one of the vectors is regarded as the means by which the other vector is mapped to the field R .
Less familiar are the matrix forms of the mappings R n × R n R and R n * × R n * R . Examples of these, in turn, are as follows:
A r ( x y ) = ( a i j e i j ) ( x i e i ) ( y j e j ) = ( a i j e i j ) ( x i y j e i j ) = { a i j x i y j } i j
and
( y x ) A c = ( y j e j ) ( x i e i ) ( a i j e j i ) = ( y j x i e j i ) ( a i j e j i ) = { y j x i a i j } i j .
Reference to (17) will show that A r ( x y ) = ( x A y ) r and that ( y x ) A c = ( x A y ) c .
It is important to recognise that the vector space L ( U , V ; F ) represents the means by which vectors u U and v V are mapped into the scalar field F . It should be distinguished from the individual mappings of U × V F , each of which entails specific values of u and v. The distinction can be made clear by considering the functional f = u A v , where u = [ u 1 , , u n ] and v = [ v 1 , , v m ] are column vectors and A = [ a i j ] is a matrix. The pair ( u , v ) are an element of the Cartesian product U × V . The matrix A is an element of L ( U , V ; F ) , and f is a product of a specific mapping of U × V F .
A tensor product of vector spaces may be defined in a confusing variety of ways. However, it is appropriate to regard the tensor product U V as unique linear mapping from the set U × V . Thus, for example, Shephard (1966) defines it to be the dual of the vector space L ( U , V ; F ) of bilinear functionals, so that each bilinear functional corresponds to a unique linear functional on U V . From the point of view of the real coordinate spaces, this amounts to nothing more than a change of notation, Nevertheless, what is revealed can be insightful.
A distinction must be made between the forms U * V , U * V * and U V . The first of these is liable to be described as a contravariant product, and it may be noted that U * V = V U * . The others may be described as covariant products, and it should be noted that the order of the two spaces is significant: U V V U and U * V * V * U * . In the present context, covariance and contravariance denote relationships. However, in mathematical physics and in other contexts, they are regarded as attributes of the vector spaces.
If x R n * and y R n , then their contravariant product is an element in R n * R n that has the form
x y = y x = ( x i e i ) ( y j e j ) = ( x i y j e j i ) .
The corresponding covariant products in R n R n and R n * R n * , respectively. are
x y = ( x i e i ) ( y j e j ) = ( x i y j e i j ) and x y = ( x i e i ) ( y j e j ) = ( x i y j e i j ) .
These are elementary or decomposable products. Examples of non-decomposable tensor products of the three varieties are
A = i j a i j ( e i e j ) = ( a i j e i j ) , A c = i j a i j ( e j e i ) = ( a i j e j i ) , A r = i j a i j ( e i e j ) = ( a i j e i j ) .
These would become decomposable products if a i j = α i β j for all i , j , albeit that a decomposable product may be the product of two or more non-decomposable products.
It is when coordinates are attributed to physical vectors in 3-dimensional space or in 4-dimensional spacetime that the terms covariant and contravariant come to denote attributes as opposed to reciprocal relationships. The usage can be demonstrated by considering a change of bases. Let W = [ w 1 , w 2 , , w n ] and M = [ m 1 , m 2 , , m n ] be alternative bases of a vector space V . Then, there exists transformations M = W A and W = M A 1 that transform one basis into the other.
Consider a vector expressed as v = v 1 e 1 + v 2 e 2 + + v n e n in terms of the natural basis I n = [ e 1 , e 2 , , e n ] and, alternatively, as v = W x = M y , where x = [ x 1 , x 2 , , x n ] and y = [ y 1 , y 2 , , y n ] are the coordinate vectors relative to the bases W and M, respectively. Then, since M = W A , there are W x = W A y , whence x = A y and y = A 1 x . Thus, the bases and the corresponding coordinates transform as follows:
W x } { M = W A y = A 1 x .
Let A = ( λ i e i i ) be a diagonal matrix. Then, the ith column of the transformed basis M is
m i = ( m i j e j ) = ( w i j e i j ) ( λ j e j ) = ( { w i j λ j } j e i )
and the vector of coordinates relative the basis vectors of M is
y = ( y i e i ) = ( λ i 1 e i i ) ( x i e i ) = ( λ i 1 x i e i ) .
They have a contravariant relationship with the basis vectors. The effect of halving the length of a basis vector will be the doubling of the value of an associated coordinate.
In general, vectors that change their scale inversely to changes in the scale of the reference axes (i.e. the basis vectors) are commonly described as contravariant. Dual vectors that change in the same way as the changes to scale of the reference axes are described as covariant. If v = v 1 e 1 + v 2 e 2 + + v n e n = ( v i e i ) denotes a velocity vector, then its coefficients [ v 1 , v 2 , v n ] , which have the dimension of distance/time, can said to constitute a contravariant vector. By contrast, a gradient vector of a dimension quantity/distance will be a covariant vector.
In the Einstein notation, which suppresses the summation sign, contravariant components carry upper indices, and the velocity vector will be denoted, typically, by
v = v 1 e 1 + v 2 e 2 + + v n e n = v i e i ,
where e i denotes a basis vector. The understanding is that, whenever an index is repeated within an expression, there must be a corresponding summation. This easement of the notation, which dispenses with the summation signs, relies on an understanding of the context of the expression.
The notation of the present paper adopts the same summation convention for expressions that are found within parentheses. Thus, I n = ( δ i j e i j ) denotes an identity matrix. However, in the absence of the parentheses, a i j e i j denotes a matrix with a i j in the ith row and the jth column and with zeros elsewhere. The expression ( e i j ) from (21) denotes a matrix with a unit in that position and with zeros elsewhere. Since there are no repeated indices, the parentheses do not imply a summation.
It should be noted that, in this notation, scalar elements never carry superscripts. Furthermore, the notation refers invariably to algebraic or coordinate vectors. It does not encompass coordinate-free geometric vectors, which are common in physical contexts.
The diversity of the existing tensor notations is too great to be summarised here. However, numerous references to alternative notations have been given by Harrison and Joseph (2016). Mention should also be made of De Lathauwer et al. (2000) and Bader and Kolda (2006), who have provided accounts that have been widely referenced by computer scientists.

5. The Problem at Hand

In terms of the index notation, the product of (6) may be expressed as
( A B C ) X v = ( { a j p b k q c r } e j k p q r ) ( x p q r e p q r ) = ( { a j p b k q c r x p q r } p q r e j k ) = Y v .
The final expression of the RHS is the product of the post multiplication of A B C by X v . It is derived by cancelling the (superscripted) column indices p q r , which are associated with the basic matrices within the leading factor, which is the Kronecker product, with the (subscripted) row indices in the trailing factor, which is the vector X v . The indices p q r affixed to the right brace in the penultimate expression emphasise that the elements within the braces are to be summed over the repeated indices. The indices of summation can be inferred from the contents of the braces.
The final expression of (32) suggests various ways of computing the product Y v . One way would be by a single contraction or summation in respect of the composite index p q r . In so far as the vector X v is concerned, the composite index is liable to correspond to a set of adjacent addresses in the memory of the computer.
The number of multiplications involved in forming the Kronecker product is J P × K Q × L R , which is the product of the number of elements in the constituent matrices. The operation of forming Y v via a single contraction in respect of the composite index p q r entails P Q R × J K L multiplications and ( P Q R 1 ) × J K L additions.
A more efficient way of computing the product is by successive contractions in respect of the indices r, q and p, separately and successively. This could be described as the divide-and-conquer procedure. Then, a sequence of products, in long-vector form, that would lead from X v to Y v are as follows:
X v = ( x p q r e p q r ) , W v = ( w p q e p q ) = ( { c r x p q r } r e p q ) , V v = ( v k p e k p ) = ( { b k q w p q } q e k p ) = ( { b k q c r x p q r } q r e k p ) , Y v = ( y j k e j k ) = ( { a j p v k p } p e j k ) = ( { a j p b k q c r x p q r } p q r e j k ) .
Here, the elements of the matrices C = [ c l r ] , B = [ b k q ] and A = [ a j p ] are adduced to the product in the course of three successive contractions. Passing from X v to W v via the contraction over r requires R × L P Q multiplications. Passing from W v to V v via the contraction over q requires Q × K L P multiplications. Passing from V v to Y v via the contraction over p requires P × J K L multiplications.
However, there are six different orders in which the contractions can be performed, which correspond to the permutations of the indices p , q , r , which are the column indices of the matrices A, B, and C, respectively. In comparing the efficiency of the alternative procedures, one need only count the number of multiplications, since these are far more expensive than are the additions (the orders of computational complexity are n 2 and n, respectively).
When R = 2 , Q = 4 and P = 3 , with J = K = L = 2 as before, there are 384 multiplications in the slow procedure. The number of multiplications in the divide-and-conquer procedure depends on the sequence of the contractions. The number of multiplications in the various the divide-and-conquer procedures are listed below with the sequence of contractions denoted by a sequence of matrices, where the order in which they are deployed should be read from right to left (as in a sequence of matrix multiplications):
A , B , C : 120 B , C , A : 112 C , A , B : 88 C , B , A : 96 A , C , B : 96 B , A , C : 128
To minimise the computational burden by minimising the number of multiplications within the divide-and-conquer procedure, one should ensure the contractions in respect of p, q, and r are ordered such that numbers of the associated multiplications are declining. When the row orders of the matrices are equal, the most efficient sequence of contractions will be self-evident, and there will be no need to count the number of multiplications.
The calculation of the product of (32) can de accomplished using nested do-loops. Six loops are required that fall into two groups, which comprise the outer indices j , k , and the inner indices p , q , r . The former group are the indices of the product vector and the latter group are the indices of the successive contractions. Within each group, the order of the indices may be freely permuted.
However, it might pay to deploy the outer indices in the order listed above, since this will enable the elements of the product to be created on the fly in the order in which they would be printed in a column. Otherwise, the printing of the column would need to be delayed until all of its elements have been created.
To illustrate a code appropriate to such calculations, it is enough to consider the equation Y v = ( A B ) X v , which can be rendered in the index notation as
( y j k e j k ) = ( { a j p b k q } e j k p q ) ( x p q e p q ) = ( { a j p b k q x p q } p q e j k ) .
The code is as follows:   
for j : = 1 to J do for k : = 1 to K do begin { j k } y [ j ] [ k ] : = 0 ; for q : = 1 to Q do for p : = 1 to P do y [ j ] [ k ] : = y [ j ] [ k ] + a [ j ] [ p ] * b [ k ] [ q ] * x [ p ] [ q ] ; end ; { j k }
The code becomes increasingly prolix with a growing number of indices, each of which gives rise to a nested do-loop. However, such code can be encapsulated within a procedure or a subroutine with a concise interface.

6. Matrix Multiplications and the Reshaping Operations

Many authors have recognised the inefficiency of forming a multiple Kronecker product before postmultiplying it by a vector. They have proposed a variety of methods that rely on pairwise matrix multiplications that are accompanied by the shuffling of the elements of the products in order to facilitate the succeeding matrix multiplications. These operations have been described as the reshaping of the matrices.
A method for solving the equation Y v = ( A B C ) X v of (6) for X was proposed by Pereyra and Scherer (1973). A much simplified solution was provided by de Boor (1979), who showed that this could be achieved by a sequence of solutions involving the matrices A, B and C successively.
An indication or the way in which the product (6) might be calculated via a sequence of matrix multiplications was given in a paper on Dyksen (1987), which contained the identity ( A B ) X v = { B ( A X ) } c , where X = ( x p q e p q ) and X v = ( x p q e p q ) , albeit that the necessary vectorisations of the matrices was not indicated.
A paper by Davio (1981) dealt at length with the matter of shuffling the elements of Kronecker product; but it did not relate such operations to pointers. Others, including Benoit et al. (2003), Langville and Stewart (2004) and Dayar and Orhan (2015) have also dealt with such issues. Programs that are aimed at computing the product of (6) have been provided by Constantine and Gleich (2009) and by Kredler (2015).
In a recent paper, Fackler (2019) has proposed that computations involving multiple Kronecker products can be achieved by ordinary matrix multiplications without displacing elements that are stored in the computer memory, as shuffling is liable to do. To illustrate how this method can be implemented in the MATLAB program, we may consider an input vector X c , which follows a column-major scheme. Then, the method, which applies the matrices A = [ a j p ] , B = [ b k q ] and C = [ c r ] in succession to X c , entails the following objects,
X c = ( x p q r e r q p ) , W c = ( w q r j e j r q ) = ( { a j p x p q r } p e j r q ) , V c = ( v r j k e k j r ) = ( { b k q w q r j } q e k j r ) , Y c = ( y j k e k j ) = ( { c r v r j k } r e k j ) .
In the process, the basis vector undergo the following transformation:
e r q p e j r q e k j r e k j .
It can be seen that, as the indices p, q, r of the contractions drop off the end of the sequence, the new indices j, k, join at the head.
When the equations of (37) are given their appropriate shapes, they become
X p c = ( x p q r e p r q ) , W j c = ( w q r j e j r q ) = ( { a j p x p q r } p e j r q ) = ( a j p e j p ) ( x p q r e p r q ) , V k c = ( v r j k e k j r ) = ( { b k q w q r j } q e k j r ) = ( b k q e k q ) ( w q r j e q j r ) , Y c = ( y j k e k j ) = ( { c r v r j k } r e k j ) = ( c r e r ) ( v r j k e r k j ) ,
and the sequence of matrix multiplications can be denoted by
A X p c = W j c B W q c = V k c C V r c = Y c .
This entails two operations that are amenable to the reshape procedure:
W j c = ( w q r j e j r q ) W c = ( w q r j e j r q ) W q c = ( w q r j e q j r ) , V k c = ( v r j k e k j r ) V c = ( v r j k e k j r ) V r c = ( v r j k e r k j ) .
Here, the variant matrices W j c , W q c and V k c , V r c have shapes that are derived directly from W c and V c , respectively, which represent what is stored in the computer’s memory. The reshape procedure is explained in detail in the Appendix A.
If it is required to reverse the sequence of matrix multiplications that are applied to the column-major vector X c , then these must become post-multiplications, and the corresponding equations will be
( X r c ) = ( x p q r e q p r ) , ( W c ) = ( w p q e q p ) = ( { c r x p q r } r e q p ) = ( x p q r e q p r ) ( c r e r ) , ( V k c ) = ( v k p e p k ) = ( { b k q w p q } q e p k ) = ( w p q e p q ) ( b k q e q k ) , ( Y j c ) = ( y j k e k j ) = ( { a j p v k p } p e k j ) = ( v k p e k p ) ( a j p e p j ) .
The sequence of matrix multiplications can be denoted by
( X r c ) C = ( W c ) ( W q c ) B = ( V k c ) ( V p c ) A = ( Y j c ) .
However, reference to (15) indicates that, if the row-major vector X v = ( x p q r e p q r ) is available, then the following sequence of matrix pre-multiplications can be pursued:
C X r v = W v B W q v = V k v A V p v = Y j v .
According to the table of (34), which is based on the values P = 3 , Q = 4 , R = 2 and J = K = L = 2 , neither of the sequences of (40) and (44) lead to a fully efficient computation. A recourse might be to re-order the matrices within the multiple Kronecker product to create the most efficient sequence, which would also require the elements within the vector Y c to be re-ordered. Whether this would be allowable or convenient is a matter that depends on the substantive nature of the problem that is represented by the equation. However, it is reasonable to suppose that, in most cases, this will be easy to achieve,
Next is the apparent obstacle that the reshape procedure is specific to MATLAB. However, as Fackler (2019) has indicated, it should be straightforward to replicate the procedure in many of the available programming languages, using the arithmetic of pointers. Whereas the method of pairwise multiplication has been described for the case of a column-major scheme, there is no difficulty in adapting to a row-major scheme.
Our preference would be to calculate the product of (6) according to the scheme of (33), or some permutation thereof, that does not require matrix multiplications. However, the intention of this paper has not been to insist on one way of another of conducting the computations, provided that gross inefficiency is avoided. Instead, the aim has been to clarify the nature of the alternative procedures.

Funding

This research has received no external funding.

Conflicts of Interest

The author declares that there are no conflicts of interest.

Appendix A. Examples of the Reshape Procedure

It the MATLAB program, a matrix array is stored, according to a column-major scheme, as a long vector, formed by adjoining successive columns of the matrix. The reshape procedure divides the long column into successive segments of equal length, which are arrayed in a new matrix.
An example of the reshaping operation is the following transformation:
A = 1 4 2 5 3 6 B = 1 3 5 2 4 6 .
This would be accomplished by the command B = reshape A ( 2 , 3 ) .
To illustrate some of the structures into which a multidimensional array may be cast, consider a three-dimensional array of which the elements are denoted by x p q r . The indices range from 1 up to P, Q and R, respectively.
We shall assume that P = Q = 2 and that R = 3 . Then, the row-major array is a wide row vector
X r = ( x p q r e p q r ) = [ x 111 x 112 x 113 x 121 x 122 x 123 x 211 x 212 x 213 x 221 x 222 x 223 ] .
Reversing the order of the indices of the basic vectors and transposing gives
X c = ( x p q r e r q p ) = [ x 111 x 211 x 121 x 221 x 112 x 212 x 122 x 222 x 113 x 213 x 123 x 223 ] .
which is a long column vector. Recall that the sequence of indices on the basis elements follow a lexicographic order. Furthermore, note that the mapping X r X c involves a commutation of the indices of the basis vector followed by a transposition. It is not amenable to a simple reshape operation.
A single reshape operation effects the following transformation of X c :
X c = ( x p q r e r q p ) X p c = ( x p q r e p r q ) = x 111 x 121 x 112 x 122 x 113 x 123 x 211 x 221 x 212 x 222 x 213 x 223 .
Here, the leading index p = 1 , 2 of the element x p q r has become the row index of X p c . The composite column index q r follows two iterations of the sequence 11 , 21 , 12 , 22 , 13 , 23 in which r = 1 , 2 , 3 is evolving more slowly than q = 1 , 2 . This is an anti-lexicographic sequence.
An alternative reshape operation yields
X c = ( x p q r e r q p ) X q p c = ( x p q r e q p r ) = x 111 x 112 x 113 x 211 x 212 x 213 x 121 x 122 x 123 x 221 x 222 x 223 .
Here, r = 1 , 2 , 3 , which is the trailing index of the element x p q r , has become the column index of X q p c . The composite row index q r follows three iterations of the sequence 11 , 21 , 12 , 22 , which is also in an anti-lexicographic order.
Transposing this matrix gives
X q p c = ( x p q r e q p r ) X r c = ( x p q r e r q p ) = x 111 x 211 x 121 x 221 x 112 x 212 x 122 x 222 x 113 x 213 x 123 x 223 .
The object that remains unchanged in the memory of the computer throughout these operations is X c = ( x p q r e r q p ) . It follows that the transformation
X r c = ( x p q r e r q p ) X p c = ( x p q r e p r q )
can be accomplished by a single reshape operation. This transformation is the prototype of the transformations of (41):
W j c = ( w q r j e j r q ) W q c = ( w q r j e q j r ) ,
V k c = ( v r k j e k j r ) V r c = ( v r j k e r k j ) .
The matrices in question have either the leading or the trailing elements of the composite indices q r j and r j k as their row indices. The middle indices cannot become row indices without first resorting to a commutation operation, which would bring then to the head or the tail of a triple index.

References

  1. Bader, Bret W., and Tamara G. Kolda. 2006. Algorithm 862: Matlab Tensor Classes for fast Algorithm Prototyping. ACM Transactions on Mathematical Software (TOMS) 32: 635–53. [Google Scholar] [CrossRef]
  2. Balestra, Pietro. 1976. La Dérivation Matricielle. Paris: Éditions Sirey. [Google Scholar]
  3. Benoit, Anne, Brigitte Plateau, and William J. Stewart. 2003. Memory-Efficient Kronecker Algorithms with Applications to the Modelling of Parallel Systems. Paper presented at the International Parallel and Distributed Processing Symposium (IPDPS’03), Nice, France, April 22–26. [Google Scholar]
  4. Bourbaki, Nicolas. 1948. Algèbre Multilinéaire. In Eléments de Mathematique. Chapter 3 of Book II (Algèbre). Paris: Herman. [Google Scholar]
  5. Bower, Allan F. 2010. Applied Mechanics of Solids. Boca Raton: Taylor and Francis. [Google Scholar]
  6. Brau, Charles A. 2004. Modern Problems in Classical Electrodynamics. Oxford: Oxford University Press. [Google Scholar]
  7. Cartan, Élie. Géometrie des Espaces de Riemann. Paris: Gauthier–Villars.
  8. Constantine, Paul G., and David F. Gleich. 2009. kronmult: Fast and Efficient Kronecker Multiplication. Available online: https://www.mathworks.com/matlabcentral/fileexchange/23606 (accessed on 27 April 2021).
  9. Davio, Marc. 1981. Kronecker Products and Shuffle Algebra. IEEE Transactions on Computers C-30: 116–25. [Google Scholar] [CrossRef]
  10. Dayar, Tŭgrul, and M. Can Orhan. 2015. On Vector-Kronecker Product Multiplication with Rectangular Factors. SIAM Journal on Scientific Computing 37: 526–43. [Google Scholar] [CrossRef]
  11. de Boor, Carl. 1979. Efficient Computer Manipulation of Tensor Products. ACM Transactions on Mathematical Software (TOMS) 5: 173–82. [Google Scholar] [CrossRef]
  12. De Lathauwer, Lieven, Bart L. R. De Moor, and Joos Vandewalle. 2000. A Multilinear Singular Value Decomposition. SIAM Journal on Matrix Analysis and Applications 21: 1253–78. [Google Scholar] [CrossRef] [Green Version]
  13. Dyksen, Wayne R. 1987. Tensor Product Generalized Adi Methods for Separable Elliptic Problems. SIAM Journal on Numerical Analysis 24: 59–76. [Google Scholar] [CrossRef]
  14. Fackler, Paul L. 2019. Algorithm 993: Efficient Computation with Kronecker Products. ACM Transactions on Mathematical Software 45: 1–9. [Google Scholar] [CrossRef]
  15. Graham, Alexander. 1981. Kronecker Products and Matrix Calculus with Applications. Chichester: Ellis Horwood. [Google Scholar]
  16. Greub, Werner H. 1967. Multilinear Algebra. Berlin: Springer. [Google Scholar]
  17. Harrison, Adam P., and Dileepan Joseph. 2016. Numeric Tensor Framework: Exploiting and Extending Einstein Notation. Journal of Computational Science 16: 128–39. [Google Scholar] [CrossRef]
  18. Kredler, Matthias. 2015. kronm: Fast Kronecker Matrix Multiplication. Available online: https://www.mathworks.com/matlabcentral/fileexchange/53382 (accessed on 27 April 2021).
  19. Langville, Amy N., and William J. Stewart. 2004. The Kronecker Product and Stochastic Automata Networks. Journal of Computational and Applied Mathematics 167: 429–47. [Google Scholar] [CrossRef]
  20. Lawden, Derek F. 1967. An Introduction to Tensor Calculus and Relativity. London: Methuen. [Google Scholar]
  21. Magnus, Jan R. 2010. On the Concept of Matrix Derivative. Journal of Multivariate Analysis. 101: 2200–6. [Google Scholar] [CrossRef] [Green Version]
  22. Magnus, Jan R., and Heinz Neudecker. 1979. The Commutation Matrix: Some Properties and Applications. Annals of Statistics 7: 381–94. [Google Scholar] [CrossRef]
  23. Magnus, Jan R., and Heinz Neudecker. 1988. Matrix Differential Calculus with Applications in Statistics and Econometrics. Chichester: John Wiley and Sons. [Google Scholar]
  24. Marcus, Marvin. 1973. Finite Dimensional Multilinear Algebra—Part I. New York: Marcel Dekker. [Google Scholar]
  25. Marder, Leslie. 1968. An Introduction to Relativity. London: Longmans, Green and Co. [Google Scholar]
  26. Pereyra, Victor, and Godela Scherer. 1973. Efficient Computer Manipulation of Tensor Products with Applications to Multidimensional Approximation. AMS Mathematics of Computation 27: 595–605. [Google Scholar] [CrossRef]
  27. Pollock, D. Stephen G. 1979. The Algebra of Econometrics. Chichester: John Wiley and Sons. [Google Scholar]
  28. Pollock, D. Stephen G. 1985. Tensor Products and Matrix Differential Calculus. Linear Algebra an its Applications 67: 169–93. [Google Scholar] [CrossRef] [Green Version]
  29. Rogers, Gerald S. 1980. Matrix Derivatives. New York: Marcel Dekker. [Google Scholar]
  30. Shephard, Geoffrey C. 1966. Vector Spaces of Finite Dimension. Edinburgh: Oliver and Boyd. [Google Scholar]
  31. Steeb, Willi-Hans. 1997. Matrix Calculus and Kronecker Product with Applications and C++ Programs. Singapore: World Scientific Publishing Co. [Google Scholar]
  32. Susskind, Leonard, and Art Friedman. 2018. Special Relativity and Classical Field Theory. London: Penguin Books. [Google Scholar]
  33. Turkington, Darrell A. 2002. Matrix Calculus and Zero-One Matrices. Cambridge: Cambridge University Press. [Google Scholar]
  34. Van Loan, Charles F. 2000. The Ubiquitous Kronecker Product. Journal of Computational and Applied Mathematics 123: 85–100. [Google Scholar] [CrossRef] [Green Version]
  35. Zellner, Arnold. 1962. An Efficient Method of Estimating Seemingly Unrelated Regression Equations and Tests for Aggregation Bias. Journal of the American Statistical Association 57: 348–68. [Google Scholar] [CrossRef]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Pollock, D.S.G. Multidimensional Arrays, Indices and Kronecker Products. Econometrics 2021, 9, 18. https://0-doi-org.brum.beds.ac.uk/10.3390/econometrics9020018

AMA Style

Pollock DSG. Multidimensional Arrays, Indices and Kronecker Products. Econometrics. 2021; 9(2):18. https://0-doi-org.brum.beds.ac.uk/10.3390/econometrics9020018

Chicago/Turabian Style

Pollock, D. Stephen G. 2021. "Multidimensional Arrays, Indices and Kronecker Products" Econometrics 9, no. 2: 18. https://0-doi-org.brum.beds.ac.uk/10.3390/econometrics9020018

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop