STATIONARY SPLITTING ITERATIVE METHODS FOR THE MATRIX EQUATION AXB = C

Stationary splitting iterative methods for solving AXB = C are considered in this paper. The main tool to derive our new method is the induced splitting of a given nonsingular matrix A = M−N by a matrix H such that (I − H)−1 exists. Convergence properties of the proposed method are discussed and numerical experiments are presented to illustrate its computational efficiency and the effectiveness of some preconditioned variants. In particular, for certain surface-fitting applications our method is much more efficient than the progressive iterative approximation (PIA), a conventional iterative method often used in computer aided geometric design (CAGD).


Introduction. Consider the matrix equation
where A ∈ R n×n , B ∈ R m×m and X, C ∈ R n×m . This matrix equation plays an important role in many practical applications such as surfaces fitting in computer aided geometric design (CAGD), signal and image processing, photogrammetry, etc., see for example [14,19,20,30,31] and a large literature therein. Given the existence of many and important applications, the theory and algorithms for matrix equations have intrigued the researchers for decades, see [7,9,15,21,28,29,30,31,34,37]. A recent survey on the major numerical algorithms for the solution of the more general linear matrix equation AXE + DXB = C, for given square matrices A, B, D, E and conformable matrix C, and related problems is presented in [33].
The matrix equation (1.1) can be treated as a standard linear system with multiple right-hand sides as pointed out in [33, p.380]. More precisely, it is possible, for instance, to split the initial problem into two matrix equations, AY = C and B T X T = Y T , to be considered in succession. In practice, instead of solving each of the linear systems independently, by using the LU decomposition method or some iterative method, it is more efficient to use a block solver, especially when the matrices involved are dense. See [13,32,38].
Equation (1.1) is also mathematically equivalent to the larger linear system where ⊗ denotes the standard Kronecker product symbol, x = vec(X) and c = vec(C) are two column-stacking vectors of the matrices X and C, respectively. The order of the coefficient matrix in (1.2) is mn, which becomes very large when m, n are large.
It is useful to consider the general linear system (1.2) in a theoretical analysis but it is impractical to solve this system to obtain a numerical solution to (1.1), since it is computationally expensive and it can be ill-conditioned. Thus, it is attractive to get the solution for (1.1) from itself. Moreover, the Level 3 BLAS operations can be performed on high performance computers [8].
In this paper we focus our attention in methods developed directely from the original matrix equation (1.1). There are two main types of methods for solving this problem: direct methods and iterative methods. When the matrices A and B are small and dense, direct methods such as QR-factorization-based algorithms [9,36] are attractive. However, these direct algorithms are quite costly and suffer from numerical instability when A and B are large and ill-conditioned. Therefore, considerable attention has recently been paid to iterative methods, see for instance [15,28,29,34,35,37] and references therein.
In the context of this problem, Tian et al. in [34] developed some Jacobi and Gauss-Seidel-type iterative methods for solving (1.2). Let us roughly summarize their schemes as follows in terms of the original problem (1.1).
We say that the representation A = M−N is a splitting of A if M is nonsingular. The splitting A = M − N is called convergent if ρ(H) < 1, where H = M −1 N is the iteration matrix and ρ(H) denotes the spectral radius of the matrix H.
Let A = F − G and B =F −Ĝ be the splittings of A and B, respectively. Then the four basic iteration sequences defined in [34] can be equivalently written as follows: and The iterations (1.3) and (1.4) are referred as modified iterations and some of their convergence theories as well as related algorithms were discussed in [34].
In fact, if A −1 or B −1 is available or it is easy to obtain, then the matrix equation (1.1) can be reduced into a general linear system with multiple right-hand sides. In this situation, iteration sequence (1.3) or (1.4) can be employed to obtain an approximate solution to (1.1), otherwise they are useless in practice. We therefore focus on iterations (1.5) and (1.6). We refer to iteration (1.5) as the direct splitting iteration and to iteration (1.6) as the residual-updating iteration (cf. [2]).
It is known that the iteration sequences (1.3) and (1.4) converge to the exact solution of (1.1) if and only if ρ(F −1 G) < 1 and ρ(ĜF −1 ) < 1, respectively. In what respects to the iteration sequence (1.5), this condition is not suficient for convergence. It was shown in [21], exhibiting a numerical example, that even if both ρ(F −1 G) < 1 and ρ(ĜF −1 ) < 1 hold simultaneously, the method may not converge. In this paper we recall the sufficent conditions for convergence of (1.5) that were established in [21] (see Theorem 2.10) and develop a new iterative scheme that also deals with the cases when convergence is not ensured immediatley.
The splittings considered in the iterative methods described in [34] were the particular splittings of Jacobi and Gauss-Seidel of the matrix B T ⊗ A. The method we propose works with the original matrices A and B, whose size is much smaller if m and n are large, and it can be applied to any convergent splitting of these matrices. Thus, our method is broader in what respects to the type of splitting that may be considered and enables an improvement of the performance speed if m and n are large.
In our numerical experiments we first consider Gauss-Seidel splittings of A and B and antecipate possible divergence, even if conditions ρ(F −1 G) < 1 and ρ(ĜF −1 ) < 1 both hold. These cases where not considered in [34] and, thus, with these numerical results we intend to give a complementary study to [34] rather than to provide a comparison of the different methods.
We present the details of the implementation of our method in Matlab and in our numerical studies we also consider sucessive overrelaxation Gauss-Seidel (SOR) splittings and preconditioned versions of our method. This paper is organized as follows. In the next section we recall some preliminary results used in the sequel. Then, in Section 3, we introduce stationary splittings and discuss the convergence of the resulting induced splitting iteration. Some preconditioned variants of this iteration are also presented in Section 3. The first numerical experiments are presented in Section 4 to exhibit the effectiveness of our method. An example of an application to surface-fitting is described in Section 5 and more numerical results are shown. A brief conclusion is drawn in Section 6, and the acknowledgements are writen in the last section.

Preliminaries.
In this section, we review some definitions, notation and known results needed in the remaining parts of this paper.
be defined by the following conditions: Matrix A is called the comparison matrix of A.  The following results about matrix splittings can be found in [3,10,14,24,27].    We remark here that an M-matrix is certainly an H-matrix but the converse is not true, and that all strictly or irreducibly diagonally dominant matrices are H-matrices.
The following conclusion is often used in the analysis of iterative methods.   3. The induced splitting iteration. In this section we develop a new iterative method for solving the matrix equation (1.1) and analyze the convergence properties of this new method. A preconditioned variant of this method will also be described.
3.1. Iterative scheme and convergence results. We first introduce induced splittings of the matrices A and B.
Let the splittings A = F − G and B =F −Ĝ be convergent splittings. Defining we have ρ(H) < 1 and ρ(Ĥ) = ρ(H) < 1, since, for any two square matrices S and T , ST and T S have the same eigenvalues.
Using the above notation, iteration (1.5) may be written as If this iteration sequence does not converge (conditions (2.1) and (2.2) in Theorem 2.10 are not satisfied simultaneously), then we proceed as follows.
We first estimate ρ = ρ(H) andρ = ρ(Ĥ) using the power iteration. If ρ < 1, then the sequence ρ p converges to 0 with p and, thus, for any arbitrarily small ε 1 > 0, there exists a positive integer N 1 such that ρ p ≤ ε 1 , when p ≥ N 1 . Let p ≥ N 1 be fixed and let H = H p . Then ρ(H) = ρ p < 1 and I − H is invertible. By Lemma 2.9, this H induces a unique splitting of A, that is, Observe that, since this splitting of A yields This formula for M −1 can also be directly derived from a two-stage iteration for where O denotes the zero matrix, see [6,10,17,24].
We refer to (3.2) as a p-degree induced splitting of A by H.
Going to a numerical implementation, we chose to develop the algorithm for the induced splitting iteration (3.6), since it is computationally more efficient. If we let iteration (3.6) becomes resulting that only two matrix multiplications are needed per iteration.
We are now in conditions to write the arguments we discussed above as a convergence theorem for iterations (3.6) (and (3.5)).
Then there exist two positive integers N 1 and N 2 such that ρ p andρ q satisfy (2.1) and (2.2) for all p ≥ N 1 and q ≥ N 2 . In this case, the iteration (3.6) converges to the exact solution of (1.1), for any given guess X (0) .
As a consequence of Theorem 3.1, we have the following result. N 1 and N 2 such that ρ p andρ q satisfy (2.1) and (2.2) for all p ≥ N 1 and q ≥ N 2 . In this case, the iteration (3.6) converges to the exact solution of (1.1), for any given guess X (0) .
Proof. It is sufficient to show that ρ < 1 andρ < 1. If A and B are Hermitian positive definite and the splittings A = F − G and B =F −Ĝ are the Gauss-Seidel splittings, these are P-regular splittings. Then, by Theorem 2.5 or Theorem 2.6, we have ρ < 1 andρ < 1. Similarly, If A and B are H-matrices, and the splittings A = F − G and B =F −Ĝ are the Gauss-Seidel splittings, these are H-compatible splittings. Then, by Lemma 2.8, we also have ρ < 1 andρ < 1.
We may now present our algorithm based on iteration (3.8). Observe that in this scheme an estimate for the error Preparatory stage: -determine the splittings A = F − G and B =F −Ĝ (SOR splittings, for example) and then compute H = F −1 G andĤ =ĜF −1 ; -obtain estimates ρ = ρ(H) andρ = ρ(Ĥ) and search for two positive integers p and q such that ρ p andρ q satisfy (2.1) and (2.2); -once some p and q are determined, compute M −1 andM −1 given in (3.2) and (3.4), respectively; -compute A 1 , B 1 and C 1 according to (3.7).
In Section 4 we describe some details considered in our Matlab implementation of Algorithm 1 in which we included the SOR splittings.
We remark here that if A and B are Hermitian positive definite or H-matrices, then a block version of Algorithm 1 with block Gauss-Seidel splittings can be easily developed, which is appealing for solving large matrix equations on parallel architectures, see [1].
3.1.1. Finding p and q. When we find p and q that satisfy (2.1) and (2.2), then we can also consider greater values and the greater p and q are, the smaller the spectral radii ρ(H) and ρ(Ĥ) are and thus the smaller the number of iterations is (required for Algorithm 1 to converge for a given tolerance). However, increasing p and q means increasing the computational complexity (the number of matrix-matrix multiplications) to obtain M −1 andM −1 which will only be compensated by the reduction of the number of iterations up to a certain level, namely if the orders m and n of the matrices A and B are not small.
Thus, it is important to consider the question: for given matrices A and B and convergent splittings of these matrices, are there optimal values for p and q? Certainly, the optimal values, if they exist, will depend on the specific structures and orders of the matrices A and B as well as on the properties of the splittings considered. We regard that a theoretical study of this question is beyond the scope of this paper. We tested different choices for p and q and there is computational evidence that the procedure we next describe gives very good results.
Assuming ρ = ρ(H) < 1 and ρ(Ĥ) =ρ < 1, we solve for p and q, and take the smallest positive integers p 1 and q 1 that satisfy these inequalities. If condition (2.2) for ρ p1 andρ q1 , is not satisfied, we let p 1 and q 1 be set to p 1 + 1 and q 1 + 1, respectively, and test condition (3.10) again. We repeat this step until condition (3.10) is satisfied.
Alternatively to this procedure, in the case ρ ≈ρ orρ < ρ, we could consider solving for p the inequality and take the smallest positive integer solution p 2 . Then we would let p = max{p 1 , p 2 } and q = max{q 1 , p 2 }. Condition (3.10) should be verified (approximately).
Analogously, if ρ <ρ, we would consider find the smallest positive integer solution q 2 and let p = max{p 1 , q 2 } and q = max{q 1 , q 2 }.
The important observation here is that the difference between p and q will come from conditions (3.9) and not from condition (3.10).
In Section 4 we report some numerical experiments exhibiting the CPU time needed for convergence for different values of p and q in Example 4.2.
3.1.2. Sucessive overrelaxation splittings. In the sucessive overrelaxation Gauss-Seidel (SOR) iterative method, we introduce a parameter ω, the relaxation parameter, which often makes an improvement in the rate of convergence. The value of this parameter should be chosen so that the rate of convergence is maximized, but the optimal value of ω is known only for some special classses of matrices. In the case of positive definiteness it is known that 0 < ω < 2 is a necessary and sufficent condition for convergence.
In the preparatory stage of Algorithm 1, for a given value of 0 < ω < 2, we consider the matrix equation For ω = 1, the method obviously reduces to Gauss-Seidel method.

Preconditioned variants.
Here we present a preconditioned variant of the iterative method described above.
We assume that A or B in (1.1) is an M -matrix and consider the preconditioned version of the iteration (3.6), by using some of the recently developed preconditioners devised to improve the convergence rate of the classic iteration methods for solving linear systems with M -matrices as coefficient matrices, see for example [16,18,25,26,34] and references therein.
• The estimates for ρ = ρ(H) andρ = ρ(Ĥ) are computed using the built-in function eigs called in the form ρ=abs(eigs(H,1)) andρ=abs(eigs(Ĥ,1)), which returns the largest magnitude eigenvalue of the matrices H andĤ, respectively (and then we take its absolute value). Assuming that ρ < 1 andρ < 1, in order for condition (2.1) to be satisfied, we set the values for p and q to be the smallest positive integers such that p > max 1, ln( √ 3 − 1)/ ln ρ and q > max{1, ln( If condition (2.2) for ρ p andρ q is not as well satisfied, for those first values of p and q, we increase p and q by one, in turn, until this condition is satisfied (using a while loop). • Once some p is determined, to compute M −1 we rewrite (3.2) in the form which is computationally more efficient. Here the inverse F −1 is also obtained using the Matlab left matrix divide command F \I. We proceed similarly for the computation ofM −1 .
where, for a given value c, and a random matrix C n 2 ×m 2 . Matrix A is associated with the centered difference scheme for a two-dimension Poisson's equation and matrix B is associated with a twodimension convection-diffusion equation where the centered and backward difference schemes for the second and first order derivatives are used, respectively, see [8,32].
In Table 4.1, for the cases c = 0.5, c = 0 and c = −0.5 and for various values of m = n, we show the performance of Algorithm 1 with the Gauss-Seidel splitting -the values of p and q required for convergence, the spectral radii ρ(H p ) and ρ(Ĥ q ), the number of iterations, the total CPU time and the CPU time used in the preparatory stage (in seconds). The required accuracy was 10 −8 .  The results in Table 4.1 show that a significative part of the computation time is used in the preparatory stage -in some cases more than 50% and in most cases between 25% and 50%. As expected, for all values of the parameter c, the bigger is m = n, the closer to 1 are the spectral radii ρ(H) and ρ(Ĥ) and thus the bigger must be p and q so that the conditions for convergence are satisfied.
We also analysed the performance of Algorithm 1 using SOR splittings with ω = 1.7 for the problem in Example 4.1. We report our results in Table 4.2 for c = 0.5 and c = 0 and the values of m = n considered in Table 4.1. The required accuracy was 10 −8 .
We can observe that SOR splittings with ω = 1.7 produced a significative reduction in the values of p and q which means that the inicial spectral radii ρ(H) and ρ(Ĥ) were much smaller than the ones given by the Gauss-Seidel splitting. The total number of iterations increased by a factor of about 2.5 in the case c = 0.5 but the total CPU time, when m and n are big, can be considered comparable. For the case c = 0, the number of iterations increased by a factor of about 3 but the total CPU time is even shorter when m = n = 50 and comparable for the other orders.
For this example we can conclude that when we use Gauss-Seidel splitting, p and q increase and thus more time is needed in the preparatory stage, but the number of iterations is reduced, when compared to the relaxation Gauss-Seidel splitting. This is expected since the bigger are p and q the more advanced is the process of convergence when the iterative part starts. Overall, we can say that both the splittings are equally good since the CPU time is comparable.  The following example suggests that it is not always true that when we consider greater values for p and q than the minimum values needed for the convergence conditions (2.1) and (2.2) to be satisfied, although the number of iterations is reduced, the total computational efficiency, measured by the CPU time, reduces as well.
Example 4.2. With this example we wanted to observe how the computational efficiency of Algorithm 1, with Gauss-Seidel splitting, varies with increasing values of p and q for the problem described in Example 4.1.
In Table 4   From the results in Table 4.3 it is not obvious to say that there are optimal values for p and q. Another observation is that the differences in CPU time are not very significative and the results for the minimum values for p and q are already very good and seem to be right away a good choice.
Next we compare the performances of Algorithm 1 and its preconditioned variants described in Section 3.2 for the problem in Example 4.1. Preconditioner P 1,µ in (3.11) produced no improvements and for this reason we don't show any results. This was expected since matrices A and B in Example 4.1 have first column with almost all entries equal to zero and then P 1,µ is close to the identity matrix when µ i ∈ [0, 1]. We only consider the preconditioner P 2,µ in (3.12).  Table 4.4 for the results on variant (3.15). The required accuracy was 10 −8 . It was considered the Gauss-Seidel splitting. For all cases considered in Table 4.4 we show the results for the choice µ i = 0.5, both in P 2,µ A and P 2,µ B. We observe that this choice always produces a significative reduction of the values p and q and, although the number of iterations do not change much, the CPU time is similar and sometimes better, when compared to the results without preconditioning in Table 4.1. For instance, for c = 0, m = n = 40, where p and q where reduced from 150 to 50, the CPU time decreased by a factor of 25%; and for c = 0, m = n = 50, where the reduction of p and q was from 233 and 232 to 78 and 77, respectively, the CPU time was shortened in aprroximatly 55%; and for c = 0.5, m = n = 50, p and q diminished from 119 and 43 to 37 and 11, respectively, and the CPU time dropped by a factor of 30%. These results, once more, reveal that a significative part of the computation in Algorithm 1 is related to the preparatory stage.
In Table 4.4 we also show other choices for µ i -the choices that lead to the smallest number of iterations and to the shortest time, which coincide for all cases (except for c = 0.5, n = m = 20 and n = m = 50, but the differences are very small). Choices of µ i that produce similar results to the best ones are also presented. For example, for the case c = 0 and m = n = 40 the values µ i,A = µ i,B = 0.8 and µ i,A = 0.7, µ i,B = 0.8 both lead to a reduction of more than 50% of the CPU time.
The most significative difference in efficiency occurred for c = 0 and m = n = 50 with the CPU time being reduced to a fourth of the initial time. In fact, the greater n = m is, the more gain we see in preconditioning.
We decided not to show any results for the other preconditioned variants (3.13) and (3.14). We did several experiments but, although in most cases there was an improvement in terms of efficiency, it was never so good as using the variant (3.15).
5. Application to surface-fitting. In this section we consider applications of Algorithm 1 to surfaces-fitting. Let us first recall the details for the conventional iterative methods for tensor product surface-fitting. Let (φ 0 , . . . , φ n ) and (ψ 0 , . . . , ψ m ) be two normalized totally positive bases. Then, given a sequence of control points {P ij }, i = 0, 1, . . . , n and j = 0, 1, . . . , m. A parametric tensor product surface can be defined as Let the control points P ij be parameterized with the real increasing sequences t 0 < t 1 < . . . < t n and s 0 < s 1 < . . . < s m , where the parameter pair (t i , s j ) is assigned to the control point P ij , for i = 0, 1, . . . , n and j = 0, 1, . . . , m. Then, an initial tensor product surface can be constructed as follows.
where P (0) ij = P ij , for i = 0, 1, . . . , n and j = 0, 1, . . . , m. Then the (k + 1)-th tensor product surface S (k+1) (t, s) for k ≥ 0, can be obtained as follows The iteration (5.1) is the conventional iterative method for tensor product surfacefitting and always converges for all normalized totally positive bases. Therefore, the iteration (5.1) is referred to as the progressive iterative approximation, briefly denoted by PIA in CAGD, see [19,20] and references therein.  The iteration sequences of (5.3) is mathematically equivalent to Richardson iterations (see [4,5]) for solving the following linear systems (B ⊗ A)x = P x , (B ⊗ A)y = P y and (B ⊗ A)z = P z . (5.4) From the viewpoint of numerical linear algebra, it is preferable to solve AXB T = C 1 , AY B T = C 2 and AZB T = C 3 , where vec(X) = x, vec(Y ) = y, vec(Z) = z, vec(C 1 ) = P x , vec(C 2 ) = P y and vec(C 3 ) = P z .
Example 5.1. In this example we compare Algorithm 1, using Gauss-Seidel splitting, with PIA for the tensor product surface-fitting problem described in this section. Here we consider the cubic B-spline basis whose collocation matrices are H-matrices and have the following form for −2 ≤ λ ≤ 1. We solve the three matrix equations in (5.5) for λ = −1 and for several values of n. In all cases, we use the minimum values for p and q, p = 2 and q = 1, and require the accuracy of 10 −3 . The results are shown in  Our results in Example 5.1 clearly show that Algorithm 1 converges significatively faster than PIA. In Figure 5.1 we show the interpolating surface for the case λ = −1 and n = 20. 6. Conclusions. In this paper we introduced an algorithm to solve the matrix equation AXB = C, based on a p-degree induced splitting of A and on a q-degree induced splitting of B. We discussed possible choices for the positive integers p and q so that not only the conditions for convergence are satisfied but also the convergence is fast. We implemented our method in Matlab, with several concerns with respect to computational costs, and preliminary numerical results show the robustness of our method. In our numerical experiments we also considered preconditioned variants for the case when A and/or B are M -matrices. The value in the interval [0, 1] for the parameter defining the preconditioner considered, even when chosen at random to be 0.5, produced very satisfying improvements. Furthermore, we used Algorithm 1 to solve tensor product surface-fitting problems with the cubic B-spline basis. For our examples the numerical tests show that our method is significatively faster than PIA.