Investigation of the Hardware Architectures for Matrix Inversion
Abstract
Matrix inversion is a widely used operation in science and engineering fields. However, the hardware implementation of matrix inversion is a challenging problem. Matrix inversion is a computationally complex operation, especially with growing matrix size. The number of the required operations is proportional to N3, where N × N is the matrix size. Regarding real-time matrix inversion applications, efficient hardware solutions need to be scaled with the matrix size. Configurable hardware, such as Field Programmable Gate Arrays (FPGAs), are promising in high-performance computing. In this thesis, five hardware architectures for computing matrix inversion using the Gauss-Jordan elimination algorithm have been investigated.
With the increasing size of matrices, the matrix inversion complexity in hardware becomes prohibitive for real-time applications. Therefore, the adaptability of the hardware design for providing parallelism with the growing size of the matrix in real-time is highly demanded, particularly in applications where a high speed implementation is required. In this project, a run-time customizable architecture using the Gauss-Jordan elimination method is proposed, which allows the same hardware to be used for different matrix sizes with high parallel capability. In addition, this customizable hardware architecture reduces the hardware resources compared with the existing implementations. Furthermore, a run-time customizable architecture using sequential processing for matrix inversion is presented, which can be an alternative to the customizable hardware architecture using parallel processing for applications, where an extremely low hardware resource usage is required.
This thesis also proposes two other hardware architectures for matrix inversion with single-precision and double-precision floating-point data. These architectures use a modified version of the Gauss-Jordan elimination algorithm that accelerates the processing of matrix inversion. Unlike regular execution that starts from the first matrix column and ends at the last matrix column in each iteration, with the modified algorithm, the normalization and elimination steps of the column that follows the pivot column are performed first. This speeds up the process of the matrix inversion and hence achieves high performance, especially for large matrix sizes. One of the two architectures is purely designed with registers, whereas memory blocks are required in the other hardware architecture. The comparison results show that the modified algorithm has considerably improved the matrix inversion performance, in terms of the operating frequency, latency and hardware resources.
The last proposed hardware architecture for matrix inversion in this thesis is highly customizable and can be parameterized to accommodate different floating-point precisions and latency requirements. The desirable matrix size is customizable during real-time, while the desirable data precision, the desirable latency of the computation units and the desirable matrix range are customizable during the design time. In addition, the proposed hardware architecture achieves low computation time and effective resource utilization. The usage of hardware resources is optimized by re-using the available parallel computation unit for different calculations. Multiple multiplication units are used in a parallel manner to compute data from the pivot column and the same units are re-used to perform the normalization and elimination processes. This allows minimizing the sequential execution time imposed by the division unit and hence achieves high performance compared to existing architectures.
Description
Keywords
Hardware Architectures, Matrix Inversion