FPGA-Based Acceleration of Polar-Format Algorithm for Video Synthetic-Aperture Radar Imaging

Jeong, Dongmin; Lee, Myeongjin; Lee, Wookyung; Jung, Yunho

doi:10.3390/electronics13122401

Open AccessArticle

FPGA-Based Acceleration of Polar-Format Algorithm for Video Synthetic-Aperture Radar Imaging

¹

Department of Smart Air Mobility, Korea Aerospace University, Goyang-si 10540, Republic of Korea

²

School of Electronics and Information Engineering, Korea Aerospace University, Goyang-si 10540, Republic of Korea

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(12), 2401; https://0-doi-org.brum.beds.ac.uk/10.3390/electronics13122401

Submission received: 28 May 2024 / Revised: 14 June 2024 / Accepted: 18 June 2024 / Published: 19 June 2024

(This article belongs to the Special Issue System-on-Chip (SoC) and Field-Programmable Gate Array (FPGA) Design)

Download

Browse Figures

Versions Notes

Abstract

:

This paper presents a polar-format algorithm (PFA)-based synthetic-aperture radar (SAR) processor that can be mounted on a small drone to support video SAR (ViSAR) imaging. For drone mounting, it requires miniaturization, low power consumption, and high-speed performance. Therefore, to meet these requirements, the processor design was based on a field-programmable gate array (FPGA), and the implementation results are presented. The proposed PFA-based SAR processor consists of both an interpolation unit and a fast Fourier transform (FFT) unit. The interpolation unit uses linear interpolation for high speed while occupying a small space. In addition, the memory transfer is minimized through optimized operations using SAR system parameters. The FFT unit uses a base-4 systolic array architecture, chosen from among various fast parallel structures, to maximize the processing speed. Each unit is designed as a reusable block (IP core) to support reconfigurability and is interconnected using the advanced extensible interface (AXI) bus. The proposed PFA-based SAR processor was designed using Verilog-HDL and implemented on a Xilinx UltraScale+ MPSoC FPGA platform. It generates an image 2048 × 2048 pixels in size within 0.766 s, which is 44.862 times faster than that achieved by the ARM Cortex-A53 microprocessor. The speed-to-area ratio normalized by the number of resources shows that it achieves a higher speed at lower power consumption than previous studies.

Keywords:

synthetic-aperture radar (SAR); video synthetic-aperture radar (ViSAR); polar-format algorithm (PFA); fast Fourier transform (FFT); systolic array (SA); interpolation; field-programmable gate array (FPGA)

1. Introduction

Synthetic-aperture radar (SAR) is used in military and civilian applications to observe land and ocean surfaces by transmitting and receiving radio waves in the air. The radar is aircraft- or satellite-mounted and uses the changes in antenna position as it moves around to mathematically synthesize antennas with small apertures, so as to create antennas with large apertures, allowing for a high azimuth resolution. This allows high-resolution images to be generated and, since it uses radio waves, it has the advantage of being able to acquire images day or night and in foggy, cloudy, or misty conditions. Thus, SAR’s role has been increasing in recent years and it has become an important and active field in radar research [1,2,3].

Used for the important task of Earth observation, SAR is unaffected by environmental conditions and can be for emergency response to disasters including earthquakes, volcanoes, floods, landslides, and coastal inundation, as well as for military surveillance and reconnaissance [4,5,6,7]. Crucially, to analyze and identify targets in these images for rapid response, SAR images must be acquired in real time. More recently, researchers have been working on creating videos from multiple images generated in real time using SAR. Video SAR (ViSAR) observes dynamic targets by surveilling the target area and generating images at a high frame rate [8,9,10,11,12].

In practice, however, a challenge in ViSAR imaging is the enormous amount of signal processing required to produce the final images from large amounts of raw echo data. SAR images can be generated using multiple pulses in the azimuth direction, and the more pulses used, the better the resolution of the image. This limits the ability to generate video because the aircraft needs to collect raw data for a long time while the frame rate is fixed by the PRF and the number of pulses. To solve this problem, pulses are overlapped to generate an image, and the frame rate can be increased by increasing the number of overlapping sections. However, to achieve a high overlap rate, the image generation must be sufficiently fast.

Due to these issues, ViSAR systems have typically been deployed on satellites or large aircraft because they require high power and large processing units. However, recent improvements in hardware accelerator performance, coupled with the rapid development of the small drone industry, have encouraged research into SAR-mounted small drones [13]. SAR systems mounted on drones have the advantage of being able to navigate terrain that is inaccessible to people, making them very useful for surveillance and reconnaissance applications. However, drone-mounted ViSAR systems are highly constrained by weight, space, and power, making it critical to choose the right imaging algorithm and acceleration platform.

Since improving efficiency from a signal processing perspective is limited, ideas for accelerating SAR imaging include using a graphic processing unit (GPU) or accelerating using a field-programmable gate array (FPGA) [14,15,16,17,18,19,20,21,22]. GPUs are developed for high-performance computations by utilizing characteristics such as high parallelism, multi-threading, and large bandwidth utilizing many cores. Existing research has shown that GPU-based methods can process large amounts of raw data and increase computational efficiency, which can improve the efficiency of SAR imaging by an order of magnitudes [14,18,19]. However, for use on drones, they are disadvantaged by extremely high power consumption, size, and weight. Therefore, embedded GPUs must be used, which still suffer from high power consumption.

Compared to general-purpose CPUs and GPUs, FPGAs offer significant advantages regarding high throughput, low latency, and energy efficiency for compute-intensive applications. Over the years, the rapid evolution of FPGAs with high processing power has led to their widespread use and adoption for SoCs with on-chip CPUs, domain-specific programmable accelerators, and multiple connectivity options [23,24]. Compared to GPUs, FPGAs consume much less power, have high throughput with abundant on-chip memory and computational resources, and in a multichannel configuration enable powerful parallelism like GPUs, allowing SAR signal-processing algorithms to run at very high speeds [25,26]. Unlike GPUs, which are used for general purposes, FPGAs can be easily optimized for specific applications, and they can thus operate faster than GPUs in situations requiring specific performance characteristics [16]. Their low power consumption, small size, and high computing power make FPGAs well-suited to accelerating ViSAR systems on drones.

Algorithm selection is also an important decision for ViSAR systems, and the choice depends on the operating mode of the SAR system. SAR systems operate in different modes, including spotlight mode, strip-map mode, scan mode, and inverse SAR (ISAR). In spotlight mode, the antenna is pointed at the target area to collect data, allowing the detection of individual objects or the collection of data on a specific area. This mode is ideal for applications that require high-resolution imaging generated over a long period of data collection or small chunks of data over multiple scenes. Therefore, for applications involving drone-based surveillance of a specific area, it is often preferable to operate in spotlight mode.

There are several representative algorithms for high-resolution image generation using SAR, such as the polar-format algorithm (PFA), the back-projection algorithm (BPA), and the range-Doppler algorithm (RDA) [27,28,29,30,31]. Imaging algorithms for spotlight mode mainly use the BPA and the PFA [32,33,34,35,36]. The BPA is an algorithm that, ideally, produces high-resolution images in the time domain with no degradation due to motion. It has the best image quality but it requires

O (N^{3})

computation to form an

N \times N

-sized image, which extends the image generation time. Conversely, the PFA is computationally efficient, and the number of operations it requires to process an

N \times N

pixel SAR image is

O (N^{2} {log}_{2} N)

, which is fast [37]. In addition, the PFA also has the advantage of achieving fine resolution when operated in spotlight mode because it properly compensates for the constantly changing reflectivity of the target at much larger scene sizes.

The PFA converts the phase data collected in the polar coordinate system from the center of the observed location to the Cartesian coordinate system by performing range and azimuth interpolation, and it then employs fast Fourier transform (FFT) to generate an image. Therefore, to accelerate the PFA, the interpolation operation and the FFT operation must be accelerated. The FFT unit’s hardware architecture is categorized as either a single butterfly structure [38], a pipeline structure [39], or a systolic array structure [40]. Although the single butterfly and pipeline structures can be implemented in a small area, ViSAR systems require high computational speed and the fastest and most suitable is the parallel structure based on systolic arrays [41,42]. Among various systolic array structures, the base-4-based systolic array structure is easy to implement, scalable, and best satisfies the trade-off between area and execution time [43,44,45,46,47,48]. Therefore, the base-4 systolic array structure was adopted. In addition, there are various interpolation algorithms, such as linear interpolation, sinc interpolation, and spline interpolation. Among these, the linear interpolation algorithm requires the smallest hardware area and the fastest speed when implemented, so we adopted the linear interpolation method because it is advantageous for drone ViSAR applications. Each accelerator integrated with an FPGA can be fine-tuned for PFA optimization and can yield a significant acceleration benefit.

This study proposes a PFA-based SAR processor for small drone ViSAR imaging. To accelerate the PFA-based ViSAR imaging, an FFT unit based on the systolic array structure was implemented, and an interpolation unit optimized for the PFA was designed. By designing each unit as a reusable block (IP core), reconfigurability is supported, and the units are interconnected using the advanced extensible interface (AXI) bus. In VLSI implementation, SRAM uses multiple transistors and is costlier, less dense, and more power-consuming than capacitor-based DRAM, which occupies less space. To compensate for these drawbacks, the proposed design minimizes the use of SRAM by using an AXI4 bus-based design to transfer data to and from DRAM. This achieves fast acceleration while reducing cost, footprint, and power usage, which benefits ViSAR applications in drones.

The remainder of this paper is organized as follows. Section 2 describes the ViSAR image frame analysis and the PFA image-generation algorithm; Section 3 describes the hardware architecture of the proposed PFA-based SAR processor; Section 4 presents the implementation results of the proposed design and the acceleration experimental results for different sizes of data and compares them with previous works. Section 5 concludes this paper.

2. Background

2.1. Geometry and Frame Rate of Video SAR

Figure 1 illustrates the geometry used for image generation in a SAR system in Spotlight mode:

θ

represents the azimuth angle between pulses,

ϕ

denotes the elevation angle, and

R_{a}

is the slant range from the center O to the antenna phase center (APC). In a SAR system using spotlight mode, the aircraft flies in a circle around the center O of the ground to be observed. By attaching radar sensors to the sides of the aircraft, the data are obtained by continuously looking at the center of the ground to be observed, so that data are obtained for a polar coordinate grid. In this case,

θ

can be expressed as

θ = \frac{V}{R_{a} cos ϕ \cdot f_{PRF}}

(1)

where

f_{PRF}

denotes the pulse repetition frequency and V represents the velocity of the platform. To create a SAR image, the radar sensor moves in the azimuth direction, receiving multiple azimuth pulses, which are used to generate the image. The synthetic aperture angle can be expressed as

θ_{az} = N_{p} \cdot θ

(2)

where

N_{p}

represents the number of pulses obtained as the aircraft moves.

Resolution is a critical parameter in SAR imaging. The range resolution

ρ_{r}

and the azimuth resolution

ρ_{a}

are defined as shown in Equation (3):

ρ_{r} = \frac{c}{2 B cos ϕ}, ρ_{a} = \frac{λ}{2 θ_{az} cos ϕ}

(3)

where c represents the speed of light,

λ

denotes the wavelength, and B signifies the bandwidth. Therefore, increasing the bandwidth can improve the range resolution, and to enhance the azimuth resolution many pulses in the azimuth direction must be used to increase

θ_{az}

. Consequently, the aircraft must collect myriad pulse data, and the frame rate r is defined as [49,50,51]

r = \frac{f_{PRF}}{N_{p}}

(4)

High frame rates must be achieved to generate video SAR images, and, according to Equation (4), they have a limit. An overlapping method is utilized to solve this issue. This approach is illustrated in Figure 2, where the equation is expressed as

r = \frac{f_{PRF}}{N_{p} - N_{a}}

(5)

where

N_{a}

is the number of overlapping pulses. A higher

N_{a}

enables higher frame rates, but to increase the number of overlapping pulses, the data obtained after the overlap interval must be imaged immediately, which requires sufficiently fast image generation.

2.2. Polar-Format Algorithm

Utilized in spotlight mode, the PFA is valued for its computational efficiency, characterized by fast

O (N^{2} {log}_{2} N)

computation. This contrasts with the BPA’s

O (N^{3})

and matched filtering algorithms’

O (N^{4})

spotlight generation complexities. The PFA includes a two-dimensional scattering model that assumes a flat scene, even in three-dimensional SAR systems, and image generation involves an IFFT of the received phase history. However, since the acquired data are structured on a polar grid it needs two-dimensional interpolation to convert it into a rectangular grid. Ideally, two-dimensional interpolation would suffice. However, in practice, this is achieved by first performing one-dimensional interpolation in the range direction, followed by another in the azimuth direction, thus avoiding the computational burden of two-dimensional interpolation. Afterward, the data interpolated onto the rectangular grid can be used to generate an image by performing FFT in the range and azimuth directions.

The transmitting linear frequency modulation (LFM) echo signal in the SAR system shown in Figure 3 is defined as

s_{t} (n, τ) = e^{j 2 π f_{c} τ + j π γ τ^{2}}

(6)

where n is the pulse number,

τ

represents the fast time,

f_{c}

denotes the center frequency, and

γ

signifies the chirp rate. When this signal is reflected by the target, the received signal is defined as

s_{r} (n, τ) = σ (x_{t}, y_{t}, z_{t}) \cdot e^{j 2 π f_{c} (τ - \frac{2 R_{t}}{c})} \cdot e^{j π γ {(τ - \frac{2 R_{t}}{c})}^{2}}

(7)

where

σ (x_{t}, y_{t}, z_{t})

represents the target reflectivity at position

(x, y, z)

and

R_{t}

is the distance between the SAR sensor and the target. The objective of SAR image generation is to obtain the reflectivity corresponding to

σ

through mathematically converting or approximating the received signal. The received signal must first be motion-compensated for the center point, and the required reference signal is expressed as

s_{ref} (n, τ) = e^{j 2 π f_{c} (τ - \frac{2 R_{a}}{c})} \cdot e^{j π γ {(τ - \frac{2 R_{a}}{c})}^{2}}

(8)

where

R_{a}

represents the distance between the SAR sensor and the scene center, which becomes the reference signal needed for compensation. Then, the complex conjugate of Equation (8) is multiplied by Equation (7) to perform motion compensation:

s (n, τ) = s_{r} \cdot s_{ref}^{*}

(9)

Equation (9) can be expressed as

s (n, τ) = σ (x_{t}, y_{t}, z_{t}) e^{- j \frac{4 π}{c} (f_{c} + γ (τ - \frac{2 R_{a}}{c})) (R_{t} - R_{a}) + \frac{4 π γ}{c^{2}} {(R_{t} - R_{a})}^{2}}

(10)

where the last term in Equation (10) represents the residual video phase error, which should generally be removed. This is the range deskew process, and to compensate for it we take the derivative of the instantaneous frequency regarding fast time and multiply it by the following formula:

S_{com} (f_{τ}) = e^{- j (\frac{π f_{τ}^{2}}{γ})}

(11)

We can multiply Equation (11) by the Fourier transform of Equation (10) and then re-perform the inverse Fourier transform to obtain the following equation with the residual video phase (RVP) removed:

s (n, τ) = σ (x_{t}, y_{t}, z_{t}) \cdot e^{- j \frac{4 π}{c} (f_{c} + γ (τ - \frac{2 R_{a}}{c})) (R_{t} - R_{a})}

(12)

The Taylor approximation is used for Equation (12) to approximate the neighborhood of

(x, y, z) = (0, 0, 0)

for a function called

R_{t}

. In this case,

R_{t} - R_{a}

is defined as

R_{t} - R_{a} = - sin ϕ_{a} (x_{t} cos θ_{a} + y_{t} sin θ_{a} + z_{t} cot ϕ_{a})

(13)

where

ϕ_{a}

is the incidence angle and

θ_{a}

is the squint angle. Since we assume a flat two-dimensional scene, the value along the z-axis can be ignored. If we define

K_{R}

,

K_{u}

, and

K_{v}

as the spatial wavenumber, range wavenumber, and azimuth wavenumber, respectively, it can be expressed as follows:

K_{R} = \frac{4 π}{c} (f_{c} + γ (τ - \frac{2 R_{a}}{c}))

(14)

K_{u} = K_{R} sin ϕ_{a} cos θ_{a}

(15)

K_{v} = K_{R} sin ϕ_{a} sin θ_{a}

(16)

Substituting the above equation into Equations (12) and (13), it can be rewritten as

s (n, K_{R}) = σ (x_{t}, y_{t}) e^{j (x_{t} K_{u} + y_{t} K_{v})}

(17)

where

s (n, K_{R})

is uniformly distributed in the

(n, K_{R})

domain but not uniformly in

(K_{v}, K_{u})

. To utilize an efficient 2D fast Fourier transform (2D-FFT), 2D resampling must be performed on the wavenumber domain signal to distribute it uniformly in a rectangular format. Ideally, this is achieved through two-dimensional wavenumber domain interpolation, but it is much more computationally efficient to perform a two-dimensional interpolation through one-dimensional interpolation in the range direction followed by another in the azimuth direction. After the complete 2D resampling of the signal, the PFA image can be obtained by using the 2D-FFT once for a wavenumber domain signal distributed in a rectangular format, expressed by the following equation:

I_{p} (x, y) = \int_{K_{u i}} \int_{K_{v i}} S (K_{u i}, K_{v i}) \cdot exp \{- j (x K_{u i} + y K_{v i})\} d K_{u i} d K_{v i}

(18)

where

K_{u i}

and

K_{v i}

are the values obtained by linear interpolation of the

K_{u}

and

K_{v}

values, distributed in a uniform rectangular format. Figure 4 shows a block diagram of the image generation process after motion compensation and de-chirping:

3. Proposed Hardware Architecture

3.1. Base-4 Systolic Array FFT Unit Hardware Architecture

The N-length discrete Fourier transform (DFT) is defined as Equation (19):

Z (k) = \sum_{n = 0}^{N - 1} W_{N}^{n k} X (n)

(19)

where

W_{N}^{n k} = e^{- j \frac{2 π n k}{N}}

, also termed the twiddle factor, where n is the input value in the time domain and k is the output value in the frequency domain. If we can decompose

N = N_{1} N_{2}

then we can express n and k as

\begin{matrix} \begin{matrix} n & = n_{1} + N_{1} n_{2}, (0 \leq n_{1} \leq N_{1} - 1, 0 \leq n_{2} \leq N_{2} - 1) \\ k & = k_{1} + N_{1} k_{2}, (0 \leq k_{1} \leq N_{1} - 1, 0 \leq k_{2} \leq N_{2} - 1) \end{matrix} \end{matrix}

(20)

Substituting the above equation into the DFT definition, Equation (19) can be rewritten as

Z (k_{1} + N_{1} k_{2}) = \sum_{n_{1} = 0}^{N_{1} - 1} (W_{N}^{n_{1} k_{1}} \sum_{n_{2} = 0}^{N_{2} - 1} W_{N_{2}}^{n_{2} k_{1}} W_{N_{2}}^{n_{2} k_{2} N_{1}} X (n_{1} + N_{1} n_{2})) W_{N_{2}}^{n_{1} k_{2}}

(21)

where, assuming that

N_{1} / N_{2}

are integer values, we have

W_{N_{2}}^{n_{2} k_{2} N_{1}} = e^{- j (\frac{2 π n_{2} k_{2} N_{1}}{N_{2}})} = 1

, and we can express Equation (21) as Equation (22):

Z (k_{1} + N_{1} k_{2}) = \sum_{n_{1} = 0}^{N_{1} - 1} (W_{N}^{n_{1} k_{1}} \sum_{n_{2} = 0}^{N_{2} - 1} W_{N_{2}}^{n_{2} k_{1}} X (n_{1} + N_{1} n_{2})) W_{N_{2}}^{n_{1} k_{2}}

(22)

In Equation (22), the value in parentheses is termed Y and the sigma operation is expressed as a matrix operation, as shown in Equation (23):

Y (k_{1}, n_{1}) = W_{N}^{n_{1} k_{1}} [\begin{matrix} W_{N_{2}}^{0} & W_{N_{2}}^{k_{1}} & W_{N_{2}}^{2 k_{1}} & \dots & W_{N_{2}}^{(N_{2} - 1) k_{1}} \end{matrix}] \times [\begin{matrix} X (n_{1}) \\ X (n_{1} + N_{1}) \\ X (n_{1} + 2 N_{1}) \\ ⋮ \\ X (n_{1} + (N_{2} - 1) N_{1}) \end{matrix}]

(23)

Then, the overall sigma operation in Equation (22) can be represented as a matrix operation using

Y (k, n)

obtained from Equation (23), as shown in Equation (24):

Z (k_{1} + N_{1} k_{2}) = [\begin{matrix} W_{N_{2}}^{0} & W_{N_{2}}^{k_{2}} & W_{N_{2}}^{2 k_{2}} & \dots & W_{N_{2}}^{(N_{2} - 1) k_{2}} \end{matrix}] \times [\begin{matrix} Y (k_{1}, 0) \\ Y (k_{1}, 1) \\ Y (k_{1}, 2) \\ ⋮ \\ Y (k_{1}, N_{1} - 1) \end{matrix}]

(24)

Finally, the process of Equations (23) and (24) obtained earlier can be represented by the respective matrix expressions as Equation (25):

\begin{matrix} \begin{matrix} Y & = W_{M} \circ C_{M 1} X \\ Z & = C_{M 2} Y^{T} \end{matrix} \end{matrix}

(25)

where

W_{M} = W_{N}^{n_{1} k_{1}}

, an

N_{1} \times N_{1}

matrix, ∘ is elementwise multiplication, and

C_{M 1} = W_{N_{2}}^{n_{2} k_{1}}

, an

N_{1} \times N_{2}

matrix.

X = X (n_{1} + N_{1} n_{2})

and is an

N_{2} \times N_{1}

matrix; thus, Y is an

N_{1} \times N_{1}

matrix.

C_{M 2} = W_{N_{2}}^{n_{1} k_{2}}

, which is the same as

C_{M 1}^{T}

; therefore, it is an

N_{2} \times N_{1}

matrix and Z is an

N_{2} \times N_{1}

matrix. In the base-4 FFT algorithm,

N_{2} = 4

is fixed, which can be changed depending on the application.

The base-b FFT algorithm operates on one-dimensional data of length N using two-step factorization. Firstly, the data are partially separated into rows and columns, resulting in

N = N_{r} N_{c}

, where

N_{r}

represents the length of the rows and

N_{c}

represents the length of the columns. Second, the partial separation is further split into

N_{r} = N_{1 r} N_{2}

and

N_{c} = N_{1 c} N_{2}

, and the FFT is applied using Equation (25). This involves three main steps: firstly, the FFT operation defined in Equation (25) is applied

N_{r}

times in the row direction using column data of length

N_{c}

. Next, each element is multiplied by

W_{N}

. Finally, the FFT operation is performed

N_{c}

times in the column direction using row data of length

N_{r}

. In essence, the one-dimensional data are transformed into a two-dimensional matrix of

N_{r} \times N_{c}

and the result is obtained in three steps: column FFT,

W_{N}

multiplication, and row FFT.

A systolic array comprises multiple process element (PE) cells that are locally connected, with each PE cell performing operations and passing data to the connected PE cells, as illustrated in Figure 5. Due to their regular, local data flow and the ability of multiple PE cells to perform operations simultaneously, systolic arrays are well-suited for algorithms requiring numerous operations, such as matrix product operations. Therefore, employing the systolic array structure for the FFT operation using Equation (25), which is organized as a product of matrices, enables high-speed processing of the FFT because the matrix product operation can be executed swiftly.

The systolic array FFT unit, designed to perform FFTs of length

N = N_{r} N_{c}

, has a four-channel structure based on the mathematically efficient

b = 4

in ’base-b’. It primarily consists of five components: an

(N_{r} / 4) \times 4

array of PE cells on the left, referred to as the left-hand side (LHS); an

(N_{r} / 4) \times 1

array of complex multipliers for multiplying

W_{M}

, the right-hand side (RHS) comprising an

(N r / 4) \times 4

array of PE cells on the right, four complex multipliers for multiplying

W_{N}

, and four memories, each sized

N / 4

. Figure 6 illustrates the operations conducted within the PE cells of the LHS and RHS, respectively.

The hardware structure of a base-4 systolic array FFT unit capable of performing a 4096-length FFT is depicted in Figure 7. It comprises a

16 \times 4

array of PE cells on the LHS and a

16 \times 4

array of PE cells on the RHS, with a

16 \times 1

array of

W_{M}

multipliers positioned between the LHS and RHS. Additionally, there are four

W_{N}

multipliers located on the right-hand side of the RHS, along with four 32-bit memories.

The FFT process sequentially progresses through column DFT and row DFT. Initially, during the columnar DFT process, input X is received from the LHS. The present values of

C_{M 1}

and X inside the PE cells undergo a matrix product operation, as depicted in Figure 8. The resulting product,

C_{M 1} \times X

, is then passed to the

W_{M}

complex multiplier for multiplication by

W_{M}

. Subsequently, the product,

W_{M} \circ C_{M 1} \times X

, is transmitted to the RHS, yielding the result Y as per Equation (23). Simultaneously, the

C_{M 2}

value is received from the bottom, and

C_{M 2} \times Y^{t}

is computed, resulting in Z as per Equation (24). This value represents the output of column DFT and is further processed by the

W_{N}

complex multiplier for multiplication by

W_{N}

before being stored in memory. Once the columnar DFT operation is completed, the results stored in memory are fed back to the LHS in the row direction. The modules

W_{M}

, RHS, and

W_{N}

(with

W_{N}

being multiplied by 1, essentially not performing the

W_{N}

operation at this stage) operate similarly to column DFT. Subsequently, the row DFT process is finalized and the result is output.

3.2. Interpolation Unit Hardware Architecture

The computational process of the PFA involves resampling the wavenumber domain signal into a uniform rectangular format. While two-dimensional wavenumber domain interpolation is often computationally expensive, a cost-saving approach involves performing interpolation in two separate parts: one in the range direction and another in the azimuth direction. Consequently, hardware for interpolation must be designed to expedite this computational process. Linear interpolation is typically favored for resampling tasks, due to its low hardware complexity and high speed.

The general interpolation equation for interpolating the value

(x, y)

between two points

(x_{0}, y_{0})

and

(x_{1}, y_{1})

is as follows:

y = y_{0} + (x - x_{0}) \cdot \frac{y_{1} - y_{0}}{x_{1} - x_{0}}

(26)

where

x_{0} < x < x_{1}

is satisfied. Substituting the above equation for the interpolation operation in the PFA process can be expressed as follows:

\hat{P} (a, r) = P (a, r) + (K_{u i} - K_{u} (a, r)) \cdot \frac{P (a, r + 1) - P (a, r)}{K_{u} (a, r + 1) - K_{u} (a, r)}

(27)

where a represents the pulse number in the azimuth direction, r signifies the sample number in the range direction,

\hat{P}

denotes the linearly interpolated phase history values, P stands for the uninterpolated phase history data,

K_{u i}

refers to the values at uniform locations in the

K_{u}

domain to be interpolated, and

K_{u}

represents the location values of each phase history uniformly present in the original

K_{R}

domain.

In the PFA, interpolation begins in the range direction. The process involves converting data uniformly distributed in the

K_{R}

domain to uniform data in the

K_{u}

domain. To achieve interpolation to uniformly distributed values in the

K_{u}

domain, we first need

K_{u i}

, which is the position value of the data uniformly distributed in the

K_{u}

domain. This value can be calculated and generated from the SAR system parameters: number of samples, number of pulses, platform distance, and resolution. With the obtained

K_{u i}

value, we can perform linear interpolation using Equation (27) to transfer the data uniformly present in the

K_{R}

domain to the uniform data in the

K_{u}

domain, as shown in Figure 9. Similarly, after the range interpolation, we perform the azimuth interpolation in the same manner, converting data uniformly present in the n-domain to uniform data in the

K_{v}

-domain. The end result is a rectangular shape. Since range interpolation and azimuth interpolation are similarly performed, a single piece of interpolation unit can perform both range and azimuth interpolation.

To optimize the acceleration effect of a hardware accelerator, which is designed using an FPGA, it is critical to minimize the time taken for data transfer between the DRAM and the accelerator during computations. As Equation (27) shows, the process of linear interpolation to determine the value of

\hat{P}

requires two neighboring phase history data points, their corresponding values in the

K_{R}

domain,

K_{u}

, and the

K_{u i}

value at the location where interpolation is to be performed. In this case, the linear interpolation for the range direction must be repeated as many times as the number of azimuth pulses for every sample in the range direction, and the linear interpolation for the azimuth direction must be similarly performed, resulting in a large amount of data being transferred.

For example, suppose we want to perform a linear interpolation with a length of 512 in the range direction and 512 in the azimuth direction for phase history data with a pulse count of 352 and a sample count of 424. First, to perform the linear interpolation on the primary distance data, we require 424 phase history data points corresponding to

P (0, 0) \dots P (0, 423)

per Equation (27), and their corresponding values in the

K_{R}

domain,

K_{u} (0, 0) \dots K_{u} (0, 423)

. This process must be repeated 352 times, which is the length of the azimuth direction. This means that 424

K_{u}

data and 424 phase history data must be transferred for each iteration, which is extremely time-consuming. Furthermore, the typical procedure for executing the interpolation operation is depicted in Figure 10a. This involves executing a ‘for’ statement within a double loop. Due to the computational complexity being

O (M N)

, this process can be time-consuming.

To address these issues, the proposed interpolation unit is organized as follows. First, to minimize data transmission, the proposed interpolation unit is designed to internally compute the

K_{u}

value inside the interpolation unit to be used in the computation, exploiting the fact that the value of

K_{u}

can be calculated from the parameter values of the SAR system. The

K_{u}

value can be calculated from the parameters of the SAR system, as shown in Equation (28):

K_{u} (a, r) = \hat{r} \cdot \hat{u} \cdot k_{freq}

(28)

where

\hat{r}

and

\hat{u}

are the unit vectors for the direction of the SAR system and

k_{freq}

is the frequency value of the LFM signal of the SAR system. Therefore, if

K_{u}

is calculated and used within the interpolation unit by utilizing the parameter values of the SAR system, the interpolation operation can be performed directly without data transfer. Furthermore, the number of operations can be reduced by utilizing the characteristics of the

K_{u}

value calculated from the SAR system parameter values. Since

k_{freq}

in Equation (28) refers to the LFM signal in the SAR system, its value is seen to change linearly. Therefore, the

K_{u}

value calculated with a linearly increasing LFM signal will progressively increase. Consequently, using this, the process of the calculation can be shown in Figure 10b. Since

K_{u} (a, r)

is consistently less than or equal to

K_{u} (a, r + 1)

, it becomes a condition that cannot be satisfied even if we continue to increase the r value. Therefore, the amount of computation becomes

O (N)

, which is significantly faster than

O (M N)

.

The interpolation unit design comprises the

K_{u}

generator to calculate the

K_{u}

value, the

K_{u i}

generator to generate the

K_{u i}

value, the comparator to assess whether the conditions are met to perform interpolation, and the interpolation unit to perform the interpolation operation, as shown in Figure 11. The

K_{u}

generator and

K_{u i}

generator take the SAR system parameter values to generate the

K_{u i}

and

K_{u}

values, the comparator compares the

K_{u i}

and

K_{u}

values. If the appropriate conditions are satisfied, the interpolation unit performs the interpolation. The proposed interpolation unit is designed using a floating-point number system. This has the advantage of a higher signal-to-quantization noise ratio (SQNR) because it generally exhibits superior precision than using a fixed-point counting system but it introduces more complexity. To resolve this and improve precision, the FP32 number system is used for the

K_{u}

and

K_{u i}

values and FP16 is used for phase history data to reduce complexity. In Equation (27),

\frac{K_{u i} - K_{u} (a, r)}{(K_{u} (a, r + 1) - K_{u} (a, r)}

is first calculated with FP32 and then converted to FP16 to perform the interpolation operation to reduce the complexity.

4. Implementation and Acceleration Results

The proposed PFA-based SAR processor consists of an interpolation unit for interpolation and an SA-FFT unit for compression in the range or azimuth direction, as shown in Figure 12. Each unit is designed as a reusable block (IP core) to support the reconfigurability and is interconnected using the AXI4 bus. The IP consists of a master interface and a slave interface to communicate with the DDR memory controller and microprocessor, the registers to change the mode of each device, and RAM to temporarily store input and output data. The master interface is connected to the 128-bit AXI4 bus, which allows four 32-bit bits of data to be sent and received per cycle, enabling efficient parallel processing. In addition, the AXI4 bus-based design minimizes the use of SRAM inside the IP by exchanging data with the DDR memory, which means that the VLSI implementation can be employed with a smaller footprint and lower cost while reducing power usage, all being advantageous for drone-mounted operations.

The proposed PFA-based SAR processor was implemented using Verilog HDL on the Xilinx Zynq UltraScale+ FPGA platform. As shown in Table 1, the SA-FFT unit using the FP16 number system was implemented with 99,610 CLB LUTs, 21,921 CLB registers, 78 DSPs, and 12 Block RAMs, and the interpolation unit was implemented with 5722 CLB LUTs, 2306 CLB registers, and 17 DSPs. The design has a maximum operating frequency of 150 MHz and measured power consumption of 3.677 W. Figure 13 shows the verification environment of the FPGA platform.

To evaluate the proposed PFA-based SAR processor’s performance, we compared the execution times of various suboperations with the software implementation based on an ARM Cortex-A53 using the Gotcha and Sandia datasets. The FFT process was accelerated by the SA-FFT unit and the interpolation process was accelerated by the interpolation unit to measure the time. For the

512 \times 512

image on the Gotcha dataset, the image generation time decreased from 1.732 s to 0.046 s, achieving a 37.39 times speed increase. Similarly, for the

2048 \times 2048

image on the Sandia dataset, the time decreased from 34.364 s to 0.766 s, achieving a 44.862 times speed increase. The summarized results are presented in Table 2:

We compared the quality of the images generated by the ARM Cortex-A53-based software with those generated by the proposed PFA-based SAR processor. There are three metrics that measure the performance of an image: the SQNR, the peak signal-to-noise ratio (PSNR), and the structural similarity index map (SSIM). The SQNR is defined as follows:

SQNR = \frac{P_{s i g n a l}}{P_{n o i s e}}

(29)

where

P_{s i g n a l}

represents the mean of the squares of the images before quantization and

P_{n o i s e}

represents the mean of the squares of the difference between the image before quantization and the image after quantization. Then, the PSNR is defined as follows:

PSNR = 10 {log}_{10} (\frac{{M A X_{I}}^{2}}{M S E})

(30)

where

M A X_{I}

represents the maximum possible pixel value of the image and

M S E

represents the mean square error, which is the difference between the original image and the image generated by the SAR processor. Lastly, the SSIM is defined as follows:

SSIM (x, y) = \frac{(2 μ_{x} μ_{y} + C_{1}) (2 σ_{x y} + C_{2})}{(μ_{x}^{2} + μ_{y}^{2} + C_{1}) (σ_{x}^{2} + σ_{y}^{2} + C_{2})}

(31)

where

μ_{x}

and

μ_{y}

represent the mean values of the images x and y,

σ_{x}^{2}

and

σ_{y}^{2}

represent the variances of the images x and y,

σ_{x y}

represents the covariance between images x and y, and

C_{1}

and

C_{2}

represent small constants for stabilization.

We measured the SQNR, PSNR, and SSIM. For the ’Gotcha image’, the SQNR was 50.31 dB, the PSNR was 61.75 dB, and the SSIM was 0.9941, and for the ’Sandia image’ the SQNR was 51.71 dB, the PSNR was 65.96 dB, and the SSIM was 0.9986. Since the proposed processor uses a floating-point number system to generate the images, they exhibit high similarity to the software-generated images as seen in Figure 14.

Table 3 compares the execution times of the proposed PFA-based SAR processor with the previous studies presented in references [52,53,54,55]. Due to the different image sizes presented in each study, we present the execution time normalized by the image size for comparison:

T_{norm} = \frac{Execution Time}{Image Size}

(32)

Although [52] achieved a higher speed than the proposed design, its power consumption of 247 W was very high and it used a large Tesla C2075 GPU, which is not suitable for drone SAR applications. Compared to [53], the proposed design occupies a smaller area and achieves a higher speed. Refs. [54,55] exhibited higher speeds than the proposed design. However, in [54], 1.33 times the number of CLB LUTs, 8.1 times the number of CLB registers, 8.54 times the number of DSPs, and 31.25 times the block RAM were used, and in [55] about 2.74 times the number of CLB LUTs, 14.88 times the number of CLB registers, 12.58 times the number of DSPs, and 48.25 times the block RAM were used. To measure the execution time normalized for resource usage compared to the proposed design, we define the following metrics:

T_{X} = T_{norm} \cdot \frac{X}{Y}

(33)

where X represents the number of CLB LUTs, CLB Registers, DSPs, or block RAM in the design to be compared and Y represents the number of CLB LUTs, CLB Registers, DSPs, or block RAM in the proposed design. If X is greater than Y, the value of

T_{X}

will be greater because it indicates that the resources of the compared design are more heavily utilized than the resources of the proposed design. A smaller

T_{X}

value shows how fast the design is relative to its usage.

This metric was used to compare the proposed design. Regarding the CLB LUTs usage, the proposed design is slightly slower, but regarding other resources and SRAM, the proposed design achieves higher speeds per unit area. As a result, compared to previous studies, the proposed PFA-based SAR processor achieves the fastest image generation time regarding power, area, and resource usage. Therefore, the proposed design is advantageous for small drone ViSAR applications that require less weight, space, and high speed. The proposed design generates one image frame in 0.046 s using

512 \times 512

single-precision floating-point complex data, which means that it can generate images at up to 21.74 Hz. Assuming the PRF is 250 Hz and the number of pulses is 352, and substituting it into Equation (5), the pulses can be overlapped by 96.7%, which means that the proposed architecture can achieve real-time imaging with high frame rates.

5. Conclusions

In this study, we propose a design for the architecture of a PFA-based SAR processor consisting of an SA-FFT unit and an interpolation unit. The SA-FFT unit is designed as a systolic array structure to achieve high computation speed, and the interpolation unit uses a linear interpolation algorithm with a small area and high speed and is optimized for the PFA. Each unit is designed as a reusable block (IP core) to support reconfigurability and is interconnected using the AXI4 bus. In addition, since SRAM requires more power, a larger area, and is more expensive than DRAM, using less SRAM is desirable for VLSI implementations. Therefore, transferring data to and from the DDR memory via the AXI4 bus minimizes the use of SRAM within the IP. The proposed design was implemented on the Xilinx Zynq UltraScale+ FPGA platform. The SA-FFT unit was implemented with 99,610 CLB LUTs, 21,921 CLB registers, 78 DSPs, and 12 Block RAMs, and the interpolation unit was implemented with 5722 CLB LUTs, 2306 CLB registers, and 17 DSPs. For comparison, the execution time of the ARM Cortex-A53-based software was measured for an image of

2048 \times 2048

pixels, and the proposed model achieved a 44.862 times acceleration. By normalizing the execution time by the number of pixels and comparing the speed-to-area ratio by normalizing the number of resources for each, we achieved higher speed with less power consumption compared to previous studies. Therefore, the proposed design is more suitable for small drone ViSAR applications.

In future research, we will implement a processor that can support various algorithms, such as the RDA, the CSA, and the BPA, simultaneously, employing the AXI4 bus-based design. This would also include the implementation of an ASIC that can be used on a small SAR platform based on the validated design with an FPGA. This model is expected to be more power-efficient and to be usable in multiple applications to run various algorithms.

Author Contributions

D.J. designed the PFA-based SAR processor, performed the experiment and evaluation, and wrote the paper. M.L. and W.L. implemented the processor and performed the revision of this manuscript. Y.J. conceived of and led the research, analyzed the experimental results, and wrote the paper. All authors have read and agreed to the published version of the manuscript.

Funding

The authors gratefully acknowledge the support of the Next-Generation SAR Research Laboratory at Korea Aerospace University, originally funded by the Defense Acquisition Program Administration (DAPA) and the Agency for Defense Development (ADD).

Data Availability Statement

All data underlying the results are available as part of the article and no additional source data are required.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Brown, W.M.; Porcello, L.J. An introduction to synthetic-aperture radar. IEEE Spectr. 1969, 6, 52–62. [Google Scholar] [CrossRef]
Munson, D.C.; Visentin, R.L. A signal processing view of strip-mapping synthetic aperture radar. IEEE Trans. Acoust. Speech Signal Process. 1989, 37, 2131–2147. [Google Scholar] [CrossRef]
Moreira, A.; Prats-Iraola, P.; Younis, M.; Krieger, G.; Hajnsek, I.; Papathanassiou, K.P. A tutorial on synthetic aperture radar. IEEE Geosci. Remote Sens. Mag. 2013, 1, 6–43. [Google Scholar] [CrossRef]
Lou, Y.; Clark, D.; Marks, P.; Muellerschoen, R.J.; Wang, C.C. Onboard radar processor development for rapid response to natural hazards. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2016, 9, 2770–2776. [Google Scholar] [CrossRef]
Percivall, G.S.; Alameh, N.S.; Caumont, H.; Moe, K.L.; Evans, J.D. Improving disaster management using earth observations—GEOSS and CEOS activities. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2013, 6, 1368–1375. [Google Scholar] [CrossRef]
Tralli, D.M.; Blom, R.G.; Zlotnicki, V.; Donnellan, A.; Evans, D.L. Satellite remote sensing of earthquake, volcano, flood, landslide and coastal inundation hazards. ISPRS J. Photogramm. Remote Sens. 2005, 59, 185–198. [Google Scholar] [CrossRef]
Sharma, R.K.; Kumar, B.S.; Desai, N.M.; Gujraty, V. SAR for disaster management. IEEE Aerosp. Electron. Syst. Mag. 2008, 23, 4–9. [Google Scholar] [CrossRef]
Yang, X.; Shi, J.; Zhou, Y.; Wang, C.; Hu, Y.; Zhang, X.; Wei, S. Ground moving target tracking and refocusing using shadow in video-SAR. Remote Sens. 2020, 12, 3083. [Google Scholar] [CrossRef]
Guo, P.; Wu, F.; Tang, S.; Jiang, C.; Liu, C. Implementation Method of Automotive Video SAR (ViSAR) Based on Sub-Aperture Spectrum Fusion. Remote Sens. 2023, 15, 476. [Google Scholar] [CrossRef]
Kim, C.K.; Azim, M.T.; Singh, A.K.; Park, S.O. Doppler shifting technique for generating multi-frames of video SAR via sub-aperture signal processing. IEEE Trans. Signal Process. 2020, 68, 3990–4001. [Google Scholar] [CrossRef]
Yang, C.; Chen, Z.; Deng, Y.; Wang, W.; Wang, P.; Zhao, F. Generation of Multiple Frames for High Resolution Video SAR Based on Time Frequency Sub-Aperture Technique. Remote Sens. 2023, 15, 264. [Google Scholar] [CrossRef]
Cheng, Y.; Ding, J.; Sun, Z.; Zhong, C. Processing of airborne video SAR data using the modified back projection algorithm. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–13. [Google Scholar] [CrossRef]
Bimber, O.; Kurmi, I.; Schedl, D.C. Synthetic aperture imaging with drones. IEEE Comput. Graph. Appl. 2019, 39, 8–15. [Google Scholar] [CrossRef]
Liu, B.; Wang, K.; Liu, X.; Yu, W. An efficient SAR processor based on GPU via CUDA. In Proceedings of the 2009 2nd International Congress on Image and Signal Processing, Tianjin, China, 17–19 October 2009; pp. 1–5. [Google Scholar]
Tang, H.; Li, G.; Zhang, F.; Hu, W.; Li, W. A spaceborne SAR on-board processing simulator using mobile GPU. In Proceedings of the 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Beijing, China, 10–15 July 2016; pp. 1198–1201. [Google Scholar]
Wielage, M.; Cholewa, F.; Fahnemann, C.; Pirsch, P.; Blume, H. High performance and low power architectures: GPU vs. FPGA for fast factorized backprojection. In Proceedings of the 2017 Fifth International Symposium on Computing and Networking (CANDAR), Aomori, Japan, 19–22 November 2017; pp. 351–357. [Google Scholar]
Wang, S.; Zhang, S.; Huang, X.; An, J.; Chang, L. A highly efficient heterogeneous processor for SAR imaging. Sensors 2019, 19, 3409. [Google Scholar] [CrossRef]
Hartley, T.D.; Fasih, A.R.; Berdanier, C.A.; Ozguner, F.; Catalyurek, U.V. Investigating the use of GPU-accelerated nodes for SAR image formation. In Proceedings of the 2009 IEEE International Conference on Cluster Computing and Workshops, New Orleans, LA, USA, 31 August–4 September 2009; pp. 1–8. [Google Scholar]
Ning, X.; Yeh, C.; Zhou, B.; Gao, W.; Yang, J. Multiple-GPU accelerated range-Doppler algorithm for synthetic aperture radar imaging. In Proceedings of the 2011 IEEE RadarCon (RADAR), Kansas City, MO, USA, 23–27 May 2011; pp. 698–701. [Google Scholar]
Le, C.; Chan, S.; Cheng, F.; Fang, W.; Fischman, M.; Hensley, S.; Johnson, R.; Jourdan, M.; Marina, M.; Parham, B.; et al. Onboard FPGA-based SAR processing for future spaceborne systems. In Proceedings of the 2004 IEEE Radar Conference (IEEE Cat. No. 04CH37509), Philadelphia, PA, USA, 29 April 2004; pp. 15–20. [Google Scholar]
Wiehle, S.; Mandapati, S.; Günzel, D.; Breit, H.; Balss, U. Synthetic aperture radar image formation and processing on an MPSoC. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–14. [Google Scholar] [CrossRef]
Xie, Y.; Zhong, Z.; Li, B.; Xie, Y.; Chen, L.; Chen, H. An ARM-FPGA Hybrid Acceleration and Fault Tolerant Technique for Phase Factor Calculation in Spaceborne Synthetic Aperture Radar Imaging. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 5059–5072. [Google Scholar] [CrossRef]
Wang, J.; Feng, D.; Xu, Z.; Wu, Q.; Hu, W. Time-domain digital-coding active frequency selective surface absorber/reflector and its imaging characteristics. IEEE Trans. Antennas Propag. 2020, 69, 3322–3331. [Google Scholar] [CrossRef]
Zhou, X.; Yu, Z.J.; Cao, Y.; Jiang, S. SAR imaging realization with FPGA based on VIVADO HLS. In Proceedings of the 2019 IEEE International Conference on Signal, Information and Data Processing (ICSIDP), Chongqing, China, 11–13 December 2019; pp. 1–4. [Google Scholar]
Milton, M.; Benigni, A.; Monti, A. Real-time multi-FPGA simulation of energy conversion systems. IEEE Trans. Energy Convers. 2019, 34, 2198–2208. [Google Scholar] [CrossRef]
Waidyasooriya, H.M.; Hariyama, M. Multi-FPGA accelerator architecture for stencil computation exploiting spacial and temporal scalability. IEEE Access 2019, 7, 53188–53201. [Google Scholar] [CrossRef]
Brown, W.M.; Fredricks, R.J. Range-Doppler imaging with motion through resolution cells. IEEE Trans. Aerosp. Electron. Syst. 1969, AES-5, 98–102. [Google Scholar] [CrossRef]
Sun, J.; Mao, S.; Wang, G.; Hong, W. Polar format algorithm for spotlight bistatic SAR with arbitrary geometry configuration. Prog. Electromagn. Res. 2010, 103, 323–338. [Google Scholar] [CrossRef]
Yegulalp, A.F. Fast backprojection algorithm for synthetic aperture radar. In Proceedings of the 1999 IEEE Radar Conference. Radar into the Next Millennium (Cat. No. 99CH36249), Waltham, MA, USA, 22 April 1999; pp. 60–65. [Google Scholar]
Raney, R.K.; Runge, H.; Bamler, R.; Cumming, I.G.; Wong, F.H. Precision SAR processing using chirp scaling. IEEE Trans. Geosci. Remote Sens. 1994, 32, 786–799. [Google Scholar] [CrossRef]
Shin, H.S.; Lim, J.T. Omega-K algorithm for spaceborne spotlight SAR imaging. IEEE Geosci. Remote Sens. Lett. 2011, 9, 343–347. [Google Scholar] [CrossRef]
Desai, M.D.; Jenkins, W.K. Convolution backprojection image reconstruction for spotlight mode synthetic aperture radar. IEEE Trans. Image Process. 1992, 1, 505–517. [Google Scholar] [CrossRef]
Zhang, B.; Xu, G.; Zhou, R.; Zhang, H.; Hong, W. Multi-channel back-projection algorithm for mmwave automotive MIMO SAR imaging with Doppler-division multiplexing. IEEE J. Sel. Top. Signal Process. 2022, 17, 445–457. [Google Scholar] [CrossRef]
Walker, J.L. Range-Doppler imaging of rotating objects. IEEE Trans. Aerosp. Electron. Syst. 1980, AES-16, 23–52. [Google Scholar] [CrossRef]
Yuan, Y.; Sun, J.; Mao, S. PFA algorithm for airborne spotlight SAR imaging with nonideal motions. IEE Proc.-Radar Sonar Navig. 2002, 149, 174–182. [Google Scholar] [CrossRef]
Jiang, J.; Li, Y.; Zheng, Q. A THz Video SAR Imaging Algorithm Based on Chirp Scaling. In Proceedings of the 2021 CIE International Conference on Radar (Radar), Haikou, China, 15–19 December 2021; pp. 656–660. [Google Scholar]
Rigling, B.D.; Moses, R.L. Polar format algorithm for bistatic SAR. IEEE Trans. Aerosp. Electron. Syst. 2004, 40, 1147–1159. [Google Scholar] [CrossRef]
Baas, B.M. A 9.5 mW 330 sec 1024-point FFT Processor. In Proceedings of the Custom Integrated Circuits Conference, Santa Clara, CA, USA, 11–14 May 1998; pp. 11–14. [Google Scholar]
He, S.; Torkelson, M. Design and implementation of a 1024-point pipeline FFT processor. In Proceedings of the IEEE 1998 Custom Integrated Circuits Conference (Cat. No. 98CH36143), Santa Clara, CA, USA, 14 May 1998; pp. 131–134. [Google Scholar]
Lee, M.K.; Shin, K.W.; Lee, J.K. A VLSI array processor for 16-point FFT. IEEE J.-Solid-State Circuits 1991, 26, 1286–1292. [Google Scholar] [CrossRef]
Kung, H.T. Why systolic architectures? Computer 1982, 15, 37–46. [Google Scholar] [CrossRef]
Kung, S.Y. VLSI array processors. IEEE ASSP Mag. 1985, 2, 4–22. [Google Scholar] [CrossRef]
Chan, L.W.; Chen, M.Y. A new systolic array for discrete Fourier transform. IEEE Trans. Acoust. Speech Signal Process. 1988, 36, 1665–1666. [Google Scholar] [CrossRef]
Wang, C.L.; Chang, Y.T. Efficient 2-D systolic array implementation of a prime factor DFT algorithm. In Proceedings of the TENCON’92-Technology Enabling Tomorrow, Notre Dame, IN, USA, 4–5 March 1992; pp. 56–60. [Google Scholar]
Lee, M.H. High speed multidimensional systolic arrays for discrete Fourier transform. IEEE Trans. Circuits Syst. II Analog. Digit. Signal Process. 1992, 39, 876–879. [Google Scholar] [CrossRef]
Lim, H.; Swartzlander, E.E. Multidimensional systolic arrays for the implementation of discrete Fourier transforms. IEEE Trans. Signal Process. 1999, 47, 1359–1370. [Google Scholar]
Meher, P.K. Efficient systolic implementation of DFT using a low-complexity convolution-like formulation. IEEE Trans. Circuits Syst. II Express Briefs 2006, 53, 702–706. [Google Scholar] [CrossRef]
Nash, J.G. Computationally efficient systolic architecture for computing the discrete Fourier transform. IEEE Trans. Signal Process. 2005, 53, 4640–4651. [Google Scholar] [CrossRef]
Zhao, S.; Chen, J.; Yang, W.; Sun, B.; Wang, Y. Image formation method for spaceborne video SAR. In Proceedings of the 2015 IEEE 5th Asia-Pacific Conference on Synthetic Aperture Radar (APSAR), Singapore, 1–4 September 2015; pp. 148–151. [Google Scholar]
Liu, B.; Zhang, X.; Tang, K.; Liu, M.; Liu, L. Spaceborne video-SAR moving target surveillance system. In Proceedings of the 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Beijing, China, 10–15 July 2016; pp. 2348–2351. [Google Scholar]
Khosravi, M.R.; Samadi, S. Frame rate computing and aggregation measurement toward QoS/QoE in Video-SAR systems for UAV-borne real-time remote sensing. J. Supercomput. 2021, 77, 14565–14582. [Google Scholar] [CrossRef]
Xu, Z.; Zhu, D. High-resolution miniature UAV SAR imaging based on GPU architecture. J. Phys. Conf. Ser. 2018, 1074, 012122. [Google Scholar] [CrossRef]
Liu, R.; Zhu, D.; Wang, D.; Du, W. FPGA implementation of SAR imaging processing system. In Proceedings of the 2019 6th Asia-Pacific Conference on Synthetic Aperture Radar (APSAR), Xiamen, China, 26–29 November 2019; pp. 1–5. [Google Scholar]
Linchen, Z.; Jindong, Z.; Daiyin, Z. FPGA implementation of polar format algorithm for airborne spotlight SAR processing. In Proceedings of the 2013 IEEE 11th International Conference on Dependable, Autonomic and Secure Computing, Chengdu, China, 21–22 December 2013; pp. 143–147. [Google Scholar]
Wang, D.; Zhu, D.; Liu, R. Video SAR high-speed processing technology based on FPGA. In Proceedings of the 2019 IEEE MTT-S International Microwave Biomedical Conference (IMBioC), Nanjing, China, 6–8 May 2019; Volume 1, pp. 1–4. [Google Scholar]

Figure 1. Geometry for SAR image generation.

Figure 2. Aperture mode for video SAR: (a) non-overlapped mode; (b) overlapped mode.

Figure 3. Geometry for SAR operation.

Figure 4. Procedure of the PFA.

Figure 5. Architecture of the systolic array.

Figure 6. Operations inside PE cells: (a) LHS PE; (b) RHS PE.

Figure 7. Hardware architecture of the proposed FP16 base-4 systolic array FFT unit.

Figure 8. The base-4 systolic array FFT unit’s computational procedure.

Figure 9. 2D resampling process.

Figure 10. Flow charts of interpolation operations: (a) typical; (b) proposed.

Figure 11. Hardware architecture of the proposed FP16 interpolation unit.

Figure 12. FPGA platform for verifying the proposed PFA-based SAR processor.

Figure 13. Verification environment for the proposed FPGA implementation.

Figure 14. Generated SAR images: (a) Cortex-A53-based Gotcha image; (b) proposed processor-based Gotcha image; (c) Cortex-A53-based Sandia image; (d) proposed processor-based Sandia image.

Table 1. Implementation results based on the Xilinx Zynq Ultrascale+ FPGA device.

Unit	# CLB LUTs	# CLB Registers	# DSPs	# Block RAMs	Max. Op. Freq.
SA-FFT	99,610	21,921	78	12	150 MHz
interpolation	5722	2306	17	-	150 MHz

Table 2. PFA execution time.

Image Size	Execution Time (s)				Speedup Ratio (SW vs. HW)
Image Size	Full SW	Interp. Accel.	FFT Accel.	Full HW	Speedup Ratio (SW vs. HW)
$512 \times 512$	1.732	1.654	0.124	0.046	37.393
$2048 \times 2048$	34.364	32.654	2.476	0.766	44.862

Table 3. Comparison with previous implementation.

	[52]	[53]	[54]	[55]	Proposed
Platform	GPU	FPGA	FPGA	FPGA	FPGA
Operating Freq.	1.15 GHz	200 MHz	200 MHz	200 MHz	150 MHz
Power (W)	247	-	-	-	3.677
Image Size	$4096 \times 8192$	$4096 \times 2048$	$4096 \times 4096$	$2048 \times 2048$	$512 \times 512$
Image Size	$4096 \times 8192$	$4096 \times 2048$	$4096 \times 4096$	$2048 \times 2048$	$2048 \times 2048$
Exec. Time (s)	2.16	2.1	1	0.18	0.046
Exec. Time (s)	2.16	2.1	1	0.18	0.766
$T_{norm}$ (ns)	64.373	250.339	59.604	42.915	175.476
$T_{norm}$ (ns)	64.373	250.339	59.604	42.915	182.629
throughput (MB/s)	62.14	15.98	67.11	93.21	22.79
throughput (MB/s)	62.14	15.98	67.11	93.21	21.91
# CLB LUTs	-	247,906	139,586	289,075	105,332
# CLB Registers	-	433,200	196,258	360,486	24,227
# DSPs	-	1093	811	1195	95
# Block RAMs	-	1056	375	579	12
$T_{CLB LUTs}$ (ns)	-	589.189	78.987	117.776	175.476
$T_{CLB Registers}$ (ns)	-	4476.281	482.839	638.554
$T_{DSPs}$ (ns)	-	2880.216	508.829	539.826
$T_{Block RAMs}$ (ns)	-	22,029.832	1862.625	2070.649

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jeong, D.; Lee, M.; Lee, W.; Jung, Y. FPGA-Based Acceleration of Polar-Format Algorithm for Video Synthetic-Aperture Radar Imaging. Electronics 2024, 13, 2401. https://0-doi-org.brum.beds.ac.uk/10.3390/electronics13122401

AMA Style

Jeong D, Lee M, Lee W, Jung Y. FPGA-Based Acceleration of Polar-Format Algorithm for Video Synthetic-Aperture Radar Imaging. Electronics. 2024; 13(12):2401. https://0-doi-org.brum.beds.ac.uk/10.3390/electronics13122401

Chicago/Turabian Style

Jeong, Dongmin, Myeongjin Lee, Wookyung Lee, and Yunho Jung. 2024. "FPGA-Based Acceleration of Polar-Format Algorithm for Video Synthetic-Aperture Radar Imaging" Electronics 13, no. 12: 2401. https://0-doi-org.brum.beds.ac.uk/10.3390/electronics13122401

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

FPGA-Based Acceleration of Polar-Format Algorithm for Video Synthetic-Aperture Radar Imaging

Abstract

1. Introduction

2. Background

2.1. Geometry and Frame Rate of Video SAR

2.2. Polar-Format Algorithm

3. Proposed Hardware Architecture

3.1. Base-4 Systolic Array FFT Unit Hardware Architecture

3.2. Interpolation Unit Hardware Architecture

4. Implementation and Acceleration Results

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI