# An efficient HDL IP-core Generator for OFDM modulators

Roberta Avanzato<sup>*a*</sup>, Gabriele Nicotra<sup>*b*</sup>

<sup>a</sup>Department of Electrical, Electronic and Computer Engineering, University of Catania, 95125 Catania, Italy <sup>b</sup>Department of Mathematics and Computer Science, University of Catania, 95125 Catania, Italy

#### Abstract

In this paper, we propose a HDL IP generator for (Orthogonal Frequency-Division Multiplexing) OFDM modulators. This modulation is used in many telecommunication standards. However, each standard requires a specific OFDM modulator characterized by a different number of carriers and a cyclic prefix. These differences, in terms of OFDM parameters, have a negative impact on RTL hardware design. This diversity makes difficult the reusing modulators already designed for a different project involving a different communication standard. For this reason, the authors propose an automatic IP HDL generator capable of generating RTL code in VHDL or Verilog of OFDM modulators with number of carriers and cyclic prefix settable by user. The generated IP have been characterized in terms of max frequency, hardware resources, and power consumption. The authors performed the hardware implementations on a XILINX xc7z030 FPGA.

Keywords OFDM, FPGA

### 1. Introduction

In the several last years, digital electronics have been increasingly used in several fields. This is essentially due to the capability of modern integrated digital circuits to provide high computational power allowing the realization of complex (Digital Signal Processing) DSP circuits [1, 2]. Digital systems can be developed using two main technologies that are (Application Specific Integrated Circuits) ASICs and (Field Programmable Gate Arrays) FPGAs. Nowadays FPGAs and digital ASICs can be used in several fields as Machine Learning [3] [4],[5], health [6],[7],[8], and communication systems [9, 10], [11] audio [12], [13] etc [14], [15]. Modern digital communication systems require high computation capabilities and for this reason, FP-GAs represent nowadays an optimal solution for their implementation For example in [16], [17] FPGA implementations of digital transmitters are presented, in [18] the authors use an FPGA to implement a spacecraft tracking system. Similar approaches can be used for modem in current and future wired Digital Subscriber Line technologies [19] or satellite [20].

The Hardware implementation of digital communication systems both on ASIC and FPGA requires a very complex design flow. Such as a flow is extremely slow if compared with the one used for software im-

➡ roberta.avanzato@phd.unict.it (G. Nicotra)

plementation. This design flow can be divided into two steps: called front-end and back-end. The front-end phase consists of RTL design using HDL languages like VHDL, Verilog, or System Verilog. The back-end phase involves the physical-design (the circuit layout).

A hardware description language (HDL) is a language used to describe the architecture and behavior of electronic circuits, usually digital logic circuits. Hardware description languages born with the intent to help engineers to describe circuits. Successively with the born of hardware synthesizers HDL language started to be used for simulation and synthesis. Hardware synthesizers are software able to transform HDL files in a netlist of electronic circuits and connections. A netlist is a specification of physical electronic components and how they are connected together.

A hardware description language looks much like a programming language such as C but differs from them for several aspects. An important difference between programming languages and HDLs is that HDLs explicitly include the notion of time. A second important difference is that HDL languages describe parallel process.

Due to the exploding complexity of digital electronic circuits since the 1970s (see Moore's law), synthesis through HDL languages began a necessity. There are two major hardware description languages: VHDL and Verilog.

The front-end phase is very slow because HDL languages are very complex to develop and verify. In order to speed-up the RTL design phase, HDL Intellectual Property (IPs) are increasingly proposed in the lit-

SYSTEM 2020: Symposium for Young Scientists in Technology, Engineering and Mathematics, Online, May 20 2020

<sup>© 2020</sup> Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org)

erature and used by RTL engineers. IPs are re-usable blocks of HDL code that can either be taken from internal design libraries or be purchased from third-party vendors. Thanks to their re-usability and reconfigurability, IP cores allow the speed-up of the RTL design phase also for specific device design such as those in Internet of Things [21] and [22].

In this paper, the authors propose an IP generation tool for OFDM modulators. The IP generator has been developed in MATLAB/SIMULINK. It allows users to select the number of carrier number, the length of the cyclic prefix. In addition, users can provide fixed-point information as the number of the bit of the inputs and of the outputs. The IP generator is capable to generate both VHDL and Verilog. The paper is organized as follows: In Sect.2 the OFDM modulation is described. In Sect.3 the IP generator and the OFDM Hardware architectures that are implemented are described. In Sect.4 the experimental results in terms of area speed and power consumption are provided. Finally in Sect.5 conclusions are provided.

## 2. The OFDM modulation

Nowadays several communication applications require high data-rate transmission over mobile or wireless channels [23] [24]. In the case of single-carrier modulation, as in time-division multiple access (TDMA) in Global System for Mobile Communications (GSM) since the symbol duration reduces with the increase of the data rate, and spreading fading of the wireless channels will cause more severe intersymbol interference (ISI). In order to reduce the effect of ISI, it is necessary that the symbol duration is much larger than the delay spread of wireless channels. The Orthogonal Frequency-Division Multiplexing (OFDM) modulation divides the entire channel into many narrowband subchannels [25], [26]. These subchannels are transmitted in parallel in order to maintain high-datarate transmission and, at the same time, to increase the symbol duration. In this way, the ISI effects are drastically reduced [27]. In Figure 1 it is reported an example of the subchannel division of OFDM modulation and their corresponding signal transmitted of a single subcarrier. In Figure 1 the overall bandwidth is divided into N subcarriers, where each subcarrier has a reduced bit rate  $R_b = R_W/T$ , where T is the ODFM signal duration in the subcarrier and  $R_W$  is the bit rate of the total bandwidth [28].

The principle of operation is based on the orthogonality of the subcarriers, whose concept is reported in



**Figure 1:** OFDM signals: subcarrier signals (a); subcarriers (b).



Figure 2: OFDM transmitter block diagram.

(1):

$$\int_{t}^{t+T} \phi_n(t) \cdot \phi_m^*(t) dt = A \quad if \quad n = m$$
 (1)

and 0 otherwise (i.e. for  $n \neq m$ ). Wave-forms satisfying (1) are those reported in Figure 1(a). Thus, a generic OFDM signal can be written as:

$$s(t) = \sum_{N} a_n(t) e^{j2\pi f_n t} \cdot e^{j2\pi f_0 t}$$
(2)

where  $f_0$  is the Radio Frequency (RF) translation. When we sample the OFDM signal in (2) for t = kT, we obtain:

$$s(kT/N) = \sum_{N} a_n(t) e^{j2\pi \frac{nk}{TN}}$$
(3)

which is the Inverse Fourier Transform  $(FFT^{-1})$  of the transmitted symbols before the RF translation. In Figure 2 it is reported the classical scheme of the transmitter of an OFDM signal. In the demodulator at the receiver, we consider for example the frequency of subcarrier  $f_m = m/T$ , the received signal is, after removing



Figure 3: Example of cyclic Prefix in OFDM signal.

the RF component:

$$\int_{(k-1)T}^{kT} s(t)e^{-j2\pi f_m t} dt = \sum_N \int_0^T a_n(t)e^{j2\pi f_n t}e^{-j2\pi f_m t} dt =$$
$$= \sum_N \int_0^T a_n(t)e^{j2\pi \frac{n-m}{T}t} dt = a_m\left(\frac{k}{NT}\right)$$
(4)

Due to the subcarrier orthogonality, the receiver is able to extract the correspondent complex symbol transmitted on th *m*-th subcarrier.

In a wireless channel, multipath can affect the orthogonality due to delays and reflections can provide the receiver with replies. It is possible to have a guard time interval  $T_g$  in order to properly start the reception, thus having the integration between  $T_g$  and  $T+T_g$ . Unfortunately, this is not enough since different terminals can transmitting simultaneously. In order to avoid an inter-carrier interference, a cyclic prefix is added to the transmitted signal as described in Figure 3.

The cyclic prefix allows to have an integer number of times the oscillation of the basic waveform  $e^{j2\pi f_n t}$  despite the reply comes late. Even if it partially leaves the integration interval thanks to the cyclic prefix, the missing part of the reply go back in the interval of interest without affecting the receiver. The only difference is that now the duration of the OFDM symbols are  $T + T_g$ , while the integration occurs for a time of T seconds, thus reducing the collected energy of  $\sqrt{\frac{T}{T+T_g}}$ . Of course, the guard time should be selected properly, depending on the channel characteristics.  $T_g$  should be greater than the maximum delay introduced by the wireless channel (at least greater than the channel delay spread) but as minimum as possible due to the loss in the energy collection.

In Figure 4 it is reported the generic scheme of the



Figure 4: Principle scheme of the OFDM receiver.

OFDM receiver. After the quadrature mixer to extract the Real and imagery parts, the parallel-to-serial stage provides inputs to the FFT. Then, symbols are extracted and passed to the decoder.

## 3. OFDM modulator architecture

The proposed IP generator has been developed in MAT-LAB. Using a graphical interface users can customize and generate the VHDL or Verilog code the OFDM modulators. The IP is provided with 5 I/O ports divided into control and data ports. In the following, a detailed description of these port is provided giving information about the data size and the direction:

- **clock**: it is a one-bit input port used to provide the clock to the circuit.
- **reset**: it is a one-bit input port used for the global reset. The reset is asynchronous active high.
- **enable**: it s a one-bit input port used for the global enable.
- **ready**: it is a one-bit input port. If the ready is low, the IP ignores the input. This port must be 1 when input data are available.
- **done**: It s a one-bit output port. This port pass from zero to one when data are available at the output of the circuit
- **real-data**: it s an N-bit Input port (N is selected by the user) used to provide the real input samples
- **imag-data**: it s an N-bit Input port (N is selected by the user) used to provide the imaginary input samples
- **real-out**: it s an N-bit output port (N is selected by the user) used to provide the real output samples



**Figure 5:** 4QAM constellation implemented in the proposed IP CORE

• **imag-out**: it s an N-bit output-port (N is selected by the user) used to provide the imaginary output samples

In Fig.6 is shown the block diagram of the IP core. It is composed of four main blocks.

- A QAM Mapper
- A IFFT Core
- A Cyclic Prefix engine
- A dual Port RAM

The QAM Mapper maps the I/Q inputs on a QAM constellation. In this first version of the core generator, only the 4QAM modulation is available. However future releases will include also other QAM modulation schemes. Fig.5 show the 4QAM constellation implemented in the proposed IP core.

The IFFT core is the complex element of the IP CORE in terms of hardware complexity. It is composed of Nlog2 Processing Element (PE) where N is the number of OFDM carriers and consequently the number of IFFT bins. Each processing element consists of a dualport RAM used for the ordering of the samples, a ROM containing the IFFT twiddle factors, a complex multiplier, and an address generator. The address generator and the dual-port RAM order the input using the double buffering technique.

A detailed description of this block is provided in Fig.7

In order to reduce the number of multipliers, the complex multiplication has been implemented using



**Figure 6:** IP Core Block Diagram, the real and imaginary parts of the input and output signals have been fused to simplify the schematic

three multiplications instead of four as shown in [29]. In addition to these main blocks the system is provided with a Finite State Machine FSM for the generation of the control signals "ready" an "done".

In Fig.8 is shown the timing of the core. It supports the streaming mode providing output without any interruption. When the ready pass from zero to one the IP CORE can receive data that must be provided at any clock cycle. In order to simplify the diagram, we fuse the real and the imaginary part in a single signal. After a certain latency depending on the number of carriers, the IP provides results at the output port. When it occurs the valid signal pass from zero to one. Also for output signals, the real and the imaginary part are fuse in the diagram. When the ready signal pass from one to zero, the input data stream must be interrupted. The time interval in which this signal remains to zero is required for the cyclic prefix computation.

#### 4. Experimental Results

In this Section, experimental results are provided. We use the proposed IP generator to generate the VHDL code of 9 OFDM modulators. These modulators differ from each other in terms of the number of carriers and cyclic prefix. In order to verify the correct behavior of the circuit, we performed several test benches using the RTL simulator models. Simulations are performed providing at the input of the IP sinusoidal waves and chirp. Simulation results are compared with theoretical results obtained by a MATLAB model especially realized for this purpose.

The generated VHDL files are has been synthesized using the XILINX Vivado toolchain. Synthesis and Place and Route have been performed with a clock constraint of 200 MHz. Implementation results have shown in Tab. 1 We varied the number of carriers from 8 to 2048 (considering only power of two). The second column of the table shows the cyclic prefix adopted for



**Figure 7:** Processing Element (PE) Block Diagram, the real and imaginary parts of the input and output signals have been fused to simplify the schematic. In order to reduce the number of multipliers, the complex multiplication has been implemented using three multiplications instead of four as shown in [29].



**Figure 8:** IP Core Timing diagram. The real and imaginary parts of the input and output signals have been fused to simplify the schematic.The systems works on positive clock edges

any test case. Results are in terms of LUTs, LUTRAM, FF, BRAM, and DSP. In Fig.10 it is shown the dynamic power consumption required for the computation of the test cases. Power consumption nowadays represents a crucial aspect of digital circuits design, especially for embedded systems. Such systems are usually powered by batteries and for this reason, power consumption must be reduced as possible in order to extend the battery life. For this reason, circuits must be realized in order to minimize the area being the power consumption correlated on the circuit area [30],[31]. There are three power dissipation components in CMOS digital circuits:

- 1. Switching Power
- 2. Short-Circuit Power
- 3. Static Power.

Among these contributions, the switching power represents the most important because one and it is defined in Eq. 5 where a is the switching activity, C is the switching capacitance, f is the clock frequency and Vdd the supply voltage.

$$P = a \cdot c \cdot f \cdot V_{dd}^2 \tag{5}$$

The second contribution, is related to the short-circuit currents flowing through the MOS transistors. It is strongly dependent on switching activity, clock frequency, and supply voltage, but it also depends on the design (for example the transistor ratios and the node waveforms). The third component, the static power, depends on the leakage currents and it is related to the circuit design, the technology, and the supply voltage. The first two power contributions are usually considered together under the name of Dynamic Power. Because our experiments are performed on FPGAs, we did not consider the static power dissipation but also the dynamic one. Static power consumption on FPGA



Figure 9: IP Core mapped on target FPGA. Figure refers to a 2048 OFDM modulator compatible with 5G standard

#### Table 1

Implementation results on a XILINX xc7z030 device with a 200 MHz clock constraint

| CN             | СР  | LUT  | L.RAM | FF   | BRAM | DSP |
|----------------|-----|------|-------|------|------|-----|
| 8              | 2   | 730  | 152   | 1494 | 64   | 6   |
| 16             | 4   | 1017 | 203   | 1941 | 64   | 9   |
| 32             | 8   | 1462 | 293   | 2626 | 64   | 12  |
| 64             | 16  | 1998 | 328   | 3244 | 64   | 15  |
| 128            | 16  | 2437 | 495   | 3950 | 65   | 18  |
| 256            | 32  | 2904 | 551   | 4600 | 66   | 21  |
| 512            | 32  | 3504 | 736   | 5512 | 68   | 24  |
| 1024           | 128 | 4301 | 1048  | 6563 | 70   | 27  |
| 2048 <b>5G</b> | 256 | 5459 | 1792  | 8039 | 73   | 30  |

is always negligible if the FPGA is almost full. in terms of hardware resource usage. This is always true being the size of the FPGA selected considering the target project.

Finally Fig.9 show the Implemented circuit layout for the 2048 case. Results show that the hardware resources required for the IP implementation are very reduced, the power consumption increases with the area following perfectly the theory. The choice to implement complex product using only three multipliers reduces the number of DSP involved in the implementation.

#### 5. Conclusions

In this paper, we presented an OFDM modulator IP generator suitable in all communication standards requiring a power of two FFT based OFDM.

The proposed tool allows RTL designers to design flexible OFDM modulators offering the possibility to customize the number of carriers, the cyclic prefix, and fixed-point. The IP has been characterized in terms of area, speed, and power consumption on a XILINX xc7z030 FPGA. Results show a very efficient implementation requiring a reduced number of hardware resources. In the future, additional characterizations will be performed, in particular we will synthesize the VHDL code generated by the proposed IP Generator on ASIC. The synthesis will be performed using Syn-



**Figure 10:** Power consumption characterization of the proposed IP core in term of Dynamic Power. The Dynamic power consumption increases with the number of carriers since the increasing of the required hardware resources.

opsis. In addition, we will introduce other modulation schemes for the OFDM carriers. In order to further improve the performance of the future releases of the IP generator, we are considering the hypothesis to implement the IFFT architecture presented in [32]. This solution will allow reducing the hardware resources in particular the number of multipliers. This hardware simplification will introduce also a power consumption reduction.

#### References

- G. Capizzi, S. Coco, G. L. Sciuto, C. Napoli, A new iterative fir filter design approach using a gaussian approximation, IEEE Signal Processing Letters 25 (2018) 1615–1619.
- [2] M. Wózniak, D. Połap, R. K. Nowicki, C. Napoli, G. Pappalardo, E. Tramontana, Novel approach toward medical signals classifier, in: 2015 International Joint Conference on Neural Networks (IJCNN), IEEE, 2015, pp. 1–7.
- [3] G. Capizzi, C. Napoli, L. Paternò, An innovative hybrid neuro-wavelet method for reconstruction of missing data in astronomical photometric surveys, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 7267 LNAI (2012) 21–29.
- [4] S. Han, J. Kang, H. Mao, Y. Hu, X. Li, Y. Li, D. Xie, H. Luo, S. Yao, Y. Wang, et al., Ese: Efficient speech recognition engine with sparse lstm on fpga, in: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017, pp. 75–84.
- [5] D. Anguita, A. Boni, S. Ridella, A digital architecture for support vector machines: theory, al-

gorithm, and fpga implementation, IEEE Transactions on neural networks 14 (2003) 993–1009.

- [6] L. Quitadamo, M. Abbafati, G. Cardarilli, D. Mattia, F. Cincotti, F. Babiloni, M. Marciani, L. Bianchi, Evaluation of the performances of different p300 based brain-computer interfaces by means of the efficiency metric, Journal of neuroscience methods 203 (2012) 361–368.
- [7] L. R. Quitadamo, M. G. Marciani, G. C. Cardarilli, L. Bianchi, Describing different brain computer interface systems through a unique model: a uml implementation, Neuroinformatics 6 (2008) 81– 96.
- [8] G. C. Cardarilli, L. Di Nunzio, R. Fazzolari, M. Re, F. Silvestri, Improvement of the cardiac oscillator based model for the simulation of bundle branch blocks, Human Health Engineering (2020) 165.
- [9] C. Napoli, G. Pappalardo, E. Tramontana, An agent-driven semantical identifier using radial basis neural networks and reinforcement learning, in: Proceedings of the XV Workshop "Dagli Oggetti agli Agenti", volume 1260, CEUR-WS, 2014. URL: http://ceur-ws.org/Vol-1260/.
- [10] D. Połap, M. Woźniak, C. Napoli, E. Tramontana, Real-time cloud-based game management system via cuckoo search algorithm, International Journal of Electronics and Telecommunications 61 (2015) 333–338.
- [11] F. Beritelli, A. Gallotta, C. Rametta, A dual streaming approach for speech quality enhancement of voip service over 3g networks, in: 2013 18th International Conference on Digital Signal Processing (DSP), IEEE, 2013, pp. 1–5.
- [12] F. Beritelli, A. Spadaccini, The role of voice activity detection in forensic speaker verification, in: 2011 17th International Conference on Digi-

tal Signal Processing (DSP), IEEE, 2011, pp. 1-6.

- [13] F. Beritelli, A. Spadaccini, A statistical approach to biometric identity verification based on heart sounds, in: 2010 Fourth International Conference on Emerging Security Information, Systems and Technologies, IEEE, 2010, pp. 93–96.
- [14] G. Iazeolla, A. Pieroni, A. D'Ambrogio, D. Gianni, A distributed approach to wireless system simulation, in: 2010 Sixth Advanced International Conference on Telecommunications, IEEE, 2010, pp. 252–262.
- [15] G. Iazeolla, A. Pieroni, Power management of server farms, in: Applied Mechanics and Materials, volume 492, Trans Tech Publ, 2014, pp. 453– 459.
- [16] P. N. Whatmough, M. R. Perrett, S. Isam, I. Darwazeh, Vlsi architecture for a reconfigurable spectrally efficient fdm baseband transmitter, IEEE Transactions on Circuits and Systems I: Regular Papers 59 (2012) 1107–1118.
- [17] K. Elango, K. Muniandi, Vlsi implementation of an area and energy efficient fft/ifft core for mimoofdm applications, Annals of Telecommunications (2019) 1–13.
- [18] G. C. Cardarilli, L. Di Nunzio, R. Fazzolari, D. Giardino, M. Matta, M. Re, L. Iess, F. Cialfi, G. De Angelis, D. Gelfusa, et al., Hardware prototyping and validation of a w- $\delta$ dor digital signal processor, Applied Sciences 9 (2019) 2909.
- [19] F. Mazzenga, R. Giuliano, F. Vatalaro, Effective strategies for gradual copper-to-fiber transition in access networks, Computer Networks (2020) 107225.
- [20] S. Mukherjee, M. De Sanctis, T. Rossi, E. Cianca, M. Ruggieri, R. Prasad, Mode switching algorithms for dvb-s2 links in w band, in: 2010 IEEE Global Telecommunications Conference GLOBE-COM 2010, IEEE, 2010, pp. 1–5.
- [21] S. Singh, N. Singh, Internet of things (iot): Security challenges, business opportunities & reference architecture for e-commerce, in: 2015 International Conference on Green Computing and Internet of Things (ICGCIoT), IEEE, 2015, pp. 1577–1581.
- [22] R. Giuliano, F. Mazzenga, A. Neri, A. M. Vegni, Security access protocols in iot networks with heterogenous non-ip terminals, in: 2014 IEEE International Conference on Distributed Computing in Sensor Systems, IEEE, 2014, pp. 257–262.
- [23] M. Jamal, B. Horia, K. Maria, I. Alexandru, Study of multiple access schemes in 3gpp lte ofdma vs. sc-fdma, in: 2011 International Conference on Applied Electronics, IEEE, 2011, pp. 1–4.

- [24] F. Mazzenga, R. Giuliano, F. Vatalaro, Fttc-based fronthaul for 5g dense/ultra-dense access network: Performance and costs in realistic scenarios, Future Internet 9 (2017) 71.
- [25] I. Pasya, T. Kobayashi, A. Khalid, N. A. Wahab, A. Rashid, Z. Awang, et al., Target localization in mimo ofdm radars adopting adaptive power allocation among selected sub-carriers, International Journal on Advanced Science, Engineering and Information Technology 7 (????) 291–298.
- [26] B. Prasetya, A. Kurniawan, A. Fahmi, Joint power loading and phase shifting on signal constellation for transmit power saving on ofdm/ofdma systems, Int. J. Adv. Sci. Eng. Inf. Technol. 8 (2018) 2039–2045.
- [27] T. Hwang, C. Yang, G. Wu, S. Li, G. Y. Li, Ofdm and its wireless applications: A survey, IEEE transactions on Vehicular Technology 58 (2008) 1673–1694.
- [28] T. Hwang, C. Yang, G. Wu, S. Li, G. Y. Li, Ofdm and its wireless applications: A survey, IEEE transactions on Vehicular Technology 58 (2008) 1673–1694.
- [29] D. E. Knuth, Art of computer programming, volume 2: Seminumerical algorithms, Addison-Wesley Professional, 2014.
- [30] S. Spanò, G. C. Cardarilli, L. Di Nunzio, R. Fazzolari, D. Giardino, M. Matta, A. Nannarelli, M. Re, An efficient hardware implementation of reinforcement learning: The q-learning algorithm, Ieee Access 7 (2019) 186340–186351.
- [31] F. Silvestri, S. Acciarito, G. C. Cardarilli, G. M. Khanal, L. Di Nunzio, R. Fazzolari, M. Re, Fpga implementation of a low-power qrs extractor, in: International Conference on Applications in Electronics Pervading Industry, Environment and Society, Springer, 2017, pp. 9–15.
- [32] Y. S. Algnabi, F. A. Aldaamee, R. Teymourzadeh, M. Othman, M. S. Islam, Novel architecture of pipeline radix 2 2 sdf fft based on digitslicing technique, in: 2012 10th IEEE International Conference on Semiconductor Electronics (ICSE), IEEE, 2012, pp. 470–474.