Advanced FPGA Design: Architecture, Implementation, and Optimization
Format: PDF / Kindle (mobi) / ePub
This book provides the advanced issues of FPGA design as the underlying theme of the work. In practice, an engineer typically needs to be mentored for several years before these principles are appropriately utilized. The topics that will be discussed in this book are essential to designing FPGA's beyond moderate complexity. The goal of the book is to present practical design techniques that are otherwise only available through mentorship and real-world experience.
descriptions and convert them directly to FPGA implementations. This is truly an abstract approach that simpliﬁes the design process and allows the designer to focus on the top level of abstraction. One tool that does a particularly good job of this is Synplify DSP from Synplicity. The Synplify DSP tool runs as an application within MATLAB, allowing a close coupling between MATLAB constructs and modeling capabilities with Synplicity’s DSP to RTL synthesis. The system represented in Figure 5.2 is
the amount of space required in an FPGA implementation is relatively large. In many DSP applications, there are a certain number of clocks per sample (i.e., the system clock frequency is greater than the sampling frequency), which means that a more compact architecture may be used to reuse the same DSP hardware for the required MAC operations. With an abstract design tool such as Synplify DSP, this architectural modiﬁcation can be made as an implementation option during MATLAB to RTL synthesis.
throughput. The penalty to pay for unrolling loops such as this is an increase in area. The iterative implementation required a single register and multiplier (along with some control logic not shown in the diagram), whereas the pipelined implementation required a separate register for both X and XPower and a separate multiplier for every pipeline stage. Optimizations for area are discussed in the Chapter 2. The penalty for unrolling an iterative loop is a proportional increase in area. 1.2
penalty for unrolling an iterative loop is a proportional increase in area. A low-latency architecture is one that minimizes the delay from the input of a module to the output. Latency can be reduced by removing pipeline registers. The penalty for removing pipeline registers is an increase in combinatorial delay between registers. Timing refers to the clock speed of a design. A design meets timing when the maximum delay between any two sequential elements is smaller than the minimum clock period.
multiplier. The state machine operates on every combination of coefﬁcients and samples: coeffA*X, coeffB*X, and coeffC*X. The reason this implementation required a state machine is because there was no natural ﬂow to the recursive data as there was with the shift and add multiplier Figure 2.2 FIR with one MAC. 2.3 Resource Sharing 23 example. In this case, we had arbitrary registers that represented the inputs required to create a set of products. The most efﬁcient way to sequence