... | ... | @@ -12,6 +12,36 @@ processing, bio-metrics, or more specifically particle tracking, |
|
|
analysis of fluid dynamics, 3D scene reconstruction, or object
|
|
|
recognition.
|
|
|
|
|
|
![](/uploads/16bfc6659c6cf6be1aeaea989c9119d8/whole_system.png)
|
|
|
|
|
|
The general rule for the implementation of the different modules is
|
|
|
based on a modular design of a fine-grain pipelined and superscalar
|
|
|
datapath to reach high performance at low working clock frequencies. The
|
|
|
final goal is to achieve a data-throughput of one data per clock cycle.
|
|
|
The design strategy requires the accurate definition of very deep
|
|
|
pipelined datapaths with concurrent accesses to external memory banks.
|
|
|
In order to facilitate the management of concurrent external memory
|
|
|
accesses, we have used a customized memory control unit.
|
|
|
|
|
|
This fine grain pipeline strategy adopted for the core increases the
|
|
|
maximum clock frequency by reducing the largest path delay between the
|
|
|
flip-flops. This design strategy is radically different than the more
|
|
|
common multi-core processing
|
|
|
adopted in common parallel architectures such as DSPs, GPUS, and
|
|
|
multi-core general purpose processor, where different tasks are assigned
|
|
|
to each core in order to exploit the parallelism. Our approach for the
|
|
|
customized FPGA architecture has been demonstrated to be efficient for
|
|
|
data streams and image processing. The only drawback of such large
|
|
|
pipelines is the generation of a higher latency (constant delay of the
|
|
|
first pixel on output). However, in our case, we get a small latency of
|
|
|
5 image lines (order of microseconds) that is negligible if compared to
|
|
|
the entire image (order of milliseconds). This latency is introduced
|
|
|
only once at the beginning of the processing and the delay is maintained
|
|
|
constant through the continuous data streaming, not affecting the
|
|
|
real-time performance. This means that once the
|
|
|
pipeline is filled, the circuit is able to process one pixel per clock
|
|
|
cycle.
|
|
|
|
|
|
|
|
|
|
|
|
### Files
|
... | ... | |