... | ... | @@ -33,20 +33,20 @@ accesses, we have used a customized memory control unit. |
|
|
This fine grain pipeline strategy adopted for the core increases the
|
|
|
maximum clock frequency by reducing the largest path delay between the
|
|
|
flip-flops. This design strategy is radically different than the more
|
|
|
common multi-core processing
|
|
|
adopted in common parallel architectures such as DSPs, GPUS, and
|
|
|
multi-core general purpose processor, where different tasks are assigned
|
|
|
to each core in order to exploit the parallelism. Our approach for the
|
|
|
customized FPGA architecture has been demonstrated to be efficient for
|
|
|
data streams and image processing. The only drawback of such large
|
|
|
pipelines is the generation of a higher latency (constant delay of the
|
|
|
first pixel on output). However, in our case, we get a small latency of
|
|
|
5 image lines (order of microseconds) that is negligible if compared to
|
|
|
the entire image (order of milliseconds). This latency is introduced
|
|
|
only once at the beginning of the processing and the delay is maintained
|
|
|
constant through the continuous data streaming, not affecting the
|
|
|
real-time performance. This means that once the pipeline is filled, the
|
|
|
circuit is able to process one pixel per clock cycle.
|
|
|
common multi-core processing adopted in common parallel architectures
|
|
|
such as DSPs, GPUS, and multi-core general purpose processor, where
|
|
|
different tasks are assigned to each core in order to exploit the
|
|
|
parallelism. Our approach for the customized FPGA architecture has been
|
|
|
demonstrated to be efficient for data streams and image processing. The
|
|
|
only drawback of such large pipelines is the generation of a higher
|
|
|
latency (constant delay of the first pixel on output). However, in our
|
|
|
case, we get a small latency of 5 image lines (order of microseconds)
|
|
|
that is negligible if compared to the entire image (order of
|
|
|
milliseconds). This latency is introduced only once at the beginning of
|
|
|
the processing and the delay is maintained constant through the
|
|
|
continuous data streaming, not affecting the real-time performance. This
|
|
|
means that once the pipeline is filled, the circuit is able to process
|
|
|
one pixel per clock cycle.
|
|
|
|
|
|
More details of the different modules and the strategy can be found in
|
|
|
the respective subprojects, the papers in the documentation, and the
|
... | ... | |