In addition to the high reliability design work I mentioned last week, I also spend a significant amount of time designing imaging processing applications for systems such as automotive, astronomy, aerospace and defense, etc.
Heterogeneous SoCs like the Zynq and Zynq MPSoC are ideal for image processing as they allow the implementation of the image processing pipeline in the PL. While the PS can implement the higher levels of the algorithm, decision making, and communication.
Of course compression allows for more efficient transfer and storage, though it makes the implementation of the image processing system more complex and costly.
Implementing a Codec within programmable logic requires additional costs for IP cores or development time depending upon the make/buy decision, as well as considerable logic resources in the target device.
Just what we need for our embedded vision applications!
This hard IP core enables us to get up and running with image streams which require the use of a Codec much faster.
As an added bonus, all of the devices in the EV range also come with UltraRAM giving us between 13.5 and 27 Mb of RAM in addition to block and distributed RAM. On-chip RAM is always very important in image processing systems, allowing the storage of video lines and segments in the processing algorithm.
The VCU is both very powerful and flexible, as it lets us simultaneously encode and decode streams with a 4K Ultra High Definition 60 Hz resolution. Alternatively, we can be split this bandwidth up and work with between one and eight smaller streams — for example, eight 1080p 30 Hz resolution streams.
In operation, the VCU is controlled via software running on the APU. This enables the decoder and encoder to be configured as desired on the fly.
Examining the VCU block diagram you will notice both the encoder and decoder contain MCUs. While we cannot update the firmware of the MCUs, they do need to communicate with the software application running on the APU. This communication is achieved using a dedicated AXI interface, which is also supported in the software stack.
As the VCU is SW configured, any configurations performed in Vivado VCU configuration dialog are only used for power and bandwidth estimation along with encode buffer sizing (if enabled).
The VCU is designed to work with images stored within DDR memory. This can be stored within the PS DDR, PL DDR, or a combination of both. To enable this, the IP modules provides the following interfaces:
- M_AXI_ENC0 & M_AXI_ENC1 — These are AXI4 Memory Mapped interfaces used for stream encoding.
- M_AXI_DEC0 & M_AXI_DEC1 — These are AXI4 Memory Mapped interfaced used for stream decoding.
- S_AXI_LITE — This interface is used by the APU to configure the VCU.
- M_AXI_MCU — This interface is used to communicate between the VCU MCUs and the APU.
As we start working with the VCU and larger stream resolutions, we need to be careful of the bandwidth required on the DDR and AXI interfaces. Many implementations of the VCU will use both the PS DDR and DDR connected to the PL.
Including the VCU in our design is very simple, we can enable it via the add IP button on the block diagram.
Once we have added the VCU and the Zynq MPSoC in to the block diagram, we can create a Vivado solution pretty quickly by running block automation.
This will implement a solution that uses the PS DDR Memory map. If we want to create a more flexible solution, then we can add in a PL DDR using a Memory Interface Generator and connect this into the memory architecture using a smart interconnect in the PL.
The block automation may take a few minutes. Once completed, you will see the block diagram shows a completed VCU solution which can be implemented.
When we implement the above block diagram, we obtain as would be expected a small logic footprint in the PL as the VCU is a hard IP. This provides the maximum amount of PL resources for implementation of our image processing pipeline.
If we are concerned about the AXI and DDR bandwidth due to the stream resolution, we can even implement a PL DDR interface (if we have DDR connected to the PL) and map this into the VCU address space.
Such a PL and PS DDR implementation follows the use case example architecture recommended in the VCU product guide.
When I implemented this architecture, the block diagram looked as below with both the PL DDR and the VCU connected to smart interconnect. This smart interconnect was then connected to the PS Slave interfaces on the Zynq MPSoC.
As would be expected, when the design above is implemented the resource utilization is higher. This increase in resource is due to the implementation of the MIG DDR4 controller and smart interconnect in the PL
The software stack for to control the VCU is contained within PetaLinux. We will look at how we can configure and use this in another blog soon.