Adapt functionality of DMA to transfer 64 bits #981
Replies: 1 comment
-
That's correct. But please keep in mind that "IMEM" and "DMEM" are just logical constructs, the underlying address space is 4GB (minus some kB of fixed IO space).
That's right. The entire processor is 32-bit.
So you want to transfer data from the processor's internal DMEM to some processor-external module in chunks of 64-bit, right? Unfortunately, this is not possible without changing the code base as all processor-internal busses are 32-bit wide. But you could use a processor-external DMEM that is 64-bit wide together with an AXI DMA and your IP block. A central AXI interconnect could take care of converting the core accesses (32-bit <=> 64-bit) and also to provide parallels access (by the DMA) to the external DMAM and your accelerator.
The AXI DMA could also be implemented as CFS (using the provided conduits for implementing the two 64-bit AXI host ports). |
Beta Was this translation helpful? Give feedback.
-
I have an accelerator IP block that hooks up via AXI (so through the XBUS) and has ports for 64 bit DMA transfers of input data. Originally it was designed to get its data from DDR3 (bottom right on picture) through an AXI CDMA block (on the left on picture) and sent by an AXI BRAM Control block (top right on picture).
To simplify the design and run it on boards that do not have DRAM I am considering replacing the DDR3 by the internal DMEM address range of NEO. It is designed for 1 GB of DDR3 and if I am not mistaken the DMEM of NEO can go to 1.5 GB. The other problem to solve would be the datapath that needs to be 64 bits
I have been looking through the DMA, CFS, and XBUS code and my conclusion is that it does not seem possible to have the DMA get 64 bit in one go since all the busses of NEO are limited to 32 bits. I see two possible alternatives: 1) Duplicate the DMA in the CFS and buffer two 32-bit words in a 64 bit register to transmit through the IO conduits that I set to 64 bits. 2) Keep the DMA the same and add the buffer logic to the accelerator instead.
In both cases the data transfer would take twice as long though compared to the original design which could do 64 bits at a time. Is there another more logical option I am overlooking?
Beta Was this translation helpful? Give feedback.
All reactions