Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RTIO collision with wide interface and DMA #2088

Closed
SquidDev opened this issue May 18, 2023 · 2 comments · Fixed by #2090
Closed

RTIO collision with wide interface and DMA #2088

SquidDev opened this issue May 18, 2023 · 2 comments · Fixed by #2090

Comments

@SquidDev
Copy link
Contributor

SquidDev commented May 18, 2023

Bug Report

One-Line Summary

When using the wide interface and DMA, RTIO collisions

Issue Details

Steps to Reproduce

  1. Set up a Kasli with a Fastino, configured with log2_width=5.
  2. Record a DMA sequence which calls set_group_mu on that fastino. I've included an example experiment below.
experiment.py
from __future__ import annotations

import artiq.experiment as aq
from artiq.coredevice.core import Core
from artiq.coredevice.dma import CoreDMA
from artiq.coredevice.fastino import Fastino
from artiq.language import delay_mu, kernel


class SimpleWideDMA(aq.EnvExperiment):
    def build(self):
        self.trace_name = "test_rtio"
        self.core: Core = self.get_device("core")
        self.core_dma: CoreDMA = self.get_device("core_dma")
        self.fastino: Fastino = self.get_device("fastino0")

    @kernel
    def record(self):
        with self.core_dma.record(self.trace_name):
            for _ in range(16):
                self.fastino.set_group_mu(0, [0] * 16)
                delay_mu(1000)

    @kernel
    def playback(self):
        handle = self.core_dma.get_handle(self.trace_name)
        self.core.break_realtime()
        self.core_dma.playback_handle(handle)

    @kernel
    def run(self):
        self.core.reset()
        self.record()
        self.playback()

Expected Behavior

The experiment runs without issues, and produces the correct RTIO events.

Actual (undesired) Behavior

The ARTIQ console reports "artiq.coredevice.comm_kernel:collision(s) reported during kernel execution".

Kasli logs:

[   444.429287s]  INFO(runtime::kern_hwreq): resetting RTIO
[   444.441013s] ERROR(runtime::rtio_mgt): RTIO sequence error involving channel 0x0000:unknown
[   444.448255s] ERROR(runtime::rtio_mgt): RTIO collision involving channel 0x0000:unknown
[   446.027836s]  INFO(runtime::session): no connection, starting idle kernel
[   446.033522s]  INFO(runtime::session): no idle kernel found
[   447.964546s]  INFO(runtime::analyzer): connection from 10.14.2.5:33518

Analyser dump:

OutputMessage(channel=0, timestamp=446230985696, rtio_counter=446230866224, address=0, data=0)
OutputMessage(channel=0, timestamp=446230986696, rtio_counter=446230866704, address=0, data=0)
OutputMessage(channel=0, timestamp=446230986696, rtio_counter=446230867088, address=0, data=0)
OutputMessage(channel=0, timestamp=446230986696, rtio_counter=446230867472, address=0, data=0)
OutputMessage(channel=0, timestamp=446230986696, rtio_counter=446230867856, address=0, data=0)
OutputMessage(channel=0, timestamp=446230986696, rtio_counter=446230868208, address=0, data=0)
OutputMessage(channel=0, timestamp=446230991696, rtio_counter=446230868592, address=0, data=0)
OutputMessage(channel=0, timestamp=446230992720, rtio_counter=446230868976, address=0, data=0)
OutputMessage(channel=0, timestamp=446230992720, rtio_counter=446230869360, address=0, data=0)
OutputMessage(channel=0, timestamp=446230992720, rtio_counter=446230869744, address=0, data=0)
OutputMessage(channel=0, timestamp=446230992720, rtio_counter=446230870064, address=0, data=0)
OutputMessage(channel=0, timestamp=446230996696, rtio_counter=446230870816, address=0, data=0)
OutputMessage(channel=0, timestamp=446230997696, rtio_counter=446230871200, address=0, data=0)
OutputMessage(channel=0, timestamp=446230997696, rtio_counter=446230871584, address=0, data=0)
OutputMessage(channel=0, timestamp=446230997696, rtio_counter=446230871968, address=0, data=0)
OutputMessage(channel=0, timestamp=446230997696, rtio_counter=446230872288, address=0, data=0)

Note these events are not distributed 1us apart. In fact, the 2nd to 4th all have the same timestamp!

Your System (omit irrelevant parts)

  • Operating System: Linux
  • ARTIQ version: 271df5979eb1ea470db9dc1d05d0a28501285f5e
  • Hardware involved: Standalone Kasli 2.0 built from the following core device JSON. Have also reproduced on Kasli-SoC.
{
    "target": "kasli",
    "variant": "standalone",
    "hw_rev": "v2.0",
    "base": "standalone",
    "sed_lanes": 4,
    "peripherals": [
        {
            "type": "fastino",
            "ports": [10],
            "log2_width": 5
        }
    ]
}
@pathfinder49
Copy link
Contributor

This seems like it may be related to #1521

@SquidDev
Copy link
Contributor Author

Some additional comments here:

  • The for _ in range(16): loop is not actually needed here. If we leave it off, we can see that the core analyser contains 18 additional RTIO events! Which is, assumedly, the cause of the collision.

    $ artiq_coreanalyzer -p
    Log channel: 3
    DDS one-hot: True
    OutputMessage(channel=0, timestamp=313901001699096, rtio_counter=313901001581040, address=0, data=0)
    OutputMessage(channel=0, timestamp=313901001699096, rtio_counter=313901001581456, address=0, data=0)
    OutputMessage(channel=0, timestamp=313901001699096, rtio_counter=313901001581840, address=0, data=0)
    OutputMessage(channel=0, timestamp=313901001699096, rtio_counter=313901001582224, address=0, data=0)
    OutputMessage(channel=0, timestamp=313901001699096, rtio_counter=313901001582608, address=0, data=0)
    OutputMessage(channel=0, timestamp=313901001699096, rtio_counter=313901001582984, address=0, data=3009196916736000)
    OutputMessage(channel=1579776, timestamp=14341242013095143969, rtio_counter=313901001583368, address=226, data=11770016741475524663)
    OutputMessage(channel=1579776, timestamp=9489295723261001249, rtio_counter=313901001583752, address=24, data=14036967028102490777)
    OutputMessage(channel=1579776, timestamp=7134015712030031393, rtio_counter=313901001584136, address=201, data=18303204151877074773)
    OutputMessage(channel=1579776, timestamp=13258911205253905953, rtio_counter=313901001584856, address=67, data=70474208395288)
    OutputMessage(channel=1579776, timestamp=13258911205253905953, rtio_counter=313901001585208, address=67, data=1767557575913400344)
    OutputMessage(channel=0, timestamp=737208381365272, rtio_counter=313901001585592, address=0, data=4151041602956593440)
    OutputMessage(channel=0, timestamp=314212985120096, rtio_counter=313901001586008, address=76, data=4035226339315578837)
    OutputMessage(channel=0, timestamp=11723548663943570784, rtio_counter=313901001586392, address=175, data=5990941903206029443)
    OutputMessage(channel=0, timestamp=397358921459040, rtio_counter=313901001586776, address=0, data=18042744063934852234)
    OutputMessage(channel=0, timestamp=397358921459040, rtio_counter=313901001587096, address=0, data=4618597721393611704)
    OutputMessage(channel=4200517, timestamp=313903093458054, rtio_counter=313901001587480, address=0, data=639139369029992448)
    OutputMessage(channel=11589922, timestamp=3351978113896771362, rtio_counter=313901001587800, address=37, data=0)
    OutputMessage(channel=3048320, timestamp=12743559349443742269, rtio_counter=313901001587864, address=10, data=0)
    StoppedMessage(rtio_counter=313902814282528)
    
  • This only occurs when creating a "full" DMA frame. Sending less than 512 bytes (for instance, with (admittedly absurd) self.fastino.set_group_mu(0, [0] * 15)) only produces one RTIO event and no collisions.

It looks like there's an off-by-one error in the RawSlicer - will submit a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants