Skip to content

Conversation

tannewt
Copy link
Member

@tannewt tannewt commented Sep 15, 2025

Hopefully fix #10562

@samblenny
Copy link

This seems to work well. Tested on Fruit Jam rev D.

@RetiredWizard
Copy link

I was testing this on my modified tinyUSB build and ran into a memory error when trying to run Fruitris so I tried on the stock Fruit Jam and got the same error:

/apps/Fruitris/code.py output:
Traceback (most recent call last):
  File "/apps/Fruitris/code.py", line 37, in <module>
  File "adafruit_fruitjam/peripherals.py", line 113, in request_display_config
MemoryError: memory allocation failed, allocating 307200 bytes

@samblenny
Copy link

samblenny commented Sep 16, 2025

...ran into a memory error when trying to run Fruitris...

Thinking about it, that makes sense. My sample code doesn't use any large buffers, and it calls release_displays(). For RAM intensive code, if enough buffers get allocated to fill up internal SRAM, or to significantly fragment it, then all the various DMA-fix dynamic allocations could end up fighting for scarce or non-existent SRAM heap space.

Perhaps, instead of dynamically allocating SRAM buffers, shared-module/usb/core/Device.c could have a small shared static global SRAM buffer in the style of _xfer_result and _actual_len. For anything I've tried to do, 64 bytes would be enough. Not sure if there are situations where more would be needed.

@samblenny
Copy link

cc @relic-se
This one might be of interest to you.

@samblenny
Copy link

@tannewt Now that I've seen how you approached this PR, I think I could probably get a static buffer version of this working to test whether it would have fewer problems with SRAM heap allocation. No promises on results, but I'm gonna take a look at it.

@samblenny
Copy link

samblenny commented Sep 17, 2025

@RetiredWizard I wasn't able to reproduce your error with Fruitris. Were you doing anything special?

My setup:

  • Fruit Jam rev D board
  • CircuitPython UF2 file from this PR's CI build
  • Fruit Jam Fruitris 1.0.0-alpha.5 project bundle zip file
  • Empty CIRCUITPY/settings.toml (no custom display size stuff, etc)
  • EVGA XR1 lite HDMI capture card
  • Adafruit generic SNES gamepad or 8BitDo SN30 Pro gamepad (both work fine for me)

I did have to make a couple attempts to get all the project bundle files copied to CIRCUITPY. Initially, I tried cp -r ... in the macOS terminal. That wasn't happy, so I switched to rsync -vrptc ... which worked much better. After I got all the files copied, it just worked:

Fruitris-game-over

@RetiredWizard
Copy link

RetiredWizard commented Sep 17, 2025

Apparently having a launcher.conf.json file with:

{
    "use_mouse": "True",
}

is the difference, however it didn't look like the keyboard plugged into the usb port of the fruit jam was working properly even when the program loaded with no launcher.conf.json file. The arrow keys seemed to work but the P to pause, Q to quit and Ctrl-C were all ignored. the keyboard seems to happen with the absolute newest firmware also so that's not related to this PR

@RetiredWizard
Copy link

Oh, one other possible difference is that I'm using the packaged version of Fruitris from https://github.com/RetiredWizard/Fruit-Jam-OS_MyApps. Other than some minor audio changes from a pending PR I don't think there are any differences from the version from @relic-se respository

@samblenny
Copy link

oh.. so to trigger this, it's not just a matter of using Fruitris, but I would need to install Fruit Jam OS?

@samblenny
Copy link

Just made a PR with a static buffer version of the code from this PR if anybody wants to compare:

@relic-se
Copy link

Apparently having a launcher.conf.json file with:

{
    "use_mouse": "True",
}

Fruitris doesn't utilize that property at the moment, but my guess is that the initialization of that device within Fruit Jam OS before loading the application might have been the cause.

is the difference, however it didn't look like the keyboard plugged into the usb port of the fruit jam was working properly even when the program loaded with no launcher.conf.json file. The arrow keys seemed to work but the P to pause, Q to quit and Ctrl-C were all ignored. the keyboard seems to happen with the absolute newest firmware also so that's not related to this PR

Thanks for your PR to fix this one! (relic-se/Fruit_Jam_Fruitris#9) I don't believe this should be a problem any longer.

@tannewt
Copy link
Member Author

tannewt commented Sep 17, 2025

I'm not sure I want to go so far as a static allocation. Silently failing if a request is longer than the buffer isn't good and allocating more than you need seems subpar too. One thing we could do is share the temporary buffer amongst all devices. Right now it is per-device.

Allocating the large framebuffers late after startup will also be risky. It's much better to do the largest allocation first and keep it.

@samblenny
Copy link

samblenny commented Sep 17, 2025

...Silently failing if a request is longer than the buffer isn't good and allocating more than you need seems subpar too...

Some things to weigh here as you consider how to move forward with this:

  1. What counts as "failure"? The TinyUSB transfer calls are designed to return their actual length of transferred data which may be less than the requested length. This is a pretty normal way for C IO functions to behave. The PyUSB API is also designed to be consistent with this by returning the actual number of transferred bytes. As long as the buffer is big enough, getting a shorter than requested actual length should be a rare thing.
  2. The code which does dynamic allocation/reallocation will get called in a relatively hot loop. Many USB device objects get created and destroyed while usb.core.find() is in use (technically, I mean find() calls common_hal_usb_core_device_construct() up to 256 times during each invocation). I would expect this could easily lead to fragmentation of the SRAM heap.
  3. As you mentioned, potential heap fragmentation could be mitigated by sharing the buffer between devices. If the initial allocation happened at import time (of usb.core), that might work pretty well, so long as the initial buffer size was over-provisioned a bit (I'd recommend at least 64 bytes).

@tannewt
Copy link
Member Author

tannewt commented Sep 17, 2025

  1. As long as the buffer is big enough, getting a shorter than requested actual length should be a rare thing.

This won't be under the program's control if we silently allocate a shorter buffer internally. I agree that the argument should be checked but it should be a sign of the device's response length, not an intermediate limitation.

2. I would expect this could easily lead to fragmentation of the SRAM heap.

My code doesn't allocate until a transfer with a PSRAM buffer is done. find() wouldn't call these.

I'm not convinced that customizing this code is the right way to handle the fact that allocating a large framebuffer can fail sometimes. Do we know what is causing the buffer allocation in Fruitris and why it occurs before the new framebuffer is allocated?

@samblenny
Copy link

samblenny commented Sep 17, 2025

Do we know what is causing the buffer allocation in Fruitris and why it occurs before the new framebuffer is allocated?

Taking a quick skim through the adafruit/Fruit-Jam-OS repo, I didn't see any sign of calls to gc.collect(). It's obviously pretty graphics intensive, and the launcher uses some USB host stuff. My guess is that that stuff leads to a fragmented SRAM heap by the time the launched program gets loaded? Maybe?

@samblenny
Copy link

Part of what I was thinking with the static buffer approach is that it would buy folks some extra breathing room by making heap fragmentation less likely. That way they wouldn't have to think as much about managing memory allocation when writing Python code. OTOH, you could definitely make an argument that people should consider memory management more in their Python code by calling gc.collect() and allocating large buffers early.

@tannewt
Copy link
Member Author

tannewt commented Sep 18, 2025

gc.collect() isn't involved here. It manages the VM Python heap.

The framebuffer error is caused by the "outer" heap fragmentation which separately manages the internal SRAM and the external PSRAM. The internal SRAM is only used for buffers that are used by DMA (like the framebuffer) and buffers that need to be accessed when PSRAM access is disabled (like the USB buffers.)

@samblenny
Copy link

The framebuffer error is caused by the "outer" heap fragmentation which separately manages the internal SRAM and the external PSRAM.

Oh... interesting. So what can be done to manage outer heap fragmentation? Is that something people writing Python code have any control over?

@tannewt
Copy link
Member Author

tannewt commented Sep 18, 2025

So what can be done to manage outer heap fragmentation? Is that something people writing Python code have any control over?

Don't try to allocate almost all of it into a framebuffer. :-P Not really, they don't have much control. There should be very few allocations there so the hope is that fragmentation doesn't become an issue.

@samblenny
Copy link

mmm... I hear you. Well, let me know if you come up with something else you'd like tested or if you want a modified version of my static allocation PR.

@tannewt
Copy link
Member Author

tannewt commented Sep 19, 2025

While talking to Dan I realized I could simply free the temp buffer after using it instead of saving it for next time. This should prevent the fragmentation and reduce the number of concurrent allocations. Will do that now and then push it here.

@samblenny
Copy link

Will do that now and then push it here.

If I understand correctly, you're proposing to do a heap allocation and deallocation of the full size buffer for every Device.read(). That might be okay. But, could you hold off on merging that until we have the chance to test how much additional latency it adds?

I'm concerned that MIDI polling in particular is already apparently too slow such that it drops notes.

This reduces memory use and prevents long term fragmentation.
@tannewt
Copy link
Member Author

tannewt commented Sep 19, 2025

If I understand correctly, you're proposing to do a heap allocation and deallocation of the full size buffer for every Device.read(). That might be okay. But, could you hold off on merging that until we have the chance to test how much additional latency it adds?

Yup, just pushed. Please give it a test. I only made sure it compiles for Fruit Jam.

I'm concerned that MIDI polling in particular is already apparently too slow such that it drops notes.

I suspect MIDI needs a separate higher level API that is implemented using TinyUSB's MIDI support for buffering. What is the way to do it in CPython on Linux? (This is similar to having a pyserial-like API for CDC.)

Copy link

@samblenny samblenny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me nervous. Feels a bit sketchy. Maybe it's totally fine. IDK.

In particular, I worry about whether static void _abort_transfer(tuh_xfer_t *xfer) will fail in unpleasant ways under edge case conditions.

@samblenny
Copy link

I tested the build for the latest changes and it works fine with my MIDI synth and gamepad stuff. The amount of stuck notes on the MIDI synth didn't seem any better or worse than before these changes. If RetiredWizard tries this and it works okay with Fruit Jam OS + Fruitris, then as far as I'm concerned, this is good to go. I have some reservations about _abort_transfer(), but it's probably fine. Whatever. Fixing the crash is more important.

@samblenny
Copy link

samblenny commented Sep 20, 2025

I suspect MIDI needs a separate higher level API that is implemented using TinyUSB's MIDI support for buffering. What is the way to do it in CPython on Linux?

Not sure. Aside from PyGame's pygame.midi I haven't run across much in the way of MIDI coding using Python on full size computers. It's more common to see stuff using C/C++, Javascript, or one of the special purpose music programming languages (Pd, Max, Faust, Supercollider, ChucK, Sonic PI, etc). Generally people doing that kind of work are aiming for very low latency, so using a garbage collected language isn't likely to be a first choice (notable exception being Javascript for web stuff, but even then people might go for WASM).

@RetiredWizard
Copy link

I'm actually on a mini-vacation this weekend, but what kind of vacation would it be without the Fruit Jam along? 😁 I managed to get this artifact loaded up and Fruitris does run without giving the memory error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

USB Host: Fast polling + write to CIRCUITPY crashes board
4 participants