STM32 USB to I2S multi channel - log - ask for help

This is unfortunatly systematic. And I still have buffer over/underruns. Host and device don't synchronize. Something is wrong there.

I try to use Wireshark, but this not easy for me. I captured something that looks like my USB packets. I get some ISOC in from the device to the host, with correct endpoint (0x81) but the frame is 18xx length, which looks big for 4 bytes feedback. There may be encaplusation overhead but...

I include a screenshot in case someone could help analyse, or guide mo to what to look at.

I look at the above, ant I see for each feedback packet:
  • 248 data length, instead of 4 bytes,
  • 128 packets,
  • alternance of empty packets and 4 bytes feedback data packets (ISO Data: 00000600) !!!!
  • repeated many times (possibly 64 times ?)

So the capture contains the relevant feedback data for the correct endpoint (so not a pointer on something completly different), but not packed in a correct way (many duplicates) ?

Do I read the capture correctly? Are those duplicates normal, like "as I have nothing else to do, I diplicqte the data so that you can cross check that all is OK or catch-up if you missed the first one) ?

Sorry for the burden :-(

JMF
 

Attachments

  • Wireshark Feedback.JPG
    Wireshark Feedback.JPG
    277.9 KB · Views: 12
Last edited:
I managed to get binterval > 1 working with Win10 by a small modification to my original logic. This is the new logic:
  • Feedback sent in SOF interrupt with binterval=1 until first IN event. After first IN event feedback is sent with actual bInterval but only when parity matches current frame number. Note that this means that with binterval=1 the actual binterval=2.
  • In first IN event parity is set according to current frame number - 1 (i.e. previous frame number).
  • If IsoInIncomplete occurs parity is flipped.
With this procedure regardless of bInterval the number of IN events is close to number of feedback messages sent and number of IsoInIncomplete events is close to zero. Note also that in Win10/11 the max supported bInterval in HS isochronous transfers is 4.
 
  • Like
Reactions: 1 user
Thanks bohrok to have worked out this logic. Would you accept to explain a bit how to check and manage those frame numbers and parity bits? Name of registers or functions used?

I understand that it is all about alignment with the good feet at the beginning, and readjusting when some misses.

I wonder if the usbx stack is not stacking up the (not claimed) feedback requests, which then makes the output invalid to Windows, then get things worse and worse...

JMF
 
For current frame number I use this (this is for STM32H7 so not sure if it is the same in STM32F4):
Code:
#define USB_OTG_BASE  USB1_OTG_HS
#define USB_SOF_NUMBER()    ((((USB_OTG_DeviceTypeDef *)((uint32_t )USB_OTG_BASE + USB_OTG_DEVICE_BASE))->DSTS&USB_OTG_DSTS_FNSOF)>>USB_OTG_DSTS_FNSOF_Pos)

I store parity bit in USB audio device handle (USBD_AUDIO_HandleTypeDef).
 
Thanks bohrok,

I'm not at the end of my journey, but your information help definitively.

From https://ask.wireshark.org/question/...k-usb-bus-capture-of-an-isochronous-endpoint/ I understand that what I see in Wireshark is that my feedback packets are not "OK". It groups the packets over 10ms. Each packet is an independant sending. So I have a mix of good feedback packets, and some empty ones (which report a message in the Windows UAC2 logs).

The USBX placeholder for the feedback implementation is too crude for the stm32 specificities of odd/even frames and IsoInIncomplete. It requests sending a feedback packet and suspends up to when the packet is read, and then immediatly puts another one. Will not work here...
 
@JMF11 : Thanks a lot for linking to that discussion, very nice description of the wireshark output. IMO it does look like some of your feedback packets are empty.

@bohrok2610 : The super informative comments to https://ask.wireshark.org/question/...chronous-endpoint/?answer=31679#post-id-31679 mention that USB timeouts to IN requests are "insanely short" (below 1us). IIUC the device must respond to an IN request immediately, i.e. it must have the data already prepared in HW.

I checked the linux gadget driver. When opening capture, the driver directly queues one feedback response with nominal value https://github.com/torvalds/linux/b...ivers/usb/gadget/function/u_audio.c#L652-L664 . When this request is completed (i.e. the fback data are sent to the host), a new fback value is calculated and queued to the fback EP IN https://github.com/torvalds/linux/b...ivers/usb/gadget/function/u_audio.c#L280-L312 , so that the Synopsys hardware itself can send the packet upon requesting the IN from host - to be able to fit within that "insanely short" timeout .

I do not understand exactly the method you describe (since I know nothing about STM programming) but it feels like in the end it provides this kind of queuing. Is perhaps the issue somewhere in this direction?
 
Synopsys OTG is responsible for the low level USB communication. It has register-based interface but there is no documenation and no way to see how it actually works internally so it is like a black box. Feedback messages (and data messages) end up in FIFOs which Synopsys uses. So these FIFOs provide the queuing. IN and IsoInIncomplete events are "indicators" of how the feedback messaging and queuing works. All in all IME it works really well. The only issue (or "bug") with Synopsys is the frame number parity handling which is not required by the spec and only complicates the feedback processing.
 
Synopsys OTG is responsible for the low level USB communication. It has register-based interface but there is no documenation and no way to see how it actually works internally so it is like a black box.
Synopsys has dedicated teams for their DWC2 and DWC3 IPs who maintain the linux drivers for their blocks and monitor the kernel mailing lists for support. It actually works quite good, the Synopsys DWC2 maintainer has helped me numerous times. Maybe STM has a similar forum with similar support?

The only issue (or "bug") with Synopsys is the frame number parity handling which is not required by the spec and only complicates the feedback processing.
I really know nothing about your implementation, just wondering, if perhaps what looks like number parity handling requirement is not just empty FIFO when the DWC block must immediately respond to IN request from the host. But as I say it's just a question raised, may be completely wrong.
 
IIUC linux for STM32F4xx supports the DWC2 OTG https://lore.kernel.org/linux-arm-kernel/d3711b3b-401d-87c7-20bd-a88364ef15a1@st.com/T/ (i.e. the same IP core as in RPi I use), maybe that source code would give you some hints. Maybe you could address directly Minas Harutyunyan <Minas.Harutyunyan@synopsys.com>, the very helpful DWC2 kernel driver maintainer (which STMs running linux use). You can google out his communication/email in the linux-usb mailing list. He may be able to answer your questions about Synopsys DWC2 IP core in your STM.
 
As the USBX feedback implementation does not match for the stm32, I want to have a look at the different implementations of UAC and feedback I have the code for. I started with a short look this morning to the code of https://www.diyaudio.com/community/threads/open-sourced-uac2-bridge-based-on-stm32.404656/

Seems that the feedback is "sent" on the DataIn Callback, so in my understanding AFTER the prepared feedback packet was pulled by the Host. The IsoInIncomplete callback, repost the feedback packet.

From what I understand, this should not work nicely:
  • the feedback packet send request in DataIn should not be pulled by the host, which will request new data not before "Binterval =4" uFrames (8 or 16?)
  • it should trigger the IsoInIncomplete callback, repost the feedback packet at next uFrame as Host is not requesting the data,
  • and so on for all frames...
  • up to the next feedback polling frame

IsoInIncomplete events and usb send requests nearly every uFrame... Except if the feedback send requested in DataIn event was executed "for that uFrame" and so the thing would work "naturally"

I'm lazzy to port the code to one of my boards to check :-(
 
Ideally sending feedback packet only at DataIn should be the best solution but it does not work well in practice. Sending feedback packets both at DataIn and IsoInIncomplete works for bInterval=1 but not if bInterval>1. If bInterval>1 the number of feedback packets sent is actually about the same as with bInterval=1 as IsoInIncomplete occurs almost every frame.
 
if perhaps what looks like number parity handling requirement is not just empty FIFO when the DWC block must immediately respond to IN request from the host.
I think the issue has more to do with synchronization. The feedback packet must be waiting in the FIFO when IN request comes from the host. When bInterval=1 writing new feedback packet to FIFO at the end of IN request more or less guarantees that there is a packet ready when next IN request comes. However for bInterval>1 the scheme becomes more complicated as new packet written at the end of IN request leads to IsoInIncomplete events for frames before next IN request. These events could be just ignored but unfortunately HAL layer does some processing in these events which messes up the logic. This HAL processing varies between MCUs as it is different on STM32F7 and H7. My logic with parity handling actually just synchronizes feedback packet writes to occur in the frame before IN request. Another option would be to rewrite part of the HAL layer. This could simplify the logic but would complicate HAL FW updates.
 
That's interesting. In the linux gadget driver when the IP core processes the request and removes it from the FIFO a completion callback is called with that request as parameter. That way new fback value gets prepared, using the pre-allocated request which upon sending is not referenced in the queue anymore and can be reused - re-filled with new values and requeued. If the system is fast enough to requeue the fback request within bInterval=4, it works fine.

For audio data there is an array of pre-allocated requests which get used sequentially so that the system has more time for preparing the requests (filling with data and queuing). The default is two requests which is too few for low-power CPUs, but configurable - I always raise that to 8 which makes it work fine for bInterval=1 and full load on weak CPUs. But that is not related to the fback handling (although the queing -> completion callback principle is the same for all requests).
 
The feedback is working now. It looks operational but it is not as clean as I would like.

1) The USBX implementation for the feedback is not a perfect match with the stm32 USB peripheral behaviour. The feedback task push the feedback to the hoast as soon as the previous one has been read. With the stm32 ISOC In behaviour, this means a lot for IsoInIncomplete callbacks and re-submit of the packet. But Windows still eat those,

2) To manage the feedback value, I assess the level in my buffer. Depending on the level, I ask a bit less or a bit more. I increase the values if we near under or overrunns. This is working. My buffer corresponds to 2ms, and the level is zigzaging betzeen the threshold faster than I would like. But lower difference with the reference could mean OK for me, but too stringent for another set-up. It would need to combine 2 complementary mechanisms:
1- to assess the average value on the long run (moving average on a lot of uFrames). The real time execution shows that even with a feedback value at equilibrium, the +/- one uFrame (6 samples) is much more important than the 0.01% correction needed in average,
2- a small addition to bring and keep the balance point toward the middle of the buffer.

So... progressing slowly ;-)