Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drivers: video: video_stm32_dcmi: Use video buffers for DCMI buffer. #84446

Merged
merged 1 commit into from
Jan 28, 2025

Conversation

iabdalkader
Copy link
Contributor

@iabdalkader iabdalkader commented Jan 23, 2025

Instead of reserving a static (possibly unaligned) buffer for DCMI, this patch reserves and holds one of the video buffers to use as the main DCMI buffer. This buffer will be aligned (using the alignment specified in the config) and will either be allocated from video_common pool or a shared multi-heap (if enabled).

Copy link
Collaborator

@josuah josuah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(context for other readers =>) It seems like the "continuous mode" of DCMI uses a different model where processing happens in the leap between two frames, rather than use a different buffer every time. That could be improved in the future maybe, but until then, allocating a buffer locally and memcpy()-ing it on every vbuf->data coming from the API seems like the way to go.

See inline comment for a proposal for how to integrate it... Thank you!

drivers/video/video_stm32_dcmi.c Outdated Show resolved Hide resolved
@iabdalkader iabdalkader force-pushed the stm32_video_smh branch 4 times, most recently from 8dec094 to 2452a5f Compare January 23, 2025 16:19
josuah
josuah previously approved these changes Jan 23, 2025
Copy link
Collaborator

@josuah josuah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible that other collaborators will have a better outlook of how to handle this.

At my level, it looks like the best way to implement it without completely reorganizing the video APIs (i.e. switch to net_buf or rtio for buffer management).

Thank you very much!

josuah
josuah previously approved these changes Jan 23, 2025
Copy link
Collaborator

@josuah josuah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I missed it.

@ngphibang
Copy link
Contributor

ngphibang commented Jan 24, 2025

Just a first look, perhaps I missed some thing but I don't understand why you need to expose the VIDEO_HEAP_ALLOC macro (old name VIDEO_COMMON_HEAP_ALLOC) to use it in a driver ? Why don't just use the video_buffer_alloc() API ?

@iabdalkader
Copy link
Contributor Author

iabdalkader commented Jan 24, 2025

Why don't just use the video_buffer_alloc() API ?

Because as it stands, VIDEO_COMMON_HEAP_ALLOC uses the pool defined in video_common.c, which is the main video buffers FIFO. video_stm32 allocates another separate buffer, and copies it to this FIFO. If you have only one buffer in that FIFO CONFIG_VIDEO_BUFFER_POOL_SZ_MAX=1 (very common case), and let video_stm32 reserve it, it will never be able to dequeue another buffer to copy to.

One way to fix this is to require a minimum of two buffers for this driver, but I don't know how to do that, or if it's even possible, is it?

@ngphibang
Copy link
Contributor

ngphibang commented Jan 24, 2025

Ok, I understand the context now.

    1. The DCMI driver needs an additional buffer for its personal usage (a part from the main video buffer pools used for the whole camera pipeline) => for this, I think some aspects need to be investiagted too, e.g does memcpy() used here is optimal, etc. But well, this is not in the scope of this PR.
    1. This PR adds an option to allocate this additional buffer (required soly by the DCMI) either on the shared multi-heap (if CONFIG_VIDEO_BUFFER_USE_SHARED_MULTI_HEAP enabled) or on the dcmi driver heap.

So, I think you just need to do it in the DCMI driver. Refactoring the video_common to expose the macro VIDEO_HEAP_ALLOC is not the right way to do here because

  • The VIDEO_HEAP_ALLOC and video_buffer_alloc() is to allocate buffers for the main pool, so its heap should be always video_buffer_pool heap. The additional buffer needed by the DCMI is something related only to the DCMI not the main video buffer pool.
  • Moreover, the refactoring does not help to save number of code lines.

Even if we want a "common" macro to allocate buffers on any heap, this should be defined system-wide, not inside the video subsystem (but again, it does not save much code lines)

@iabdalkader
Copy link
Contributor Author

iabdalkader commented Jan 24, 2025

So, I think you just need to do it in the DCMI driver. Refactoring the video_common to expose the macro VIDEO_HEAP_ALLOC is not the right way to do here because

You mean duplicate the macro? The initial PR did that exactly, but then I was asked to move it to common header. Are we all in agreement this time that it needs to be duplicated?

Moreover, the refactoring does not help to save number of code lines.

It does, because without it, we'll have to duplicate the macro more or less.

Alternatively, requiring a minimum number of buffers (in the case of stm32 >= 2) for stm32 is another valid option, and then we can use video_buffer_aligned_alloc. The only drawback I see is that using CONFIG_VIDEO_BUFFER_POOL_NUM_MAX from the application might be confusing as it will be -1 for the DCMI.

@ngphibang
Copy link
Contributor

ngphibang commented Jan 24, 2025

You mean duplicate the macro? The initial PR did that exactly, but then I was asked to move it to common header. Are we all in agreement this time that it needs to be duplicated?

Sorry for that, cc @josuah (?). It is used only in one place so you maybe don't need to define the macro, just

#if defined(CONFIG_VIDEO_BUFFER_USE_SHARED_MULTI_HEAP)
shared_multi_heap_aligned_alloc();
#else
k_heap_alloc();

? but if it needs to be duplicated, we do it. There are code duplication existing in Zephyr, especially when you look at the .conf, overlay.

As explained above, VIDEO_HEAP_ALLOC or video_buffer_alloc() is to allocate buffer on the video buffer pool heap. If we want to generalize it for allocation on any particular heap, it should be an utility macro defined system wide so that other non-video drivers can use it as well.

There are examples such as a display driver needs a specific display heap for buffer allocation as here so it does not mean that display driver should use the refactored VIDEO_HEAP_ALLOC macro (?).

Alternatively, requiring a minimum number of buffers (in the case of stm32 >= 2) for stm32 is another valid option, and then we can use video_buffer_aligned_alloc. The only drawback I see is that using CONFIG_VIDEO_BUFFER_POOL_NUM_MAX from the application might be confusing as it will be -1 for the DCMI.

In general, it could be an option. The DCMI can use the common video buffer pool for its personal purpose as well. The CONFIG_VIDEO_BUFFER_POOL_NUM_MAX default value is already 2. But the problem is this config can be changed accidentally / easily by the application and the application does not know that the DCMI driver needs an additional buffer.

A similar option which was discussed here is to use the system heap instead of the driver heap and extend it in the driver with CONFIG_HEAP_MEM_POOL_ADD_SIZE_XXX but if I remember well it does not add the additiobnal required size but specifies the whole size ?? (the discussion was a long time ago, I need to recall it).

@iabdalkader
Copy link
Contributor Author

so it does not mean that display driver should use the refactored VIDEO_HEAP_ALLOC macro (?).

Actually it should, because it reserves a chunk of memory that it may not use at all.

But the problem is this config can be changed accidentally / easily by the application and the application does not know that the DCMI driver needs an additional buffer.

I thought about this and have a better idea. What if video_stm32_dcmi.c dequeued a video buffer instead of allocating raw memory? The application allocates video buffers with video_buffer_aligned_alloc and enqueues them with video_enqueue. Then when video_stm32_dcmi_stream_start is called it will dequeue one video buffer and hold it until streaming is stopped. Note that the driver will fail to build if the MAX buffers is less than 2 for stm32 video.

Instead of reserving a static (possibly unaligned) buffer for DCMI,
this patch reserves and holds one of the video buffers to use as the
main DCMI buffer. This buffer will be aligned (using the alignment
specified in the config) and will either be allocated from `video_common`
pool or a shared multi-heap (if enabled).

Signed-off-by: Ibrahim Abdalkader <[email protected]>
@ngphibang
Copy link
Contributor

ngphibang commented Jan 24, 2025

so it does not mean that display driver should use the refactored VIDEO_HEAP_ALLOC macro (?).

Actually it should, because it reserves a chunk of memory that it may not use at all.

I think you misunderstood my point here. I mean the display driver could use a common macro like HEAP_ALLOC() defined system wide (to save some line of codes) but it should not include video_common.h and use VIDEO_HEAP_ALLOC() because VIDEO_HEAP_ALLOC() is for video subsystem.

To be clearer, take a look at the signature of the newly refactored VIDEO_HEAP_ALLOC:

#define VIDEO_HEAP_ALLOC(buffer_pool, align, size, timeout)

there is nothing specific to video anymore, so that's why it should be HEAP_ALLOC(buffer_pool, align, size, timeout) defined system wide so that any video / display / others drivers could use it.

@iabdalkader
Copy link
Contributor Author

iabdalkader commented Jan 24, 2025

there is nothing specific to video anymore, so that's why it should be HEAP_ALLOC(buffer_pool, align, size, timeout) defined system wide so that any video / display / others drivers could use it.

I see. Well I just need to solve my immediate problem right now: Use SMH if enabled. I imagine adding a general purpose macro/public API will require reviews docs, tests etc..

Please see the updated commit, if it's not good for any reason I'll duplicate the macro in video_stm32_dcmi.c.

@ngphibang
Copy link
Contributor

Thanks. I need some time to understand the context / purpose of the additional buffer in the DCMI driver (I missed the review of this driver). Otherwise, @CharlesDias may have a better look.

@iabdalkader
Copy link
Contributor Author

iabdalkader commented Jan 24, 2025

I need some time to understand the context / purpose of the additional buffer in the DCMI driver

No worries. FWIW, there's no reason not to use the FIFO to pass buffers between driver application without memcpy (i.e., put back the buffer after capture, and get another one). However, if the user/application does not dequeue, process and enqueue fast enough the FIFO on DCMI side will underrun and will have to stop streaming, and possibly restart later. This, I believe, is the only reason for the additional buffer. Is it fixable? Probably yes, but again just trying to fix my immediate problem :)

@josuah
Copy link
Collaborator

josuah commented Jan 24, 2025

Thanks all for this in-depth study!

One way to fix this is to require a minimum of two buffers for this driver, but I don't know how to do that, or if it's even possible, is it?

It is possible to declare it, although, not taken advantage of through the samples or API:

uint8_t min_vbuf_count;

maybe don't need to define the macro, just #if [...] shared_multi_heap_aligned_alloc(); #else k_heap_alloc();

That sounds good, it would be a more lightweight intermediate solution for a more general fix coming later.

What if video_stm32_dcmi.c dequeued a video buffer instead of allocating raw memory? The application allocates video buffers with video_buffer_aligned_alloc and enqueues them with video_enqueue.

That is how most video drivers work it seems, and if it is possible to make DCMI work that way, that avoids the problem of local video memory allocation in the driver.

context / purpose of the additional buffer in the DCMI driver

The way the buffers are currently loaded/unloaded:

HAL_DCMI_Suspend(hdcmi);
vbuf = k_fifo_get(&dev_data->fifo_in, K_NO_WAIT);
if (vbuf == NULL) {
LOG_DBG("Failed to get buffer from fifo");
goto resume;
}
vbuf->timestamp = k_uptime_get_32();
memcpy(vbuf->buffer, dev_data->buffer, vbuf->bytesused);
k_fifo_put(&dev_data->fifo_out, vbuf);
resume:
HAL_DCMI_Resume(hdcmi);

I do not see any API call that allows to update that pData to the buffer used for I/O:

https://github.com/zephyrproject-rtos/hal_stm32/blob/1d1f81866ccbaa6e84e9960ed763e005d1e45560/stm32cube/stm32f7xx/drivers/include/stm32f7xx_hal_dcmi.h#L559

HAL_StatusTypeDef HAL_DCMI_Start_DMA(DCMI_HandleTypeDef *hdcmi, uint32_t DCMI_Mode, uint32_t pData, uint32_t Length);

Maybe the hardware does not allow to update the buffer without stopping/starting the engine?

@iabdalkader
Copy link
Contributor Author

iabdalkader commented Jan 24, 2025

That is how most video drivers work it seems, and if it is possible to make DCMI work that way, that avoids the problem of local video memory allocation in the driver.

I've already implemented this and updated the PR, no local video buffer allocation but still uses memcpy as before, this has not changed.

Maybe the hardware does not allow to update the buffer without stopping/starting the engine?

The DMA does allow updating the target address when not in use, in general or just in double buffer mode can't remember, but either way this is a different issue, which I'm not trying to fix. However, this PR is a step in the right direction: you need to use video buffers before you can remove the memcpy anyway.

Copy link
Collaborator

@josuah josuah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed the update. Very good first step indeed!
I prefer it over what I originally suggested (previous LGTM) as I did not think of this idea.
+1 for my part!

@CharlesDias
Copy link
Contributor

Hi, @iabdalkader. Thank you for your contribution! This is a great improvement! :)

@ngphibang and @josuah, is there another way to release the video buffers besides using the video_buffer_release? I'm wondering if there's a possibility for the video buffer to be released while the DCMI driver is running. I believe this isn't possible under normal usage conditions.

Additionally, I tested the capture_to_lvgl sample on MiniSTM32H743 and still worked.
It has my +1!

@iabdalkader iabdalkader changed the title drivers: video: video_stm32_dcmi: Use shared multi-heap if enabled. drivers: video: video_stm32_dcmi: Use video buffers for DCMI buffer. Jan 25, 2025
@josuah
Copy link
Collaborator

josuah commented Jan 25, 2025

Thank you for testing it!

is there another way to release the video buffers besides using the video_buffer_release?

If the application decides to use a vbuf after it was enqueued(), and before it was dequeued(), AFAIU, it is an undefined behavior (as the buffer is expected to be processed by the driver). It is expected to forget a vbuf pointer as soon as it is passed to enqueue().

AFAICT, the current APIs give the driver the freedom of how many buffers it holds and in which order they are released.

void video_buffer_release(struct video_buffer *vbuf)

The video buffer variables are not exported, there is no way to access it from the outside other than the provided functions.

Copy link
Contributor

@ngphibang ngphibang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CharlesDias @iabdalkader I believe the memcpy() is not necessary and it does reduce the performance of the camera pipeline. To avoid this software copy, the DCMI driver (zephyr driver and / or the HAL driver) could implement a double buffering mechanism (similar to the NXP CSI drivers) so that when the DCMI dma-ing the image data to one buffer, the application can use another buffer for displaying.

But well, this is the problem of the original code. The current PR does not affect this issue and LGTM.

@kartben kartben merged commit b58671f into zephyrproject-rtos:main Jan 28, 2025
26 checks passed
@iabdalkader iabdalkader deleted the stm32_video_smh branch January 28, 2025 08:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Video Video subsystem platform: STM32 ST Micro STM32
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants