Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bluetooth: Controller: Fix HCI command buffer allocation failure #83774

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

cvinayak
Copy link
Contributor

@cvinayak cvinayak commented Jan 10, 2025

Fix HCI command buffer allocation failure, that can cause loss of Host Number of Completed Packets command.

Fail by rejecting the HCI Host Buffer Size command if the required number of HCI command buffers are not allocated in the Controller implementation.

Relates to commit 8161430 ("Bluetooth: Add workaround
for no command buffer available")'.

Relates to commit 297f4f4 ("Bluetooth: Split HCI
command & event buffers to two pools").

Relates to #81866

One of many CI failure exposing incorrect allocation:

tests/bsim/bluetooth/host/l2cap/einprogress/test_scripts/run.sh FAILED (0.838 s)
d_01: @00:00:00.000000  *** Booting Zephyr OS build v4.0.0-3133-ge9f6c00b05bc ***
d_01: @00:00:00.000000 INFO: Test start: tester
d_00: @00:00:00.000000  *** Booting Zephyr OS build v4.0.0-3133-ge9f6c00b05bc ***
d_00: @00:00:00.000000 INFO: Test start: dut
d_01: @00:00:00.005768  [00:00:00.005,767] <wrn> [sysworkq] net_buf: Timeout discarded. No blocking in syswq
d_01: @00:00:00.005768  [00:00:00.005,767] <wrn> [sysworkq] net_buf: Timeout discarded. No blocking in syswq
d_01: @00:00:00.005768  [00:00:00.005,767] <wrn> [sysworkq] net_buf: Timeout discarded. No blocking in syswq
d_01: @00:00:00.005768  [00:00:00.005,767] <wrn> [sysworkq] net_buf: Timeout discarded. No blocking in syswq
d_01: @00:00:00.005768  [00:00:00.005,767] <err> [sysworkq] bt_ctlr_hci: FC: Require Host ACL packets (2) < CONFIG_BT_BUF_CMD_TX_COUNT (2)
d_01: @00:00:00.005768  [00:00:00.005,767] <wrn> [main] bt_hci_core: opcode 0x0c33 status 0x12 
d_01: @00:00:00.005768  ASSERTION FAIL [!err] @ WEST_TOPDIR/zephyr/tests/bsim/bluetooth/host/l2cap/einprogress/src/tester.c:51
d_01: @00:00:00.005768  @ WEST_TOPDIR/zephyr/lib/os/assert.c:43
d_01: @00:00:00.005768  [00:00:00.005,767] <err> [main] os: >>> ZEPHYR FATAL ERROR 4: Kernel panic on CPU 0
d_01: @00:00:00.005768  [00:00:00.005,767] <err> [main] os: Current thread: 0x826bc40 (main)
d_01: @00:00:00.005768  [00:00:00.005,767] <err> [main] os: Halting system
d_00: @00:00:00.005768  [00:00:00.005,767] <wrn> [sysworkq] net_buf: Timeout discarded. No blocking in syswq
d_00: @00:00:00.005768  [00:00:00.005,767] <wrn> [sysworkq] net_buf: Timeout discarded. No blocking in syswq
d_00: @00:00:00.005768  [00:00:00.005,767] <wrn> [sysworkq] net_buf: Timeout discarded. No blocking in syswq
d_00: @00:00:00.005768  [00:00:00.005,767] <wrn> [sysworkq] net_buf: Timeout discarded. No blocking in syswq
d_00: @00:00:00.005768  [00:00:00.005,767] <err> [sysworkq] bt_ctlr_hci: FC: Require Host ACL packets (2) < CONFIG_BT_BUF_CMD_TX_COUNT (2)
d_00: @00:00:00.005768  [00:00:00.005,767] <wrn> [main] bt_hci_core: opcode 0x0c33 status 0x12 
d_00: @00:00:00.005768  ASSERTION FAIL [!err] @ WEST_TOPDIR/zephyr/tests/bsim/bluetooth/host/l2cap/einprogress/src/dut.c:108
d_00: @00:00:00.005768  @ WEST_TOPDIR/zephyr/lib/os/assert.c:43
d_00: @00:00:00.005768  [00:00:00.005,767] <err> [main] os: >>> ZEPHYR FATAL ERROR 4: Kernel panic on CPU 0
d_00: @00:00:00.005768  [00:00:00.005,767] <err> [main] os: Current thread: 0x826bc40 (main)
d_00: @00:00:00.005768  [00:00:00.005,767] <err> [main] os: Halting system
p_2G4: WARNING: (src/bs_pc_base.c:437): Device 0 left the party unsuspectingly.. I treat it as if it disconnected
timeout: the monitored command dumped core
/__w/zephyr/zephyr/tests/bsim/sh_common.source: line 22: 413863 Trace/breakpoint trap   $@
p_2G4: WARNING: (src/bs_pc_base.c:437): Device 1 left the party unsuspectingly.. I treat it as if it disconnected
timeout: the monitored command dumped core
/__w/zephyr/zephyr/tests/bsim/sh_common.source: line 22: 413864 Trace/breakpoint trap   $@

subsys/bluetooth/common/Kconfig Show resolved Hide resolved
subsys/bluetooth/common/Kconfig Show resolved Hide resolved
subsys/bluetooth/controller/hci/hci.c Outdated Show resolved Hide resolved
Comment on lines +193 to +200
HCI Controllers may not support Num_HCI_Command_Packets > 1, hence
default to 1 when not enabling Controller to Host data flow control,
BT_HCI_ACL_FLOW_CONTROL.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This paragraph isn't completely clear to me. Even if a controller supports ncmd > 1, the Zephyr host has a command queue and will always round down the value to 1, or rather, treat it as a boolean:

/* Allow next command to be sent */
if (ncmd) {
k_sem_give(&bt_dev.ncmd_sem);
bt_tx_irq_raise();
}

The other help text paragraphs sort of make sense, but I just wanted to make sure the above host implementation detail is clear, so that this PR isn't making any false assumptions (since I haven't quite wrapped my head around all the details of it).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Zephyr Host only implements ncmd = 1. See,

/* Give cmd_sem allowing to send first HCI_Reset cmd, the only
* exception is if the controller requests to wait for an
* initial Command Complete for NOP.
*/
if (!IS_ENABLED(CONFIG_BT_WAIT_NOP)) {
k_sem_init(&bt_dev.ncmd_sem, 1, 1);
} else {
k_sem_init(&bt_dev.ncmd_sem, 0, 1);
}

I guess it is not difficult to use the same semaphore and we able to support ncmd > 1, but we do not have support in native Controller today for that though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the Zephyr host has a command queue and will always round down the value to 1, or rather, treat it as a boolean:

@jhedberg I believe the implementation "can" support ncmd > 1, but today ncmd in the evt is only used as a bool to pause the queuing, but the ncmd_sem if initialized correctly would support ncmd > 1. But we are not going towards that in this PR.

@cvinayak cvinayak requested a review from KyraLengfeld January 10, 2025 10:01
@cvinayak cvinayak force-pushed the github_hci_cmd_buf_alloc_check branch from 7c899ff to 8369ca2 Compare January 10, 2025 14:24
@zephyrbot zephyrbot added area: Samples Samples platform: nRF Nordic nRFx area: Bluetooth HCI Bluetooth HCI Driver labels Jan 10, 2025
@cvinayak cvinayak force-pushed the github_hci_cmd_buf_alloc_check branch from 8369ca2 to 99f1d09 Compare January 10, 2025 14:52
subsys/bluetooth/common/Kconfig Show resolved Hide resolved
Comment on lines 129 to 136
config BT_CTLR_HCI_CMD_TX_COUNT
# Hidden Controller implementation supported Num_HCI_Command_Packets value */
int
depends on BT_CTLR_HCI
default 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we use BT_BUF_CMD_TX_COUNT instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should rename this to BT_CTLR_HCI_NUM_CMD_PKT_MAX, right? This is the supported 1 normal HCI command flow control count.

BT_BUF_CMD_TX_COUNT will be the buffer counts calculated with/without flow control support.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A promptless Kconfig with a static value doesn't really have any value (i.e. nonconfigurable) - It could just as well be a #define or something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the derived value BT_BUF_CMD_TX_COUNT in hci_common_internal.h is needed by hci_raw.c (and a Controller implementation HCI driver glue) which will be used with other vendor Controllers (source or binary) to build Controller only builds (hci_ipc, hci_uart, etc...), I do not want to hard code using #define which will require hci_raw.c to include numerous vendor Controller's header file. Kconfig is cleaner to fetch a Controller's implementation specifics (just like we have the numerous _SUPPORT Kconfigs).

Fix HCI command buffer allocation failure, that can cause
loss of Host Number of Completed Packets command.

Fail by rejecting the HCI Host Buffer Size command if the
required number of HCI command buffers are not allocated in
the Controller implementation.

When Controller to Host data flow control is supported in
the Controller only build, ensure that BT_BUF_CMD_TX_COUNT
is greater than or equal to (BT_BUF_ACL_RX_COUNT + Ncmd),
where Ncmd is supported maximum Num_HCI_Command_Packets in
the Controller implementation.

Relates to commit 8161430 ("Bluetooth: Add workaround
for no command buffer available")'.

Relates to commit 297f4f4 ("Bluetooth: Split HCI
command & event buffers to two pools").

Signed-off-by: Vinayak Kariappa Chettimada <[email protected]>
@cvinayak cvinayak force-pushed the github_hci_cmd_buf_alloc_check branch from 99f1d09 to 24ce6d9 Compare January 10, 2025 18:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants