-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DSP Ops tester - Take 3 #4874
DSP Ops tester - Take 3 #4874
Conversation
61b5f1f
to
e15cf2c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not on-board with the directions, the simple "echo 1 > boot" or echo 3 > dsp_state is too simple for scripting. These transitions take time, so there's no way for a script to know how long it needs to wait and if the transition was successful.
I would recommend writing a 2-page summary of what we want to achieve, requirements included, otherwise we'll have lots of code review without first agreeing on what we want to achieve.
sound/soc/sof/core.c
Outdated
if (sof_debug_check_flag(SOF_DBG_DSPLESS_MODE)) { | ||
/* select DSPless mode only when DSP OPS testing is not in use */ | ||
if (sof_debug_check_flag(SOF_DBG_DSPLESS_MODE) && | ||
!IS_ENABLED(CONFIG_SND_SOC_SOF_DEBUG_DSP_OPS_TEST)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would put this DSP_OPS_TEST as a new bitfield in sof_debug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document the DSPless VS ops_test mode and print if the DSPless mode is ignored due to the ops_test taking precedence.
Or should DSPless mode be the one which should have the higher priority?
sound/soc/sof/debug-dsp-ops.c
Outdated
snd_sof_fw_unload(sdev); | ||
|
||
/* free trace */ | ||
sof_fw_trace_free(sdev); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't seem related to the boot but more a shutdown...
sound/soc/sof/debug-dsp-ops.c
Outdated
sof_fw_trace_free(sdev); | ||
|
||
/* power up the DSP */ | ||
ret = snd_sof_dsp_runtime_resume(sdev); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yikes. power up the DSP and dsp_runtime_resume doesn't seem like the same thing at all to me.
sound/soc/sof/debug-dsp-ops.c
Outdated
if (ret < 0) | ||
return ret; | ||
|
||
return sof_dsp_dsp_ops_create_dfse(sdev, "boot_fw", dsp_ops_debugfs, 0222); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No control on whether we boot from IMR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Always a fresh boot with the boot option. I will need to add another option to do IMR boot later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The IMR boot is the target, I don't think anyone is interested in a fresh boot, historically this created all kinds of issues in stress tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, fixed now to do IMR boot as default
sound/soc/sof/debug-dsp-ops.c
Outdated
if (ret < 0) | ||
return ret; | ||
|
||
return sof_dsp_dsp_ops_create_dfse(sdev, "dsp_power_state", dsp_ops_debugfs, 0666); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem I have with this API is that it's not quite clear how a script would know that the dsp_power_state has been reached.
How long should the script wait and does it get any error codes?
This seems to simple to be useful IMHO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@plbossart i have exposed both the current fw boot state and the dsp power state that the script can poll to see if the desired state has been reached. How long to poll something that we can set based on the current kernel poll wait times. As for error codes, Im afraid I have no way to pass that on to userspace but we have everything we need in the dmesg isnt it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it's rather problematic if you don't have an error code for an API, and rely on something else to find if the routine worked or not... If you start fuzzing or doing all kinds of stress tests, you may have difficulties finding out when exactly the problem occured. it's just very hard to parse such messages. Exhibit A is the SOF CI, where we need to have a start marker and we stop on dev_err but that leaves lots of issues undetected...
It's also problematic if we have to duplicate in scripts the timeouts we already use for the kernel.
I do think we ought to work first on the design/architecture before going into the code review space. It's a very large effort we are starting with scripts that will have to be maintained. We need to do this right from the beginning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are multiple comments not addressed, not sure if they were missed or github messes with the comments.
The ordering of the patches is weird, we should first have code moves then the addition of new functionality.
sound/soc/sof/Kconfig
Outdated
operations will be disabled when this option is selected. | ||
Say Y if you want to enable DSP ops testing. | ||
If unsure, select "N". | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should make it a SND_SOC_SOF_DEBUG_DSP_OPS_TEST_ENABLE, to let distros opt-out, and then set the actual boolean with a kernel parameter.
it's too much work for everyone to have yet another build.
sound/soc/sof/debug-dsp-ops.c
Outdated
if (ret < 0) | ||
return ret; | ||
|
||
return sof_dsp_dsp_ops_create_dfse(sdev, "dsp_power_state", dsp_ops_debugfs, 0666); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it's rather problematic if you don't have an error code for an API, and rely on something else to find if the routine worked or not... If you start fuzzing or doing all kinds of stress tests, you may have difficulties finding out when exactly the problem occured. it's just very hard to parse such messages. Exhibit A is the SOF CI, where we need to have a start marker and we stop on dev_err but that leaves lots of issues undetected...
It's also problematic if we have to duplicate in scripts the timeouts we already use for the kernel.
I do think we ought to work first on the design/architecture before going into the code review space. It's a very large effort we are starting with scripts that will have to be maintained. We need to do this right from the beginning.
to note:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ranj063, I'm still not sure about the use of this.
We will introduce a new flow which will be tested and it is different than what we normally use, and we draw conclusion based on the test results?
I also think that if this mode is selected then we should not load topology, not create sound card. I see this as a really dangerous feature which can introduce unexpected side-effects on the normal use of the stack.
sound/soc/sof/core.c
Outdated
if (sof_debug_check_flag(SOF_DBG_DSPLESS_MODE)) { | ||
/* select DSPless mode only when DSP OPS testing is not in use */ | ||
if (sof_debug_check_flag(SOF_DBG_DSPLESS_MODE) && | ||
!IS_ENABLED(CONFIG_SND_SOC_SOF_DEBUG_DSP_OPS_TEST)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document the DSPless VS ops_test mode and print if the DSPless mode is ignored due to the ops_test taking precedence.
Or should DSPless mode be the one which should have the higher priority?
sound/soc/sof/amd/acp-loader.c
Outdated
if (!fw_filename) { | ||
fw_filename = kasprintf(GFP_KERNEL, "%s/%s", | ||
plat_data->fw_filename_prefix, | ||
adata->fw_code_bin); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the fw_data_bin
? acp needs two files to be loaded...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh I missed this. I don't know how to handle this for ACP with 2 files. Let me fix the ACP loader to ignore the pass argument for now
if (ret < 0) | ||
return ret; | ||
|
||
return sof_dsp_dsp_ops_create_dfse(sdev, "free_trace", dsp_ops_debugfs, 0222); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how do you start the trace after powering on the DSP? Usually we need to send IPC messages. Does the sof_resume()
will be called to take care of it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our regular sequence for starting trace after boot is: run_firmware() -> trace_init().
So, in the test script, we'd do the same. boot the firmware and then call init_trace. And then do whatever else we want to do say for example load libaries, topologies etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you share the test script? It is hard to visualize how this is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oooohh, so that's why you haven't been reviewing the test script @ujfalusi ? Mere lack of marketing? The PR is here:
Yes, agreed. Thats what the mode does. There's no audio functionality when this mode is selected |
sound/soc/sof/intel/hda-dsp.c
Outdated
|
||
ret = chip->power_down_dsp(sdev); | ||
if (ret < 0) { | ||
dev_err(sdev->dev, "%s: failed to power down DSP during suspend\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the 'during suspend' is redundant, no?
Is it even correct in case we remove the drivers and the power off fails?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry not following the "is it correct?" comment? Are you asking if this is correct to call this during driver remove or are you asking is it OK to fail?
sound/soc/sof/debug-dsp-ops.c
Outdated
int ret; | ||
|
||
/* debugfs root directory for DSP ops debug */ | ||
dsp_ops_debugfs = debugfs_create_dir("dsp_ops", sdev->debugfs_root); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot put the 'dsp_ops' to anywhere to be honest, strange name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what would you suggest I call it instead of "dsp_ops"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"dsp_test" | "dsp_test_mode" | "test_mode" ?
Probably the "test_mode" is better as it is not really testing the DSP as such.
But I'm bad at naming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will hopefully be used to find more bugs in the firmware/software than bugs in the DSP hardware! So how about fw_debug_ops
? Then you can all the now redundant fw_
prefixes in fw_debug_ops/fw_*
below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok renamed to fw_debug_ops. But I will leave the fw_ prefix for now because there will also be tplg_filename/ path etc.
if (IS_ENABLED(CONFIG_SND_SOC_SOF_DEBUG_DSP_OPS_TEST)) | ||
fw_lib = kzalloc(sizeof(*fw_lib), GFP_KERNEL); | ||
else | ||
fw_lib = devm_kzalloc(sdev->dev, sizeof(*fw_lib), GFP_KERNEL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On module remove you are going to leak these, we rely on devm to handle these (see sof_ipc4_exit()
)
if (IS_ENABLED(CONFIG_SND_SOC_SOF_DEBUG_DSP_OPS_TEST)) | ||
fw_lib = kzalloc(sizeof(*fw_lib), GFP_KERNEL); | ||
else | ||
fw_lib = devm_kzalloc(sdev->dev, sizeof(*fw_lib), GFP_KERNEL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for IPC3, I was merely asking if you have checked how this works with ipc_type=0.
sound/soc/sof/debug-dsp-ops.c
Outdated
|
||
/* | ||
* Fix set_power_state to power up the DSP. | ||
* ATM this is taken care of during firmware boot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't get the meaning of this comment....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ranj063, it looks much cleaner now for sure.
I still have reservation on the usefulness of such interface. We test isolated and narrowed down cases which will never going to be possible to use in real use.
Probably it is beneficial, and I just cannot see that. The added code and complexity is minimal, which is a good sign.
sound/soc/sof/debug-dsp-ops.c
Outdated
int ret; | ||
|
||
/* debugfs root directory for DSP ops debug */ | ||
dsp_ops_debugfs = debugfs_create_dir("dsp_ops", sdev->debugfs_root); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"dsp_test" | "dsp_test_mode" | "test_mode" ?
Probably the "test_mode" is better as it is not really testing the DSP as such.
But I'm bad at naming.
sdev->test_profile.fw_name); | ||
|
||
/* load firmware */ | ||
ret = snd_sof_load_firmware(sdev, fw_filename); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we tell the user that what is tried is going to be a NOP and the previously loaded kernel is going to remain?
sound/soc/sof/sof-priv.h
Outdated
@@ -52,6 +52,9 @@ struct snd_sof_pcm_stream; | |||
* dump the message payload also | |||
*/ | |||
#define SOF_DBG_DSPLESS_MODE BIT(15) /* Do not initialize and use the DSP */ | |||
#define SOF_DBG_DSP_OPS_TEST_MODE BIT(16) /* Used for DSP ops testing. | |||
* No audio functionality |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and overrides the DSPless mode.
sound/soc/sof/debug-dsp-ops.c
Outdated
string = "PREPARE\n"; | ||
break; | ||
case SOF_FW_BOOT_IN_PROGRESS: | ||
string = "IN PROGRESS\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is "In progress"?
sound/soc/sof/debug-dsp-ops.c
Outdated
string = "READY OK\n"; | ||
break; | ||
case SOF_FW_BOOT_COMPLETE: | ||
string = "COMPLETE\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kind of like the numeric values as well (got used to them in the kernel log)...
Not sure if it makes sense to include it in the print (6: BOOT_COMPLETE
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
sound/soc/sof/debug-dsp-ops.c
Outdated
if (ret < 0) | ||
return ret; | ||
|
||
return sof_dsp_dsp_ops_create_dfse(sdev, "fw_state", dsp_ops_debugfs, 0444); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, what I mean is that we have this information already, but it is just returning the raw u32.
I don't know if it is used by anyone.
if (ret < 0) | ||
return ret; | ||
|
||
return sof_dsp_dsp_ops_create_dfse(sdev, "free_trace", dsp_ops_debugfs, 0222); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you share the test script? It is hard to visualize how this is used.
sound/soc/sof/core.c
Outdated
if (sof_debug_check_flag(SOF_DBG_DSPLESS_MODE)) { | ||
/* select DSPless mode only when DSP OPS testing is not in use */ | ||
if (sof_debug_check_flag(SOF_DBG_DSPLESS_MODE) && | ||
!sof_debug_check_flag(SOF_DBG_DSP_OPS_TEST_MODE)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would really flip the logic and only take the DSP test mode when the user didn't explicitly bypass the DSP.
sound/soc/sof/sof-priv.h
Outdated
@@ -565,6 +568,12 @@ struct snd_sof_dev { | |||
*/ | |||
bool dspless_mode_selected; | |||
|
|||
/* | |||
* Flag set when DSP ops testing is enabled. When this is set normal audio functionality | |||
* including DSPless mode is disabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really believe this is only relevant IF the DSPless mode is NOT enabled. It's more logical to me...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok I've flipped the logic to have DSPLESS mode take priority over ops testing
sdev->test_profile.fw_name); | ||
|
||
/* load firmware */ | ||
ret = snd_sof_load_firmware(sdev, fw_filename); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to be clear on what we mean by 'new firmware'. if someone overrides an existing firmware in /lib/firmware, they will face all kinds of caching issues. We cannot really add a new firmware with SSH and hope that it's used immediately, that's really asking for trouble.
The assumption should be that you test the firmware that was available at boot.
sound/soc/sof/debug-dsp-ops.c
Outdated
if (ret < 0) | ||
return ret; | ||
|
||
return sof_dsp_dsp_ops_create_dfse(sdev, "boot_fw", dsp_ops_debugfs, 0222); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The IMR boot is the target, I don't think anyone is interested in a fresh boot, historically this created all kinds of issues in stress tests.
sound/soc/sof/debug-dsp-ops.c
Outdated
|
||
/* | ||
* Fix set_power_state to power up the DSP. | ||
* ATM this is taken care of during firmware boot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was also unable to understand the comment.
Modify the signature of the load_firmware op to take the firmware filename as an argument. The firmware filename should be include the full path of the firmware file to be loaded. This is in preparation to add a debug op that will be used to boot a firmware with the filename passed from userspace. Signed-off-by: Ranjani Sridharan <[email protected]>
Before the first firmware boot, set the initial state of the DSP to D3. The DSP D3 state is not the same as the SOF PCI device D3. Instead, it represents that the DSP is in the powered off state. After firmware boot is completed successfully, set the state to D0 to indicate that the DSP has been powered on. Signed-off-by: Ranjani Sridharan <[email protected]>
This will allow for the set_power_state op to be used for switching the DSP to the D3 state. The logic for powering up the DSP when setting the state to D0 also needs to be moved and will be done in a follow up. But for now, this will allow stress testing firmware boot. Signed-off-by: Ranjani Sridharan <[email protected]>
(can't reply to the original anymore because Github does not like force pushes)
I don't understand this. AFAIK, the purpose of this PR is to debug boot issues. We know booting from IMR versus not have had different sets of issues in the past. So shouldn't this PR allow testing both? I don't known which default would be best but I feel like both should be possible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks ok to me, I am just not sure what the next steps are?
@plbossart the first tiny next step is to have the fw boot script run with our CI to do the stress test. I also want to keep building the test script library to add more things like PM flows, DMA flows, topology loading, IPC etc |
Introduce a new debug option to enable testing of DSP operations. With this ioption enabled, normal audio operations will be disabled. The API for the testing interface will be introduced in the follow up patches. Signed-off-by: Ranjani Sridharan <[email protected]>
Add debugfs entries for testing DSP ops. This patch only adds the firmware filename and path entries to pass them from userspace. These will be saved in the newly added field for saving the test profile in struct snd_sof_dev. These will be used for testing DSP ops such as booting firmware and setting the DSP power state in the following patches. Signed-off-by: Ranjani Sridharan <[email protected]>
Add a new op to test firmware boot. Make sure to the DSP is powered off before attempting to boot a new firmware. Signed-off-by: Ranjani Sridharan <[email protected]>
Add a new op to test DSP power state changes. For now only D3 is supported. More states will be added in follow up patches. Signed-off-by: Ranjani Sridharan <[email protected]>
Add a read-only debugfs entry to read the current FW boot state. Signed-off-by: Ranjani Sridharan <[email protected]>
Unload any existing firmware and set the fw_state to PREPARE in preparation for a subsequent firmware boot. Signed-off-by: Ranjani Sridharan <[email protected]>
These will be used to debugging purposes when testing the different DSP ops. Signed-off-by: Ranjani Sridharan <[email protected]>
@plbossart just a minor change to add the fw filename and fw_path string lengths |
@ujfalusi any objection to merging this? This will help me build upon it and add more test cases. |
This PR adds the API for enabling testing the DSP ops in the driver for testing the FW boot flow, PM flow etc. It introduces a new kernel config to enable the testing debugfs interface. When enabled, normal audio operations will be disabled.
For now the following ops are supported for testing: