Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Filestore] failing mount-kikimr-test filestore #2831

Open
debnatkh opened this issue Jan 10, 2025 · 1 comment
Open

[Filestore] failing mount-kikimr-test filestore #2831

debnatkh opened this issue Jan 10, 2025 · 1 comment
Assignees
Labels
filestore Add this label to run only cloud/filestore build and tests on PR

Comments

@debnatkh
Copy link
Collaborator

Failures are seen here: https://github-actions-s3.storage.eu-north1.nebius.cloud/ydb-platform/nbs/Nightly-build/12701217490/1/nebius-x86-64/summary/ya-test.html#FAIL and here: https://github-actions-s3.storage.eu-north1.nebius.cloud/ydb-platform/nbs/PR-check/12707781055/1/nebius-x86-64/summary/ya-test.html#FAIL

Profile log looks like this:

1970-01-01T00:48:01.004000Z		Ping	0.000721s	S_OK	{no_info}
....
2025-01-10T11:22:53.728589Z	nfs_share	DestroyHandle	0.004081s	S_OK	{node_id=13, handle=12695884864272595}
2025-01-10T11:22:53.728589Z	nfs_share	GetNodeAttr	0.004081s	E_FS_NOENT	{parent_node_id=12, node_name=randrw_4K_16_fdatasync_8_unlink_end-fsync.0.0, flags=0, mode=0, node_id=0, handle=0, size=0}
1970-01-01T00:51:05.944000Z	nfs_share	PingSession	0.000236s	S_OK	{no_info}
1970-01-01T00:51:06.945000Z	nfs_share	PingSession	0.000348s	S_OK	{no_info}
...
1970-01-01T01:48:33.574000Z	nfs_share	PingSession	0.000314s	S_OK	{no_info}
1970-01-01T01:48:33.954000Z	nfs_share	ResetSession	0.002126s	S_OK	{no_info}
1970-01-01T01:48:33.961000Z	nfs_share	DestroySession	0.001517s	S_OK	{no_info}

Filestore-server logs look like this:

2025-01-10T11:22:53.733783Z :NFS_TABLET DEBUG: [f:nfs_share][t:72075186224037888] GetNodeAttr: #391610 completed (E_FS_NOENT invalid target node #12 "randrw_4K_16_fdatasync_8_unlink_end-fsync.0.0")
2025-01-10T11:22:53.733809Z :NFS_SERVICE DEBUG: #391608 completed DestroyHandle (S_OK)
2025-01-10T11:22:53.733843Z :NFS_SERVICE DEBUG: #391610 completed GetNodeAttr (E_FS_NOENT invalid target node #12 "randrw_4K_16_fdatasync_8_unlink_end-fsync.0.0", request-id: 391610)
2025-01-10T11:22:53.902981Z :NFS_SERVICE DEBUG: CpuWait stats: lack = 0; interval = 1001526; wait = 0
2025-01-10T11:22:54.904195Z :NFS_SERVICE DEBUG: CpuWait stats: lack = 0; interval = 1001470; wait = 0
2025-01-10T11:22:55.905636Z :NFS_SERVICE DEBUG: CpuWait stats: lack = 0; interval = 1001404; wait = 0
...
2025-01-10T12:20:20.969308Z :NFS_SERVICE DEBUG: CpuWait stats: lack = 0; interval = 1001425; wait = 0
@debnatkh
Copy link
Collaborator Author

https://github-actions-s3.storage.eu-north1.nebius.cloud/ydb-platform/nbs/PR-check/12866769509/1/nebius-x86-64/test_data/actions-runner/_work/nbs/nbs/cloud/filestore/tests/fio_index/mount-kikimr-test/test-results/py3test/chunk1/testing_out_stuff/test.py.test_fio.randrw_4K_16_fsync_16.log

Dumping fio ps aux outputs produces following:

==== Start of fio ====
2025-01-20 11:58:14,243 - INFO - root - monitor_fio_progress: Fio process is still running with PIDs: [713159]
PID    PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
github    713159  713068  0 37644  5452  21 11:58 ?        00:00:00 /home/github/.ya/build/build_root/y3gp/0010da/cloud/storage/core/tools/testing/fio/bin/fio --name randrw_4K_16_fsync_16 --rw randrw --size 104857600 --bs 4096 --buffer_compress_percentage 90 --runtime 30 --time_based --output-format json --verify md5 --verify_backlog 8192 --serialize_overlap 1 --verify_dump 1 --direct 1 --ioengine libaio --iodepth 16 --directory /home/github/.ya/build/build_root/y3gp/0010da/cloud/filestore/tests/fio_index/mount-kikimr-test/test-results/py3test/chunk1/nfs_mount/randrw_4K_16_fsync_16 --numjobs 1 --write_iolog /home/github/.ya/build/build_root/y3gp/0010da/cloud/filestore/tests/fio_index/mount-kikimr-test/test-results/py3test/chunk1/testing_out_stuff/randrw_4K_16_fsync_16.iolog --log_issue_time 1 --write_lat_log /home/github/.ya/build/build_root/y3gp/0010da/cloud/filestore/tests/fio_index/mount-kikimr-test/test-results/py3test/chunk1/testing_out_stuff/randrw_4K_16_fsync_16.lat.log --write_bw_log /home/github/.ya/build/build_root/y3gp/0010da/cloud/filestore/tests/fio_index/mount-kikimr-test/test-results/py3test/chunk1/testing_out_stuff/randrw_4K_16_fsync_16.bw.log --log_offset 1 --fsync 16
==== Main fio process (713159) spawned a child ====
2025-01-20 11:58:14,387 - INFO - root - monitor_fio_progress: Fio process is still running with PIDs: [713159, 713212]
UID          PID    PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
github    713159  713068  0 37644  5452  18 11:58 ?        00:00:00 /home/github/.ya/build/build_root/y3gp/0010da/cloud/storage/core/tools/testing/fio/bin/fio --name randrw_4K_16_fsync_16 --rw randrw --size 104857600 --bs 4096 --buffer_compress_percentage 90 --runtime 30 --time_based --output-format json --verify md5 --verify_backlog 8192 --serialize_overlap 1 --verify_dump 1 --direct 1 --ioengine libaio --iodepth 16 --directory /home/github/.ya/build/build_root/y3gp/0010da/cloud/filestore/tests/fio_index/mount-kikimr-test/test-results/py3test/chunk1/nfs_mount/randrw_4K_16_fsync_16 --numjobs 1 --write_iolog /home/github/.ya/build/build_root/y3gp/0010da/cloud/filestore/tests/fio_index/mount-kikimr-test/test-results/py3test/chunk1/testing_out_stuff/randrw_4K_16_fsync_16.iolog --log_issue_time 1 --write_lat_log /home/github/.ya/build/build_root/y3gp/0010da/cloud/filestore/tests/fio_index/mount-kikimr-test/test-results/py3test/chunk1/testing_out_stuff/randrw_4K_16_fsync_16.lat.log --write_bw_log /home/github/.ya/build/build_root/y3gp/0010da/cloud/filestore/tests/fio_index/mount-kikimr-test/test-results/py3test/chunk1/testing_out_stuff/randrw_4K_16_fsync_16.bw.log --log_offset 1 --fsync 16
github    713212  713159  0 37687  2772  65 11:58 ?        00:00:00 /home/github/.ya/build/build_root/y3gp/0010da/cloud/storage/core/tools/testing/fio/bin/fio --name randrw_4K_16_fsync_16 --rw randrw --size 104857600 --bs 4096 --buffer_compress_percentage 90 --runtime 30 --time_based --output-format json --verify md5 --verify_backlog 8192 --serialize_overlap 1 --verify_dump 1 --direct 1 --ioengine libaio --iodepth 16 --directory /home/github/.ya/build/build_root/y3gp/0010da/cloud/filestore/tests/fio_index/mount-kikimr-test/test-results/py3test/chunk1/nfs_mount/randrw_4K_16_fsync_16 --numjobs 1 --write_iolog /home/github/.ya/build/build_root/y3gp/0010da/cloud/filestore/tests/fio_index/mount-kikimr-test/test-results/py3test/chunk1/testing_out_stuff/randrw_4K_16_fsync_16.iolog --log_issue_time 1 --write_lat_log /home/github/.ya/build/build_root/y3gp/0010da/cloud/filestore/tests/fio_index/mount-kikimr-test/test-results/py3test/chunk1/testing_out_stuff/randrw_4K_16_fsync_16.lat.log --write_bw_log /home/github/.ya/build/build_root/y3gp/0010da/cloud/filestore/tests/fio_index/mount-kikimr-test/test-results/py3test/chunk1/testing_out_stuff/randrw_4K_16_fsync_16.bw.log --log_offset 1 --fsync 16
...
==== 30 seconds later ====
2025-01-20 11:58:44,448 - INFO - root - monitor_fio_progress: Fio process is still running with PIDs: [713159, 713212]
UID          PID    PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
github    713159  713068  0 54028  5456  48 11:58 ?        00:00:00 /home/github/.ya/build/build_root/y3gp/0010da/cloud/storage/core/tools/testing/fio/bin/fio --name randrw_4K_16_fsync_16 --rw randrw --size 104857600 --bs 4096 --buffer_compress_percentage 90 --runtime 30 --time_based --output-format json --verify md5 --verify_backlog 8192 --serialize_overlap 1 --verify_dump 1 --direct 1 --ioengine libaio --iodepth 16 --directory /home/github/.ya/build/build_root/y3gp/0010da/cloud/filestore/tests/fio_index/mount-kikimr-test/test-results/py3test/chunk1/nfs_mount/randrw_4K_16_fsync_16 --numjobs 1 --write_iolog /home/github/.ya/build/build_root/y3gp/0010da/cloud/filestore/tests/fio_index/mount-kikimr-test/test-results/py3test/chunk1/testing_out_stuff/randrw_4K_16_fsync_16.iolog --log_issue_time 1 --write_lat_log /home/github/.ya/build/build_root/y3gp/0010da/cloud/filestore/tests/fio_index/mount-kikimr-test/test-results/py3test/chunk1/testing_out_stuff/randrw_4K_16_fsync_16.lat.log --write_bw_log /home/github/.ya/build/build_root/y3gp/0010da/cloud/filestore/tests/fio_index/mount-kikimr-test/test-results/py3test/chunk1/testing_out_stuff/randrw_4K_16_fsync_16.bw.log --log_offset 1 --fsync 16
github    713212  713159  2 38891  5776  13 11:58 ?        00:00:00 /home/github/.ya/build/build_root/y3gp/0010da/cloud/storage/core/tools/testing/fio/bin/fio --name randrw_4K_16_fsync_16 --rw randrw --size 104857600 --bs 4096 --buffer_compress_percentage 90 --runtime 30 --time_based --output-format json --verify md5 --verify_backlog 8192 --serialize_overlap 1 --verify_dump 1 --direct 1 --ioengine libaio --iodepth 16 --directory /home/github/.ya/build/build_root/y3gp/0010da/cloud/filestore/tests/fio_index/mount-kikimr-test/test-results/py3test/chunk1/nfs_mount/randrw_4K_16_fsync_16 --numjobs 1 --write_iolog /home/github/.ya/build/build_root/y3gp/0010da/cloud/filestore/tests/fio_index/mount-kikimr-test/test-results/py3test/chunk1/testing_out_stuff/randrw_4K_16_fsync_16.iolog --log_issue_time 1 --write_lat_log /home/github/.ya/build/build_root/y3gp/0010da/cloud/filestore/tests/fio_index/mount-kikimr-test/test-results/py3test/chunk1/testing_out_stuff/randrw_4K_16_fsync_16.lat.log --write_bw_log /home/github/.ya/build/build_root/y3gp/0010da/cloud/filestore/tests/fio_index/mount-kikimr-test/test-results/py3test/chunk1/testing_out_stuff/randrw_4K_16_fsync_16.bw.log --log_offset 1 --fsync 16
==== Child process died, main keeps working ====
2025-01-20 11:58:44,763 - INFO - root - monitor_fio_progress: Fio process is still running with PIDs: [713159]
UID          PID    PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
github    713159  713068  0 54028  5456  54 11:58 ?        00:00:00 /home/github/.ya/build/build_root/y3gp/0010da/cloud/storage/core/tools/testing/fio/bin/fio --name randrw_4K_16_fsync_16 --rw randrw --size 104857600 --bs 4096 --buffer_compress_percentage 90 --runtime 30 --time_based --output-format json --verify md5 --verify_backlog 8192 --serialize_overlap 1 --verify_dump 1 --direct 1 --ioengine libaio --iodepth 16 --directory /home/github/.ya/build/build_root/y3gp/0010da/cloud/filestore/tests/fio_index/mount-kikimr-test/test-results/py3test/chunk1/nfs_mount/randrw_4K_16_fsync_16 --numjobs 1 --write_iolog /home/github/.ya/build/build_root/y3gp/0010da/cloud/filestore/tests/fio_index/mount-kikimr-test/test-results/py3test/chunk1/testing_out_stuff/randrw_4K_16_fsync_16.iolog --log_issue_time 1 --write_lat_log /home/github/.ya/build/build_root/y3gp/0010da/cloud/filestore/tests/fio_index/mount-kikimr-test/test-results/py3test/chunk1/testing_out_stuff/randrw_4K_16_fsync_16.lat.log --write_bw_log /home/github/.ya/build/build_root/y3gp/0010da/cloud/filestore/tests/fio_index/mount-kikimr-test/test-results/py3test/chunk1/testing_out_stuff/randrw_4K_16_fsync_16.bw.log --log_offset 1 --fsync 16
...
==== This process hangs for 10 minutes ====
2025-01-20 12:08:43,322 - INFO - root - monitor_fio_progress: Fio process is still running with PIDs: [713159]
UID          PID    PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
github    713159  713068  0 54073  5648  54 11:58 ?        00:00:00 /home/github/.ya/build/build_root/y3gp/0010da/cloud/storage/core/tools/testing/fio/bin/fio --name randrw_4K_16_fsync_16 --rw randrw --size 104857600 --bs 4096 --buffer_compress_percentage 90 --runtime 30 --time_based --output-format json --verify md5 --verify_backlog 8192 --serialize_overlap 1 --verify_dump 1 --direct 1 --ioengine libaio --iodepth 16 --directory /home/github/.ya/build/build_root/y3gp/0010da/cloud/filestore/tests/fio_index/mount-kikimr-test/test-results/py3test/chunk1/nfs_mount/randrw_4K_16_fsync_16 --numjobs 1 --write_iolog /home/github/.ya/build/build_root/y3gp/0010da/cloud/filestore/tests/fio_index/mount-kikimr-test/test-results/py3test/chunk1/testing_out_stuff/randrw_4K_16_fsync_16.iolog --log_issue_time 1 --write_lat_log /home/github/.ya/build/build_root/y3gp/0010da/cloud/filestore/tests/fio_index/mount-kikimr-test/test-results/py3test/chunk1/testing_out_stuff/randrw_4K_16_fsync_16.lat.log --write_bw_log /home/github/.ya/build/build_root/y3gp/0010da/cloud/filestore/tests/fio_index/mount-kikimr-test/test-results/py3test/chunk1/testing_out_stuff/randrw_4K_16_fsync_16.bw.log --log_offset 1 --fsync 16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
filestore Add this label to run only cloud/filestore build and tests on PR
Projects
None yet
Development

No branches or pull requests

1 participant