Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run archival/retrieval stress tests for LTS #83

Closed
phwissmann opened this issue Aug 12, 2024 · 1 comment · Fixed by #105 · May be fixed by #91
Closed

Run archival/retrieval stress tests for LTS #83

phwissmann opened this issue Aug 12, 2024 · 1 comment · Fixed by #105 · May be fixed by #91
Assignees
Labels
enhancement New feature or request

Comments

@phwissmann
Copy link
Collaborator

phwissmann commented Aug 12, 2024

Description
Devise and implement a (stress) test scenarios to understand the following:

  • How many failures occur among N retrievals of the same dataset over time
  • How many failures occur among M archivals and retrieval of same datasets
  • How does the system behave with parallel writes and reads?

Solution proposals

Repeated Retrieval of single dataset

Procedure:

  1. Create a dataset
Setup
number of files 800
size per file 200 MB
dataset size 160 GB
target block size 50 GB

done: pid 11223344

  1. Archive once on LTS test share
  1. Run scheduled retrieval against LTS share|
  • Scheduled every 4h
  • will run from landing zone first, expect to run from tape after 8h+
  • Question: what is the retention time on the landing zone? Schedule needs adaptation

Results

  • archival succeeded: 2h48min
  • single retrieval (from landing zone, not tape) succeeded: 52min

Repeated Archival

Procedure:

  1. Create 30 datasets
Setup
number of files 100
size per file 100 MB
dataset size 10 GB
target block size 50 GB
  1. Archive datasets concurrently
  • Concurrency limit on workpool level (4)

Result

  • no failures
  • concurrency was not set in workpool
  • landing zone bucket not cleaned up

Large dataset

Recommendation by Daniele: one 1-2TB dataset to see any issues

@phwissmann phwissmann added the enhancement New feature or request label Aug 12, 2024
@phwissmann phwissmann self-assigned this Aug 12, 2024
@phwissmann phwissmann changed the title Add stress test for LTS Run archival/retrieval stress tests for LTS Aug 14, 2024
@phwissmann phwissmann linked a pull request Aug 14, 2024 that will close this issue
@phwissmann
Copy link
Collaborator Author

Additional info regarding landing zone handling etc:

The files of your test-share are written to tape 24 hours after last access_time, every day at 04:20am in the morning.
At this point the "T-flag" is not set yet, as the file still has a copy on the landing_zone ( on disk ).
The delete script, which removes the copy from the landing_zone runs at 13:45 every noon.
After that the "T-flag" should be set.
The time schedules for the copy and delete scripts are different for each share, and can also change,
so they should not be hard-coded inside your code.

Please note, that the "T-flag" will be gone again, after copying a file back from tape,
as it has 2 copies then, one on tape and one on disk ( landing_zone ).
The file will be removed with the next Delete-Job ( Cleanup Job ), and the "T-flag" will show up again then.

@phwissmann phwissmann linked a pull request Sep 6, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
1 participant