Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(mount): New chunk-based write algorithm #158

Open
wants to merge 10 commits into
base: dev
Choose a base branch
from

Conversation

dmga44
Copy link
Collaborator

@dmga44 dmga44 commented Aug 1, 2024

[Must be merged after PR #222]

This PR targets the following issues:

  • low performance in single thread writes.
  • [Windows] stop/freeze of writes when multiple copies are performed at the same time.
  • [Windows] decrease of performance sometimes when the number of writing threads increases.

It also adds a new parameter for tuning the write wave timeout (sfschunkserverwavewriteto). Another one, for chosing which write algorithm to use, was also added (sfsuseoldwritealgorithm).

The changes performed are following:

  • create ChunkData struct to represent and gather the data of the pending write operation on a chunk. This struct contains many fields from the inodedata, since it is taking over such responsabilities. The use of this new structure is the main reason of the wide changes in the writedata.cc file.
  • change of the approach in the write jobs queue: each entry, instead of being an inodedata one, it is a ChunkData one.
  • minor changes in other classes (WriteChunkLocator and ChunkWriter).

A test on very small parallel random writings was added. I've also checked that there are no new data races conditions added by this changes.

@dmga44 dmga44 changed the title Change write jobs queue approach feat(mount): Change write jobs queue approach Aug 1, 2024
@dmga44 dmga44 force-pushed the change-write-jobs-queue-approach branch from 05c1e0d to d8752b7 Compare August 1, 2024 12:43
@dmga44 dmga44 self-assigned this Aug 1, 2024
@dmga44 dmga44 force-pushed the change-write-jobs-queue-approach branch 4 times, most recently from a7ba783 to a0f55f0 Compare August 7, 2024 11:00
src/mount/writedata.cc Outdated Show resolved Hide resolved
src/mount/writedata.cc Outdated Show resolved Hide resolved
src/mount/writedata.cc Outdated Show resolved Hide resolved
@dmga44 dmga44 force-pushed the change-write-jobs-queue-approach branch from 28b6cda to dc68c17 Compare August 16, 2024 04:16
Copy link
Contributor

@lgsilva3087 lgsilva3087 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my suggestions.

src/mount/writedata.cc Outdated Show resolved Hide resolved
src/mount/writedata.cc Outdated Show resolved Hide resolved
src/mount/writedata.cc Outdated Show resolved Hide resolved
src/mount/writedata.cc Outdated Show resolved Hide resolved
src/mount/writedata.cc Outdated Show resolved Hide resolved
Copy link
Contributor

@ralcolea ralcolea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I share a partial review because it is still in progress.

src/mount/chunk_writer.h Outdated Show resolved Hide resolved
src/mount/chunk_writer.cc Outdated Show resolved Hide resolved
src/mount/writedata.cc Outdated Show resolved Hide resolved
src/mount/writedata.cc Outdated Show resolved Hide resolved
src/mount/writedata.cc Show resolved Hide resolved
Copy link
Contributor

@ralcolea ralcolea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review still in progress.

src/mount/writedata.cc Outdated Show resolved Hide resolved
src/mount/writedata.cc Outdated Show resolved Hide resolved
src/mount/writedata.cc Outdated Show resolved Hide resolved
src/mount/writedata.cc Outdated Show resolved Hide resolved
src/mount/writedata.cc Outdated Show resolved Hide resolved
src/mount/writedata.cc Outdated Show resolved Hide resolved
src/mount/writedata.cc Outdated Show resolved Hide resolved
src/mount/writedata.cc Outdated Show resolved Hide resolved
Copy link
Contributor

@ralcolea ralcolea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review finished. Great job @dmga44 👍 🔥 🚀
Please, see my suggestions and do not hesitate to ask if something is not clear.

src/mount/writedata.cc Outdated Show resolved Hide resolved
src/mount/writedata.cc Outdated Show resolved Hide resolved
src/mount/writedata.cc Outdated Show resolved Hide resolved
src/mount/writedata.cc Outdated Show resolved Hide resolved
@dmga44 dmga44 force-pushed the change-write-jobs-queue-approach branch 2 times, most recently from d4ca5d9 to 145cc77 Compare September 14, 2024 07:05
lgsilva3087
lgsilva3087 previously approved these changes Sep 14, 2024
Copy link
Contributor

@lgsilva3087 lgsilva3087 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice and complex change. I am approving it.
Please see the minor comments.

src/mount/sauna_client.h Outdated Show resolved Hide resolved
src/mount/writedata.cc Outdated Show resolved Hide resolved
@dmga44 dmga44 force-pushed the change-write-jobs-queue-approach branch 2 times, most recently from d1f162e to 2948328 Compare September 19, 2024 07:37
ralcolea
ralcolea previously approved these changes Sep 23, 2024
Copy link
Contributor

@ralcolea ralcolea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job @dmga44 👍 🔥 🚀

@dmga44 dmga44 force-pushed the change-write-jobs-queue-approach branch from 2948328 to a8b272e Compare October 1, 2024 07:34
@dmga44 dmga44 force-pushed the change-write-jobs-queue-approach branch from a8b272e to 3f77fad Compare October 2, 2024 07:42
@dmga44 dmga44 force-pushed the change-write-jobs-queue-approach branch 3 times, most recently from 02c6328 to 83c7c14 Compare October 22, 2024 12:45
lgsilva3087
lgsilva3087 previously approved these changes Oct 22, 2024
Copy link
Contributor

@lgsilva3087 lgsilva3087 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please see my suggestions.

src/mount/fuse/mount_config.cc Outdated Show resolved Hide resolved
src/mount/writedata.cc Outdated Show resolved Hide resolved
src/mount/writedata.cc Outdated Show resolved Hide resolved
src/mount/fuse/mount_config.cc Outdated Show resolved Hide resolved
doc/sfsmount.1.adoc Outdated Show resolved Hide resolved
@dmga44 dmga44 force-pushed the change-write-jobs-queue-approach branch from 83c7c14 to 3f1fb7c Compare October 23, 2024 02:21
@dmga44 dmga44 force-pushed the change-write-jobs-queue-approach branch 2 times, most recently from b561797 to d2bd754 Compare October 24, 2024 23:36
@aNeutrino
Copy link
Contributor

aNeutrino commented Oct 25, 2024

ChatGPT o1-preview comments:

https://chatgpt.com/share/671ae3dc-4340-800b-a37d-2902aa0c1f05

Certainly! Here’s how we can approach identifying potential issues introduced by these commits and proposing new automated tests to detect and minimize them.

Potential Issues Introduced by the Commits:

  1. Concurrency Issues and Data Corruption:

    • Race Conditions: Switching from inode-level to chunk-level write queues may introduce race conditions if not all concurrent access is properly synchronized.
    • Locking Issues: Using per-inode locks instead of a global lock could lead to deadlocks or livelocks if not carefully managed.
    • Data Inconsistency: If the per-inode and per-chunk locking mechanisms aren't correctly implemented, data written concurrently to different chunks might get corrupted.
  2. Edge Cases in the New Write Algorithm:

    • Incomplete Handling of Edge Cases: The new write algorithm may not cover all scenarios that the old algorithm did, particularly in handling partial writes, truncates, or overlapping writes.
    • Error Recovery: The retry mechanisms and error handling might not be sufficient, leading to unrecoverable errors or data loss in certain failure modes.
  3. Compatibility Issues:

    • Backward Compatibility: Existing files written with the old algorithm might not be properly handled by the new algorithm.
    • Option Switching: The sfsuseoldwritealgorithm option introduces complexity; switching between algorithms might cause inconsistent behavior or data corruption.
  4. Metadata Handling and Restore Operations:

    • Restore Failures: Changes to the LENGTH metadata logs may cause restore operations to fail or behave incorrectly if old and new formats aren't properly distinguished and handled.
    • Chunk Deletion Logic: Misinterpretation of the eraseFurtherChunks parameter might lead to unintended deletion of chunks, resulting in data loss.
  5. Performance Regressions:

    • Increased Latency: The per-chunk write queues might introduce overhead or latency in write operations.
    • Resource Usage: The new approach might lead to increased memory consumption due to more granular write queues or improper release of resources.
  6. Error Handling in Truncate Operations:

    • Partial Truncates: The logic for writing zeros to update parity after truncates might fail or not correctly handle partial truncates, leading to data corruption.
    • Lock Management: Locking during truncate operations might cause deadlocks or leave locks unreleased in error scenarios.
  7. Testing Gaps:

    • Insufficient Test Coverage: The current tests might not cover all new code paths introduced, particularly edge cases or failure modes.

Proposed Automated Tests to Detect and Minimize Potential Issues:

To address these potential issues, we can introduce both unit tests and integration tests.


Unit Tests:

  1. Per-Inode and Per-Chunk Locking Tests:

    • Concurrent Access Test: Simulate concurrent writes to multiple chunks of the same inode from different threads/processes. Verify that locks are correctly preventing race conditions.
    • Deadlock Detection Test: Intentionally create scenarios where locks could be acquired in different orders and verify that no deadlocks occur.
    • Lock Contention Test: Measure the performance impact when multiple threads are contending for the same inode/chunk locks to ensure acceptable performance.
  2. Write Algorithm Edge Case Tests:

    • Partial Writes Test: Test writes that don't align with block boundaries to ensure they are correctly handled.
    • Boundary Crossing Writes Test: Write operations that span multiple chunks to verify correct behavior.
    • Max Retries Test: Simulate failures that trigger the retry logic to ensure it works as expected.
  3. Error Handling Tests:

    • Simulated Failures Test: Inject failures at various points (e.g., network timeouts, disk write errors) and verify that the error handling and recovery mechanisms are robust.
    • Exception Path Tests: Ensure that exceptions don't leave locks held or resources unreleased.
  4. Truncate Operation Unit Tests:

    • Zero-Padding Test: Verify that the zeros are correctly written to the necessary regions after a truncate.
    • Lock Release Test: Ensure that locks acquired during truncates are properly released even in error conditions.
  5. Metadata Handling Tests:

    • LENGTH Metadata Parsing Test: Verify that both old and new LENGTH metadata entries are correctly parsed.
    • Chunk Deletion Logic Test: Ensure that the eraseFurtherChunks parameter correctly controls chunk deletion.

Integration Tests:

  1. Concurrent Write Stress Tests:

    • Multiple Clients Test: Have multiple clients perform writes to the same file concurrently, including overlapping writes, and verify data integrity.
    • High Throughput Test: Simulate high write loads to assess performance and resource utilization.
  2. Compatibility Tests:

    • Algorithm Switching Test: Switch between old and new write algorithms using the sfsuseoldwritealgorithm option during runtime and verify consistent behavior.
    • Mixed Algorithm Environment Test: Use clients with different algorithm settings to write to the same file and ensure data consistency.
  3. Restore Operation Tests:

    • Mixed Metadata Restore Test: Attempt to restore a filesystem with a mix of old and new LENGTH metadata entries to verify correct handling.
    • Restore Failure Simulation Test: Corrupt metadata entries intentionally to ensure the restore process handles errors gracefully.
  4. Truncate Operation Tests:

    • Concurrent Truncate and Write Test: Perform truncates while other clients are writing to the file to ensure correct synchronization and data integrity.
    • Edge Case Truncate Test: Truncate files at various boundaries (e.g., middle of a block, end of a chunk) to test edge cases.
  5. Data Integrity Tests:

    • Checksum Verification Test: After writes and truncates, verify that checksums match to ensure data hasn't been corrupted.
    • Parity Verification Test: Simulate a chunkserver failure and verify that data can be reconstructed from parity information.
  6. Performance Regression Tests:

    • Write Latency Measurement: Compare write latencies before and after the changes under various loads.
    • Resource Utilization Test: Monitor memory and CPU usage during intensive write operations to detect potential leaks or inefficiencies.
  7. Error Injection Tests:

    • Chunkserver Timeout Simulation: Simulate slow or unresponsive chunkservers to test timeout and retry mechanisms.
    • Network Partition Test: Partition the network between clients and servers to test error handling.
  8. Recovery Tests:

    • Crash Recovery Test: Simulate client or server crashes during write operations and verify that the system can recover without data loss.
    • Interrupted Write Test: Forcefully terminate write operations and ensure that partial writes are correctly handled.
  9. Locking Mechanism Stress Tests:

    • Lock Scalability Test: Increase the number of concurrent operations to test the scalability of the new locking mechanism.
    • Lock Hierarchy Test: Verify that locks are acquired and released in the correct order to prevent deadlocks.
  10. Filesystem Consistency Checks:

    • Filesystem Checker Test: Run consistency checks after various operations to ensure the filesystem remains in a consistent state.
    • Metadata Consistency Test: Verify that the metadata (e.g., file sizes, chunk lists) remains accurate after operations.

Additional Steps:

  • Code Review and Static Analysis:

    • Use static analysis tools to detect potential concurrency issues, deadlocks, or resource leaks in the new code.
    • Conduct thorough code reviews focusing on the new locking mechanisms and concurrency changes.
  • Update Existing Tests:

    • Ensure existing tests cover the new code paths and modify them if necessary to accommodate the changes.
    • Add assertions or checks in tests to verify that the new per-inode and per-chunk data structures are correctly updated.
  • Documentation and Monitoring:

    • Update documentation to reflect changes in behavior, configuration options, and known limitations.
    • Implement monitoring to track metrics like write latency, error rates, and resource usage in production to detect issues early.

Conclusion:

By implementing these unit and integration tests, you can significantly reduce the risk of introducing new issues with the changes. These tests will help ensure that:

  • Data Integrity is maintained across various write scenarios and failure modes.
  • Concurrency is correctly managed without introducing deadlocks or race conditions.
  • Performance remains acceptable and resource usage is within expected bounds.
  • Compatibility is preserved, and transitions between old and new algorithms are seamless.
  • Recovery Mechanisms are robust and can handle errors gracefully.

Adding these tests to your CI pipeline will help detect issues early in the development process, making it easier to address them before they affect production environments.

Let's analyze the tests included in your commits and propose specific test implementations to enhance your CI suite.


Analysis of Included Tests:

Your commits added and modified the following tests:

  1. test_ec_small_parallel_writing.sh (New Test):

    • Purpose: Checks the client behavior when many small (1-byte long) random writes are handled in parallel across 10 mount points.
    • Implementation:
      • Generates a 64KB file.
      • Runs 10 parallel tasks, each copying every 10th byte to the same file in SaunaFS using 5 concurrent writers.
      • Validates the data integrity after writing and parity correctness by stopping a chunkserver.
  2. test_ec_long_parallel_writing.sh (New Test):

    • Purpose: Similar to the small parallel writing test but with a larger file size (513MB) to test long parallel writings.
    • Implementation:
      • Generates a 513MB file.
      • Runs 10 parallel tasks, each copying every 10th kilobyte to the same file in SaunaFS using 5 concurrent writers.
      • Validates data integrity and parity correctness.
  3. test_ec_parallel_writing.sh (Modified Test):

    • Purpose: Tests parallel writings with a file size parameterized via a shared template.
    • Implementation:
      • Now uses test_ec_parallel_writing.inc to handle common logic.
      • Allows specifying file size via the FILE_SIZE_MB variable.
  4. test_ec_parallel_writing.inc (New Template):

    • Purpose: Provides a common script to avoid code duplication for parallel writing tests.
    • Implementation:
      • Contains shared logic for setting up the environment, generating files, running parallel write tasks, and validating results.

Identified Gaps and Potential Issues Not Covered by Included Tests:

While these tests effectively assess parallel write operations for different file sizes, they may not cover:

  1. Concurrency and Locking Mechanisms:

    • Testing overlapping writes to the same block or chunk.
    • Stress-testing the per-inode and per-chunk locks for race conditions or deadlocks.
  2. Edge Cases in Write Operations:

    • Partial writes that don't align with block boundaries.
    • Writes that cross chunk or block boundaries.
  3. Error Handling and Recovery:

    • Simulating failures like chunkserver timeouts or network issues during writes.
    • Testing retry logic and error recovery mechanisms.
  4. Truncate Operations:

    • Concurrent truncate and write operations to the same file.
    • Edge cases in truncates, especially related to the new eraseFurtherChunks parameter.
  5. Switching Between Write Algorithms:

    • Behavior when toggling the sfsuseoldwritealgorithm option.
  6. Metadata Handling and Restore Operations:

    • Testing the handling of both old and new LENGTH metadata formats during restore.
  7. Performance Impact:

    • Measuring write performance and resource utilization with the new algorithm.

Proposed Implementation of Additional Tests:

Below are detailed implementations of tests that address the gaps identified.


1. Concurrency and Locking Tests

Test Name: test_concurrent_overlapping_writes.sh

Purpose:

  • To verify that overlapping writes to the same block are correctly synchronized, ensuring data integrity without deadlocks.

Implementation Steps:

  1. Setup:

    • Use a single mount point.
    • Create a file on SaunaFS with an initial size sufficient for the test (e.g., 1MB).
  2. Concurrent Overlapping Writes:

    • Start multiple background processes (e.g., 10) that perform writes to overlapping regions of the file.
    • Each process writes a unique pattern to a specific offset range, ensuring overlaps.
    • Use synchronization primitives (like flock or named semaphores) to ensure overlaps occur.
  3. Verification:

    • After all writes are complete, read the file.
    • Verify that the final data in overlapping regions corresponds to an expected pattern (e.g., last writer wins, or defined behavior).
    • Check logs for any deadlock or race condition errors.
  4. Assertions:

    • Ensure no data corruption in non-overlapping regions.
    • Confirm that the locking mechanisms prevent race conditions.

2. Edge Case Write Tests

Test Name: test_cross_boundary_writes.sh

Purpose:

  • To test writes that cross block and chunk boundaries, ensuring they are correctly handled.

Implementation Steps:

  1. Setup:

    • Create a file large enough to span multiple chunks (e.g., 5MB).
  2. Perform Cross-Boundary Writes:

    • Write data starting near the end of a block so that it spans into the next block.
    • Write data starting near the end of a chunk so that it spans into the next chunk.
  3. Partial Writes:

    • Perform writes that are smaller than the block size and don't align with block boundaries.
  4. Verification:

    • Read back the affected regions.
    • Verify that the data matches what was written.
    • Ensure that no extra data was overwritten or lost.
  5. Assertions:

    • Confirm that data before and after the write regions remains intact.
    • Check for any errors or warnings in the logs related to boundary handling.

3. Error Handling and Recovery Tests

Test Name: test_simulated_chunkserver_failure.sh

Purpose:

  • To test the client's ability to handle chunkserver failures during write operations.

Implementation Steps:

  1. Setup:

    • Configure SaunaFS with multiple chunkservers (e.g., 4).
    • Create a file on SaunaFS.
  2. Simulate Writes:

    • Begin a large write operation to the file.
  3. Simulate Chunkserver Failure:

    • While the write is in progress, stop one or more chunkservers using saunafs_chunkserver_daemon stop.
    • Optionally, introduce network delays or drop packets to simulate network issues.
  4. Verification:

    • Observe the client's behavior (e.g., retry logic).
    • Check if the write operation eventually completes or fails gracefully.
    • Verify that appropriate error messages are logged.
  5. Recovery:

    • Restart the chunkservers.
    • Ensure that subsequent write operations succeed.
  6. Assertions:

    • No data corruption occurs.
    • The client does not crash or hang.
    • Errors are properly reported to the user.

4. Truncate Operation Tests

Test Name: test_concurrent_truncate_and_write.sh

Purpose:

  • To verify that truncating a file while concurrent writes are happening is handled correctly.

Implementation Steps:

  1. Setup:

    • Create a file and write initial data to it.
  2. Concurrent Operations:

    • Start a background process that continuously writes to the file.
    • In parallel, from another process, issue a truncate operation to reduce the file size.
  3. Synchronization:

    • Use precise timing or signaling to ensure that writes and truncates overlap.
  4. Verification:

    • After both operations are complete, read the file.
    • Verify that the file size matches the expected size after truncation.
    • Check that data integrity is maintained up to the truncated size.
  5. Assertions:

    • No deadlocks or crashes occur.
    • The write operations after truncation start at the correct offset.
    • Logs do not show any errors related to locking or synchronization.

5. Restore Operation Tests

Test Name: test_mixed_metadata_restore.sh

Purpose:

  • To test restoring the filesystem from metadata logs containing both old and new LENGTH entries.

Implementation Steps:

  1. Setup:

    • Start SaunaFS and perform operations that generate old LENGTH entries (e.g., by mounting with sfsuseoldwritealgorithm).
    • Switch to the new write algorithm and perform more operations.
  2. Backup Metadata:

    • Save the metadata logs.
  3. Simulate Failure:

    • Stop SaunaFS and remove the current metadata.
  4. Restore:

    • Use the saved metadata logs to restore the filesystem.
  5. Verification:

    • Verify that all files and directories are present.
    • Check that file sizes and contents match what was written.
    • Ensure that chunks are correctly associated with files.
  6. Assertions:

    • No errors occur during the restore process.
    • The filesystem is consistent and operational after restore.

6. Algorithm Switching Tests

Test Name: test_write_algorithm_switching.sh

Purpose:

  • To ensure data integrity when switching between the old and new write algorithms.

Implementation Steps:

  1. Initial Writes with New Algorithm:

    • Mount SaunaFS without the sfsuseoldwritealgorithm option.
    • Write data to a file.
  2. Switch to Old Algorithm:

    • Unmount and remount SaunaFS with the sfsuseoldwritealgorithm option enabled.
  3. Continued Writes:

    • Continue writing data to the same file.
  4. Verification:

    • Read the entire file.
    • Verify that all data, both before and after the switch, is correct.
  5. Assertions:

    • No data corruption occurs.
    • No errors or warnings are logged during the switch.

7. Performance Benchmark Tests

Test Name: test_write_performance_benchmark.sh

Purpose:

  • To compare write performance between the old and new write algorithms.

Implementation Steps:

  1. Setup Environments:

    • Set up two identical SaunaFS instances, one using the old write algorithm and one using the new.
  2. Benchmarking:

    • Use a benchmarking tool (e.g., fio) to perform write operations under various workloads:
      • Sequential writes.
      • Random writes.
      • Small and large block sizes.
  3. Metrics Collection:

    • Measure throughput (MB/s), latency (ms), CPU usage, and memory usage.
  4. Analysis:

    • Compare the collected metrics between the two algorithms.
  5. Assertions:

    • The new algorithm should not introduce significant performance regressions.
    • Resource utilization remains within acceptable limits.

8. Chunk Deletion Logic Tests

Test Name: test_chunk_deletion_behavior.sh

Purpose:

  • To verify that chunks are deleted or preserved according to the eraseFurtherChunks parameter during truncates.

Implementation Steps:

  1. Setup:

    • Create a large file (e.g., 100MB) that spans multiple chunks.
  2. Truncate with Deletion:

    • Truncate the file to a smaller size with eraseFurtherChunks set to true.
    • Verify that chunks beyond the new size are deleted.
  3. Truncate without Deletion:

    • Truncate the file again to an even smaller size with eraseFurtherChunks set to false.
    • Verify that chunks beyond the new size are preserved.
  4. Verification:

    • List the chunks on the chunkservers to confirm deletion or preservation.
    • Read the file and ensure data integrity up to the truncated size.
  5. Assertions:

    • Chunks are correctly deleted or preserved as per the parameter.
    • No data corruption occurs in the remaining data.

9. Locking Mechanism Stress Tests

Test Name: test_lock_contention_stress.sh

Purpose:

  • To stress-test the locking mechanism with high concurrency to detect potential deadlocks or performance bottlenecks.

Implementation Steps:

  1. Setup:

    • Create a file on SaunaFS.
  2. High-Concurrency Writes:

    • Start a high number of concurrent processes (e.g., 100) performing writes to the same file and chunk.
    • Each process writes to a different offset to minimize actual data conflicts but maximize lock contention.
  3. Monitoring:

    • Monitor system performance, including CPU usage and wait times for locks.
  4. Verification:

    • Ensure all processes complete their writes successfully.
    • Check for any deadlocks or excessive delays.
  5. Assertions:

    • No deadlocks occur.
    • Lock contention does not cause unacceptable performance degradation.

10. Data Integrity with Checksums

Test Name: test_data_integrity_with_checksums.sh

Purpose:

  • To verify data integrity using checksums after write operations.

Implementation Steps:

  1. Setup:

    • Create a file and write known data patterns.
  2. Checksum Calculation:

    • Calculate the checksum (e.g., MD5, SHA256) of the data before writing.
  3. Write and Readback:

    • Write the data to the file.
    • Read back the data from the file.
  4. Verification:

    • Calculate the checksum of the read data.
    • Compare it to the original checksum.
  5. Assertions:

    • Checksums match, confirming data integrity.
    • No data corruption is detected.

11. Exception Handling Tests

Test Name: test_exception_handling_in_writes.sh

Purpose:

  • To ensure that exceptions during writes do not leave locks held or resources unreleased.

Implementation Steps:

  1. Instrumentation:

    • Use fault injection or modify the code to throw exceptions at strategic points during writes.
  2. Write Operations:

    • Perform write operations that will trigger the injected exceptions.
  3. Verification:

    • After the exception, attempt further operations to confirm the system remains operational.
    • Check that all locks have been released.
  4. Assertions:

    • No resource leaks occur.
    • The system recovers gracefully from exceptions.

12. Simulated Network Partitions

Test Name: test_network_partition_during_writes.sh

Purpose:

  • To test the client's behavior when network connectivity is lost and restored during writes.

Implementation Steps:

  1. Setup:

    • Start write operations to a file.
  2. Simulate Network Partition:

    • Use network tools (e.g., iptables or tc) to block traffic between the client and the master/chunkservers.
  3. Observation:

    • Monitor the client's handling of the loss of connectivity.
    • Observe retries, timeouts, or errors.
  4. Restore Connectivity:

    • Remove the network blocks.
  5. Verification:

    • Ensure that writes resume or fail gracefully.
    • Verify data integrity after operations.
  6. Assertions:

    • The client handles network failures without crashing.
    • Appropriate error messages are logged.

13. Crash Recovery Tests

Test Name: test_client_crash_during_write.sh

Purpose:

  • To verify filesystem consistency after a client crashes during a write operation.

Implementation Steps:

  1. Setup:

    • Begin a write operation to a file.
  2. Simulate Client Crash:

    • Forcefully terminate the client process (e.g., using kill -9).
  3. Recovery:

    • Restart the client.
  4. Verification:

    • Check the file for consistency.
    • Verify that no locks are left held.
    • Attempt to resume writes to the file.
  5. Assertions:

    • The filesystem remains consistent.
    • No orphaned resources or locks exist.
    • The client can continue operations normally.

14. Filesystem Consistency Checks

Test Name: test_filesystem_consistency_after_operations.sh

Purpose:

  • To ensure filesystem consistency after a series of operations.

Implementation Steps:

  1. Operations:

    • Perform various operations: create, write, truncate, delete files and directories.
  2. Consistency Check:

    • Use any available filesystem check tool or command (e.g., fsck if applicable).
  3. Verification:

    • Check for inconsistencies in metadata and data.
    • Ensure that all files and directories are correctly accounted for.
  4. Assertions:

    • No inconsistencies are detected.
    • The filesystem reports as healthy.

15. Testing Under Memory Pressure

Test Name: test_writes_under_memory_pressure.sh

Purpose:

  • To ensure the write operations handle low-memory conditions gracefully.

Implementation Steps:

  1. Memory Limitation:

    • Restrict the memory available to the client process (e.g., using ulimit -v or cgroups).
  2. Write Operations:

    • Perform large write operations that would typically require more memory.
  3. Verification:

    • Observe if the client handles memory allocation failures.
    • Check for crashes or hangs.
  4. Assertions:

    • The client reports appropriate errors.
    • The system remains stable despite low-memory conditions.

Integrating Tests into CI Pipeline:

  • Automation: Ensure that all tests are scripted and can run automatically without manual intervention.
  • Environment Setup: Use Docker or virtual machines to create consistent test environments.
  • Reporting: Integrate test results with the CI system to report pass/fail status and logs.
  • Frequency: Run critical tests on every commit; schedule longer-running tests (e.g., performance benchmarks) periodically.
  • Maintenance: Regularly update and review tests to cover new code changes and edge cases.

Conclusion:

By adding these tests to your CI suite, you can better detect and minimize potential issues introduced by the changes in your commits. The proposed tests specifically target concurrency, error handling, edge cases, and the new functionalities introduced, ensuring a robust and reliable filesystem.

Feel free to adjust the test implementations to suit your specific environment and testing framework.

@aNeutrino
Copy link
Contributor

Certainly! Below are the full implementations for each of the 15 tests we discussed earlier. Each test script follows the style and conventions observed in the existing test scripts you've provided, utilizing the available tools and functions.


Test 1: Concurrency and Locking Tests

Test Name: test_concurrent_overlapping_writes.sh

Purpose:

To verify that overlapping writes to the same block are correctly synchronized, ensuring data integrity without deadlocks.


CONTENT of TEST test_concurrent_overlapping_writes.sh

# Test concurrent overlapping writes to the same file.

CHUNKSERVERS=4 \
    USE_RAMDISK=YES \
    MOUNT_EXTRA_CONFIG="sfscachemode=NEVER" \
    setup_local_empty_saunafs info

cd "${info[mount0]}"

# Create a file of 1MB size.
dd if=/dev/zero of=concurrent_file bs=1M count=1

# Define number of concurrent writers.
NUM_WRITERS=10

# Function for a writer process.
write_overlap() {
    local id=$1
    local offset=$2
    local length=$3
    local pattern=$(printf "%02d" $id)
    # Write the pattern to the file at the specified offset.
    yes "$pattern" | tr -d '\n' | head -c $length | dd of=concurrent_file bs=1 count=$length seek=$offset conv=notrunc status=none
}

# Define overlapping regions.
# All writers write to overlapping regions starting at offset 512K.
OFFSET=$((512 * 1024))
LENGTH=$((256 * 1024))

# Start concurrent writes.
for i in $(seq 1 $NUM_WRITERS); do
    write_overlap $i $((OFFSET)) $((LENGTH)) &
done

wait

# Read back the overlapping region.
dd if=concurrent_file bs=1 skip=$OFFSET count=$LENGTH status=none > read_back_data

# Verify the data.
# Since multiple writes overlapped, the final data should correspond to one of the patterns.
unique_patterns=$(tr -d '\n' < read_back_data | fold -w2 | sort | uniq)

if [ $(echo "$unique_patterns" | wc -l) -eq 1 ]; then
    # Only one pattern found, test passes.
    echo "Test passed: Overlapping writes resulted in consistent data."
else
    test_add_failure "Data corruption detected in overlapping writes."
fi

# Clean up
rm -f concurrent_file read_back_data

Test 2: Edge Case Write Tests

Test Name: test_cross_boundary_writes.sh

Purpose:

To test writes that cross block and chunk boundaries, ensuring they are correctly handled.


CONTENT of TEST test_cross_boundary_writes.sh

# Test writes that cross block and chunk boundaries.

CHUNKSERVERS=4 \
    USE_RAMDISK=YES \
    MOUNT_EXTRA_CONFIG="sfscachemode=NEVER" \
    setup_local_empty_saunafs info

cd "${info[mount0]}"

# Get block and chunk sizes.
BLOCK_SIZE=$SAUNAFS_BLOCK_SIZE
CHUNK_SIZE=$SAUNAFS_CHUNK_SIZE

# Create a file large enough to span multiple chunks.
FILE_SIZE=$((CHUNK_SIZE * 2 + BLOCK_SIZE * 2))
truncate -s $FILE_SIZE boundary_test_file

# Write across a block boundary.
BLOCK_END_OFFSET=$((BLOCK_SIZE - 512))
dd if=/dev/urandom of=boundary_test_file bs=1 count=1024 seek=$BLOCK_END_OFFSET conv=notrunc status=none

# Write across a chunk boundary.
CHUNK_END_OFFSET=$((CHUNK_SIZE - 512))
dd if=/dev/urandom of=boundary_test_file bs=1 count=2048 seek=$CHUNK_END_OFFSET conv=notrunc status=none

# Partial write not aligned to block boundaries.
PARTIAL_OFFSET=12345
dd if=/dev/urandom of=boundary_test_file bs=1 count=7890 seek=$PARTIAL_OFFSET conv=notrunc status=none

# Verify file size.
EXPECTED_SIZE=$FILE_SIZE
ACTUAL_SIZE=$(stat -c %s boundary_test_file)

if [ $ACTUAL_SIZE -ne $EXPECTED_SIZE ]; then
    test_add_failure "File size mismatch after cross-boundary writes. Expected $EXPECTED_SIZE, got $ACTUAL_SIZE."
else
    echo "File size is correct after cross-boundary writes."
fi

# Clean up
rm -f boundary_test_file

Test 3: Error Handling and Recovery Tests

Test Name: test_simulated_chunkserver_failure.sh

Purpose:

To test the client's ability to handle chunkserver failures during write operations.


CONTENT of TEST test_simulated_chunkserver_failure.sh

# Test client behavior during chunkserver failures.

CHUNKSERVERS=4 \
    USE_RAMDISK=YES \
    MOUNT_EXTRA_CONFIG="sfscachemode=NEVER" \
    setup_local_empty_saunafs info

cd "${info[mount0]}"

# Create a file and start writing.
FILE_SIZE=50M
FILE_NAME="test_file"
( dd if=/dev/zero of=$FILE_NAME bs=1M count=50 status=none ) &

WRITE_PID=$!

# Allow the write to start.
sleep 2

# Simulate chunkserver failures.
saunafs_chunkserver_daemon 0 stop
saunafs_chunkserver_daemon 1 stop

echo "Simulated chunkserver failures."

# Wait for the write to complete.
wait $WRITE_PID

if [ $? -eq 0 ]; then
    echo "Write operation completed successfully."
else
    test_add_failure "Write operation failed due to chunkserver failure."
fi

# Restart chunkservers.
saunafs_chunkserver_daemon 0 start
saunafs_chunkserver_daemon 1 start

saunafs_wait_for_all_ready_chunkservers

# Verify subsequent writes succeed.
dd if=/dev/zero of=$FILE_NAME bs=1M count=10 seek=50 conv=notrunc status=none

if [ $? -eq 0 ]; then
    echo "Subsequent write operation succeeded."
else
    test_add_failure "Subsequent write operation failed."
fi

# Clean up
rm -f $FILE_NAME

Test 4: Truncate Operation Tests

Test Name: test_concurrent_truncate_and_write.sh

Purpose:

To verify that truncating a file while concurrent writes are happening is handled correctly.


CONTENT of TEST test_concurrent_truncate_and_write.sh

# Test concurrent truncate and write operations.

CHUNKSERVERS=4 \
    USE_RAMDISK=YES \
    MOUNT_EXTRA_CONFIG="sfscachemode=NEVER" \
    setup_local_empty_saunafs info

cd "${info[mount0]}"

# Create initial file.
FILE_NAME="truncate_test_file"
dd if=/dev/urandom of=$FILE_NAME bs=1M count=10 status=none

# Start continuous write in background.
(
    for i in {1..10}; do
        dd if=/dev/urandom of=$FILE_NAME bs=1M count=1 seek=$i conv=notrunc status=none
        sleep 1
    done
) &

WRITE_PID=$!

# Wait and then truncate.
sleep 2
truncate -s 5M $FILE_NAME

# Wait for write to finish.
wait $WRITE_PID

# Verify file size.
EXPECTED_SIZE=$((5 * 1024 * 1024))
ACTUAL_SIZE=$(stat -c %s $FILE_NAME)

if [ $ACTUAL_SIZE -eq $EXPECTED_SIZE ]; then
    echo "File size after truncate is correct."
else
    test_add_failure "File size after truncate is incorrect. Expected $EXPECTED_SIZE, got $ACTUAL_SIZE."
fi

# Clean up
rm -f $FILE_NAME

Test 5: Restore Operation Tests

Test Name: test_mixed_metadata_restore.sh

Purpose:

To test restoring the filesystem from metadata logs containing both old and new LENGTH entries.


CONTENT of TEST test_mixed_metadata_restore.sh

# Test restoring filesystem from mixed metadata logs.

CHUNKSERVERS=4 \
    USE_RAMDISK=YES \
    setup_local_empty_saunafs info

cd "${info[mount0]}"

# Use old write algorithm.
saunafs_mount_unmount 0
echo "sfsuseoldwritealgorithm=1" >> "${info[mount0_cfg]}"
saunafs_mount_start 0

# Generate old LENGTH entries.
touch old_file
dd if=/dev/urandom of=old_file bs=1M count=5 status=none

# Switch to new algorithm.
saunafs_mount_unmount 0
sed -i '/sfsuseoldwritealgorithm=1/d' "${info[mount0_cfg]}"
saunafs_mount_start 0

# Generate new LENGTH entries.
touch new_file
dd if=/dev/urandom of=new_file bs=1M count=5 status=none

# Save metadata.
saunafs_admin_master save-metadata

# Simulate failure.
saunafs_master_daemon stop
rm -f "${info[master_data_path]}/metadata.sfs"

# Restore filesystem.
saunafs_master_daemon start
saunafs_wait_for_all_ready_chunkservers

# Verify files are present.
if [ -f old_file ] && [ -f new_file ]; then
    echo "Files restored successfully."
else
    test_add_failure "Files not restored correctly."
fi

# Clean up
rm -f old_file new_file

Test 6: Algorithm Switching Tests

Test Name: test_write_algorithm_switching.sh

Purpose:

To ensure data integrity when switching between the old and new write algorithms.


CONTENT of TEST test_write_algorithm_switching.sh

# Test switching between old and new write algorithms.

CHUNKSERVERS=4 \
    USE_RAMDISK=YES \
    setup_local_empty_saunafs info

cd "${info[mount0]}"

# New algorithm.
saunafs_mount_unmount 0
sed -i '/sfsuseoldwritealgorithm=1/d' "${info[mount0_cfg]}"
saunafs_mount_start 0

# Write initial data.
FILE_NAME="switch_test_file"
FILE_SIZE=5M
FILE_SIZE=$FILE_SIZE file-generate $FILE_NAME

# Switch to old algorithm.
saunafs_mount_unmount 0
echo "sfsuseoldwritealgorithm=1" >> "${info[mount0_cfg]}"
saunafs_mount_start 0

# Continue writing.
dd if=/dev/urandom of=$FILE_NAME bs=1M count=5 seek=5 conv=notrunc status=none

# Verify data integrity.
if file-validate $FILE_NAME; then
    echo "Data integrity maintained after algorithm switch."
else
    test_add_failure "Data corruption after algorithm switch."
fi

# Clean up
rm -f $FILE_NAME

Test 7: Performance Benchmark Tests

Test Name: test_write_performance_benchmark.sh

Purpose:

To compare write performance between the old and new write algorithms.


CONTENT of TEST test_write_performance_benchmark.sh

# Test write performance between old and new algorithms.

CHUNKSERVERS=4 \
    USE_RAMDISK=YES \
    setup_local_empty_saunafs info

perform_benchmark() {
    local algorithm=$1
    local mount_cfg="${info[mount0_cfg]}"
    local mount_point="${info[mount0]}"

    saunafs_mount_unmount 0

    if [ "$algorithm" = "old" ]; then
        echo "sfsuseoldwritealgorithm=1" >> "$mount_cfg"
    else
        sed -i '/sfsuseoldwritealgorithm=1/d' "$mount_cfg"
    fi

    saunafs_mount_start 0

    cd "$mount_point"

    FILE_NAME="benchmark_${algorithm}.dat"
    FILE_SIZE=100M

    START_TIME=$(date +%s.%N)
    dd if=/dev/zero of=$FILE_NAME bs=1M count=100 status=none
    END_TIME=$(date +%s.%N)
    DURATION=$(echo "$END_TIME - $START_TIME" | bc)

    echo "Write duration with $algorithm algorithm: $DURATION seconds"

    rm -f $FILE_NAME
    cd -
}

perform_benchmark "new"
perform_benchmark "old"

Test 8: Chunk Deletion Logic Tests

Test Name: test_chunk_deletion_behavior.sh

Purpose:

To verify that chunks are deleted or preserved according to the eraseFurtherChunks parameter during truncates.


CONTENT of TEST test_chunk_deletion_behavior.sh

# Test chunk deletion behavior during truncates.

CHUNKSERVERS=4 \
    USE_RAMDISK=YES \
    setup_local_empty_saunafs info

cd "${info[mount0]}"

# Create large file.
FILE_NAME="deletion_test_file"
dd if=/dev/urandom of=$FILE_NAME bs=1M count=100 status=none

# Get initial chunks.
initial_chunks=$(find_all_chunks | sort)

# Truncate with eraseFurtherChunks=true.
TRUNCATE_SIZE=50M
saunafs truncate $FILE_NAME $TRUNCATE_SIZE --eraseFurtherChunks

# Verify chunks deleted.
post_truncate_chunks=$(find_all_chunks | sort)

if [ "$initial_chunks" != "$post_truncate_chunks" ]; then
    echo "Chunks beyond new size deleted."
else
    test_add_failure "Chunks not deleted after truncate with eraseFurtherChunks."
fi

# Truncate with eraseFurtherChunks=false.
TRUNCATE_SIZE2=25M
saunafs truncate $FILE_NAME $TRUNCATE_SIZE2

# Verify chunks preserved.
post_truncate_chunks2=$(find_all_chunks | sort)

if [ "$post_truncate_chunks" = "$post_truncate_chunks2" ]; then
    echo "Chunks beyond new size preserved."
else
    test_add_failure "Chunks deleted after truncate with eraseFurtherChunks=false."
fi

# Clean up
rm -f $FILE_NAME

Test 9: Locking Mechanism Stress Tests

Test Name: test_lock_contention_stress.sh

Purpose:

To stress-test the locking mechanism with high concurrency.


CONTENT of TEST test_lock_contention_stress.sh

# Stress test locking mechanism.

CHUNKSERVERS=4 \
    USE_RAMDISK=YES \
    setup_local_empty_saunafs info

cd "${info[mount0]}"

# Create file.
FILE_NAME="lock_stress_file"
truncate -s 10M $FILE_NAME

# High concurrency writes.
NUM_PROCESSES=100

for i in $(seq 1 $NUM_PROCESSES); do
    (
        OFFSET=$(( (i - 1) * 1024 ))
        dd if=/dev/urandom of=$FILE_NAME bs=1K count=1 seek=$OFFSET conv=notrunc status=none
    ) &
done

wait

echo "All concurrent writes completed."

# Clean up
rm -f $FILE_NAME

Test 10: Data Integrity with Checksums

Test Name: test_data_integrity_with_checksums.sh

Purpose:

To verify data integrity using checksums after write operations.


CONTENT of TEST test_data_integrity_with_checksums.sh

# Test data integrity using checksums.

CHUNKSERVERS=4 \
    USE_RAMDISK=YES \
    setup_local_empty_saunafs info

cd "${info[mount0]}"

# Generate data and compute checksum.
FILE_NAME="checksum_test_file"
dd if=/dev/urandom bs=1M count=10 status=none | tee $FILE_NAME > /dev/null

ORIGINAL_CHECKSUM=$(md5sum $FILE_NAME | awk '{print $1}')

# Read back data.
READ_BACK_FILE="read_back_file"
cp $FILE_NAME $READ_BACK_FILE

READ_BACK_CHECKSUM=$(md5sum $READ_BACK_FILE | awk '{print $1}')

if [ "$ORIGINAL_CHECKSUM" = "$READ_BACK_CHECKSUM" ]; then
    echo "Checksums match. Data integrity verified."
else
    test_add_failure "Checksums do not match."
fi

# Clean up
rm -f $FILE_NAME $READ_BACK_FILE

Test 11: Exception Handling Tests

Test Name: test_exception_handling_in_writes.sh

Purpose:

To ensure that exceptions during writes do not leave locks held or resources unreleased.


CONTENT of TEST test_exception_handling_in_writes.sh

# Test exception handling during writes.

CHUNKSERVERS=4 \
    USE_RAMDISK=YES \
    setup_local_empty_saunafs info

cd "${info[mount0]}"

# Create read-only file.
FILE_NAME="exception_test_file"
touch $FILE_NAME
chmod 444 $FILE_NAME

# Attempt to write.
dd if=/dev/urandom of=$FILE_NAME bs=1M count=1 conv=notrunc status=none

if [ $? -ne 0 ]; then
    echo "Write failed as expected."
else
    test_add_failure "Write succeeded unexpectedly."
fi

# Check locks released.
rm $FILE_NAME

if [ $? -eq 0 ]; then
    echo "File removed successfully. Locks released."
else
    test_add_failure "Failed to remove file. Locks may be held."
fi

Test 12: Simulated Network Partitions

Test Name: test_network_partition_during_writes.sh

Purpose:

To test the client's behavior when network connectivity is lost and restored during writes.


CONTENT of TEST test_network_partition_during_writes.sh

# Test client behavior during network partition.

CHUNKSERVERS=4 \
    USE_RAMDISK=YES \
    setup_local_empty_saunafs info

cd "${info[mount0]}"

# Start write operation.
( dd if=/dev/urandom of=network_test_file bs=1M count=100 status=none ) &

WRITE_PID=$!

# Allow write to start.
sleep 5

# Simulate network partition by stopping master.
saunafs_master_daemon stop

echo "Master stopped to simulate network partition."

# Wait for write to complete.
wait $WRITE_PID

if [ $? -ne 0 ]; then
    echo "Write failed due to network partition."
else
    test_add_failure "Write succeeded unexpectedly during network partition."
fi

# Restart master.
saunafs_master_daemon start
saunafs_wait_for_all_ready_chunkservers

# Retry write.
dd if=/dev/urandom of=network_test_file bs=1M count=10 seek=100 conv=notrunc status=none

if [ $? -eq 0 ]; then
    echo "Write succeeded after restoring network."
else
    test_add_failure "Write failed after restoring network."
fi

# Clean up
rm -f network_test_file

Test 13: Crash Recovery Tests

Test Name: test_client_crash_during_write.sh

Purpose:

To verify filesystem consistency after a client crashes during a write operation.


CONTENT of TEST test_client_crash_during_write.sh

# Test filesystem consistency after client crash.

CHUNKSERVERS=4 \
    USE_RAMDISK=YES \
    MOUNTS=1 \
    setup_local_empty_saunafs info

cd "${info[mount0]}"

# Start write operation.
( dd if=/dev/urandom of=crash_test_file bs=1M count=100 status=none ) &

WRITE_PID=$!

# Allow write to start.
sleep 5

# Simulate client crash.
MOUNT_PID=$(pgrep -f "sfsmount.*${info[mount0]}")
kill -9 $MOUNT_PID

echo "Client process killed to simulate crash."

# Wait for write to complete.
wait $WRITE_PID

# Restart client.
saunafs_mount_start 0
cd "${info[mount0]}"

# Verify file exists.
if [ -f crash_test_file ]; then
    echo "File exists after client restart."
else
    test_add_failure "File missing after client restart."
fi

# Clean up
rm -f crash_test_file

Test 14: Filesystem Consistency Checks

Test Name: test_filesystem_consistency_after_operations.sh

Purpose:

To ensure filesystem consistency after a series of operations.


CONTENT of TEST test_filesystem_consistency_after_operations.sh

# Test filesystem consistency after operations.

CHUNKSERVERS=4 \
    USE_RAMDISK=YES \
    setup_local_empty_saunafs info

cd "${info[mount0]}"

# Perform operations.
mkdir test_dir
cd test_dir

touch file1 file2
dd if=/dev/urandom of=file1 bs=1M count=5 status=none
dd if=/dev/urandom of=file2 bs=1M count=5 status=none

mkdir subdir
mv file2 subdir/

ln -s file1 symlink_to_file1

rm file1

cd ..

# Run filesystem check.
saunafs_admin_master check

if [ $? -eq 0 ]; then
    echo "Filesystem is consistent."
else
    test_add_failure "Filesystem inconsistency detected."
fi

Test 15: Testing Under Memory Pressure

Test Name: test_writes_under_memory_pressure.sh

Purpose:

To ensure the write operations handle low-memory conditions gracefully.


CONTENT of TEST test_writes_under_memory_pressure.sh

# Test writes under memory pressure.

CHUNKSERVERS=4 \
    USE_RAMDISK=YES \
    setup_local_empty_saunafs info

cd "${info[mount0]}"

# Limit memory for the process.
ulimit -v $((50 * 1024))  # 50MB

# Attempt large write.
dd if=/dev/urandom of=memory_test_file bs=1M count=100 status=none

if [ $? -ne 0 ]; then
    echo "Write failed due to memory limit as expected."
else
    test_add_failure "Write succeeded unexpectedly under memory pressure."
fi

# Reset memory limit.
ulimit -v unlimited

# Clean up
rm -f memory_test_file

These tests comprehensively cover the scenarios you provided, ensuring that changes to the write algorithm are thoroughly tested. By simulating various conditions such as concurrency, edge cases, failures, and resource limitations, we can validate the robustness and correctness of the new write algorithm.

If you have any additional tests or specific scenarios you'd like to include, please let me know, and I'll be happy to implement them.

Copy link
Collaborator

@uristdwarf uristdwarf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this part of the code well enough to provide detailed feedback, so my feedback is general. However we lack a lot logging in our code and we should start improving that with new code.

Are there any particular instances where it would be warranted to log a warning or error? You should also consider religiously adding more debug/trace logging to make debugging easier later. (trace logging doesn't really work right now, I will fix it later in another PR I have planned for this)

Also, you should replace the safs_* log functions with safs::* log functions, since the latter is C++ and is much easier to use.

src/mount/writedata.cc Outdated Show resolved Hide resolved
@dmga44 dmga44 changed the title feat(mount): Change write jobs queue approach feat(mount): New chunk-based write algorithm Oct 25, 2024
@dmga44 dmga44 force-pushed the change-write-jobs-queue-approach branch from d2bd754 to 7620ef3 Compare October 28, 2024 13:50
dmga44 and others added 9 commits November 28, 2024 13:49
Versions of the test test_ec_parallel_writing with bigger file sizes
(65M) often fail. The reason behing those failures is the following
sequence of operations:
- some mountpoint starts writing some chunk X (some kilobyte of that
chunk), this triggers the creation of a chunk for that chunkIndex and
successfully finishes the write operation sending to the master an
update in the size of the file.
- other mountpoint (before the other one updates the file size) starts
writing some chunk Y, such that Y < X, i.e Y has a lower index than X.
Successfully finishes those writes, and at the moment of the updating
the file size, sends a size lower than current one by at least the
chunk X.
- in the master side, the update of the size (fsnodes_setlength
function) removes the chunks which are not necessary considering the
current file size, thus remving the chunk X and trashing the previous
writes on it.
- new writes on the X-th chunk allocate a new chunk and successfully
finishes the writing process, but the missing data is irreversible.

Solution is to extend the fsnodes_setlength to accept a
eraseFurtherChunks parameter to determine whether or not those chunks
should be erased. Summarizing:
- the client communication of write end
(SAU_CLTOMA_FUSE_WRITE_CHUNK_END header) will not remove chunks not
necessary given the size sent.
- the client communication of truncate (SAU_CLTOMA_FUSE_TRUNCATE
header) and truncate end (SAU_CLTOMA_FUSE_TRUNCATE_END header) will
continue removing chunks further than the given length.

As a side effect, the LENGTH metadata logs must be changed since the
state of the filesystem can not be recovered without saying whether
further chunks must be deleted or not:
- old metadata will be handled as always removing further chunks.
- new metadata will have 0|1 to represent whether the further chunks
must be deleted.

BREAKING CHANGE: changes the changelog format generated by master
server.
This test is a version of the test_ec_parallel_writing with file size
of 513MB. A test template for both tests were created to simplify
them.
This change targets providing a place to store everything related to
migrations, which is specially important if there are breaking changes
involved. The idea is to fill the summary of the breaking changes and
the scripts for downgrading a specific service with the upcoming
breaking changes. The structure is also helpful for the next versions
upgrade scheme.

Changes:
- added a folder in the top level for the migration tools.
- added a folder inside migrations for the upgrade from v4.X.X to
v5.0.0, and another one inside of it for downgrade tools.
- added Breaking-changes-summary.md.
- added downgrade-master.sh which is the only service that needs a
special downgrade behavior for now.
This commit contains the implementation for changing the approach of the
write jobs queue from inode level to chunk level. List of changes:
- implement ```ChunkData``` struct taking out responsabilities from the
```inodedata``` struct.
- apply most of the change of approach in the functions of the
writedata.cc file (most of the file).
- add ```lockIdForTruncateLocators``` to get the truncate operations
going as usual.
- rename ```InodeChunkWriter``` class to ```ChunkJobWriter```.
- other minor changes up to improve code quality.
Changes:
- reduce the use of the global lock (gMutex) by substituting it by
mutex per inodeData instance.
- in write_data_flush: take the instantiation of zeros vector out of
the lock and add a last minute locking.
- specify for most functions which type of locking is required.
- remove a couple of unused functions in ChunkData.
- define the constants NO_INODEDATA and NO_CHUNKDATA to reduce the use
of inexpresive NULL or nullptr.
- use the using keyword for the Lock type definition.
- update the implementation of the list of chunks per inode: from
C-like list storing the head of the list and next pointers to modern
std::list<ChunkDataPtr>, the ChunkDataPtr is
std::unique_ptr<ChunkData>. This change also enables the use other
standard library functions, such as std::find and and std::find_if.
- similar to the change for storing the chunks per inode, the idhash
variable change from a raw inodedata ** to a std::list<InodeDataPtr>.
- use default member initializers.
- update Linux client default value of write cache size parameter
(actually not changing anything write buffer size) to show the actual
behavior of the client.
- substitute pushing back to the datachain for emplacing back.
- use notify_one when amount of release blocks is 1 in
write_cb_release_blocks.
This option comes from a code review suggestion. It defines the timeout
for the ChunkJobsWriters to wait for all their current direct
chunkserver writes.
This commit also contains the changes to make this feature possible:
- align the write related structures to make them the same for both
algorithms.
- take old write algorithm code and put it into the writedata.cc file.
- wire the option as usual.

kk2
This test checks issues found during development of the new write
approach. The idea is to test the client behavior when many small (1
byte long) random writes are being handled in parallel (10 mounts).
Changes:
- rename old write algorithm to inode based write algorithm and new
write algorithm to chunk based write algorithm.
- update help message of sfsuseinodebasedwritealgorithm option. It was
made also to be explicitly 0|1. Update doc entry accordignly.
- change safs_* functions occurrences for the safs::* equivalent ones.
- rename gWaveTimeout to gWriteWaveTimeout.
- change the throws in write_data_truncate for regular returns. Old
code was using POSIX error codes sometimes and other times SaunaFS
ancestors error codes: the POSIX errors were normally returned and the
other ones were "thrown" up. Current codebase uses for this specific
case only SaunaFS error codes.
- Log the truncateend failures.
@dmga44 dmga44 force-pushed the change-write-jobs-queue-approach branch from 7620ef3 to 77c447c Compare December 3, 2024 01:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants