-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(mount): New chunk-based write algorithm #158
base: dev
Are you sure you want to change the base?
Conversation
05c1e0d
to
d8752b7
Compare
a7ba783
to
a0f55f0
Compare
28b6cda
to
dc68c17
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see my suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I share a partial review because it is still in progress.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review still in progress.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review finished. Great job @dmga44 👍 🔥 🚀
Please, see my suggestions and do not hesitate to ask if something is not clear.
d4ca5d9
to
145cc77
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice and complex change. I am approving it.
Please see the minor comments.
d1f162e
to
2948328
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job @dmga44 👍 🔥 🚀
2948328
to
a8b272e
Compare
a8b272e
to
3f77fad
Compare
02c6328
to
83c7c14
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Please see my suggestions.
83c7c14
to
3f1fb7c
Compare
b561797
to
d2bd754
Compare
ChatGPT o1-preview comments: https://chatgpt.com/share/671ae3dc-4340-800b-a37d-2902aa0c1f05 Certainly! Here’s how we can approach identifying potential issues introduced by these commits and proposing new automated tests to detect and minimize them. Potential Issues Introduced by the Commits:
Proposed Automated Tests to Detect and Minimize Potential Issues: To address these potential issues, we can introduce both unit tests and integration tests. Unit Tests:
Integration Tests:
Additional Steps:
Conclusion: By implementing these unit and integration tests, you can significantly reduce the risk of introducing new issues with the changes. These tests will help ensure that:
Adding these tests to your CI pipeline will help detect issues early in the development process, making it easier to address them before they affect production environments. Let's analyze the tests included in your commits and propose specific test implementations to enhance your CI suite. Analysis of Included Tests:Your commits added and modified the following tests:
Identified Gaps and Potential Issues Not Covered by Included Tests:While these tests effectively assess parallel write operations for different file sizes, they may not cover:
Proposed Implementation of Additional Tests:Below are detailed implementations of tests that address the gaps identified. 1. Concurrency and Locking TestsTest Name:
|
Certainly! Below are the full implementations for each of the 15 tests we discussed earlier. Each test script follows the style and conventions observed in the existing test scripts you've provided, utilizing the available tools and functions. Test 1: Concurrency and Locking TestsTest Name: Purpose: To verify that overlapping writes to the same block are correctly synchronized, ensuring data integrity without deadlocks. CONTENT of TEST # Test concurrent overlapping writes to the same file.
CHUNKSERVERS=4 \
USE_RAMDISK=YES \
MOUNT_EXTRA_CONFIG="sfscachemode=NEVER" \
setup_local_empty_saunafs info
cd "${info[mount0]}"
# Create a file of 1MB size.
dd if=/dev/zero of=concurrent_file bs=1M count=1
# Define number of concurrent writers.
NUM_WRITERS=10
# Function for a writer process.
write_overlap() {
local id=$1
local offset=$2
local length=$3
local pattern=$(printf "%02d" $id)
# Write the pattern to the file at the specified offset.
yes "$pattern" | tr -d '\n' | head -c $length | dd of=concurrent_file bs=1 count=$length seek=$offset conv=notrunc status=none
}
# Define overlapping regions.
# All writers write to overlapping regions starting at offset 512K.
OFFSET=$((512 * 1024))
LENGTH=$((256 * 1024))
# Start concurrent writes.
for i in $(seq 1 $NUM_WRITERS); do
write_overlap $i $((OFFSET)) $((LENGTH)) &
done
wait
# Read back the overlapping region.
dd if=concurrent_file bs=1 skip=$OFFSET count=$LENGTH status=none > read_back_data
# Verify the data.
# Since multiple writes overlapped, the final data should correspond to one of the patterns.
unique_patterns=$(tr -d '\n' < read_back_data | fold -w2 | sort | uniq)
if [ $(echo "$unique_patterns" | wc -l) -eq 1 ]; then
# Only one pattern found, test passes.
echo "Test passed: Overlapping writes resulted in consistent data."
else
test_add_failure "Data corruption detected in overlapping writes."
fi
# Clean up
rm -f concurrent_file read_back_data Test 2: Edge Case Write TestsTest Name: Purpose: To test writes that cross block and chunk boundaries, ensuring they are correctly handled. CONTENT of TEST # Test writes that cross block and chunk boundaries.
CHUNKSERVERS=4 \
USE_RAMDISK=YES \
MOUNT_EXTRA_CONFIG="sfscachemode=NEVER" \
setup_local_empty_saunafs info
cd "${info[mount0]}"
# Get block and chunk sizes.
BLOCK_SIZE=$SAUNAFS_BLOCK_SIZE
CHUNK_SIZE=$SAUNAFS_CHUNK_SIZE
# Create a file large enough to span multiple chunks.
FILE_SIZE=$((CHUNK_SIZE * 2 + BLOCK_SIZE * 2))
truncate -s $FILE_SIZE boundary_test_file
# Write across a block boundary.
BLOCK_END_OFFSET=$((BLOCK_SIZE - 512))
dd if=/dev/urandom of=boundary_test_file bs=1 count=1024 seek=$BLOCK_END_OFFSET conv=notrunc status=none
# Write across a chunk boundary.
CHUNK_END_OFFSET=$((CHUNK_SIZE - 512))
dd if=/dev/urandom of=boundary_test_file bs=1 count=2048 seek=$CHUNK_END_OFFSET conv=notrunc status=none
# Partial write not aligned to block boundaries.
PARTIAL_OFFSET=12345
dd if=/dev/urandom of=boundary_test_file bs=1 count=7890 seek=$PARTIAL_OFFSET conv=notrunc status=none
# Verify file size.
EXPECTED_SIZE=$FILE_SIZE
ACTUAL_SIZE=$(stat -c %s boundary_test_file)
if [ $ACTUAL_SIZE -ne $EXPECTED_SIZE ]; then
test_add_failure "File size mismatch after cross-boundary writes. Expected $EXPECTED_SIZE, got $ACTUAL_SIZE."
else
echo "File size is correct after cross-boundary writes."
fi
# Clean up
rm -f boundary_test_file Test 3: Error Handling and Recovery TestsTest Name: Purpose: To test the client's ability to handle chunkserver failures during write operations. CONTENT of TEST # Test client behavior during chunkserver failures.
CHUNKSERVERS=4 \
USE_RAMDISK=YES \
MOUNT_EXTRA_CONFIG="sfscachemode=NEVER" \
setup_local_empty_saunafs info
cd "${info[mount0]}"
# Create a file and start writing.
FILE_SIZE=50M
FILE_NAME="test_file"
( dd if=/dev/zero of=$FILE_NAME bs=1M count=50 status=none ) &
WRITE_PID=$!
# Allow the write to start.
sleep 2
# Simulate chunkserver failures.
saunafs_chunkserver_daemon 0 stop
saunafs_chunkserver_daemon 1 stop
echo "Simulated chunkserver failures."
# Wait for the write to complete.
wait $WRITE_PID
if [ $? -eq 0 ]; then
echo "Write operation completed successfully."
else
test_add_failure "Write operation failed due to chunkserver failure."
fi
# Restart chunkservers.
saunafs_chunkserver_daemon 0 start
saunafs_chunkserver_daemon 1 start
saunafs_wait_for_all_ready_chunkservers
# Verify subsequent writes succeed.
dd if=/dev/zero of=$FILE_NAME bs=1M count=10 seek=50 conv=notrunc status=none
if [ $? -eq 0 ]; then
echo "Subsequent write operation succeeded."
else
test_add_failure "Subsequent write operation failed."
fi
# Clean up
rm -f $FILE_NAME Test 4: Truncate Operation TestsTest Name: Purpose: To verify that truncating a file while concurrent writes are happening is handled correctly. CONTENT of TEST # Test concurrent truncate and write operations.
CHUNKSERVERS=4 \
USE_RAMDISK=YES \
MOUNT_EXTRA_CONFIG="sfscachemode=NEVER" \
setup_local_empty_saunafs info
cd "${info[mount0]}"
# Create initial file.
FILE_NAME="truncate_test_file"
dd if=/dev/urandom of=$FILE_NAME bs=1M count=10 status=none
# Start continuous write in background.
(
for i in {1..10}; do
dd if=/dev/urandom of=$FILE_NAME bs=1M count=1 seek=$i conv=notrunc status=none
sleep 1
done
) &
WRITE_PID=$!
# Wait and then truncate.
sleep 2
truncate -s 5M $FILE_NAME
# Wait for write to finish.
wait $WRITE_PID
# Verify file size.
EXPECTED_SIZE=$((5 * 1024 * 1024))
ACTUAL_SIZE=$(stat -c %s $FILE_NAME)
if [ $ACTUAL_SIZE -eq $EXPECTED_SIZE ]; then
echo "File size after truncate is correct."
else
test_add_failure "File size after truncate is incorrect. Expected $EXPECTED_SIZE, got $ACTUAL_SIZE."
fi
# Clean up
rm -f $FILE_NAME Test 5: Restore Operation TestsTest Name: Purpose: To test restoring the filesystem from metadata logs containing both old and new LENGTH entries. CONTENT of TEST # Test restoring filesystem from mixed metadata logs.
CHUNKSERVERS=4 \
USE_RAMDISK=YES \
setup_local_empty_saunafs info
cd "${info[mount0]}"
# Use old write algorithm.
saunafs_mount_unmount 0
echo "sfsuseoldwritealgorithm=1" >> "${info[mount0_cfg]}"
saunafs_mount_start 0
# Generate old LENGTH entries.
touch old_file
dd if=/dev/urandom of=old_file bs=1M count=5 status=none
# Switch to new algorithm.
saunafs_mount_unmount 0
sed -i '/sfsuseoldwritealgorithm=1/d' "${info[mount0_cfg]}"
saunafs_mount_start 0
# Generate new LENGTH entries.
touch new_file
dd if=/dev/urandom of=new_file bs=1M count=5 status=none
# Save metadata.
saunafs_admin_master save-metadata
# Simulate failure.
saunafs_master_daemon stop
rm -f "${info[master_data_path]}/metadata.sfs"
# Restore filesystem.
saunafs_master_daemon start
saunafs_wait_for_all_ready_chunkservers
# Verify files are present.
if [ -f old_file ] && [ -f new_file ]; then
echo "Files restored successfully."
else
test_add_failure "Files not restored correctly."
fi
# Clean up
rm -f old_file new_file Test 6: Algorithm Switching TestsTest Name: Purpose: To ensure data integrity when switching between the old and new write algorithms. CONTENT of TEST # Test switching between old and new write algorithms.
CHUNKSERVERS=4 \
USE_RAMDISK=YES \
setup_local_empty_saunafs info
cd "${info[mount0]}"
# New algorithm.
saunafs_mount_unmount 0
sed -i '/sfsuseoldwritealgorithm=1/d' "${info[mount0_cfg]}"
saunafs_mount_start 0
# Write initial data.
FILE_NAME="switch_test_file"
FILE_SIZE=5M
FILE_SIZE=$FILE_SIZE file-generate $FILE_NAME
# Switch to old algorithm.
saunafs_mount_unmount 0
echo "sfsuseoldwritealgorithm=1" >> "${info[mount0_cfg]}"
saunafs_mount_start 0
# Continue writing.
dd if=/dev/urandom of=$FILE_NAME bs=1M count=5 seek=5 conv=notrunc status=none
# Verify data integrity.
if file-validate $FILE_NAME; then
echo "Data integrity maintained after algorithm switch."
else
test_add_failure "Data corruption after algorithm switch."
fi
# Clean up
rm -f $FILE_NAME Test 7: Performance Benchmark TestsTest Name: Purpose: To compare write performance between the old and new write algorithms. CONTENT of TEST # Test write performance between old and new algorithms.
CHUNKSERVERS=4 \
USE_RAMDISK=YES \
setup_local_empty_saunafs info
perform_benchmark() {
local algorithm=$1
local mount_cfg="${info[mount0_cfg]}"
local mount_point="${info[mount0]}"
saunafs_mount_unmount 0
if [ "$algorithm" = "old" ]; then
echo "sfsuseoldwritealgorithm=1" >> "$mount_cfg"
else
sed -i '/sfsuseoldwritealgorithm=1/d' "$mount_cfg"
fi
saunafs_mount_start 0
cd "$mount_point"
FILE_NAME="benchmark_${algorithm}.dat"
FILE_SIZE=100M
START_TIME=$(date +%s.%N)
dd if=/dev/zero of=$FILE_NAME bs=1M count=100 status=none
END_TIME=$(date +%s.%N)
DURATION=$(echo "$END_TIME - $START_TIME" | bc)
echo "Write duration with $algorithm algorithm: $DURATION seconds"
rm -f $FILE_NAME
cd -
}
perform_benchmark "new"
perform_benchmark "old" Test 8: Chunk Deletion Logic TestsTest Name: Purpose: To verify that chunks are deleted or preserved according to the CONTENT of TEST # Test chunk deletion behavior during truncates.
CHUNKSERVERS=4 \
USE_RAMDISK=YES \
setup_local_empty_saunafs info
cd "${info[mount0]}"
# Create large file.
FILE_NAME="deletion_test_file"
dd if=/dev/urandom of=$FILE_NAME bs=1M count=100 status=none
# Get initial chunks.
initial_chunks=$(find_all_chunks | sort)
# Truncate with eraseFurtherChunks=true.
TRUNCATE_SIZE=50M
saunafs truncate $FILE_NAME $TRUNCATE_SIZE --eraseFurtherChunks
# Verify chunks deleted.
post_truncate_chunks=$(find_all_chunks | sort)
if [ "$initial_chunks" != "$post_truncate_chunks" ]; then
echo "Chunks beyond new size deleted."
else
test_add_failure "Chunks not deleted after truncate with eraseFurtherChunks."
fi
# Truncate with eraseFurtherChunks=false.
TRUNCATE_SIZE2=25M
saunafs truncate $FILE_NAME $TRUNCATE_SIZE2
# Verify chunks preserved.
post_truncate_chunks2=$(find_all_chunks | sort)
if [ "$post_truncate_chunks" = "$post_truncate_chunks2" ]; then
echo "Chunks beyond new size preserved."
else
test_add_failure "Chunks deleted after truncate with eraseFurtherChunks=false."
fi
# Clean up
rm -f $FILE_NAME Test 9: Locking Mechanism Stress TestsTest Name: Purpose: To stress-test the locking mechanism with high concurrency. CONTENT of TEST # Stress test locking mechanism.
CHUNKSERVERS=4 \
USE_RAMDISK=YES \
setup_local_empty_saunafs info
cd "${info[mount0]}"
# Create file.
FILE_NAME="lock_stress_file"
truncate -s 10M $FILE_NAME
# High concurrency writes.
NUM_PROCESSES=100
for i in $(seq 1 $NUM_PROCESSES); do
(
OFFSET=$(( (i - 1) * 1024 ))
dd if=/dev/urandom of=$FILE_NAME bs=1K count=1 seek=$OFFSET conv=notrunc status=none
) &
done
wait
echo "All concurrent writes completed."
# Clean up
rm -f $FILE_NAME Test 10: Data Integrity with ChecksumsTest Name: Purpose: To verify data integrity using checksums after write operations. CONTENT of TEST # Test data integrity using checksums.
CHUNKSERVERS=4 \
USE_RAMDISK=YES \
setup_local_empty_saunafs info
cd "${info[mount0]}"
# Generate data and compute checksum.
FILE_NAME="checksum_test_file"
dd if=/dev/urandom bs=1M count=10 status=none | tee $FILE_NAME > /dev/null
ORIGINAL_CHECKSUM=$(md5sum $FILE_NAME | awk '{print $1}')
# Read back data.
READ_BACK_FILE="read_back_file"
cp $FILE_NAME $READ_BACK_FILE
READ_BACK_CHECKSUM=$(md5sum $READ_BACK_FILE | awk '{print $1}')
if [ "$ORIGINAL_CHECKSUM" = "$READ_BACK_CHECKSUM" ]; then
echo "Checksums match. Data integrity verified."
else
test_add_failure "Checksums do not match."
fi
# Clean up
rm -f $FILE_NAME $READ_BACK_FILE Test 11: Exception Handling TestsTest Name: Purpose: To ensure that exceptions during writes do not leave locks held or resources unreleased. CONTENT of TEST # Test exception handling during writes.
CHUNKSERVERS=4 \
USE_RAMDISK=YES \
setup_local_empty_saunafs info
cd "${info[mount0]}"
# Create read-only file.
FILE_NAME="exception_test_file"
touch $FILE_NAME
chmod 444 $FILE_NAME
# Attempt to write.
dd if=/dev/urandom of=$FILE_NAME bs=1M count=1 conv=notrunc status=none
if [ $? -ne 0 ]; then
echo "Write failed as expected."
else
test_add_failure "Write succeeded unexpectedly."
fi
# Check locks released.
rm $FILE_NAME
if [ $? -eq 0 ]; then
echo "File removed successfully. Locks released."
else
test_add_failure "Failed to remove file. Locks may be held."
fi Test 12: Simulated Network PartitionsTest Name: Purpose: To test the client's behavior when network connectivity is lost and restored during writes. CONTENT of TEST # Test client behavior during network partition.
CHUNKSERVERS=4 \
USE_RAMDISK=YES \
setup_local_empty_saunafs info
cd "${info[mount0]}"
# Start write operation.
( dd if=/dev/urandom of=network_test_file bs=1M count=100 status=none ) &
WRITE_PID=$!
# Allow write to start.
sleep 5
# Simulate network partition by stopping master.
saunafs_master_daemon stop
echo "Master stopped to simulate network partition."
# Wait for write to complete.
wait $WRITE_PID
if [ $? -ne 0 ]; then
echo "Write failed due to network partition."
else
test_add_failure "Write succeeded unexpectedly during network partition."
fi
# Restart master.
saunafs_master_daemon start
saunafs_wait_for_all_ready_chunkservers
# Retry write.
dd if=/dev/urandom of=network_test_file bs=1M count=10 seek=100 conv=notrunc status=none
if [ $? -eq 0 ]; then
echo "Write succeeded after restoring network."
else
test_add_failure "Write failed after restoring network."
fi
# Clean up
rm -f network_test_file Test 13: Crash Recovery TestsTest Name: Purpose: To verify filesystem consistency after a client crashes during a write operation. CONTENT of TEST # Test filesystem consistency after client crash.
CHUNKSERVERS=4 \
USE_RAMDISK=YES \
MOUNTS=1 \
setup_local_empty_saunafs info
cd "${info[mount0]}"
# Start write operation.
( dd if=/dev/urandom of=crash_test_file bs=1M count=100 status=none ) &
WRITE_PID=$!
# Allow write to start.
sleep 5
# Simulate client crash.
MOUNT_PID=$(pgrep -f "sfsmount.*${info[mount0]}")
kill -9 $MOUNT_PID
echo "Client process killed to simulate crash."
# Wait for write to complete.
wait $WRITE_PID
# Restart client.
saunafs_mount_start 0
cd "${info[mount0]}"
# Verify file exists.
if [ -f crash_test_file ]; then
echo "File exists after client restart."
else
test_add_failure "File missing after client restart."
fi
# Clean up
rm -f crash_test_file Test 14: Filesystem Consistency ChecksTest Name: Purpose: To ensure filesystem consistency after a series of operations. CONTENT of TEST # Test filesystem consistency after operations.
CHUNKSERVERS=4 \
USE_RAMDISK=YES \
setup_local_empty_saunafs info
cd "${info[mount0]}"
# Perform operations.
mkdir test_dir
cd test_dir
touch file1 file2
dd if=/dev/urandom of=file1 bs=1M count=5 status=none
dd if=/dev/urandom of=file2 bs=1M count=5 status=none
mkdir subdir
mv file2 subdir/
ln -s file1 symlink_to_file1
rm file1
cd ..
# Run filesystem check.
saunafs_admin_master check
if [ $? -eq 0 ]; then
echo "Filesystem is consistent."
else
test_add_failure "Filesystem inconsistency detected."
fi Test 15: Testing Under Memory PressureTest Name: Purpose: To ensure the write operations handle low-memory conditions gracefully. CONTENT of TEST # Test writes under memory pressure.
CHUNKSERVERS=4 \
USE_RAMDISK=YES \
setup_local_empty_saunafs info
cd "${info[mount0]}"
# Limit memory for the process.
ulimit -v $((50 * 1024)) # 50MB
# Attempt large write.
dd if=/dev/urandom of=memory_test_file bs=1M count=100 status=none
if [ $? -ne 0 ]; then
echo "Write failed due to memory limit as expected."
else
test_add_failure "Write succeeded unexpectedly under memory pressure."
fi
# Reset memory limit.
ulimit -v unlimited
# Clean up
rm -f memory_test_file These tests comprehensively cover the scenarios you provided, ensuring that changes to the write algorithm are thoroughly tested. By simulating various conditions such as concurrency, edge cases, failures, and resource limitations, we can validate the robustness and correctness of the new write algorithm. If you have any additional tests or specific scenarios you'd like to include, please let me know, and I'll be happy to implement them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this part of the code well enough to provide detailed feedback, so my feedback is general. However we lack a lot logging in our code and we should start improving that with new code.
Are there any particular instances where it would be warranted to log a warning or error? You should also consider religiously adding more debug/trace logging to make debugging easier later. (trace logging doesn't really work right now, I will fix it later in another PR I have planned for this)
Also, you should replace the safs_* log functions with safs::* log functions, since the latter is C++ and is much easier to use.
d2bd754
to
7620ef3
Compare
Versions of the test test_ec_parallel_writing with bigger file sizes (65M) often fail. The reason behing those failures is the following sequence of operations: - some mountpoint starts writing some chunk X (some kilobyte of that chunk), this triggers the creation of a chunk for that chunkIndex and successfully finishes the write operation sending to the master an update in the size of the file. - other mountpoint (before the other one updates the file size) starts writing some chunk Y, such that Y < X, i.e Y has a lower index than X. Successfully finishes those writes, and at the moment of the updating the file size, sends a size lower than current one by at least the chunk X. - in the master side, the update of the size (fsnodes_setlength function) removes the chunks which are not necessary considering the current file size, thus remving the chunk X and trashing the previous writes on it. - new writes on the X-th chunk allocate a new chunk and successfully finishes the writing process, but the missing data is irreversible. Solution is to extend the fsnodes_setlength to accept a eraseFurtherChunks parameter to determine whether or not those chunks should be erased. Summarizing: - the client communication of write end (SAU_CLTOMA_FUSE_WRITE_CHUNK_END header) will not remove chunks not necessary given the size sent. - the client communication of truncate (SAU_CLTOMA_FUSE_TRUNCATE header) and truncate end (SAU_CLTOMA_FUSE_TRUNCATE_END header) will continue removing chunks further than the given length. As a side effect, the LENGTH metadata logs must be changed since the state of the filesystem can not be recovered without saying whether further chunks must be deleted or not: - old metadata will be handled as always removing further chunks. - new metadata will have 0|1 to represent whether the further chunks must be deleted. BREAKING CHANGE: changes the changelog format generated by master server.
This test is a version of the test_ec_parallel_writing with file size of 513MB. A test template for both tests were created to simplify them.
This change targets providing a place to store everything related to migrations, which is specially important if there are breaking changes involved. The idea is to fill the summary of the breaking changes and the scripts for downgrading a specific service with the upcoming breaking changes. The structure is also helpful for the next versions upgrade scheme. Changes: - added a folder in the top level for the migration tools. - added a folder inside migrations for the upgrade from v4.X.X to v5.0.0, and another one inside of it for downgrade tools. - added Breaking-changes-summary.md. - added downgrade-master.sh which is the only service that needs a special downgrade behavior for now.
This commit contains the implementation for changing the approach of the write jobs queue from inode level to chunk level. List of changes: - implement ```ChunkData``` struct taking out responsabilities from the ```inodedata``` struct. - apply most of the change of approach in the functions of the writedata.cc file (most of the file). - add ```lockIdForTruncateLocators``` to get the truncate operations going as usual. - rename ```InodeChunkWriter``` class to ```ChunkJobWriter```. - other minor changes up to improve code quality.
Changes: - reduce the use of the global lock (gMutex) by substituting it by mutex per inodeData instance. - in write_data_flush: take the instantiation of zeros vector out of the lock and add a last minute locking. - specify for most functions which type of locking is required. - remove a couple of unused functions in ChunkData. - define the constants NO_INODEDATA and NO_CHUNKDATA to reduce the use of inexpresive NULL or nullptr. - use the using keyword for the Lock type definition. - update the implementation of the list of chunks per inode: from C-like list storing the head of the list and next pointers to modern std::list<ChunkDataPtr>, the ChunkDataPtr is std::unique_ptr<ChunkData>. This change also enables the use other standard library functions, such as std::find and and std::find_if. - similar to the change for storing the chunks per inode, the idhash variable change from a raw inodedata ** to a std::list<InodeDataPtr>. - use default member initializers. - update Linux client default value of write cache size parameter (actually not changing anything write buffer size) to show the actual behavior of the client. - substitute pushing back to the datachain for emplacing back. - use notify_one when amount of release blocks is 1 in write_cb_release_blocks.
This option comes from a code review suggestion. It defines the timeout for the ChunkJobsWriters to wait for all their current direct chunkserver writes.
This commit also contains the changes to make this feature possible: - align the write related structures to make them the same for both algorithms. - take old write algorithm code and put it into the writedata.cc file. - wire the option as usual. kk2
This test checks issues found during development of the new write approach. The idea is to test the client behavior when many small (1 byte long) random writes are being handled in parallel (10 mounts).
Changes: - rename old write algorithm to inode based write algorithm and new write algorithm to chunk based write algorithm. - update help message of sfsuseinodebasedwritealgorithm option. It was made also to be explicitly 0|1. Update doc entry accordignly. - change safs_* functions occurrences for the safs::* equivalent ones. - rename gWaveTimeout to gWriteWaveTimeout. - change the throws in write_data_truncate for regular returns. Old code was using POSIX error codes sometimes and other times SaunaFS ancestors error codes: the POSIX errors were normally returned and the other ones were "thrown" up. Current codebase uses for this specific case only SaunaFS error codes. - Log the truncateend failures.
7620ef3
to
77c447c
Compare
[Must be merged after PR #222]
This PR targets the following issues:
It also adds a new parameter for tuning the write wave timeout (
sfschunkserverwavewriteto
). Another one, for chosing which write algorithm to use, was also added (sfsuseoldwritealgorithm
).The changes performed are following:
ChunkData
struct to represent and gather the data of the pending write operation on a chunk. This struct contains many fields from the inodedata, since it is taking over such responsabilities. The use of this new structure is the main reason of the wide changes in the writedata.cc file.inodedata
one, it is aChunkData
one.WriteChunkLocator
andChunkWriter
).A test on very small parallel random writings was added. I've also checked that there are no new data races conditions added by this changes.