Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fadvise:AUTO_RANDOM mode #1243

Open
wants to merge 5 commits into
base: branch-2.2.x
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions gcs/CHANGES.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# Release Notes

## Next
1. Add AUTO_RANDOM as new fadvise mode.

## 2.2.25 - 2024-08-01
1. PR #1227 - Avoid registering subscriber class multiple times
Expand Down
16 changes: 16 additions & 0 deletions gcs/CONFIGURATION.md
Original file line number Diff line number Diff line change
Expand Up @@ -503,6 +503,22 @@ permissions (not authorized) to execute these requests.
streaming requests as soon as first backward read or forward read for
more than `fs.gs.inputstream.inplace.seek.limit` bytes was detected.

* `AUTO_RANDOM` - It is complementing `AUTO` mode which uses sequential
mode to start with and adapts to bounded range requests. `AUTO_RANDOM`
mode uses bounded channel initially and adapts to sequential requests if
consecutive requests are within `fs.gs.inputstream.min.range.request.size`.
gzip-encode object will bypass this adoption, it will always be a
streaming(unbounded) channel. This helps in cases where egress limits is
getting breached for customer because `AUTO` mode will always lead to
one unbounded channel for a file. `AUTO_RANDOM` will avoid such unwanted
unbounded channels.

* `fs.gs.fadvise.request.track.count` (default: `3`)
singhravidutt marked this conversation as resolved.
Show resolved Hide resolved
singhravidutt marked this conversation as resolved.
Show resolved Hide resolved

Self adaptive fadvise mode uses distance between the served requests to
decide the access pattern. This property controls how many such requests
need to be tracked.

* `fs.gs.inputstream.inplace.seek.limit` (default: `8388608`)

If forward seeks are within this many bytes of the current position, seeks
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -213,6 +213,13 @@ public class GoogleHadoopFileSystemConfiguration {
public static final HadoopConfigurationProperty<Integer> GCS_BATCH_THREADS =
new HadoopConfigurationProperty<>("fs.gs.batch.threads", 15);

/**
* Configuration key for number of request to track for adapting the access pattern i.e. fadvise:
* AUTO & AUTO_RANDOM.
*/
public static final HadoopConfigurationProperty<Integer> GCS_FADVISE_REQUEST_TRACK_COUNT =
new HadoopConfigurationProperty<>("fs.gs.fadvise.request.track.count", 3);

/**
* Configuration key for enabling the use of Rewrite requests for copy operations. Rewrite request
* has the same effect as Copy request, but it can handle moving large objects that may
Expand Down Expand Up @@ -673,6 +680,8 @@ private static GoogleCloudStorageReadOptions getReadChannelOptions(Configuration
.setTraceLogTimeThreshold(GCS_TRACE_LOG_TIME_THRESHOLD_MS.get(config, config::getLong))
.setTraceLogExcludeProperties(
ImmutableSet.copyOf(GCS_TRACE_LOG_EXCLUDE_PROPERTIES.getStringCollection(config)))
.setBlockSize(BLOCK_SIZE.get(config, config::getLong))
.setFadviseRequestTrackCount(GCS_FADVISE_REQUEST_TRACK_COUNT.get(config, config::getInt))
.build();
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -154,6 +154,7 @@ public class GoogleHadoopFileSystemConfigurationTest {
"fs.gs.write.parallel.composite.upload.part.file.cleanup.type",
PartFileCleanupType.ALWAYS);
put("fs.gs.write.parallel.composite.upload.part.file.name.prefix", "");
put("fs.gs.fadvise.request.track.count", 3);
}
};

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,181 @@
/*
* Copyright 2024 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package com.google.cloud.hadoop.gcsio;

import com.google.cloud.hadoop.gcsio.GoogleCloudStorageReadOptions.Fadvise;
import com.google.common.flogger.GoogleLogger;
import java.io.Closeable;
import java.io.IOException;
import java.util.LinkedList;
import java.util.ListIterator;

class AdaptiveFileAccessPattern implements Closeable {
singhravidutt marked this conversation as resolved.
Show resolved Hide resolved

private static final GoogleLogger logger = GoogleLogger.forEnclosingClass();
private final StorageResourceId resourceId;
private final GoogleCloudStorageReadOptions readOptions;
private boolean isPatternOverriden = false;
private boolean randomAccess;
private long lastServedIndex = -1;
// Keeps track of distance between consecutive requests
private BoundedList<Long> consecutiveRequestsDistances;

@Override
public void close() throws IOException {
singhravidutt marked this conversation as resolved.
Show resolved Hide resolved
if (consecutiveRequestsDistances != null) {
singhravidutt marked this conversation as resolved.
Show resolved Hide resolved
consecutiveRequestsDistances = null;

Check warning on line 40 in gcsio/src/main/java/com/google/cloud/hadoop/gcsio/AdaptiveFileAccessPattern.java

View check run for this annotation

Codecov / codecov/patch

gcsio/src/main/java/com/google/cloud/hadoop/gcsio/AdaptiveFileAccessPattern.java#L40

Added line #L40 was not covered by tests
}
}

class BoundedList<E> extends LinkedList<E> {
singhravidutt marked this conversation as resolved.
Show resolved Hide resolved
private int limit;

public BoundedList(int limit) {
this.limit = limit;
}

@Override
public boolean add(E o) {
super.add(o);
while (size() > limit) {
super.removeFirst();

Check warning on line 55 in gcsio/src/main/java/com/google/cloud/hadoop/gcsio/AdaptiveFileAccessPattern.java

View check run for this annotation

Codecov / codecov/patch

gcsio/src/main/java/com/google/cloud/hadoop/gcsio/AdaptiveFileAccessPattern.java#L55

Added line #L55 was not covered by tests
}
return true;
}
}

public AdaptiveFileAccessPattern(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this class is not a "Pattern", but rather a "PatternTracker". Why not name this accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't track the pattern of access but decide on pattern which is derived by tracking the requests. I agrees that it's also not a "Pattern". Will change it to AccessPatternManager or something on those lines.

singhravidutt marked this conversation as resolved.
Show resolved Hide resolved
StorageResourceId resourceId, GoogleCloudStorageReadOptions readOptions) {
this.resourceId = resourceId;
this.readOptions = readOptions;
this.randomAccess =
readOptions.getFadvise() == Fadvise.AUTO_RANDOM
|| readOptions.getFadvise() == Fadvise.RANDOM;
if (readOptions.getFadvise() == Fadvise.AUTO_RANDOM) {
consecutiveRequestsDistances = new BoundedList<>(readOptions.getFadviseRequestTrackCount());
}
}

public void updateLastServedIndex(long position) {
this.lastServedIndex = position;
}

public boolean isRandomAccessPattern() {
return randomAccess;
}

public void updateAccessPattern(long currentPosition) {
if (isPatternOverriden) {
logger.atFiner().log(
"Will bypass computing access pattern as it's overriden for resource :%s", resourceId);
return;
}
if (readOptions.getFadvise() == Fadvise.AUTO_RANDOM) {
if (isSequentialAccessPattern(currentPosition)) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will it only switch from sequential -> random once and not go back to sequential? Is this intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it is as per the request as mentioned in the document AUTO_RANDOM starts assuming "RANDOM" file access and adapts to "SEQUENTIAL" if read pattern is so. Once it's adapted it will not flip again.

unsetRandomAccess();
}
} else if (readOptions.getFadvise() == Fadvise.AUTO) {
if (isRandomAccessPattern(currentPosition)) {
setRandomAccess();
}
}
}

/**
* This provides a way to override the access pattern, once overridden it will not be recomputed
* for adaptive fadvise types.
*
* @param pattern, true, to override with random access else false
*/
public void overrideAccessPattern(boolean pattern) {
singhravidutt marked this conversation as resolved.
Show resolved Hide resolved
this.isPatternOverriden = true;
this.randomAccess = pattern;
logger.atInfo().log(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this flood the logs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we are using it only for the cases where it's not feasible to use configured fadvise for example gzip encoded files, no matter what fadvise is configured it needs to be read in a sequential manner.

"Overriding the random access pattern to %s for fadvise:%s for resource: %s ",
pattern, readOptions.getFadvise(), resourceId);
}

private boolean isSequentialAccessPattern(long currentPosition) {
if (lastServedIndex != -1 && consecutiveRequestsDistances != null) {
consecutiveRequestsDistances.add(currentPosition - lastServedIndex);
}

if (!shouldDetectSequentialAccess()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not have this as the first check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will help populate the distance accurately. I do agree that it will not be used if shouldDetectSequentialAccess returns true. I find it to be helpful in maintaining the correct state.

return false;

Check warning on line 118 in gcsio/src/main/java/com/google/cloud/hadoop/gcsio/AdaptiveFileAccessPattern.java

View check run for this annotation

Codecov / codecov/patch

gcsio/src/main/java/com/google/cloud/hadoop/gcsio/AdaptiveFileAccessPattern.java#L118

Added line #L118 was not covered by tests
}

if (consecutiveRequestsDistances.size() < readOptions.getFadviseRequestTrackCount()) {
return false;
}

ListIterator<Long> iterator = consecutiveRequestsDistances.listIterator();
while (iterator.hasNext()) {
Long distance = iterator.next();
if (distance < 0 || distance > readOptions.DEFAULT_INPLACE_SEEK_LIMIT) {
return false;

Check warning on line 129 in gcsio/src/main/java/com/google/cloud/hadoop/gcsio/AdaptiveFileAccessPattern.java

View check run for this annotation

Codecov / codecov/patch

gcsio/src/main/java/com/google/cloud/hadoop/gcsio/AdaptiveFileAccessPattern.java#L129

Added line #L129 was not covered by tests
}
}
logger.atInfo().log(
singhravidutt marked this conversation as resolved.
Show resolved Hide resolved
"Detected %d consecutive read request within distance threshold %d with fadvise: %s switching to sequential IO for '%s'",
consecutiveRequestsDistances.size(),
readOptions.getInplaceSeekLimit(),
readOptions.getFadvise(),
resourceId);
return true;
}

private boolean isRandomAccessPattern(long currentPosition) {
if (!shouldDetectRandomAccess()) {
return false;
}
if (lastServedIndex == -1) {
return false;
}

if (currentPosition < lastServedIndex) {
logger.atFine().log(
"Detected backward read from %s to %s position, switching to random IO for '%s'",
lastServedIndex, currentPosition, resourceId);
return true;
}
if (lastServedIndex >= 0
&& lastServedIndex + readOptions.getInplaceSeekLimit() < currentPosition) {
logger.atFine().log(
"Detected forward read from %s to %s position over %s threshold,"
+ " switching to random IO for '%s'",
lastServedIndex, currentPosition, readOptions.getInplaceSeekLimit(), resourceId);
return true;
}
return false;

Check warning on line 163 in gcsio/src/main/java/com/google/cloud/hadoop/gcsio/AdaptiveFileAccessPattern.java

View check run for this annotation

Codecov / codecov/patch

gcsio/src/main/java/com/google/cloud/hadoop/gcsio/AdaptiveFileAccessPattern.java#L163

Added line #L163 was not covered by tests
}

private boolean shouldDetectSequentialAccess() {
return randomAccess && readOptions.getFadvise() == Fadvise.AUTO_RANDOM;
}

private boolean shouldDetectRandomAccess() {
return !randomAccess && readOptions.getFadvise() == Fadvise.AUTO;
}

private void setRandomAccess() {
randomAccess = true;
}

private void unsetRandomAccess() {
randomAccess = false;
}
}
Loading