Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhaul FASTER page and segment size parameters #230

Merged
merged 3 commits into from
Mar 9, 2023

Conversation

sebastianburckhardt
Copy link
Member

@sebastianburckhardt sebastianburckhardt commented Mar 7, 2023

The following FASTER tuning parameters are immutable, i.e. must not be changed after a task hub is created (or else the storage provider cannot start any more):

  • hybrid log page size
  • hybrid log segment size
  • event log page size
  • event log segment size

However, sometimes we do need to change these parameters. Most recently (#229), we found that object logs are not collected aggressively enough after compaction, because they hybrid log segment size is much too large. Therefore, we want to change this default.

In this PR, we

  • record the above sizes in the taskhubparameters.json file of the task hub, so that we can restore them when loading a task hub, which guarantees that parameters always match what is in storage even if the defaults change or the user explicitly changes them.
  • change the default hybrid log page size to 512k to allow more aggressive collection of the object logs after compaction.

For compatibility, if loading a task hub that has no sizes recorded, we assume it uses the default sizes (as of the time of this PR).

Copy link
Member

@davidmrdavid davidmrdavid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments!

Comment on lines +103 to +105
LogCommitManager = this.UseLocalFiles
? null // TODO: fix this: new LocalLogCommitManager($"{this.LocalDirectoryPath}\\{this.PartitionFolderName}\\{CommitBlobName}")
: (ILogCommitManager)this,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please elaborate on this TODO comment? What needs fixing exactly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, this was just a copy-paste (so it has nothing to do with this PR). The TODO here refers to the fact that at some point Netherite supported using the file system (instead of Azure Storage) to store task hubs. But it is not an official feature and commented out at the moment. There is some scenario in which we bring this back on K8s, therefore I did not want to just delete it outright.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah gotcha. That makes sense.
Might be helpful just to add a blurb to that comment that this local-files support is for K8s but not advertised today.

Comment on lines 159 to 164
public static (int pageSizeBits, int segmentSizeBits) GetImmutableStoreLogParameters(bool useSeparatePageBlobStorage, FasterTuningParameters tuningParameters)
{
int pageSizeBits = tuningParameters?.StoreLogPageSizeBits ?? 10; // 1kB
int segmentSizeBits = tuningParameters?.StoreLogSegmentSizeBits ?? 19; // 512 kB

return (pageSizeBits, segmentSizeBits);
Copy link
Member

@davidmrdavid davidmrdavid Mar 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears the first parameter is not being used here, or is it? Can we remove it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove it. It was used in a previous version.

@@ -157,10 +174,21 @@ public class FasterTuningParameters

public static string GetStorageFormat(NetheriteOrchestrationServiceSettings settings)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed this doesn't seem to be called in this PR. Is it called elsewhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is called when a new taskhub is created in IStorageLayer.CreateTaskhubIfNotExistsAsync. At that point, all the important parameters for this task hub storage format are recorded inside taskhubparameters.json.

@@ -201,6 +232,28 @@ public static void CheckStorageFormat(string format, NetheriteOrchestrationServi
{
throw new NetheriteConfigurationException($"The current storage format version (={StorageFormatVersion.Last()}) is incompatible with the existing taskhub (={taskhubFormat.FormatVersion}).");
}

// read the immutable log parameters from storage, and keep them as a tuning parameter
// (which may override any tuning parameters set by the user; that is what we want since they cannot be honored)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we throw a warning if we detect this conflict: that the new parameters cannot be honored?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good idea. I think we used to have a warning like that for the partition count (which is also an immutable parameter) but I can't find it anymore in the code. It should be there also.

Comment on lines 190 to 221
public static void CheckStorageFormat(string format, NetheriteOrchestrationServiceSettings settings)
public static void CheckAndLoadStorageFormat(string format, NetheriteOrchestrationServiceSettings settings)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nitpick: at first glance, I find it a bit unintuitive to first check and then load the config here. The opposite order makes more sense to me, as it allows us to validate the config we just loaded. I realize this current ordering works because we don't really throw any validation errors for the config we load.

I'll make a soft suggestion to rename this to LoadAndCheckStorageFormat and check the order of operations to match that new name, which I think is more future-proof. But feel free to disregard this comment as it's mostly a stylistic nit in the present.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reworked this a bit to make it clearer.

Copy link
Member

@davidmrdavid davidmrdavid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for these changes

@sebastianburckhardt sebastianburckhardt merged commit 1bc736b into dev Mar 9, 2023
@davidmrdavid davidmrdavid deleted the pr/faster-page-and-segment-sizes branch March 9, 2023 22:32
@davidmrdavid davidmrdavid restored the pr/faster-page-and-segment-sizes branch March 9, 2023 22:32
@sebastianburckhardt sebastianburckhardt deleted the pr/faster-page-and-segment-sizes branch March 10, 2023 15:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants