Blob Storage Continually Growing using Netherite #229

UMCPGrad · 2023-03-01T21:18:07Z

Our team is currently planning on switching to Netherite from the Azure Storage backend provider. In testing, we see the improved latency, however we noticed that Blob Storage appears to continually grow over time. We had a similar issue with the Azure Storage backend provider, and were hoping that we would not have to deal with clearing out blob storage. Should we be running the Purge API daily to overcome this? Here is a screen shot of about 24 hours worth of testing in our test environment.

sebastianburckhardt · 2023-03-02T20:16:36Z

Indeed. By default, an orchestration instance remains in storage indefinitely, until explicitly purged using the purge API). This behavior is the same across all storage providers.

Supporting a more convenient auto-purge functionality is high on our priority list, but I cannot give you an ETA yet.

In the meantime, I would suggest to use a periodic timer function that purges completed orchestrations. Below are three examples (using a 10 minute, 1 hour, or 1 day interval) of how to run such a function. Which one of the three snippets you pick depends on how long you want to keep completed orchestrations in storage.

        [FunctionName("PurgeEveryTenMinutes")]
        public static Task PurgeEveryTenMinutes(
            [DurableClient] IDurableOrchestrationClient client,
            [TimerTrigger("0 */10 * * * *")] TimerInfo myTimer)
        {
            // purge all orchestration instances that started at least 10 minutes ago
            // and are now in "Completed" state
            return client.PurgeInstanceHistoryAsync(
                DateTime.MinValue,
                DateTime.UtcNow.AddMinutes(-10),
                new List<OrchestrationStatus>
                {
                    OrchestrationStatus.Completed
                });
        }

        [FunctionName("PurgeEveryHour")]
        public static Task PurgeEveryHour(
             [DurableClient] IDurableOrchestrationClient client,
             [TimerTrigger("0 0 * * * *")] TimerInfo myTimer)
        {
            // purge all orchestration instances that started at least 1 hour ago
            // and are now in "Completed" state
            return client.PurgeInstanceHistoryAsync(
                DateTime.MinValue,
                DateTime.UtcNow.AddHours(-1),
                new List<OrchestrationStatus>
                {
                OrchestrationStatus.Completed
                });
        }

        [FunctionName("PurgeEveryDay")]
        public static Task PurgeEveryDay(
            [DurableClient] IDurableOrchestrationClient client,
            [TimerTrigger("0 0 0 * * *")] TimerInfo myTimer)
        {
            // purge all orchestration instances that started at least 1 day ago
            // and are now in "Completed" state
            return client.PurgeInstanceHistoryAsync(
                DateTime.MinValue,
                DateTime.UtcNow.AddDays(-1),
                new List<OrchestrationStatus>
                {
                    OrchestrationStatus.Completed
                });
        }

(by the way, in these examples, I use the same time period for how often the purge runs, and the minimum age of an orchestration before it is purged. But you can of course use different values).

sebastianburckhardt · 2023-03-04T00:05:43Z

The purge timer functions are now working as intended, but reported storage capacity is still much higher than expected.

I have had a detailed look at the collected telemetry (thank you @UMCPGrad for giving me the details) and was able to identify two more issues that cause larger than expected storage capacity billing:

Soft delete needs to be disabled for the storage account. Apparently, new storage accounts by default have this feature turned on. It is very important to turn this off when using Netherite. This is because with soft delete enabled, the capacity of each deleted blob continues to get billed for 7 more days. Because Netherite frequently writes and deletes blobs at high volume (a characteristic of the blob-backed log devices used by FASTER), the implicit retention of blobs that comes with the soft delete feature can inflate the billed capacity tremendously (we saw more than 10x inflated capacity billing with this application).
Netherite does not collect object logs as aggressively as it should. I am working on a PR to tune the corresponding FASTER parameters to improve this.

microsoft-github-policy-service bot added the Needs: Triage 🔍 label Mar 1, 2023

sebastianburckhardt added needs author response bug Something isn't working and removed Needs: Triage 🔍 bug Something isn't working labels Mar 2, 2023

sebastianburckhardt added the documentation Improvements or additions to documentation label Mar 6, 2023

sebastianburckhardt mentioned this issue Mar 7, 2023

Overhaul FASTER page and segment size parameters #230

Merged

sebastianburckhardt removed the needs author response label Apr 20, 2023

sebastianburckhardt mentioned this issue May 4, 2023

Crash using medium test case #263

Closed

sebastianburckhardt mentioned this issue May 25, 2023

[Feature Request] Orchestration Auto-Purge Azure/azure-functions-durable-extension#892

Open

sebastianburckhardt mentioned this issue Oct 30, 2023

Repeating Timers on no longer existing durable function and instance #312

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Blob Storage Continually Growing using Netherite #229

Blob Storage Continually Growing using Netherite #229

UMCPGrad commented Mar 1, 2023

sebastianburckhardt commented Mar 2, 2023 •

edited

Loading

sebastianburckhardt commented Mar 4, 2023

Blob Storage Continually Growing using Netherite #229

Blob Storage Continually Growing using Netherite #229

Comments

UMCPGrad commented Mar 1, 2023

sebastianburckhardt commented Mar 2, 2023 • edited Loading

sebastianburckhardt commented Mar 4, 2023

sebastianburckhardt commented Mar 2, 2023 •

edited

Loading