Wide range of throughput numbers using my large test case #240

ericleigh007 · 2023-03-21T13:28:18Z

The new infamous test case I'm using for Netherite that launches now 200 orchestrators and 200 activities each, seems to report widely varying throughput numbers, even after scale up.

I'm wondering if something is working differently than expected.

So over the course of my looping sample, if I start from idle, I'd expect to see slower throughput, then faster and faster, then some sort of stable or close-to-stable number of items processed/second.

What I'm seeing varies quite widely, from 14000/second then sometimes down to 2000/second and less.

See below... This is a test of 40000 items (200/200) in a loop, so the invocations are right after each other.
My next version will be so good as to log the wall time, sorry about that.

RunTime                           : 00:00:03.3574739
ItemsPerSecond                    : 11913.718822951982
MinProcessingClockTime            : 00:00:00.0000397
MaxProcessingClockTime            : 00:00:00.0864887
MinOrchestratorDequeueDelay       : 00:00:00.0709527
MaxOrchestratorDequeueDelay       : 00:00:00.4736806
MinActivityDequeueDelay           : 00:00:00.0456265
MaxActivityDequeueDelay           : 00:00:03.0261596
MinActivityOutputDequeueDelay     : 00:00:00.0017783
MaxActivityOutputDequeueDelay     : 00:00:01.0126249
MinOrchestratorOutputDequeueDelay : 00:00:00.0385646
MaxOrchestratorOutputDequeueDelay : 00:00:02.9698479

RunTime                           : 00:00:03.3689919
ItemsPerSecond                    : 11872.987881033492
MinProcessingClockTime            : 00:00:00.0000361
MaxProcessingClockTime            : 00:00:00.0865658
MinOrchestratorDequeueDelay       : 00:00:00.0658183
MaxOrchestratorDequeueDelay       : 00:00:00.6006850
MinActivityDequeueDelay           : 00:00:00.0458639
MaxActivityDequeueDelay           : 00:00:03.1040092
MinActivityOutputDequeueDelay     : 00:00:00.0019841
MaxActivityOutputDequeueDelay     : 00:00:01.6303926
MinOrchestratorOutputDequeueDelay : 00:00:00.1586693
MaxOrchestratorOutputDequeueDelay : 00:00:02.9354276

RunTime                           : 00:00:17.0899671
ItemsPerSecond                    : 2340.554535064026
MinProcessingClockTime            : 00:00:00.0000397
MaxProcessingClockTime            : 00:00:00.1002413
MinOrchestratorDequeueDelay       : 00:00:00.0771264
MaxOrchestratorDequeueDelay       : 00:00:00.3821118
MinActivityDequeueDelay           : 00:00:00.0632519
MaxActivityDequeueDelay           : 00:00:16.8132223
MinActivityOutputDequeueDelay     : 00:00:00.0019337
MaxActivityOutputDequeueDelay     : 00:00:15.6254716
MinOrchestratorOutputDequeueDelay : 00:00:00.0438357
MaxOrchestratorOutputDequeueDelay : 00:00:16.5373772

RunTime                           : 00:00:02.8043951
ItemsPerSecond                    : 14263.325449399053
MinProcessingClockTime            : 00:00:00.0000369
MaxProcessingClockTime            : 00:00:00.0762990
MinOrchestratorDequeueDelay       : 00:00:00.0407836
MaxOrchestratorDequeueDelay       : 00:00:00.3179334
MinActivityDequeueDelay           : 00:00:00.0474485
MaxActivityDequeueDelay           : 00:00:02.5890787
MinActivityOutputDequeueDelay     : 00:00:00.0025170
MaxActivityOutputDequeueDelay     : 00:00:01.4027891
MinOrchestratorOutputDequeueDelay : 00:00:00.0647167
MaxOrchestratorOutputDequeueDelay : 00:00:02.2597560

RunTime                           : 00:00:16.5352867
ItemsPerSecond                    : 2419.0690325315013
MinProcessingClockTime            : 00:00:00.0000369
MaxProcessingClockTime            : 00:00:00.1032389
MinOrchestratorDequeueDelay       : 00:00:00.0530615
MaxOrchestratorDequeueDelay       : 00:00:00.3793625
MinActivityDequeueDelay           : 00:00:00.0454832
MaxActivityDequeueDelay           : 00:00:16.1006502
MinActivityOutputDequeueDelay     : 00:00:00.0014239
MaxActivityOutputDequeueDelay     : 00:00:14.7625118
MinOrchestratorOutputDequeueDelay : 00:00:00.0796387
MaxOrchestratorOutputDequeueDelay : 00:00:15.9613534

RunTime                           : 00:00:06.2084545
ItemsPerSecond                    : 6442.827276901199
MinProcessingClockTime            : 00:00:00.0000411
MaxProcessingClockTime            : 00:00:00.0620684
MinOrchestratorDequeueDelay       : 00:00:00.1095485
MaxOrchestratorDequeueDelay       : 00:00:02.1953854
MinActivityDequeueDelay           : 00:00:00.0517679
MaxActivityDequeueDelay           : 00:00:04.0140725
MinActivityOutputDequeueDelay     : 00:00:00.0020846
MaxActivityOutputDequeueDelay     : 00:00:01.2265399
MinOrchestratorOutputDequeueDelay : 00:00:00.0383115
MaxOrchestratorOutputDequeueDelay : 00:00:05.5973410

RunTime                           : 00:00:16.7253336
ItemsPerSecond                    : 2391.5815945219774
MinProcessingClockTime            : 00:00:00.0000406
MaxProcessingClockTime            : 00:00:00.1983516
MinOrchestratorDequeueDelay       : 00:00:00.1106292
MaxOrchestratorDequeueDelay       : 00:00:01.8555362
MinActivityDequeueDelay           : 00:00:00.0435017
MaxActivityDequeueDelay           : 00:00:16.4401935
MinActivityOutputDequeueDelay     : 00:00:00.0026886
MaxActivityOutputDequeueDelay     : 00:00:15.3348152
MinOrchestratorOutputDequeueDelay : 00:00:00.0729207
MaxOrchestratorOutputDequeueDelay : 00:00:16.0948323

RunTime                           : 00:00:16.1567415
ItemsPerSecond                    : 2475.7467339562254
MinProcessingClockTime            : 00:00:00.0000372
MaxProcessingClockTime            : 00:00:00.1526612
MinOrchestratorDequeueDelay       : 00:00:00.0856060
MaxOrchestratorDequeueDelay       : 00:00:00.4543126
MinActivityDequeueDelay           : 00:00:00.0510178
MaxActivityDequeueDelay           : 00:00:15.7874673
MinActivityOutputDequeueDelay     : 00:00:00.0025035
MaxActivityOutputDequeueDelay     : 00:00:14.1651892
MinOrchestratorOutputDequeueDelay : 00:00:00.0530131
MaxOrchestratorOutputDequeueDelay : 00:00:15.6284004

RunTime                           : 00:00:03.1172668
ItemsPerSecond                    : 12831.753765831016
MinProcessingClockTime            : 00:00:00.0000374
MaxProcessingClockTime            : 00:00:00.0777534
MinOrchestratorDequeueDelay       : 00:00:00.0433117
MaxOrchestratorDequeueDelay       : 00:00:00.4105704
MinActivityDequeueDelay           : 00:00:00.0407375
MaxActivityDequeueDelay           : 00:00:02.9726634
MinActivityOutputDequeueDelay     : 00:00:00.0021738
MaxActivityOutputDequeueDelay     : 00:00:00.9526307
MinOrchestratorOutputDequeueDelay : 00:00:00.0383611
MaxOrchestratorOutputDequeueDelay : 00:00:02.6392095

RunTime                           : 00:00:16.5780695
ItemsPerSecond                    : 2412.8261737592543
MinProcessingClockTime            : 00:00:00.0000391
MaxProcessingClockTime            : 00:00:00.1716346
MinOrchestratorDequeueDelay       : 00:00:00.0350603
MaxOrchestratorDequeueDelay       : 00:00:00.1687830
MinActivityDequeueDelay           : 00:00:00.0574938
MaxActivityDequeueDelay           : 00:00:16.4114290
MinActivityOutputDequeueDelay     : 00:00:00.0013982
MaxActivityOutputDequeueDelay     : 00:00:15.2459880
MinOrchestratorOutputDequeueDelay : 00:00:00.0794737
MaxOrchestratorOutputDequeueDelay : 00:00:16.2135437

RunTime                           : 00:00:02.9817131
ItemsPerSecond                    : 13415.106906160758
MinProcessingClockTime            : 00:00:00.0000398
MaxProcessingClockTime            : 00:00:00.1167567
MinOrchestratorDequeueDelay       : 00:00:00.0755145
MaxOrchestratorDequeueDelay       : 00:00:00.4151801
MinActivityDequeueDelay           : 00:00:00.0555147
MaxActivityDequeueDelay           : 00:00:02.6986282
MinActivityOutputDequeueDelay     : 00:00:00.0027920
MaxActivityOutputDequeueDelay     : 00:00:01.1156877
MinOrchestratorOutputDequeueDelay : 00:00:00.0432506
MaxOrchestratorOutputDequeueDelay : 00:00:02.3856099

RunTime                           : 00:00:15.2868120
ItemsPerSecond                    : 2616.6345213115724
MinProcessingClockTime            : 00:00:00.0000372
MaxProcessingClockTime            : 00:00:00.1819660
MinOrchestratorDequeueDelay       : 00:00:00.0447356
MaxOrchestratorDequeueDelay       : 00:00:13.5874393
MinActivityDequeueDelay           : 00:00:00.0484493
MaxActivityDequeueDelay           : 00:00:03.6183609
MinActivityOutputDequeueDelay     : 00:00:00.0023733
MaxActivityOutputDequeueDelay     : 00:00:01.4724385
MinOrchestratorOutputDequeueDelay : 00:00:00.0394185
MaxOrchestratorOutputDequeueDelay : 00:00:14.8275195

RunTime                           : 00:00:16.4125338
ItemsPerSecond                    : 2437.161774497001
MinProcessingClockTime            : 00:00:00.0000410
MaxProcessingClockTime            : 00:00:00.1739761
MinOrchestratorDequeueDelay       : 00:00:00.0348563
MaxOrchestratorDequeueDelay       : 00:00:00.1636345
MinActivityDequeueDelay           : 00:00:00.0511314
MaxActivityDequeueDelay           : 00:00:16.2285142
MinActivityOutputDequeueDelay     : 00:00:00.0009707
MaxActivityOutputDequeueDelay     : 00:00:14.0458835
MinOrchestratorOutputDequeueDelay : 00:00:00.0573969
MaxOrchestratorOutputDequeueDelay : 00:00:16.0556575

RunTime                           : 00:00:03.0061488
ItemsPerSecond                    : 13306.061230235842
MinProcessingClockTime            : 00:00:00.0000367
MaxProcessingClockTime            : 00:00:00.2683780
MinOrchestratorDequeueDelay       : 00:00:00.1208113
MaxOrchestratorDequeueDelay       : 00:00:00.4739112
MinActivityDequeueDelay           : 00:00:00.0617926
MaxActivityDequeueDelay           : 00:00:02.6201734
MinActivityOutputDequeueDelay     : 00:00:00.0018922
MaxActivityOutputDequeueDelay     : 00:00:01.0050227
MinOrchestratorOutputDequeueDelay : 00:00:00.0451203
MaxOrchestratorOutputDequeueDelay : 00:00:02.5680282

RunTime                           : 00:00:16.2601642
ItemsPerSecond                    : 2459.999758182024
MinProcessingClockTime            : 00:00:00.0000371
MaxProcessingClockTime            : 00:00:00.1683287
MinOrchestratorDequeueDelay       : 00:00:00.0304614
MaxOrchestratorDequeueDelay       : 00:00:00.2098718
MinActivityDequeueDelay           : 00:00:00.0556229
MaxActivityDequeueDelay           : 00:00:16.0974459
MinActivityOutputDequeueDelay     : 00:00:00.0010998
MaxActivityOutputDequeueDelay     : 00:00:14.5173542
MinOrchestratorOutputDequeueDelay : 00:00:00.0376069
MaxOrchestratorOutputDequeueDelay : 00:00:15.7674936

RunTime                           : 00:00:16.7808984
ItemsPerSecond                    : 2383.6626053346463
MinProcessingClockTime            : 00:00:00.0000377
MaxProcessingClockTime            : 00:00:00.0437672
MinOrchestratorDequeueDelay       : 00:00:00.0426995
MaxOrchestratorDequeueDelay       : 00:00:00.2639204
MinActivityDequeueDelay           : 00:00:00.0430592
MaxActivityDequeueDelay           : 00:00:16.5999489
MinActivityOutputDequeueDelay     : 00:00:00.0009935
MaxActivityOutputDequeueDelay     : 00:00:14.9335350
MinOrchestratorOutputDequeueDelay : 00:00:00.0522790
MaxOrchestratorOutputDequeueDelay : 00:00:16.2959642

RunTime                           : 00:00:02.7988467
ItemsPerSecond                    : 14291.600894039677
MinProcessingClockTime            : 00:00:00.0000378
MaxProcessingClockTime            : 00:00:00.1528032
MinOrchestratorDequeueDelay       : 00:00:00.0348766
MaxOrchestratorDequeueDelay       : 00:00:00.3306209
MinActivityDequeueDelay           : 00:00:00.0486242
MaxActivityDequeueDelay           : 00:00:02.5968465
MinActivityOutputDequeueDelay     : 00:00:00.0015949
MaxActivityOutputDequeueDelay     : 00:00:01.1873683
MinOrchestratorOutputDequeueDelay : 00:00:00.0447643
MaxOrchestratorOutputDequeueDelay : 00:00:02.4075779

RunTime                           : 00:00:17.5208937
ItemsPerSecond                    : 2282.9885669587734
MinProcessingClockTime            : 00:00:00.0000363
MaxProcessingClockTime            : 00:00:00.1919865
MinOrchestratorDequeueDelay       : 00:00:00.0322732
MaxOrchestratorDequeueDelay       : 00:00:00.1874352
MinActivityDequeueDelay           : 00:00:00.0713355
MaxActivityDequeueDelay           : 00:00:17.3781610
MinActivityOutputDequeueDelay     : 00:00:00.0017288
MaxActivityOutputDequeueDelay     : 00:00:15.6113828
MinOrchestratorOutputDequeueDelay : 00:00:00.0294105
MaxOrchestratorOutputDequeueDelay : 00:00:17.0746972
CosmosThroughput                  : 0

RunTime                           : 00:00:17.2647517
ItemsPerSecond                    : 2316.859268818792
MinProcessingClockTime            : 00:00:00.0000379
MaxProcessingClockTime            : 00:00:00.1241042
MinOrchestratorDequeueDelay       : 00:00:00.0296077
MaxOrchestratorDequeueDelay       : 00:00:00.5128788
MinActivityDequeueDelay           : 00:00:00.0503961
MaxActivityDequeueDelay           : 00:00:14.3834253
MinActivityOutputDequeueDelay     : 00:00:00.0028123
MaxActivityOutputDequeueDelay     : 00:00:12.5396788
MinOrchestratorOutputDequeueDelay : 00:00:00.0604172
MaxOrchestratorOutputDequeueDelay : 00:00:16.7830052

RunTime                           : 00:00:03.4726508
ItemsPerSecond                    : 11518.578257278274
MinProcessingClockTime            : 00:00:00.0000386
MaxProcessingClockTime            : 00:00:00.1454547
MinOrchestratorDequeueDelay       : 00:00:00.1020626
MaxOrchestratorDequeueDelay       : 00:00:00.4832070
MinActivityDequeueDelay           : 00:00:00.0341847
MaxActivityDequeueDelay           : 00:00:03.1227420
MinActivityOutputDequeueDelay     : 00:00:00.0012181
MaxActivityOutputDequeueDelay     : 00:00:01.3468593
MinOrchestratorOutputDequeueDelay : 00:00:00.0436711
MaxOrchestratorOutputDequeueDelay : 00:00:02.8118433

Included are a few other numbers to perhaps help the analysis.

Min and max orchestrator Dequeue delay are the times that it takes between the orchestrator being initiated from StartSubOrchestrator and the time the orchestrator code starts to run.

Similarly with Min and Max activity dequeue delay.

Out dequeue delay is the time from the return statement in the orchestrator or activity to the time the parent code gets control.

Is it expected that the times should vary so widely? I'm using a copy of the benchmark host.json in my tests.

{
  "version": "2.0",
  "logging": {
    "logLevel": {
      // ------ NOTE: 
      // The log levels in this section 
      // - determine what is displayed in the console when running func.exe locally
      // - determine what is stored and billed by Application Insights
      // But they do not fully control the logging:
      // - have no effect on what is collected by internal analytics
      // - have no effect on what is traced to the console when running in a container
      // - have no effect on what is shown in the Live Metrics side panel in the portal

      // ---- Per-invocation framework-generated logging
      //"Host.Triggers.DurableTask": "Information", // use this setting if you need analytics in the portal
      "Host.Triggers.DurableTask": "Warning", // use this setting otherwise

      // ---- Per-invocation application-generated logging
      //"Function": "Information", // use this setting for small-scale debugging only
      "Function": "Warning", // use this setting when running perf tests

      // --- the levels below are used to control the Netherite tracing.
      "DurableTask.Netherite": "Information",
      "DurableTask.Netherite.FasterStorage": "Warning",
      "DurableTask.Netherite.FasterStorage.Performance": "Error",
      "DurableTask.Netherite.EventHubsTransport": "Warning",
      "DurableTask.Netherite.Events": "Warning",
      "DurableTask.Netherite.WorkItems": "Warning",
      "DurableTask.Netherite.Client": "Warning",
      "DurableTask.Netherite.LoadMonitor": "Warning"
    },
    "applicationInsights": {
      "sampling": {
        "isEnabled": false
      }
    }
  },
  "extensions": {
    "durableTask": {

     // "hubName": "perftests",

      "UseGracefulShutdown": true, // helps to avoid lengthy waits for lease expiration

      "maxConcurrentActivityFunctions": "100",
      "maxConcurrentOrchestratorFunctions": "100",

      "storageProvider": {

        "type": "Netherite",
        // "type": "AzureStorage",

        "StorageConnectionName": "AzureWebJobsStorage",
        "EventHubsConnectionName": "EventHubsConnection",

        "TakeStateCheckpointWhenStoppingPartition": "true",
        "MaxNumberBytesBetweenCheckpoints": "20000000",
        "MaxNumberEventsBetweenCheckpoints": "10000",
        "IdleCheckpointFrequencyMs": "60000",

        "PackPartitionTaskMessages": 100,

        "PartitionCount": 12,

        // set this to false to force replay on each orchestration step
        "CacheOrchestrationCursors": true,

        // set this to control the max size of the orchestration instance cache
        // "InstanceCacheSizeMB": "50",

        // set this to true to use the PSF support in Faster. Will soon be obsolete.
        "UsePSFQueries": "false",

        // set this to true to bypass Faster and instead use Azure blobs for the object store. Not for production use.
        "UseAlternateObjectStore": "false",

        // set this to "true" to disable all pipelining
        "PersistStepsFirst": false,

        // set this to x to inject faults with probability 1/x
        //"FaultInjectionProbability": 2000,
        // set this to true to attach replay checker
        //"AttachReplayChecker": true,
        // set this to true to attach replay checker
        //"AttachCacheDebugger": true,

        // can change this to use a different table, or blobs
        //"LoadInformationAzureTableName": "",

        // set this to "Scripted" to control the scenario with a partition script
        // or to "ClientOnly" to run only the client
        "PartitionManagement": "EventProcessorHost",

        // set this to "Local" to disable the global activity distribution algorithm
        // options: "Local", "Static", "Locavore"
        "ActivityScheduler": "Locavore",

        // controls whether the instance ids are kept in memory. If false, paged queries are not supported.
        "KeepInstanceIdsInMemory": true,

        // The log level limits below control the production of log events by the various components.
        // it limits production, not just consumption, of the events, so it can be used to prevent overheads.
        // "Debug" is a reasonable setting, as it allows troubleshooting without impacting perf too much.
        "LogLevelLimit": "Debug",
        "StorageLogLevelLimit": "Debug",
        "TransportLogLevelLimit": "Debug",
        "EventLogLevelLimit": "Debug",
        "WorkItemLogLevelLimit": "Debug",
        "ClientLogLevelLimit": "Debug",
        "LoadMonitorLogLevelLimit": "Debug",

        // the following can be used to collectd and direct trace output to additional specific sinks
        // which is useful in a testing and debugging context
        "TraceToConsole": false,
        "TraceToBlob": false
      }
    }
  }
}

The text was updated successfully, but these errors were encountered:

sebastianburckhardt · 2023-03-21T22:09:59Z

There can be many reasons why throughput fluctuates. If you give me the name and region of your function app, and a start/end time for the experiment(s), I can a look at our internal telemetry to see if I can determine a cause.

If I were to just guess, my guess would be that it has to do with the load balancing of activities between nodes. This may or may not be the actual reason. One thing you can try is to turn off the load balancing altogether, using the setting "ActivityScheduler": "Local". Under this setting, all activities are always executed on the same node as the orchestration that issues them. This can be beneficial if orchestrations run very short activities only, and the orchestrations themselves are well balanced across the nodes (which is probably the case in your benchmark situation).

ericleigh007 · 2023-03-21T23:16:39Z

Thanks I'll try the "Local" setting later and report any differences.
On the other hand, my case doesn't seem that different from the HelloCities benchmark and i'm using his host.json settings.
Once that's done, I'll email you the information above to maybe get a better understanding of what's going on.

ericleigh007 · 2023-07-28T22:14:54Z

Sorry, I decided I should keep this one open until after I've had a chance to test with the 1.4.0 version.
Apologies.....

microsoft-github-policy-service bot added the Needs: Triage 🔍 label Mar 21, 2023

sebastianburckhardt added needs author response and removed Needs: Triage 🔍 labels Mar 21, 2023

sebastianburckhardt removed the needs author response label Apr 20, 2023

sebastianburckhardt mentioned this issue Jun 16, 2023

implement blob batching to improve EH sender throughput #275

Merged

ericleigh007 closed this as completed Jul 28, 2023

ericleigh007 reopened this Jul 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wide range of throughput numbers using my large test case #240

Wide range of throughput numbers using my large test case #240

ericleigh007 commented Mar 21, 2023

sebastianburckhardt commented Mar 21, 2023 •

edited

Loading

ericleigh007 commented Mar 21, 2023

ericleigh007 commented Jul 28, 2023

Wide range of throughput numbers using my large test case #240

Wide range of throughput numbers using my large test case #240

Comments

ericleigh007 commented Mar 21, 2023

sebastianburckhardt commented Mar 21, 2023 • edited Loading

ericleigh007 commented Mar 21, 2023

ericleigh007 commented Jul 28, 2023

sebastianburckhardt commented Mar 21, 2023 •

edited

Loading