BaseCommitService consumes 100% CPU when idle #12086

davseitsev · 2025-01-24T09:56:00Z

Apache Iceberg version

1.7.0

Query engine

Spark

Please describe the bug 🐞

We have Spark job which performs all the maintenance actions over our data lake. We noticed high CPU usage on driver caused by Committer-Service threads.

Here is output of top:

51128 yarn      20   0  120.4g  29.7g  23732 R  99.9  24.1  66:34.41 Committer-Servi
53929 yarn      20   0  120.4g  29.7g  23732 R  99.9  24.1  16:08.09 Committer-Servi
11001 yarn      20   0  120.4g  29.7g  23732 R  99.7  24.1  90:47.03 dispatcher-Coar
49957 yarn      20   0  120.4g  29.7g  23732 R  99.7  24.1  65:28.45 Committer-Servi
50738 yarn      20   0  120.4g  29.7g  23732 R  99.7  24.1  66:45.21 Committer-Servi
11052 yarn      20   0  120.4g  29.7g  23732 S   1.0  24.1   1:53.65 spark-listener-
83359 yarn      20   0  120.4g  29.7g  23732 S   0.6  24.1   0:00.10 Committer-Servi

Consuming threads stack trace look like this:

"Committer-Service" #28702 prio=5 os_prio=0 cpu=3958800.03ms elapsed=4094.93s tid=0x0000ffe544acc4e0 nid=0xc325 runnable  [0x0000ffe492d55000]
   java.lang.Thread.State: RUNNABLE
        at org.apache.iceberg.actions.BaseCommitService.lambda$start$0(BaseCommitService.java:133)
        at org.apache.iceberg.actions.BaseCommitService$$Lambda$4975/0x00000030026fec58.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker([email protected]/ThreadPoolExecutor.java:1136)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run([email protected]/ThreadPoolExecutor.java:635)
        at java.lang.Thread.run([email protected]/Thread.java:840)

Here is flame graph:

And call tree:

And in the code:

It looks like wasting CPU time when there is actually nothing to do.
It doesn't affects us significantly as we use big istance for the driver but it worth optimization for smaller instances.

Willingness to contribute

I can contribute a fix for this bug independently
I would be willing to contribute a fix for this bug with guidance from the Iceberg community
I cannot contribute a fix for this bug at this time

The text was updated successfully, but these errors were encountered:

silentxingtian · 2025-01-24T11:11:41Z

兄弟，请问下，你的那个火焰图是通过什么得到的，我学习哈

RussellSpitzer · 2025-01-24T22:17:32Z

I wonder if it should be sleeping even if inProgressCommits() has elements in it. I don't think we want to loop unless work is actually finished

lliangyu-lin · 2025-01-26T21:00:48Z

I feel one potential approach to address this could be using BlockingQueue, which allows the commit service thread to sleep until new work is offered in the completedRewrites queue. Although this might introduce slightly increased latency.

davseitsev added the bug Something isn't working label Jan 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BaseCommitService consumes 100% CPU when idle #12086

BaseCommitService consumes 100% CPU when idle #12086

davseitsev commented Jan 24, 2025

silentxingtian commented Jan 24, 2025

RussellSpitzer commented Jan 24, 2025

lliangyu-lin commented Jan 26, 2025

BaseCommitService consumes 100% CPU when idle #12086

BaseCommitService consumes 100% CPU when idle #12086

Comments

davseitsev commented Jan 24, 2025

Apache Iceberg version

Query engine

Please describe the bug 🐞

Willingness to contribute

silentxingtian commented Jan 24, 2025

RussellSpitzer commented Jan 24, 2025

lliangyu-lin commented Jan 26, 2025