Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BaseCommitService consumes 100% CPU when idle #12086

Open
1 of 3 tasks
davseitsev opened this issue Jan 24, 2025 · 3 comments
Open
1 of 3 tasks

BaseCommitService consumes 100% CPU when idle #12086

davseitsev opened this issue Jan 24, 2025 · 3 comments
Labels
bug Something isn't working

Comments

@davseitsev
Copy link

Apache Iceberg version

1.7.0

Query engine

Spark

Please describe the bug 🐞

We have Spark job which performs all the maintenance actions over our data lake. We noticed high CPU usage on driver caused by Committer-Service threads.

Here is output of top:

51128 yarn      20   0  120.4g  29.7g  23732 R  99.9  24.1  66:34.41 Committer-Servi
53929 yarn      20   0  120.4g  29.7g  23732 R  99.9  24.1  16:08.09 Committer-Servi
11001 yarn      20   0  120.4g  29.7g  23732 R  99.7  24.1  90:47.03 dispatcher-Coar
49957 yarn      20   0  120.4g  29.7g  23732 R  99.7  24.1  65:28.45 Committer-Servi
50738 yarn      20   0  120.4g  29.7g  23732 R  99.7  24.1  66:45.21 Committer-Servi
11052 yarn      20   0  120.4g  29.7g  23732 S   1.0  24.1   1:53.65 spark-listener-
83359 yarn      20   0  120.4g  29.7g  23732 S   0.6  24.1   0:00.10 Committer-Servi

Consuming threads stack trace look like this:

"Committer-Service" #28702 prio=5 os_prio=0 cpu=3958800.03ms elapsed=4094.93s tid=0x0000ffe544acc4e0 nid=0xc325 runnable  [0x0000ffe492d55000]
   java.lang.Thread.State: RUNNABLE
        at org.apache.iceberg.actions.BaseCommitService.lambda$start$0(BaseCommitService.java:133)
        at org.apache.iceberg.actions.BaseCommitService$$Lambda$4975/0x00000030026fec58.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker([email protected]/ThreadPoolExecutor.java:1136)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run([email protected]/ThreadPoolExecutor.java:635)
        at java.lang.Thread.run([email protected]/Thread.java:840)

Here is flame graph:

Image

And call tree:

Image

And in the code:

Image

It looks like wasting CPU time when there is actually nothing to do.
It doesn't affects us significantly as we use big istance for the driver but it worth optimization for smaller instances.

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time
@davseitsev davseitsev added the bug Something isn't working label Jan 24, 2025
@silentxingtian
Copy link

兄弟,请问下,你的那个火焰图是通过什么得到的,我学习哈

@RussellSpitzer
Copy link
Member

I wonder if it should be sleeping even if inProgressCommits() has elements in it. I don't think we want to loop unless work is actually finished

@lliangyu-lin
Copy link
Contributor

I feel one potential approach to address this could be using BlockingQueue, which allows the commit service thread to sleep until new work is offered in the completedRewrites queue. Although this might introduce slightly increased latency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants