Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to run code and download result properly? #108

Open
makrushin-evgenii opened this issue Nov 15, 2023 · 1 comment
Open

How to run code and download result properly? #108

makrushin-evgenii opened this issue Nov 15, 2023 · 1 comment

Comments

@makrushin-evgenii
Copy link

makrushin-evgenii commented Nov 15, 2023

I create session and download resulting dataframe. The running code itself is not important:

with LivySession.create(self.LIVY_URL, kind=SessionKind.PYSPARK, requests_session=self.requests_session, spark_conf=conf) as session:
    session.run(code)
    return session.download(download_dataframe_name)

It works fine in staging environment on small amounts of data. Fail with error on large amounts in production environment:

requests.exceptions.HTTPError: 500 Server Error: Server Error for url: https://***:443/gateway/production/livy/sessions/655/statements/1
{"msg":"Session '655' not found."}

Wherein YARN application finished with succeed status:
enter image description here

Livy logs looks like:

23/11/15 14:06:14 INFO InteractiveSession: Interactive session 656 created [appid: application_1698181251761_0043, owner: knox, proxyUser: Some(e.makrushin), state: idle, kind: pyspark, info: {driverLogUrl=http://***:8042/node/containerlogs/container_e58_1698181251761_0043_01_000001/e.makrushin, sparkUiUrl=http://***/proxy/application_1698181251761_0043/}]
23/11/15 14:09:54 INFO InteractiveSessionManager: Deleting session 656
23/11/15 14:09:54 INFO InteractiveSession: Stopping InteractiveSession 656...
23/11/15 14:09:54 WARN Rpc: [Rpc] Closing RPC channel with 2 outstanding RPCs.
23/11/15 14:09:54 ERROR SessionServlet$: internal error
java.util.concurrent.CancellationException
        at io.netty.util.concurrent.DefaultPromise.cancel(...)(Unknown Source)
23/11/15 14:09:54 INFO InteractiveSession: Stopped InteractiveSession 656.
23/11/15 14:09:54 INFO InteractiveSessionManager: Deleted session 656

It seems the session is deleted before I can download the result. Why might this happen and how to fix it?

I also tried to handle downloaded dataframe in with scope. And do not use with at all. It doesn't change anything: i got same error at the moment of download call

@makrushin-evgenii
Copy link
Author

A few more details. The calculation takes from 5 to 15 minutes, the result in csv format weighs about a gigabyte, often less. I don't need LivySession ability to transfer session between threads/instances, but use it because of convenient interface: easy to run a code without need to upload its source to HDFS, easy to get a result without need to download it from HDFS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant