-
Notifications
You must be signed in to change notification settings - Fork 913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve documentation for pyspark setup ('kedro run' cant resolve path on starter project with tool pyspark enabled) #4366
Comments
Now I tested all tools separately and tool 6 (pyspark) is causing this issue. I didn't have spark installed so I checked the requirements.txt and pyspark wasn't present. Unfortunately adding and installing it with pip didn't solve the issue. I also added the env-var PYSPARK_HADOOP_VERSION=3 for a quick check but it didn't resolve the issue either. |
I suppose its less of a bug than poor documentation related to this isssue/comment: kedro-starters/issues/237# My hadoop is pretty old and links might be broken. I won't be able to test a fresh installation though, if you aren't able to reproduce this it mostlikely my machine. Still the documentation and error message (which path cant be found?) could get improved. Is hadoop getting installed with initializing kedro? Why isn't it mentioned its necessary for the pyspark tool? |
Hi @bf-malefiz thanks for reporting this. I wasn't able to recreate your issue. Ensure you're using Java 8 or Java 11. I noticed issues with Java 21, which is not officially supported by Apache Spark or PySpark. You can check your current Java version with: You mentioned a concern about documentation clarity. This is a valid point, and we'll looks into making the setup steps for tools like PySpark clearer in the documentation. |
Description
I'm initializing a new project with all tools enabled and an example pipeline. After installing the requirements kedro new fails with
The system cannot find the path specified.
Context
Just trying to get the example running. The spaceflights starter is working but can't be initialized with --tools=all
kedro new --name=basic --starter=spaceflights-pandas
Steps to Reproduce
Expected Result
INFO Pipeline execution completed successfully.
Actual Result
[12/03/24 01:27:05] INFO Using 'conf\logging.yml' as logging configuration. You can change this by setting the KEDRO_LOGGING_CONFIG environment variable accordingly. init.py:270
[12/03/24 01:27:06] INFO Kedro project space session.py:329
The system cannot find the path specified.
Your Environment
Win11, conda-env
pip show kedro
orkedro -V
): Version: 0.19.10python -V
):Python 3.11.10The text was updated successfully, but these errors were encountered: