Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running in Zeppelin results in non-progressing execution #32

Open
PowerToThePeople111 opened this issue Jul 10, 2019 · 0 comments
Open

Comments

@PowerToThePeople111
Copy link

Hey guys,

just a minor thing. Since I run most of my analysis using a Zeppelin frontend, I also wanted to use it when training models with SparkFlow. Sadly tho, the training process by the famous MNIST example does not run through: it starts well off (as it would in a shell) and at some points just hangs (producing no output or error).


W0710 16:11:49.977241 139741029619520 deprecation_wrapper.py:119] From /home/hadoop/anaconda3/envs/pySpark/lib/python3.6/site-packages/sparkflow/tensorflow_async.py:20: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.

W0710 16:11:49.977596 139741029619520 deprecation_wrapper.py:119] From /home/hadoop/anaconda3/envs/pySpark/lib/python3.6/site-packages/sparkflow/tensorflow_async.py:21: The name tf.train.RMSPropOptimizer is deprecated. Please use tf.compat.v1.train.RMSPropOptimizer instead.

W0710 16:11:49.977807 139741029619520 deprecation_wrapper.py:119] From /home/hadoop/anaconda3/envs/pySpark/lib/python3.6/site-packages/sparkflow/tensorflow_async.py:22: The name tf.train.MomentumOptimizer is deprecated. Please use tf.compat.v1.train.MomentumOptimizer instead.

W0710 16:11:49.978009 139741029619520 deprecation_wrapper.py:119] From /home/hadoop/anaconda3/envs/pySpark/lib/python3.6/site-packages/sparkflow/tensorflow_async.py:23: The name tf.train.AdadeltaOptimizer is deprecated. Please use tf.compat.v1.train.AdadeltaOptimizer instead.

W0710 16:11:50.300260 139741029619520 deprecation_wrapper.py:119] From /home/hadoop/anaconda3/envs/pySpark/lib/python3.6/site-packages/sparkflow/HogwildSparkModel.py:137: The name tf.MetaGraphDef is deprecated. Please use tf.compat.v1.MetaGraphDef instead.

WARNING: Logging before flag parsing goes to stderr.
W0710 16:11:52.099525 139883655739200 deprecation_wrapper.py:119] From /home/hadoop/anaconda3/envs/pySpark/lib/python3.6/site-packages/sparkflow/HogwildSparkModel.py:188: The name tf.train.Server is deprecated. Please use tf.distribute.Server instead.

2019-07-10 16:11:52.100034: I tensorflow/core/platform/cpu_feature_guard.cc:145] This TensorFlow binary is optimized with Intel(R) MKL-DNN to use the following CPU instructions in performance critical operations:  SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in non-MKL-DNN operations, rebuild TensorFlow with the appropriate compiler flags.
2019-07-10 16:11:52.119754: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300100000 Hz
2019-07-10 16:11:52.120280: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x561152a547a0 executing computations on platform Host. Devices:
2019-07-10 16:11:52.120302: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
E0710 16:11:52.121669655   16576 socket_utils_common_posix.cc:198] check for SO_REUSEPORT: {"created":"@1562775112.121656125","description":"SO_REUSEPORT unavailable on compiling system","file":"external/grpc/src/core/lib/iomgr/socket_utils_common_posix.cc","file_line":166}
2019-07-10 16:11:52.121849: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:250] Initialize GrpcChannelCache for job local -> {0 -> localhost:42003}
2019-07-10 16:11:52.123072: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:365] Started server with target: grpc://localhost:42003
W0710 16:11:52.125344 139883655739200 deprecation_wrapper.py:119] From /home/hadoop/anaconda3/envs/pySpark/lib/python3.6/site-packages/sparkflow/HogwildSparkModel.py:191: The name tf.train.import_meta_graph is deprecated. Please use tf.compat.v1.train.import_meta_graph instead.

W0710 16:11:52.203922 139883655739200 deprecation_wrapper.py:119] From /home/hadoop/anaconda3/envs/pySpark/lib/python3.6/site-packages/sparkflow/HogwildSparkModel.py:192: The name tf.get_collection is deprecated. Please use tf.compat.v1.get_collection instead.

W0710 16:11:52.204156 139883655739200 deprecation_wrapper.py:119] From /home/hadoop/anaconda3/envs/pySpark/lib/python3.6/site-packages/sparkflow/HogwildSparkModel.py:192: The name tf.GraphKeys is deprecated. Please use tf.compat.v1.GraphKeys instead.

W0710 16:11:52.204305 139883655739200 deprecation_wrapper.py:119] From /home/hadoop/anaconda3/envs/pySpark/lib/python3.6/site-packages/sparkflow/HogwildSparkModel.py:193: The name tf.trainable_variables is deprecated. Please use tf.compat.v1.trainable_variables instead.

W0710 16:11:52.350754 139883655739200 deprecation_wrapper.py:119] From /home/hadoop/anaconda3/envs/pySpark/lib/python3.6/site-packages/sparkflow/HogwildSparkModel.py:197: The name tf.global_variables_initializer is deprecated. Please use tf.compat.v1.global_variables_initializer instead.

W0710 16:11:52.351631 139883655739200 deprecation_wrapper.py:119] From /home/hadoop/anaconda3/envs/pySpark/lib/python3.6/site-packages/sparkflow/HogwildSparkModel.py:199: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

When I execute the same code in the Pyspark shell, I also get the following lines and also the output of the training process itself.

2019-07-10 16:11:52.390889: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set.  If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU.  To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
 * Serving Flask app "sparkflow.HogwildSparkModel" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off

You got any ideas why that happens? I explicitly allowed multiple contexts by setting spark.driver.allowMultipleContexts to true in the interpreter settings of spark/pyspark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant