Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

is it possible to convert tf-agents to tf-lite and run on android device #280

Open
ssujr opened this issue Jan 2, 2020 · 26 comments
Open
Assignees

Comments

@ssujr
Copy link

ssujr commented Jan 2, 2020

We want to implement RL on android device. Just wondering if it is possible to run tf-agents on android or to convert tf-agents to tf-lite. It will be great if someone can share some experience. Thank you!

@ebrevdo
Copy link
Contributor

ebrevdo commented Jan 22, 2020

Yes; you should be able to do this. I'm guessing you care about inference (running a policy) more than training (since tflite doesn't support that anyway).

See the PolicySaver class. You can use it to export a SavedModel. You can then use the TFLite converter to convert that SavedModel to a TFLite model.

Please report back and let us know if this works for you!

@ebrevdo ebrevdo self-assigned this Jan 22, 2020
@ssujr
Copy link
Author

ssujr commented Feb 3, 2020

Actually, we plan to do both training and inference on device. Do you guys have plan to support training in near future? Thank you for the response.

@dvdhfnr
Copy link

dvdhfnr commented May 4, 2020

Hi!

Yes; you should be able to do this. I'm guessing you care about inference (running a policy) more than training (since tflite doesn't support that anyway).

See the PolicySaver class. You can use it to export a SavedModel. You can then use the TFLite converter to convert that SavedModel to a TFLite model.

Please report back and let us know if this works for you!

We tried to do this (using the DqnAgent.). However, we are receiving the following error when trying to convert the saved model (policy):
"ValueError: This converter can only convert a single ConcreteFunction. Converting multiple functions is under development."

@ebrevdo Any suggestions?
(If required, further details can be provided.)

Thanks!

@ebrevdo
Copy link
Contributor

ebrevdo commented May 4, 2020

For "only convert a single ConcreteFunction" this is cause it's trying to use the new MLIR converter. I suggest filing a repro separately with the TensorFlow Issues so they can see this feature is required. @aselle @jdduke fyi.

Separately; for now you should be able to use the "old-style" converter (it should work fine). Try passing --enable_v1_converter when you call tflite_convert and report back :)

@ebrevdo
Copy link
Contributor

ebrevdo commented May 4, 2020

For training on device you cannot do this with TFLite. You must either use the standard TF runtime, or try the (less well supported path) of using the new saved_model_cli aot_compile_cpu approach, which does not support dynamic shapes and a lot more manual, but would allow you to train on device. Unfortunately there's no tutorial (yet) on how to do this. If you're interested in that, we can involve the TF team to maybe write something about this approach.

@ebrevdo
Copy link
Contributor

ebrevdo commented May 4, 2020

(for aot_compile_cpu; you will need the most recent tf2.2 RC; it's not in TF2.1).

@dvdhfnr
Copy link

dvdhfnr commented May 4, 2020

enable_v1_converter

Thanks for the fast response!

--enable_v1_converter works "better", but leads to a different error:
ValueError: No 'serving_default' in the SavedModel's SignatureDefs. Possible values are 'get_initial_state,__saved_model_init_op,get_train_step,action'.

(We do not require training on the device.)

@ebrevdo
Copy link
Contributor

ebrevdo commented May 4, 2020 via email

@ebrevdo
Copy link
Contributor

ebrevdo commented May 4, 2020 via email

@dvdhfnr
Copy link

dvdhfnr commented May 4, 2020

Great. Thanks.

tflite_convert --saved_model_dir saveDir --enable_v1_converter --saved_model_signature_key action --output_file out.tflite --allow_custom_ops
seems to work for the conversion.

(Still need to investigate if this tflite model runs as expected on the Android device. I will try to report back.)

Thanks.

@maslovay
Copy link

maslovay commented May 10, 2020

@dvdhfnr how are things with implementing your tf agents trained NN on Android? I have this error:

"RuntimeError: Encountered unresolved custom op: BroadcastArgs.Node number 0 (BroadcastArgs) failed to prepare."

Here the case: https://stackoverflow.com/questions/61715154/tflite-model-load-error-runtimeerror-encountered-unresolved-custom-op-broadca

@ebrevdo

@ebrevdo
Copy link
Contributor

ebrevdo commented May 10, 2020

@jdduke @raziel any suggestions?

@dvdhfnr
Copy link

dvdhfnr commented May 11, 2020

When converting with the flag "--allow_custom_ops" you need to implement the ops that are not supported by TFLite by yourself: see e.g. https://www.tensorflow.org/lite/guide/ops_custom

Try to convert without "--allow_custom_ops". Then, you will see a list of ops that are not supported. Unfortunately, it seems that we will have to implement those by ourselves.

@maslovay
Copy link

@dvdhfnr you are right, the problem is this ops:

Exception: <unknown>:0: error: loc(fused["Deterministic_1/sample/BroadcastArgs@__inference_action_11129549", "StatefulPartitionedCall/StatefulPartitionedCall/StatefulPartitionedCall/StatefulPartitionedCall/Deterministic_1/sample/BroadcastArgs"]): 'tf.BroadcastArgs' op is neither a custom op nor a flex op
<unknown>:0: error: loc(fused["ActorDistributionNetwork/TanhNormalProjectionNetwork/MultivariateNormalDiag/shapes_from_loc_and_scale/prefer_static_broadcast_shape/BroadcastArgs@__inference_action_11129549", "StatefulPartitionedCall/StatefulPartitionedCall/StatefulPartitionedCall/StatefulPartitionedCall/ActorDistributionNetwork/TanhNormalProjectionNetwork/MultivariateNormalDiag/shapes_from_loc_and_scale/prefer_static_broadcast_shape/BroadcastArgs"]): 'tf.BroadcastArgs' op is neither a custom op nor a flex op
<unknown>:0: error: loc(fused["Deterministic_1/sample/BroadcastArgs_1@__inference_action_11129549", "StatefulPartitionedCall/StatefulPartitionedCall/StatefulPartitionedCall/StatefulPartitionedCall/Deterministic_1/sample/BroadcastArgs_1"]): 'tf.BroadcastArgs' op is neither a custom op nor a flex op
<unknown>:0: error: loc(fused["Deterministic_1/sample/BroadcastTo@__inference_action_11129549", "StatefulPartitionedCall/StatefulPartitionedCall/StatefulPartitionedCall/StatefulPartitionedCall/Deterministic_1/sample/BroadcastTo"]): 'tf.BroadcastTo' op is neither a custom op nor a flex op
<unknown>:0: error: failed while converting: 'main': Ops that can be supported by the flex runtime (enabled via setting the -emit-select-tf-ops flag): BroadcastArgs,BroadcastArgs,BroadcastArgs,BroadcastTo.

@dvdhfnr
Copy link

dvdhfnr commented May 12, 2020

Currently, I am using the following pipeline:

policy_saver = PolicySaver(policy)
policy_saver.save('tmp')
converter = tf.lite.TFLiteConverter.from_saved_model('tmp', signature_keys=["action"])
tflite_policy = converter.convert()

Since I am actually not interested in saving the policy to a file, I tried to exchange the 2nd and 3rd line with

converter = tf.lite.TFLiteConverter.from_concrete_functions([policy_saver._signatures['action'].get_concrete_function()])

I noticed that this changes the order of the input tensors. Do I need to take care of other side-effects or is this method safe to use? Moreover, do I need to use the PolicySaver at all or can I just directly create a concrete function ('action') and convert from this?
(The PolicySaver code looks quite sophisticated. Hence, I cannot fully get an overview of what is done and why.)

Thanks for your comments!

@ebrevdo
Copy link
Contributor

ebrevdo commented Apr 17, 2021

There is now a unit test showing how to use policy saver with tflite converter in policy_saver_test.py. does it help?

@soldierofhell
Copy link

Hi @ebrevdo,
There's short note in the code:

# TODO(b/111309333): Remove this when `has_input_fn_and_spec`
# is `False` once TFLite has native support for RNG ops, atan, etc.

I guess this "native support for RNG ops, atan, etc." relates to unsupported BroadcastArgs and BroadcastTo ops.
Could you please provide more details what is the root cause of the problem (e.g. where are those broadcast coming from)? Maybe it's possible to change something in tf_agents code? Or maybe we can somehow contribute to improve something on TFLite side?
Thanks in advance,
Regards,

@ebrevdo
Copy link
Contributor

ebrevdo commented Jun 17, 2021

This has nothing to do with TF-Agents - it depends on TFLite team. @jdduke FYI. Is there a relevant issue open on tf's side?

@ebrevdo
Copy link
Contributor

ebrevdo commented Jun 17, 2021

I'm not sure where the broadcast args are coming from. possibly from TF Probability? Here's where we use broadcast_to but I don't think these are the real places it's coming from. Probably from a library we're using as I mentioned.

@jdduke
Copy link
Member

jdduke commented Jun 17, 2021

@thaink is actively working to support this. I'm not sure if there's a corresponding TF issue, but we do have an internal issue tracking this.

@thaink
Copy link
Member

thaink commented Jun 18, 2021

@ebrevdo I think the BroadcastArgs may come from using broadcast_to on a dynamic tensor.
I am working on supporting BroadcastArgs now.

@soldierofhell
Copy link

Thanks guys, please leave here a comment when BroadcastArgs will be available

@soldierofhell
Copy link

@thaink any ETA for this BroadcastArgs issue? :)

@thaink
Copy link
Member

thaink commented Jun 28, 2021

Unfortunately, it is still under review.

@thaink
Copy link
Member

thaink commented Jul 8, 2021

@soldierofhell BroadcastArgs is added to master branch.
You could try it using the nightly now.

@windmaple
Copy link

I can convert the model now. Thanks for @thaink 's work.

@jdduke jdduke removed their assignment Aug 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants