Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bfloat to client utils #118

Closed
wants to merge 5 commits into from
Closed

Add bfloat to client utils #118

wants to merge 5 commits into from

Conversation

jbkyang-nvi
Copy link
Contributor

@jbkyang-nvi jbkyang-nvi commented Jun 17, 2022

Add bfloat16 library to client utils.
Server PR to put bfloat16 pip library into SDK container: triton-inference-server/server#4521

@@ -125,6 +125,7 @@ def debug_details(self):


def np_to_triton_dtype(np_dtype):
from bfloat16 import bfloat16
Copy link
Contributor

@rmccorm4 rmccorm4 Jun 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not asking for change here, but just a note for future reference - if we ever encounter any issues with importing this package - we can defer the import to only be when it is actually needed by placing this in the else block.

That way, no one will ever hit the bf16 import unless they're using a type not already covered by builtin numpy types that we support.

I think this is OK how it is for now though.

i.e.:

if np_dtype == bool:
  ...
# If not covered by builtin numpy types, defer custom imports here to only be when needed
else:
    from bfloat16 import bfloat16
    if np_dtype == bfloat16:
        return "BF16"

return None

If there is any issue with adding bfloat16 to our existing requirements.txt this could also be where we would output a warning/error to user that they have to install the package themself.

# If not covered by builtin numpy types, defer custom imports here to only be when needed
else:
    try:
        from bfloat16 import bfloat16
    except:
        print("ERROR: bfloat16 package is required, please install it: <pip install bfloat16>")
        return None
        
    if np_dtype == bfloat16:
        return "BF16"

return None

Copy link
Member

@Tabrizian Tabrizian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add the dependency to requirements.txt file? If so, does this mean that the user will not be able to install tensorflow even if they don't use bfloat16?

@@ -125,6 +125,7 @@ def debug_details(self):


def np_to_triton_dtype(np_dtype):
from bfloat16 import bfloat16
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Undoing my approval - agree with Iman, we probably need to add this to the requirements.txt - currently anyone calling this function will require the bf16 package.

@jbkyang-nvi
Copy link
Contributor Author

@rmccorm4 @Tabrizian I made a workaround so that the user would not need to import bfloat16 and allow them to import tensorflow if they don't do that. Let me know what you think

@@ -151,6 +151,9 @@ def np_to_triton_dtype(np_dtype):
return "FP64"
elif np_dtype == np.object_ or np_dtype.type == np.bytes_:
return "BYTES"
elif (str(np_dtype) == "<class 'bfloat16'>"):
if np_dtype == bfloat16:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need the import here too right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also are we leaving it as an optional dependency user has to install then? If so we should document that somewhere noticeable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the dtype is already there, it means the user has already imported bfloat16 right? The user does not have to install bfloat16, it is already in the sdk container. I can add a comment here if they are doing their own thing in their container.

They will need to import bfloat16, which I assume they will do because they are converting a dtype to triton type and therefore already using it somewhere else

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user does not have to install bfloat16, it is already in the sdk container.

if they're using the client outside of the SDK container, they will. such as on jetson where we install wheel directly, right?

If the dtype is already there, it means the user has already imported bfloat16 right?

Your logic sounds reasonable that if they passed the type, it must already be imported - but I suspect a couple potential issues:

  • If they are using a different bf16 library that does the same thing (patching a numpy type), there may be a potential issue?
  • Also if for some reason they did import bfloat16 as bf16 in their code or used a differently named package, I believe this if np_dtype == bfloat16 would fail to resolve the name/package.

Copy link
Contributor Author

@jbkyang-nvi jbkyang-nvi Jun 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Added comments to address your concern with regards to user needing to install bfloat16
  2. With regards to your concern about importing after the type->string check:
    a. If the user is using another bfloat16 library, I think importing our own bfloat16 will not save us, since there definitely will be symbol conflicts. I think a better resolution would be to recommend the package to install. I don't like putting it in requirements.txt because it prevents the user from using tensorflow in the same script...
    b. good point will do import inline as well...

Copy link
Contributor

@rmccorm4 rmccorm4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comments

@jbkyang-nvi jbkyang-nvi requested a review from rmccorm4 June 22, 2022 01:02
Copy link
Contributor

@rmccorm4 rmccorm4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems OK to me to save the import if and only if user is using bf16 type.

Not sure what the right move is on the requirements.txt to not include bf16 for users installing pip package / wheel directly.

I don't like putting it in requirements.txt because it prevents the user from using tensorflow in the same script

Wouldn't users trying to use Tensorflow in our SDK container run into the same problem if we're pre-installing it there for them?


It seems like we're being a little hacky to make this work with this particular package. Current options off the top of my head:

  • I don't think writing our own bf16 type is worth the effort/maintenance
  • Don't install this bfloat16 package by default in container/requirements.txt and make it clear to users somehow that they'll have to install it themselves if needed?
  • Forking/patching the bfloat16 package to simply fix the TF segfault issue ourselves? Maybe it's worth spending some cycles to find the root cause and re-evaluate?
  • Adding tensorflow pip package to the container/requirements to use tf.bfloat16 numpy type instead may make this less hacky, but looks like it would introduce ~500+MB which is quite large, so not ideal.
  • Not supporting BF16 natively in python client library for now (product decision)

What does everyone think? @jbkyang-nvi @Tabrizian @tanmayv25 @GuanLuo

@jbkyang-nvi
Copy link
Contributor Author

jbkyang-nvi commented Jun 23, 2022

Thought about it more and I think requirements.txt is reasonable. Also modified the import to include try/catch logic

re:

  • Forking/patching the bfloat16 package to simply fix the TF segfault issue ourselves? Maybe it's worth spending some cycles to find the root cause and re-evaluate?

I think this is a reasonable suggestion. I do think that we can do it as an additional ticket.

re:

  • Adding tensorflow pip package to the container/requirements to use tf.bfloat16 numpy type instead may make this less hacky, but looks like it would introduce ~500+MB which is quite large, so not ideal.

I think we don't want to do this in general to have a lightweight minimal container.

@jbkyang-nvi jbkyang-nvi requested a review from rmccorm4 June 23, 2022 22:43
@Tabrizian
Copy link
Member

Tabrizian commented Jun 24, 2022

IMO, this library looks very fragile. Is there any other alternative bfloat16 library that we could be using? What was the conclusion on using DLPack?

Forking/patching the bfloat16 package to simply fix the GreenWaves-Technologies/bfloat16#2 ourselves? Maybe it's worth spending some cycles to find the root cause and re-evaluate?

I think this is the problem with poorly maintained open-source libraries. The issue is open for 5 months and nobody has responded to it. I think it might be better to search for alternative solutions.

@tanmayv25
Copy link
Contributor

I think for bfloat16, or any other future new data type. We can add a capability in python client library to provide tensor contents as raw bytes.

@jbkyang-nvi jbkyang-nvi deleted the kyang-python-bf16 branch July 19, 2022 22:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants