-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Not able to run LLaVA-Next pretraining with NeMo 2.0 using container version nemo:24.12 #11741
Comments
At step 7 when we got the error
could this be a similar issue to this stackoverflow question, where some dependency imports are messed up? |
We would be releasing a new container with the fixes soon. |
@yashaswikarnati Sorry for the late response. I have tried both options of "buildiung a nemo container with Dockerfile from source" and "reinstalling the source code from within the container" but both are giving various different errors. Buildiung a nemo container with Dockerfile from source# In the A3M node.
git clone [[email protected]](mailto:[email protected]):NVIDIA/NeMo.git
cd NeMo
DOCKER_BUILDKIT=1 docker build -f Dockerfile -t nemo:latest . Got error
StackOverflow questions such as this have not been helpful. Reinstalling the source code from within the containerAttempted to reinstall the NeMo dependency from the main branch within the prebuilt nemo:24.12 container. docker run --gpus all -it --rm -v /home/bernardhan_google_com/nemo-multimodal:/NeMo --shm-size=32g -p 8888:8888 -p 6006:6006 --ulimit memlock=-1 --ulimit stack=67108864 [nvcr.io/nvidia/nemo:24.12](http://nvcr.io/nvidia/nemo:24.12)
cd /opt
rm -rf NeMo
# Follow https://github.com/NVIDIA/NeMo?tab=readme-ov-file#build-from-source
git clone https://github.com/NVIDIA/NeMo
apt-get update && apt-get install -y libsndfile1 ffmpeg
./[reinstall.sh](http://reinstall.sh/)
# This prints "All Done" which indicates success However, running the following code from nemo.collections import vlm
finetune = vlm.llava_next_7b.pretrain_recipe(
name="llava_next_7b_pretrain",
dir=f"/NeMo/new-ckpts",
num_nodes=1,
num_gpus_per_node=8,
language_model_from_pretrained='/NeMo/neva/checkpoints/llama-3-8b-instruct.nemo', # This is the directory where I transformed the Llama3-8b-Instruct checkpoint to .nemo format
# Can be None or change based on local checkpoint path
)
import nemo_run as run
run.run(finetune, executor=run.LocalExecutor()) yields a different error
Could you advise further? |
Hi @bernardhan33, The fix for the original issue you raised was pushed into this container - nvcr.io/nvidia/nemo:24.12.rc3 Re: Reinstalling from source - I think ToT needs a different version of transformers than what comes with the container. You could try doing - pip install transformers==4.48 Re: building from docker from source, are you trying to build from https://github.com/NVIDIA/NeMo/blob/main/Dockerfile.ci? |
Describe the bug
I would love to run LLaVA-Next pretraining with NeMo 2.0 following the documentation but failed with various errors with either nemo:24.12, nemo:24.09 or nemo:dev.
Steps/Code to reproduce bug
pretrain.py
and fill with the sample code from the documentation:/opt/NeMo/nemo/collections/vlm/recipes/llava_next_7b.py
that the code does not supportlanguage_model_from_pretrained
.language_model_from_pretrained
and tried again. Got errornemo:dev
andnemo:24.09
. Failed with errors.Confirmed from code path that the recipes do not exist yet in those versions.
Expected behavior
I should be able to follow the public documentation to get the LLaVA-NEXT pretraining run just fine.
Environment overview (please complete the following information)
docker pull
&docker run
commands used: see above.Environment details
N/A.
Additional context
N/A.
The text was updated successfully, but these errors were encountered: