Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

虚拟环境安装deepspeed依赖报错后和其他冲突问题的一种解决方案 #369

Open
ayme0707 opened this issue Mar 1, 2025 · 0 comments

Comments

@ayme0707
Copy link

ayme0707 commented Mar 1, 2025

根据正文的教程,如何使用Llama模型我执行命令后遇到了deepspeed无法正确收集安装的报错,如果强制升级又会遇到其他依赖冲突报错的

git clone https://github.com/LlamaFamily/Llama-Chinese.git
cd Llama-Chinese
pip install -r requirements.txt

我最开始是遇到了deepspeed报错,尝试换了好几个版本无法解决,后来使用pipi升级最新版本发现按照没有错误了,但是又遇到了高版本的deepspeed与依赖pytorch2.1.2冲突的报错,随后我对pytorch升级又遇到了其他的冲突报错。。。。
我升级了某个依赖后,运行提示我PyTorch版本与CUDA不兼容,告诉我缺少flash-attn依赖项。。。还有 libiomp5md.dll 报错
我于是逐步开始解决,关于OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll already initialized.错误解决方法
看这个:https://zhuanlan.zhihu.com/p/371649016

后续弄了一下午人都整麻了,说说我过程。
1、我虚拟环境重新删除了,重新创建一个310的环境包,安装了pytorch最新版本2.6,我显卡cuda是12.6,因此我装的是pytorch也是选择12.6的cuda,我上一次是装了cuda11.8的,出现pytorch提示和我电脑的cuda不兼容的情况,我原来的conda是3.9我是强制升级到3.10不知道这样会不会影响pytorch但是我后面又把整个虚拟环境都删除了重装的cuda11.8的pytorch,没有什么用。

2、把依赖包的全部手动安装,我这一步是为了检验到底是哪个插件冲突报错,最后我全部装的不带版本号,直接都装了最新的版本(这里有个坑)
*transformers的版本和python版本挂钩,请见:https://pypi.org/project/transformers/4.49.0/#history
bitsandbytes与transformers的版本挂钩

3、安装flash-attn看这个:https://blog.csdn.net/MurphyStar/article/details/138523803
*flash-attn与python版本,cuda版本,pytorch版本都挂钩,注意选择合适的安装包
flash_attn-2.7.4.post1+cu124torch2.6.0cxx11abiFALSE-cp310-cp310-win_amd64.whl
例如我选择就是flash_attn-2.7.4的版本,cudacu124,(没有最新的12.6选择低一版本的),cp310的python版本,win系统

4.依赖解决之后运行脚本,会出现
The load_in_4bitandload_in_8bitarguments are deprecated and will be removed in the future versions. Please, pass aBitsAndBytesConfigobject inquantization_configargument instead. The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please useattn_implementation="flash_attention_2"instead. Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 3/3 [02:14<00:00, 44.91s/it] The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input'sattention_maskto obtain reliable results. Theseen_tokensattribute is deprecated and will be removed in v4.41. Use thecache_position model input instead. Traceback (most recent call last): File "P:\Docker\text-generation-webui\models\quick_startAtom.py", line 23, in <module> generate_ids = model.generate(**generate_input) File "C:\Users\User\.conda\envs\py310\lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) File "C:\Users\User\.conda\envs\py310\lib\site-packages\transformers\generation\utils.py", line 2223, in generate result = self._sample( File "C:\Users\User\.conda\envs\py310\lib\site-packages\transformers\generation\utils.py", line 3204, in _sample model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs) File "C:\Users\User\.cache\huggingface\modules\transformers_modules\Atom-7B-Chat\model_atom.py", line 1380, in prepare_inputs_for_generation max_cache_length = past_key_values.get_max_length() File "C:\Users\User\.conda\envs\py310\lib\site-packages\torch\nn\modules\module.py", line 1928, in __getattr__ raise AttributeError( AttributeError: 'DynamicCache' object has no attribute 'get_max_length'. Did you mean: 'get_seq_length'?

这是因为2中我装的最新版本的transformers,4.49的冲突,起先我装回依赖里的4.23,之后遇到了bitsandbytes报错,我又装回了bitsandbytes依赖指定的0.42,但是transformers和bitsandbytes是不冲突了,但是bitsandbytes和我的pytorch、cuda版本又冲突了,它无法识别我12.6的cuda
我无法降级安装,只能又装回了最新的版本。
最后解决的方法按照提示修改了FlagAlpha\Atom-7B-Chat\model_atom.py文件,把1380行max_cache_length = past_key_values.get_max_length()
的get_max_length换成了get_seq_length,保存成功运行了。

5.最后还是有一些警告,不知道要不要解决,如果你有什么好的建议可以回复我

The load_in_4bit and load_in_8bit arguments are deprecated and will be removed in the future versions. Please, pass a BitsAndBytesConfig object in quantization_config argument instead.
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use attn_implementation="flash_attention_2" instead.
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 3/3 [02:01<00:00, 40.43s/it]
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results.
The seen_tokens attribute is deprecated and will be removed in v4.41. Use the cache_position model input instead.
Human: 介绍一下中国
Assistant: 中华人民共和国是中国的一个国家,位于亚洲东部、太平洋西岸。它是世界上人口最多的发展中大国之一,也是全 球第二大经济体和国家元首会议的常任成员国之一。中国的历史悠久,文化丰富多彩,是世界上最古老的文明之一的发源地之一。此外,它也是一个多民族的国家,拥有多种不同的语言和文化传统。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant