[CANN] Support cpu offload optimizer for Ascend NPU #4568

hipudding · 2023-10-26T02:35:48Z

Support cpu_adam, cpu_adagrad and cpu_lion optimizer for Ascend NPU. All these optimizer are running on host, the difference between each backend is the way to copy params back to device. This commit add a new symbol called "ENABLE_CANN". This symbol can compile code adapted to NPU.
The NPU builder adds the required header files and libraries for compiling, according to CANN's compilation manual.
Note that there's no FusedLion implementation for NPU, test_cpu_lion test case should disabled until FusedLion optimizer implemented.

Besides, when NPU is selected as the accelerator, ds_report will show torch_npu and CANN informations.

With this PR, deepspeed test cases in huggingface/accelerate are all passed.

It's a part of feature list for Ascend NPU support, @see #4567

Support cpu_adam, cpu_adagrad and cpu_lion optimizer for Ascend NPU. All these optimizer are running on host, the difference between each backend is the way to copy params back to device. This commit add a new symbol called __ENABLE_CANN__. This symbol can compile code adapted to NPU. The NPU builder adds the required header files and libraries for compiling, according to CANN's compilation manual. Note that there's no FusedLion implementation for NPU, test_cpu_lion test case should disabled until FusedLion optimizer implemented. Besides, when NPU is selected as the accelerator, ds_report will show torch_npu and CANN informations.

ji-huazhong · 2023-11-11T03:26:59Z

Hi @tjruwase, please take a look at this PR. 🤗 Deepspeed test cases in huggingface/transformers are also passed. See:huggingface/transformers#27342 (comment)

hipudding · 2023-11-13T03:44:21Z

@tjruwase Format and spell issue has fixed. Please re-trigger checks, Thanks.

hipudding · 2023-11-14T01:08:49Z

All checks are passed. Is it ready to merge now?

@see

Support cpu_adam, cpu_adagrad and cpu_lion optimizer for Ascend NPU. All these optimizer are running on host, the difference between each backend is the way to copy params back to device. This commit add a new symbol called "__ENABLE_CANN__". This symbol can compile code adapted to NPU. The NPU builder adds the required header files and libraries for compiling, according to CANN's compilation manual. Note that there's no FusedLion implementation for NPU, test_cpu_lion test case should disabled until FusedLion optimizer implemented. Besides, when NPU is selected as the accelerator, ds_report will show torch_npu and CANN informations. With this PR, deepspeed test cases in [huggingface/accelerate](https://github.com/huggingface/accelerate/tree/main/tests/deepspeed) are all passed. It's a part of feature list for Ascend NPU support, @see microsoft#4567 --------- Co-authored-by: Olatunji Ruwase <[email protected]>

hipudding requested review from jeffra, tjruwase, mrwyattii, RezaYazdaniAminabadi, cmikeh2, awan-10 and arashb as code owners October 26, 2023 02:35

hipudding marked this pull request as draft October 26, 2023 02:35

hipudding mentioned this pull request Oct 26, 2023

[Feature package] Full feature support with Ascend NPU #4567

Closed

hipudding force-pushed the cpu_adam branch 2 times, most recently from f572373 to 07916e0 Compare October 27, 2023 07:02

hipudding changed the title ~~[WIP][CANN] Support cpu_adam optimizer for NPU~~ [CANN] Support cpu offload optimizer for NPU Oct 27, 2023

hipudding changed the title ~~[CANN] Support cpu offload optimizer for NPU~~ [CANN] Support cpu offload optimizer for Ascend NPU Oct 27, 2023

hipudding force-pushed the cpu_adam branch 2 times, most recently from 4deb6f6 to 11d61cb Compare November 9, 2023 02:22

hipudding force-pushed the cpu_adam branch from 11d61cb to 91b0976 Compare November 9, 2023 02:32

hipudding marked this pull request as ready for review November 9, 2023 02:33

Merge branch 'master' into cpu_adam

6d33acc

tjruwase and others added 2 commits November 11, 2023 19:17

Merge branch 'master' into cpu_adam

d2fe01e

Fix code format and spell issue

b77e892

tjruwase approved these changes Nov 14, 2023

View reviewed changes

tjruwase added this pull request to the merge queue Nov 14, 2023

Merged via the queue into microsoft:master with commit c1ba6a1 Nov 14, 2023
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CANN] Support cpu offload optimizer for Ascend NPU #4568

[CANN] Support cpu offload optimizer for Ascend NPU #4568

hipudding commented Oct 26, 2023 •

edited

Loading

ji-huazhong commented Nov 11, 2023 •

edited

Loading

hipudding commented Nov 13, 2023

hipudding commented Nov 14, 2023

[CANN] Support cpu offload optimizer for Ascend NPU #4568

[CANN] Support cpu offload optimizer for Ascend NPU #4568

Conversation

hipudding commented Oct 26, 2023 • edited Loading

ji-huazhong commented Nov 11, 2023 • edited Loading

hipudding commented Nov 13, 2023

hipudding commented Nov 14, 2023

hipudding commented Oct 26, 2023 •

edited

Loading

ji-huazhong commented Nov 11, 2023 •

edited

Loading