Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix halting N300 tests. #170

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Fix halting N300 tests. #170

wants to merge 2 commits into from

Conversation

LPanosTT
Copy link
Contributor

@LPanosTT LPanosTT commented Jan 7, 2025

Open the subprocesses which compile and run ops in the op-by-op flow by spawning rather than forking the main process.

If an error is thrown during the compilation, catch it, terminate the process, then raise it again.

@LPanosTT LPanosTT requested a review from AleksKnezevic January 7, 2025 16:39
@LPanosTT LPanosTT marked this pull request as draft January 7, 2025 17:05
@LPanosTT LPanosTT force-pushed the lpanos/n300_op_by_op_fix branch 2 times, most recently from ec6f06c to dfdca7a Compare January 8, 2025 18:07
@LPanosTT LPanosTT marked this pull request as ready for review January 8, 2025 18:07
@@ -14,8 +14,8 @@ jobs:
fail-fast: false
matrix:
build: [
{runs-on: n150, name: "run1", test_names: "stable_diffusion, Qwen, MobileNetV2, clip, flan_t5, mlpmixer, resnet, vilt, albert, codegen, glpn_kitti, mnist, resnet50, RMBG, unet_carvana, mgp-str-base, musicgen_small, segformer, torchvision, yolos"},
{runs-on: n150, name: "run2", test_names: "t5, whisper, autoencoder_conv, deit, gpt2, mobilenet_ssd, roberta, timm, xglm, autoencoder_linear, detr, beit, distilbert, hand_landmark, openpose, segment_anything, unet, yolov3, bert, dpr, hardnet, opt, speecht5_tts, unet_brain, yolov5, bloom, falcon, llama, perceiver_io, squeeze_bert, gpt_neo"},
{runs-on: n150, n300, name: "run1", test_names: "stable_diffusion, Qwen, MobileNetV2, clip, flan_t5, mlpmixer, resnet, vilt, albert, codegen, glpn_kitti, mnist, resnet50, RMBG, unet_carvana, mgp-str-base, musicgen_small, segformer, torchvision, yolos"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we skip the nightlies until we figure out what's causing the hangs?

@LPanosTT LPanosTT force-pushed the lpanos/n300_op_by_op_fix branch 2 times, most recently from 63c7d79 to 1f2459c Compare January 10, 2025 20:05
@LPanosTT LPanosTT force-pushed the lpanos/n300_op_by_op_fix branch from 1a3acde to 2a34e08 Compare January 20, 2025 16:06
add _extract_outputs function to ModelTester that subclasses may need to
implement. This function should return a tuple of torch.Tensors

Trim unused code in tests/utils.py

move verify_against_golden to verify.py

replace self.model with self.framework_model and self.compiled_model

Use fp32 for all models

Enable n300 model tests. Use  to create new processes with python multiprocessing rather than the default

make sure to terminate process if an error is raised before we get to  it

n300 run
@LPanosTT LPanosTT force-pushed the lpanos/n300_op_by_op_fix branch from 2a34e08 to 004151d Compare January 21, 2025 19:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants