QE Blow Assertion #9

BradyTaylor1996 · 2022-06-22T18:45:06Z

I'm attempting to look at the effects of certain hardware parameters (cellBit, ADCPrecision, etc.) on accuracy and energy. I set "--inference 1" on a relatively unchanged clone of the repository and my GPU ran out of memory. After reducing the size of the layers but leaving everything else generally unchanged (except for a few errors), I keep getting a "QE Blow" assertion error. I've used print statements to find that the assertion error occurs during the second run of "backward" for WAGERounding. Changing grad_scale hasn't helped, nor has adjusting the network architecture. Adding a small value to "x" since it is zero also doesn't help. Is there a possible explanation for why this error is occurring?

BradyTaylor1996 · 2022-06-23T15:59:11Z

I have an update, but the problem remains. The output of the network is all NaNs. I can change the activation functions to tanh() to get better network outputs, but during the backward pass, the max_entry when QE is called is still always 0. I'm wondering if this is maybe a quantization error, but I can't pinpoint where the error is occurring.

neurosim · 2022-06-25T03:54:02Z

You can try to change the "beta" of scale_limit function in wage_initializer.py. It may work.

rafaelfmoura · 2022-06-28T12:55:21Z

Change the method QE inside wage_quantizer.py to:
def QE(x, bits):
max_entry = x.abs().max()
if max_entry == 0:
max_entry = max_entry+1e-9
x /= shift(max_entry)
return Q(C(x, bits), bits)
This will prevent division by zero in the quantization process

SenFFF · 2022-07-21T22:27:50Z

BTW, is there any way to solve the CUDA out of memory issue without changing the network topology? I tried train.py with --inference=1, but the memory insufficiency message keeps being reported even if I set batch_size =1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QE Blow Assertion #9

QE Blow Assertion #9

BradyTaylor1996 commented Jun 22, 2022 •

edited

Loading

BradyTaylor1996 commented Jun 23, 2022

neurosim commented Jun 25, 2022

rafaelfmoura commented Jun 28, 2022

SenFFF commented Jul 21, 2022

QE Blow Assertion #9

QE Blow Assertion #9

Comments

BradyTaylor1996 commented Jun 22, 2022 • edited Loading

BradyTaylor1996 commented Jun 23, 2022

neurosim commented Jun 25, 2022

rafaelfmoura commented Jun 28, 2022

SenFFF commented Jul 21, 2022

BradyTaylor1996 commented Jun 22, 2022 •

edited

Loading