-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
QE Blow Assertion #9
Comments
I have an update, but the problem remains. The output of the network is all NaNs. I can change the activation functions to tanh() to get better network outputs, but during the backward pass, the max_entry when QE is called is still always 0. I'm wondering if this is maybe a quantization error, but I can't pinpoint where the error is occurring. |
You can try to change the "beta" of scale_limit function in wage_initializer.py. It may work. |
Change the method QE inside wage_quantizer.py to: |
BTW, is there any way to solve the CUDA out of memory issue without changing the network topology? I tried train.py with --inference=1, but the memory insufficiency message keeps being reported even if I set batch_size =1 |
I'm attempting to look at the effects of certain hardware parameters (cellBit, ADCPrecision, etc.) on accuracy and energy. I set "--inference 1" on a relatively unchanged clone of the repository and my GPU ran out of memory. After reducing the size of the layers but leaving everything else generally unchanged (except for a few errors), I keep getting a "QE Blow" assertion error. I've used print statements to find that the assertion error occurs during the second run of "backward" for WAGERounding. Changing grad_scale hasn't helped, nor has adjusting the network architecture. Adding a small value to "x" since it is zero also doesn't help. Is there a possible explanation for why this error is occurring?
The text was updated successfully, but these errors were encountered: