You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We had boards/cabinets where the BMP command failed
Jobs get allocated here but BMPSendTimedOutException (see log)
Job hangs in QUEUED
Found in /home/spalloc/spalloc.log on https://spinnaker.cs.man.ac.uk/
2024-05-05 07:11:33.787 INFO 1176 --- [ThreadPoolTaskScheduler16] u.a.m.s.a.a.AllocatorTask : Job 452535 changes resulted in errors.
2024-05-05 07:11:36.799 ERROR 1176 --- [ThreadPoolTaskScheduler-8] u.a.m.s.a.b.BMPController : Requests failed on BMP 357
uk.ac.manchester.spinnaker.transceiver.ProcessException: when sending to 0:0:13, received exception: uk.ac.manchester.spinnaker.transceiver.BMPSendTimedOutException
with message: Operation CMD_VER (GetBMPVersion(command=CMD_VER, sequence=51774, argument1=0, argument2=0, argument3=0)) timed out after 0.750000 seconds
at uk.ac.manchester.spinnaker.transceiver.ProcessException.makeInstance(ProcessException.java:116) ~[SpiNNaker-comms-7.1.0-SNAPSHOT.jar!/:?]
at uk.ac.manchester.spinnaker.transceiver.BMPCommandProcess$RequestPipeline.finish(BMPCommandProcess.java:464) ~[SpiNNaker-comms-7.1.0-SNAPSHOT.jar!/:?]
at uk.ac.manchester.spinnaker.transceiver.BMPCommandProcess.call(BMPCommandProcess.java:164) ~[SpiNNaker-comms-7.1.0-SNAPSHOT.jar!/:?]
at uk.ac.manchester.spinnaker.transceiver.Transceiver.get(Transceiver.java:1725) ~[SpiNNaker-comms-7.1.0-SNAPSHOT.jar!/:?]
at uk.ac.manchester.spinnaker.transceiver.Transceiver.readBMPVersion(Transceiver.java:1839) ~[SpiNNaker-comms-7.1.0-SNAPSHOT.jar!/:?]
at uk.ac.manchester.spinnaker.transceiver.BMPTransceiverInterface.readBMPVersion(BMPTransceiverInterface.java:859) ~[SpiNNaker-comms-7.1.0-SNAPSHOT.jar!/:?]
at uk.ac.manchester.spinnaker.alloc.bmp.SpiNNaker1.canBoardManageFPGAs(SpiNNaker1.java:212) ~[classes!/:?]
at uk.ac.manchester.spinnaker.alloc.bmp.SpiNNaker1.setLinkOff(SpiNNaker1.java:228) ~[classes!/:?]
at uk.ac.manchester.spinnaker.alloc.bmp.BMPController$PowerRequest.changeBoardPowerState(BMPController.java:502) ~[classes!/:?]
at uk.ac.manchester.spinnaker.alloc.bmp.BMPController$PowerRequest.lambda$tryProcessRequest$10(BMPController.java:621) ~[classes!/:?]
at uk.ac.manchester.spinnaker.alloc.bmp.BMPController$Request.bmpAction(BMPController.java:279) ~[classes!/:?]
at uk.ac.manchester.spinnaker.alloc.bmp.BMPController$PowerRequest.tryProcessRequest(BMPController.java:620) ~[classes!/:?]
at uk.ac.manchester.spinnaker.alloc.bmp.BMPController$Request.processRequest(BMPController.java:384) ~[classes!/:?]
at uk.ac.manchester.spinnaker.alloc.bmp.BMPController$Worker.run(BMPController.java:1079) ~[classes!/:?]
at uk.ac.manchester.spinnaker.alloc.bmp.BMPController.lambda$triggerSearch$4(BMPController.java:226) ~[classes!/:?]
at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54) [spring-context-5.3.30.jar!/:5.3.30]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) [?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.lang.Thread.run(Thread.java:840) [?:?]
Caused by: uk.ac.manchester.spinnaker.transceiver.BMPSendTimedOutException: Operation CMD_VER (GetBMPVersion(command=CMD_VER, sequence=51774, argument1=0, argument2=0, argument3=0)) timed out after 0.750000 seconds
at uk.ac.manchester.spinnaker.transceiver.BMPCommandProcess$RequestPipeline.resend(BMPCommandProcess.java:530) ~[SpiNNaker-comms-7.1.0-SNAPSHOT.jar!/:?]
at uk.ac.manchester.spinnaker.transceiver.BMPCommandProcess$RequestPipeline.handleReceiveTimeout(BMPCommandProcess.java:515) ~[SpiNNaker-comms-7.1.0-SNAPSHOT.jar!/:?]
at uk.ac.manchester.spinnaker.transceiver.BMPCommandProcess$RequestPipeline.finish(BMPCommandProcess.java:456) ~[SpiNNaker-comms-7.1.0-SNAPSHOT.jar!/:?]
The text was updated successfully, but these errors were encountered:
Note that the "stuck in queued" appears to be that after the failure, the same board is again tried, and this repeats. Ideally a board that is attempted and fails is marked as having been allocated to avoid this repetition. Even more ideally, the board would be disabled after a number of failures, and an admin emailed for evaluation.
We had boards/cabinets where the BMP command failed
Jobs get allocated here but BMPSendTimedOutException (see log)
Job hangs in QUEUED
Found in /home/spalloc/spalloc.log on https://spinnaker.cs.man.ac.uk/
2024-05-05 07:11:33.787 INFO 1176 --- [ThreadPoolTaskScheduler16] u.a.m.s.a.a.AllocatorTask : Job 452535 changes resulted in errors.
2024-05-05 07:11:36.799 ERROR 1176 --- [ThreadPoolTaskScheduler-8] u.a.m.s.a.b.BMPController : Requests failed on BMP 357
uk.ac.manchester.spinnaker.transceiver.ProcessException: when sending to 0:0:13, received exception: uk.ac.manchester.spinnaker.transceiver.BMPSendTimedOutException
with message: Operation CMD_VER (GetBMPVersion(command=CMD_VER, sequence=51774, argument1=0, argument2=0, argument3=0)) timed out after 0.750000 seconds
at uk.ac.manchester.spinnaker.transceiver.ProcessException.makeInstance(ProcessException.java:116) ~[SpiNNaker-comms-7.1.0-SNAPSHOT.jar!/:?]
at uk.ac.manchester.spinnaker.transceiver.BMPCommandProcess$RequestPipeline.finish(BMPCommandProcess.java:464) ~[SpiNNaker-comms-7.1.0-SNAPSHOT.jar!/:?]
at uk.ac.manchester.spinnaker.transceiver.BMPCommandProcess.call(BMPCommandProcess.java:164) ~[SpiNNaker-comms-7.1.0-SNAPSHOT.jar!/:?]
at uk.ac.manchester.spinnaker.transceiver.Transceiver.get(Transceiver.java:1725) ~[SpiNNaker-comms-7.1.0-SNAPSHOT.jar!/:?]
at uk.ac.manchester.spinnaker.transceiver.Transceiver.readBMPVersion(Transceiver.java:1839) ~[SpiNNaker-comms-7.1.0-SNAPSHOT.jar!/:?]
at uk.ac.manchester.spinnaker.transceiver.BMPTransceiverInterface.readBMPVersion(BMPTransceiverInterface.java:859) ~[SpiNNaker-comms-7.1.0-SNAPSHOT.jar!/:?]
at uk.ac.manchester.spinnaker.alloc.bmp.SpiNNaker1.canBoardManageFPGAs(SpiNNaker1.java:212) ~[classes!/:?]
at uk.ac.manchester.spinnaker.alloc.bmp.SpiNNaker1.setLinkOff(SpiNNaker1.java:228) ~[classes!/:?]
at uk.ac.manchester.spinnaker.alloc.bmp.BMPController$PowerRequest.changeBoardPowerState(BMPController.java:502) ~[classes!/:?]
at uk.ac.manchester.spinnaker.alloc.bmp.BMPController$PowerRequest.lambda$tryProcessRequest$10(BMPController.java:621) ~[classes!/:?]
at uk.ac.manchester.spinnaker.alloc.bmp.BMPController$Request.bmpAction(BMPController.java:279) ~[classes!/:?]
at uk.ac.manchester.spinnaker.alloc.bmp.BMPController$PowerRequest.tryProcessRequest(BMPController.java:620) ~[classes!/:?]
at uk.ac.manchester.spinnaker.alloc.bmp.BMPController$Request.processRequest(BMPController.java:384) ~[classes!/:?]
at uk.ac.manchester.spinnaker.alloc.bmp.BMPController$Worker.run(BMPController.java:1079) ~[classes!/:?]
at uk.ac.manchester.spinnaker.alloc.bmp.BMPController.lambda$triggerSearch$4(BMPController.java:226) ~[classes!/:?]
at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54) [spring-context-5.3.30.jar!/:5.3.30]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) [?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.lang.Thread.run(Thread.java:840) [?:?]
Caused by: uk.ac.manchester.spinnaker.transceiver.BMPSendTimedOutException: Operation CMD_VER (GetBMPVersion(command=CMD_VER, sequence=51774, argument1=0, argument2=0, argument3=0)) timed out after 0.750000 seconds
at uk.ac.manchester.spinnaker.transceiver.BMPCommandProcess$RequestPipeline.resend(BMPCommandProcess.java:530) ~[SpiNNaker-comms-7.1.0-SNAPSHOT.jar!/:?]
at uk.ac.manchester.spinnaker.transceiver.BMPCommandProcess$RequestPipeline.handleReceiveTimeout(BMPCommandProcess.java:515) ~[SpiNNaker-comms-7.1.0-SNAPSHOT.jar!/:?]
at uk.ac.manchester.spinnaker.transceiver.BMPCommandProcess$RequestPipeline.finish(BMPCommandProcess.java:456) ~[SpiNNaker-comms-7.1.0-SNAPSHOT.jar!/:?]
The text was updated successfully, but these errors were encountered: