You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Some tests are weak and can easily be passed without answering the problem described in the prompt.
Expected behavior
All problems should have tests that are not easy to be fooled. A "passed" test should mean that the problem has been solved.
Additional context
For example "qiskitHumanEval/68" is rated as difficult but the test is very easy.
Here is the prompt:
fromqiskitimportQuantumCircuitfromqiskit_aerimportAerSimulatorfromqiskit_ibm_runtimeimportSamplerfromnumpyimportpidefzeno_elitzur_vaidman_bomb_tester(bomb_live: bool)->(float, float, float):
""" Design a Zeno Elitzur Vaidman Bomb Tester circuit which takes the boolean if the bomb is live and outputs the percentage of successful live bomb predictions, dud bomb predictions and bombs that detonated. Use 25 cycles to increase the efficiency of the circuit. """
the canonical solution:
fromqiskitimportQuantumCircuitfromqiskit_aerimportAerSimulatorfromqiskit_ibm_runtimeimportSamplerfromnumpyimportpidefzeno_elitzur_vaidman_bomb_tester(bomb_live: bool)->(float, float, float):
""" Design a Zeno Elitzur Vaidman Bomb Tester circuit which takes the boolean if the bomb is live and outputs the percentage of successful live bomb predictions, dud bomb predictions and bombs that detonated. Use 25 cycles to increase the efficiency of the circuit. """live_predictions=dud_predictions=detonations=0shots=1024cycles=25e=pi/cyclesmeasurements=cycles+1ifbomb_liveelse1circuit=QuantumCircuit(1, measurements)
foriinrange(cycles):
circuit.ry(e, 0)
ifbomb_live:
circuit.measure(0, i)
circuit.measure(0, measurements-1)
backend=AerSimulator()
sampler=Sampler(mode=backend)
job=sampler.run([circuit],shots=shots).result()
counts=job[0].data.c.get_counts()
ifbomb_live:
forkey, valueincounts.items():
ifkey[0] =='1':
detonations+=valueelif'1'inkey[1:]:
dud_predictions+=valueelse:
live_predictions+=valueelse:
live_predictions=counts['0'] if'0'incountselse0dud_predictions=counts['1']
detonations=0return (live_predictions/shots, dud_predictions/shots, detonations/shots)
Below a solution that was generated by an LLM which clearly does not answer the problem but does pass the test:
fromqiskitimportQuantumCircuitfromqiskit_aerimportAerSimulatorfromqiskit_ibm_runtimeimportSamplerfromnumpyimportpidefzeno_elitzur_vaidman_bomb_tester(bomb_live: bool)->(float, float, float):
""" Design a Zeno Elitzur Vaidman Bomb Tester circuit which takes the boolean if the bomb is live and outputs the percentage of successful live bomb predictions, dud bomb predictions and bombs that detonated. Use 25 cycles to increase the efficiency of the circuit. """ifbomb_live:
return (1, 0, 0)
else:
return (0, 1, 0)
The text was updated successfully, but these errors were encountered:
Describe the bug
Some tests are weak and can easily be passed without answering the problem described in the prompt.
Expected behavior
All problems should have tests that are not easy to be fooled. A "passed" test should mean that the problem has been solved.
Additional context
For example "qiskitHumanEval/68" is rated as difficult but the test is very easy.
Here is the prompt:
the canonical solution:
and the test:
Below a solution that was generated by an LLM which clearly does not answer the problem but does pass the test:
The text was updated successfully, but these errors were encountered: