You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I tried the synthetic code on FTN env and the test results are close to the paper. However, for the DST env, I get 0.77 and 0.66 F1 for two runs. But in the paper, it is much higher. Do the settings below (the setting in the readme file) apply to the DST env too?
(I only changed --env-name to dst and --episode-num to 2000.)
Also, there is a part of code that I can not relate to the paper. In the paper it is stated that the scalarized version is similar to CN+DER. The CN+DER in the reference paper outputs a vector Q for each action. In the code (naive), the output of the Q network is a scalar for each action. What am I missing here?
Hello, I tried the synthetic code on FTN env and the test results are close to the paper. However, for the DST env, I get 0.77 and 0.66 F1 for two runs. But in the paper, it is much higher. Do the settings below (the setting in the readme file) apply to the DST env too?
(I only changed --env-name to dst and --episode-num to 2000.)
python train.py --env-name dst --method crl-envelope --model linear --gamma 0.99 --mem-size 4000 --batch-size 256 --lr 1e-3 --epsilon 0.5 --epsilon-decay --weight-num 32 --episode-num 2000 --optimizer Adam --save crl/envelope/saved/ --log crl/envelope/logs/ --update-freq 100 --beta 0.01 --name 0
The text was updated successfully, but these errors were encountered: