Skip to content

Commit

Permalink
merge upstream changes (#186)
Browse files Browse the repository at this point in the history
* freeze pytorch version to fix mypy crash (Learning-and-Intelligent-Systems#1563)

* Implement infinite-horizon for exploration (Learning-and-Intelligent-Systems#1565)

* simple initial implementation

* fix checks

* okay - really fix checks now

* MyPy Bump and changes (Learning-and-Intelligent-Systems#1568)

* minor changes to fix bugs (Learning-and-Intelligent-Systems#1569)

* fix get_objects in hierarchical typing case (Learning-and-Intelligent-Systems#1572)

Co-authored-by: Tom Silver <[email protected]>

* fix hierarchical typing edge case (Learning-and-Intelligent-Systems#1574)

* Fix + raise awareness of subtle bugs with active sampler exploration (Learning-and-Intelligent-Systems#1575)

* fix subtle bugs

* yapf

* Ball and Cup Sticky Table Env (Learning-and-Intelligent-Systems#1576)

* initial commit that seems to run without error...

* fix bug in placing logic

* delete outdated comment

* fix replanning bug

* more data = better results???

* starting tests

* try oracle feature selection?

* fix buggy test

* increase training time?

* yapf + fix tom comment

* fix reachability issue in placing

* minor

* more unit tests

* fix and more tests

* this should be interesting

* see if this yields a difference

* let's see what happens now

* woops

* try removing placing cup with the ball on the table

* hail mary

* minor changes + logging

* run task repeat first

* sticky table with moving radius

* yay! try other approaches...

* polar coordinates ftw!

* try a simpler thing

* let's see how this does.

* try more probability of success

* all baselines

* try running grid row env

* most things passing

* try this

* progress towards PR

* should be ready!

* revert unnecessary change

* fix linting

* tom comments

---------

Co-authored-by: Tom Silver <[email protected]>

* allow third party users to define their own oracle NSRTs (Learning-and-Intelligent-Systems#1578)

* allow third party users to define their own oracle NSRTs

* test fixes

* mypy

* Clustering via reverse engineering (Learning-and-Intelligent-Systems#1556)

* Initial commit.

* Fix a minor bug.

* Small changes to satisfy mypi.

* Fix linting.

* Add tests.

* fixes

* fix minor grammatical issue

* Change check for non-zero types.

---------

Co-authored-by: Nishanth Kumar <[email protected]>
Co-authored-by: Nishanth Kumar <[email protected]>

* pin openai dependency (Learning-and-Intelligent-Systems#1580)

* changes to produce prettier grid row graphs (Learning-and-Intelligent-Systems#1577)

* add functionality for rendering videos within cogman, rather than within the environment (Learning-and-Intelligent-Systems#1581)

* add info to FD crashes (Learning-and-Intelligent-Systems#1582)

* disable flakey tests (Learning-and-Intelligent-Systems#1586)

* Remove dead email and add NJK email in README (Learning-and-Intelligent-Systems#1583)

with Rohan's blessing

* handle planning failures within task planning in active sampler explorer (Learning-and-Intelligent-Systems#1584)

* add separate flag for approach wrapper (Learning-and-Intelligent-Systems#1585)

* fix expected atoms monitoring (Learning-and-Intelligent-Systems#1587)

* Split fail focus into UCB and non-UCB baselines (Learning-and-Intelligent-Systems#1579)

* try non-ucb exploration baseline

* update plotting script

* ready!

* yapf

* Sample a random point inside a `_Geom2D` (Learning-and-Intelligent-Systems#1591)

* should be gtg!

* should be gtg

* pursue task goal during exploration only every n cycles (Learning-and-Intelligent-Systems#1589)

* fix loading during online learning (Learning-and-Intelligent-Systems#1592)

* a few fixes to saving and loading in active sampler learning (Learning-and-Intelligent-Systems#1593)

* use a highly optimistic initial competence until the second cycle (Learning-and-Intelligent-Systems#1595)

* try a simpler fix

* try again

* merge in upstream

---------

Co-authored-by: Tom Silver <[email protected]>
Co-authored-by: Nishanth Kumar <[email protected]>
Co-authored-by: Bartłomiej Cieślar <[email protected]>
Co-authored-by: Tom Silver <[email protected]>
Co-authored-by: Ashay Athalye <[email protected]>
Co-authored-by: Nishanth Kumar <[email protected]>
  • Loading branch information
7 people authored Dec 5, 2023
1 parent 208cbd9 commit 2d008b9
Show file tree
Hide file tree
Showing 7 changed files with 66 additions and 28 deletions.
6 changes: 3 additions & 3 deletions predicators/approaches/active_sampler_learning_approach.py
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,8 @@ def load(self, online_learning_cycle: Optional[int]) -> None:
self._nsrt_to_explorer_sampler = save_dict["nsrt_to_explorer_sampler"]
self._seen_train_task_idxs = save_dict["seen_train_task_idxs"]
self._train_tasks = save_dict["train_tasks"]
self._online_learning_cycle = save_dict["online_learning_cycle"]
self._online_learning_cycle = 0 if online_learning_cycle is None \
else online_learning_cycle + 1

def _learn_nsrts(self, trajectories: List[LowLevelTrajectory],
online_learning_cycle: Optional[int],
Expand Down Expand Up @@ -179,20 +180,19 @@ def _learn_nsrts(self, trajectories: List[LowLevelTrajectory],
with open(f"{save_path}_{online_learning_cycle}.DATA", "wb") as f:
pkl.dump(
{
"dataset": self._dataset,
"sampler_data": self._sampler_data,
"ground_op_hist": self._ground_op_hist,
"competence_models": self._competence_models,
"last_seen_segment_traj_idx":
self._last_seen_segment_traj_idx,
"nsrt_to_explorer_sampler": self._nsrt_to_explorer_sampler,
"seen_train_task_idxs": self._seen_train_task_idxs,
"dataset": self._dataset,
# We need to save train tasks because they get modified
# in the explorer. The original sin is that tasks are
# generated before reset with default init states, which
# are subsequently overwritten after reset is called.
"train_tasks": self._train_tasks,
"online_learning_cycle": self._online_learning_cycle,
},
f)

Expand Down
13 changes: 10 additions & 3 deletions predicators/competence_models.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,10 @@ def predict_competence(self, num_additional_data: int) -> float:
# Highly naive: predict a constant improvement in competence.
del num_additional_data # unused
current_competence = self.get_current_competence()
return min(1.0, current_competence + 1e-2)
# Use a highly optimistic initial competence until the second cycle.
return min(
1.0,
current_competence + CFG.skill_competence_initial_prediction_bonus)


class OptimisticSkillCompetenceModel(SkillCompetenceModel):
Expand Down Expand Up @@ -100,7 +103,9 @@ def predict_competence(self, num_additional_data: int) -> float:
nonempty_cycle_obs = self._get_nonempty_cycle_observations()
current_competence = self.get_current_competence()
if len(nonempty_cycle_obs) < 2:
return min(1.0, current_competence + 1e-2) # default
return min(
1.0, current_competence +
CFG.skill_competence_initial_prediction_bonus) # default
# Look at changes between individual cycles.
inference_window = 1
recency_size = CFG.skill_competence_model_optimistic_recency_size
Expand Down Expand Up @@ -143,7 +148,9 @@ def predict_competence(self, num_additional_data: int) -> float:
# the LegacySkillCompetenceModel.
if self._competence_regressor is None:
current_competence = self.get_current_competence()
return min(1.0, current_competence + 1e-2)
return min(
1.0, current_competence +
CFG.skill_competence_initial_prediction_bonus)
# Use the regressor to predict future competence.
current_num_data = self._get_current_num_data()
current_rv = self._competence_regressor.predict_beta(current_num_data)
Expand Down
12 changes: 7 additions & 5 deletions predicators/explorers/active_sampler_explorer.py
Original file line number Diff line number Diff line change
Expand Up @@ -408,11 +408,13 @@ def _score_ground_op(
num_tries = len(history)
success_rate = sum(history) / num_tries
total_trials = sum(len(h) for h in self._ground_op_hist.values())
# Try less successful operators more often.
# UCB-like bonus.
c = CFG.active_sampler_explore_bonus
bonus = c * np.sqrt(np.log(total_trials) / num_tries)
score = (1.0 - success_rate) + bonus
score = (1.0 - success_rate)
if CFG.active_sampler_explore_use_ucb_bonus:
# Try less successful operators more often.
# UCB-like bonus.
c = CFG.active_sampler_explore_bonus
bonus = c * np.sqrt(np.log(total_trials) / num_tries)
score += bonus
elif CFG.active_sampler_explore_task_strategy == "random":
# Random scores baseline.
score = self._rng.uniform()
Expand Down
2 changes: 2 additions & 0 deletions predicators/settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -576,6 +576,7 @@ class GlobalSettings:
skill_competence_model_optimistic_window_size = 5
skill_competence_model_optimistic_recency_size = 5
skill_competence_default_alpha_beta = (10.0, 1.0)
skill_competence_initial_prediction_bonus = 0.5

# refinement cost estimation parameters
refinement_estimator = "oracle" # default refinement cost estimator
Expand Down Expand Up @@ -608,6 +609,7 @@ class GlobalSettings:
greedy_lookahead_max_num_resamples = 10

# active sampler explorer parameters
active_sampler_explore_use_ucb_bonus = True
active_sampler_explore_bonus = 1e-1
active_sampler_explore_task_strategy = "planning_progress"
active_sampler_explorer_replan_frequency = 100
Expand Down
11 changes: 9 additions & 2 deletions scripts/configs/active_sampler_learning.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,18 @@ APPROACHES:
FLAGS:
explorer: "active_sampler"
active_sampler_explore_task_strategy: "task_repeat"
success_rate_explore:
success_rate_explore_ucb:
NAME: "active_sampler_learning"
FLAGS:
explorer: "active_sampler"
active_sampler_explore_task_strategy: "success_rate"
active_sampler_explore_use_ucb_bonus: False
success_rate_explore_no_ucb:
NAME: "active_sampler_learning"
FLAGS:
explorer: "active_sampler"
active_sampler_explore_task_strategy: "success_rate"
active_sampler_explore_use_ucb_bonus: False
random_score_explore:
NAME: "active_sampler_learning"
FLAGS:
Expand Down Expand Up @@ -112,4 +119,4 @@ FLAGS:
sesame_grounder: "fd_translator"
active_sampler_learning_n_iter_no_change: 5000
START_SEED: 456
NUM_SEEDS: 10
NUM_SEEDS: 10
25 changes: 17 additions & 8 deletions scripts/plotting/create_active_sampler_learning_plots.py
Original file line number Diff line number Diff line change
Expand Up @@ -86,8 +86,10 @@ def _derive_per_task_average(metric: str,
lambda v: "kitchen-planning_progress_explore" in v)),
("Task Repeat", "orange", lambda df: df["EXPERIMENT_ID"].apply(
lambda v: "kitchen-task_repeat_explore" in v)),
("Fail Focus", "red", lambda df: df["EXPERIMENT_ID"].apply(
lambda v: "kitchen-success_rate_explore" in v)),
("Fail Focus Non-UCB", "brown", lambda df: df["EXPERIMENT_ID"].apply(
lambda v: "kitchen-success_rate_explore_no_ucb" in v)),
("Fail Focus UCB", "red", lambda df: df["EXPERIMENT_ID"].apply(
lambda v: "kitchen-success_rate_explore_ucb" in v)),
("Task-Relevant", "purple", lambda df: df["EXPERIMENT_ID"].apply(
lambda v: "kitchen-random_score_explore" in v)),
("Random Skills", "blue", lambda df: df["EXPERIMENT_ID"].apply(
Expand All @@ -98,8 +100,11 @@ def _derive_per_task_average(metric: str,
lambda v: "regional_bumpy_cover-planning_progress_explore" in v)),
("Task Repeat", "orange", lambda df: df["EXPERIMENT_ID"].apply(
lambda v: "regional_bumpy_cover-task_repeat_explore" in v)),
("Fail Focus", "red", lambda df: df["EXPERIMENT_ID"].apply(
lambda v: "regional_bumpy_cover-success_rate_explore" in v)),
("Fail Focus Non-UCB", "brown", lambda df: df["EXPERIMENT_ID"].apply(
lambda v: "regional_bumpy_cover-success_rate_explore_no_ucb" in v)
),
("Fail Focus UCB", "red", lambda df: df["EXPERIMENT_ID"].apply(
lambda v: "regional_bumpy_cover-success_rate_explore_ucb" in v)),
("Task-Relevant", "purple", lambda df: df["EXPERIMENT_ID"].apply(
lambda v: "regional_bumpy_cover-random_score_explore" in v)),
("Random Skills", "blue", lambda df: df["EXPERIMENT_ID"].apply(
Expand All @@ -110,8 +115,10 @@ def _derive_per_task_average(metric: str,
lambda v: "grid_row-planning_progress_explore" in v)),
("Task Repeat", "orange", lambda df: df["EXPERIMENT_ID"].apply(
lambda v: "grid_row-task_repeat_explore" in v)),
("Fail Focus", "red", lambda df: df["EXPERIMENT_ID"].apply(
lambda v: "grid_row-success_rate_explore" in v)),
("Fail Focus Non-UCB", "brown", lambda df: df["EXPERIMENT_ID"].apply(
lambda v: "grid_row-success_rate_explore_no_ucb" in v)),
("Fail Focus UCB", "red", lambda df: df["EXPERIMENT_ID"].apply(
lambda v: "grid_row-success_rate_explore_ucb" in v)),
("Task-Relevant", "purple", lambda df: df["EXPERIMENT_ID"].apply(
lambda v: "grid_row-random_score_explore" in v)),
("Random Skills", "blue", lambda df: df["EXPERIMENT_ID"].apply(
Expand All @@ -122,8 +129,10 @@ def _derive_per_task_average(metric: str,
lambda v: "sticky_table-planning_progress_explore" in v)),
("Task Repeat", "orange", lambda df: df["EXPERIMENT_ID"].apply(
lambda v: "sticky_table-task_repeat_explore" in v)),
("Fail Focus", "red", lambda df: df["EXPERIMENT_ID"].apply(
lambda v: "sticky_table-success_rate_explore" in v)),
("Fail Focus Non-UCB", "brown", lambda df: df["EXPERIMENT_ID"].apply(
lambda v: "sticky_table-success_rate_explore_no_ucb" in v)),
("Fail Focus UCB", "red", lambda df: df["EXPERIMENT_ID"].apply(
lambda v: "sticky_table-success_rate_explore_ucb" in v)),
("Task-Relevant", "purple", lambda df: df["EXPERIMENT_ID"].apply(
lambda v: "sticky_table-random_score_explore" in v)),
("Random Skills", "blue", lambda df: df["EXPERIMENT_ID"].apply(
Expand Down
25 changes: 18 additions & 7 deletions tests/test_competence_models.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,20 +29,25 @@ def test_legacy_skill_competence_model():
"""Tests for LegacySkillCompetenceModel()."""
utils.reset_config({
"skill_competence_default_alpha_beta": (1.0, 1.0),
"skill_competence_initial_prediction_bonus": 1e-2,
})
model = create_competence_model("legacy", "test")
assert isinstance(model, LegacySkillCompetenceModel)
assert np.isclose(model.get_current_competence(), 0.5)
assert np.isclose(model.predict_competence(1), 0.5 + 1e-2)
assert np.isclose(model.predict_competence(1),
0.5 + CFG.skill_competence_initial_prediction_bonus)
model.observe(True)
assert model.get_current_competence() > 0.5
assert model.predict_competence(1) > 0.5 + 1e-2
assert model.predict_competence(
1) > 0.5 + CFG.skill_competence_initial_prediction_bonus
model.observe(False)
assert np.isclose(model.get_current_competence(), 0.5)
assert np.isclose(model.predict_competence(1), 0.5 + 1e-2)
assert np.isclose(model.predict_competence(1),
0.5 + CFG.skill_competence_initial_prediction_bonus)
model.advance_cycle()
assert np.isclose(model.get_current_competence(), 0.5)
assert np.isclose(model.predict_competence(1), 0.5 + 1e-2)
assert np.isclose(model.predict_competence(1),
0.5 + CFG.skill_competence_initial_prediction_bonus)
model.observe(True)
assert model.get_current_competence() > 0.5

Expand All @@ -53,10 +58,12 @@ def test_latent_variable_skill_competence_model_short():
"skill_competence_model_num_em_iters": 1,
"skill_competence_model_max_train_iters": 10,
"skill_competence_default_alpha_beta": (1.0, 1.0),
"skill_competence_initial_prediction_bonus": 1e-2,
})
model = create_competence_model("latent_variable", "test")
assert np.isclose(model.get_current_competence(), 0.5)
assert np.isclose(model.predict_competence(1), 0.5 + 1e-2)
assert np.isclose(model.predict_competence(1),
0.5 + CFG.skill_competence_initial_prediction_bonus)
model.observe(True)
assert model.get_current_competence() > 0.5
assert model.predict_competence(1) > model.get_current_competence()
Expand All @@ -72,12 +79,14 @@ def test_optimistic_skill_competence_model():
"""Tests for OptimisticSkillCompetenceModel()."""
utils.reset_config({
"skill_competence_default_alpha_beta": (1.0, 1.0),
"skill_competence_initial_prediction_bonus": 1e-2,
})
h = CFG.skill_competence_model_lookahead

model = create_competence_model("optimistic", "test")
assert np.isclose(model.get_current_competence(), 0.5)
assert np.isclose(model.predict_competence(h), 0.5 + 1e-2)
assert np.isclose(model.predict_competence(h),
0.5 + CFG.skill_competence_initial_prediction_bonus)

# Test impossible skill.
model = create_competence_model("optimistic", "impossible-skill")
Expand Down Expand Up @@ -154,12 +163,14 @@ def test_latent_variable_skill_competence_model_long():
"""Long tests for LatentVariableSkillCompetenceModel()."""
utils.reset_config({
"skill_competence_default_alpha_beta": (1.0, 1.0),
"skill_competence_initial_prediction_bonus": 1e-2,
})
h = CFG.skill_competence_model_lookahead

model = create_competence_model("latent_variable", "test")
assert np.isclose(model.get_current_competence(), 0.5)
assert np.isclose(model.predict_competence(h), 0.5 + 1e-2)
assert np.isclose(model.predict_competence(h),
0.5 + CFG.skill_competence_initial_prediction_bonus)

# Test impossible skill.
model = create_competence_model("latent_variable", "impossible-skill")
Expand Down

0 comments on commit 2d008b9

Please sign in to comment.