Seer exclusively utilizes the DROID dataset for pre-training. In this section, we describe the data pre-processing and transformation steps for both the DROID and OXE datasets. These transformations convert the RLDS format into a standard dataset format, including .png, .npz, and .h5 files. The transformed dataset is organized as follows: /subset_name/episodes/000000/steps/0000/xxx.jpg (h5). The pre-processing step also unifies action labels across different subsets. For example, it standardizes all control methods to use the delta end-effector pose control, ensuring consistency in the robot's base and end-effector origin and axes. This carefully designed alignment process minimizes confusion caused by different robots and control methods. To facilitate this process, we create a new environment, seer_pre, which is specifically used for pre-processing the DROID and OXE datasets into our desired format.
(1) Env
conda create -n seer_pre python=3.10
conda activate seer_pre
(2) Move to real_preprocess
cd ${YOUR_PATH_TO_SEER}/real_preprocess
(3) Third Party Packages
pip install -r requirements.txt
(4) octo_oxe_data_utils (Optional for DROID, Required for OXE)
cd octo_oxe_data_utils
python install
cd ..
(5) Mujoco
pip install mujoco
(6) Torch
pip install torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 --index-url
(7) Dlimp (Important): We try to use multiprocess to process data. However, the in Dlimp introduce randomness. replace the in /your_anaconda/envs/seer_pre/lib/python3.10/site-packages/dlimp/ with the one in dlimp/
You can download the full DROID dataset (1.7TB) in RLDS format using the following command:
gsutil -m cp -r gs://gresearch/robotics/droid <path_to_your_target_dir>
If needed, follow the download instructions provided on the OXE Github page.
cd ${YOUR_PATH_TO_SEER}/real_preprocess
conda activate seer_pre
To process the DROID dataset, set the src_dir and tgt_dir paths. You can adjust the num_worker argument to specify the number of processes to use:
For processing the Franka subsets in the OXE dataset, update the src_root_dir and tgt_dataset_dir paths. Similarly, adjust the num_worker argument for parallel processing:
To process other subsets of the OXE dataset (excluding Franka), update the src_root_dir and tgt_dataset_dir paths and set the number of worker processes: