Skip to content

Benchmarking Performance Different Than Reported #3

@jiadingfang

Description

@jiadingfang

Hi, thanks for such an all-around repo for working with 3DSG planning!

I would like to reproduce the benchmarking results in your repo under the benchmark folder to make sure everything runs properly before testing my own planners. However, during my testing, the behaviors of the planners are quite different than what are reported.

As of 07/20/2023, I ran all available planners in pddlgym_planners/__init__.py with pddl_domain taskographyv2tiny1 with the command python scripts/benchmark/plan.py --domain-name $DOMAIN_NAME --planner $PLANNER. The results are the following:

  1. FF: error while running

gcc -o ff main.o memory.o output.o parse.o inst_pre.o inst_easy.o inst_hard.o inst_final.o orderings.o relax.o search.o scan-fct_pddl.tab.o scan-ops_pddl.tab.o -Wall -g -std=gnu99 -O6 -lm
/usr/bin/ld: search.o:/home/fjd/miniconda3/envs/taskographypy37/lib/python3.7/site-packages/pddlgym_planners/FF-v2.3/search.c:110: multiple definition of lcurrent_goals'; relax.o:/home/fjd/miniconda3/envs/taskographypy37/lib/python3.7/site-packages/pddlgym_planners/FF-v2.3/relax.c:111: first defined here /usr/bin/ld: scan-fct_pddl.tab.o:/home/fjd/miniconda3/envs/taskographypy37/lib/python3.7/site-packages/pddlgym_planners/FF-v2.3/lex-fct_pddl.l:9: multiple definition of gbracket_count'; main.o:/home/fjd/miniconda3/envs/taskographypy37/lib/python3.7/site-packages/pddlgym_planners/FF-v2.3/main.c:147: first defined here
collect2: error: ld returned 1 exit status
make: *** [makefile:74: ff] Error 1

  1. FF-X: the same error as FF
  2. FD-lama-first: plan failure

{'failure_rate': 1.0,
'num_node_expansions': nan,
'num_node_expansions_std': nan,
'plan_length': nan,
'plan_length_std': nan,
'search_time': nan,
'search_time_std': nan,
'success_rate': 0.0,
'timeout_rate': 0.0,
'total_time': nan,
'total_time_std': nan}

  1. Cerberus-seq-sat: plan falure

{'failure_rate': 1.0,
'num_node_expansions': nan,
'num_node_expansions_std': nan,
'plan_length': nan,
'plan_length_std': nan,
'search_time': nan,
'search_time_std': nan,
'success_rate': 0.0,
'timeout_rate': 0.0,
'total_time': nan,
'total_time_std': nan}

  1. Cerberus-seq-agl: plan failure

{'failure_rate': 1.0,
'num_node_expansions': nan,
'num_node_expansions_std': nan,
'plan_length': nan,
'plan_length_std': nan,
'search_time': nan,
'search_time_std': nan,
'success_rate': 0.0,
'timeout_rate': 0.0,
'total_time': nan,
'total_time_std': nan}

  1. DecStar-agl-decoupled: plan failure

{'failure_rate': 1.0,
'num_node_expansions': nan,
'num_node_expansions_std': nan,
'plan_length': nan,
'plan_length_std': nan,
'search_time': nan,
'search_time_std': nan,
'success_rate': 0.0,
'timeout_rate': 0.0,
'total_time': nan,
'total_time_std': nan}

  1. lapkt-bfws: slightly different behavior than benchmark/taskographyv2tiny1_bfws. My result:

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [03:21<00:00, 5.04s/it]
{'failure_rate': 0.0,
'num_node_expansions': 468.48387096774195,
'num_node_expansions_std': 192.6469059835003,
'plan_length': 14.709677419354838,
'plan_length_std': 3.828530825661262,
'search_time': 0.4536315483870968,
'search_time_std': 0.3696494008728636,
'success_rate': 0.775,
'timeout_rate': 0.225,
'total_time': 0.4536315483870968,
'total_time_std': 0.3696494008728636}
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 55/55 [05:57<00:00, 6.51s/it]
{'failure_rate': 0.0,
'num_node_expansions': 573.3225806451613,
'num_node_expansions_std': 338.3147405651472,
'plan_length': 15.32258064516129,
'plan_length_std': 4.394917128465223,
'search_time': 0.5754497419354839,
'search_time_std': 0.8765903350261305,
'success_rate': 0.5636363636363636,
'timeout_rate': 0.43636363636363634,
'total_time': 0.5754497419354839,
'total_time_std': 0.8765903350261305}

reported in benchmark/taskographyv2tiny1_bfws/taskographyv2tiny1_bfws_test.json:

{
"failure_rate": 0.0,
"num_node_expansions": 609.6279069767442,
"num_node_expansions_std": 339.64208406455214,
"plan_length": 15.55813953488372,
"plan_length_std": 4.15570398469826,
"search_time": 0.8969197023255813,
"search_time_std": 1.3382104019851668,
"success_rate": 0.7818181818181819,
"timeout_rate": 0.21818181818181817,
"total_time": 0.8969197023255813,
"total_time_std": 1.3382104019851668
}

  1. FD-seq-opt-lmcut: plan failure

{'failure_rate': 1.0,
'num_node_expansions': nan,
'num_node_expansions_std': nan,
'plan_length': nan,
'plan_length_std': nan,
'search_time': nan,
'search_time_std': nan,
'success_rate': 0.0,
'timeout_rate': 0.0,
'total_time': nan,
'total_time_std': nan}

  1. Delfi: plan failure:

{'failure_rate': 1.0,
'num_node_expansions': nan,
'num_node_expansions_std': nan,
'plan_length': nan,
'plan_length_std': nan,
'search_time': nan,
'search_time_std': nan,
'success_rate': 0.0,
'timeout_rate': 0.0,
'total_time': nan,
'total_time_std': nan}

  1. DecStar-opt-decoupled: plan failure

{'failure_rate': 1.0,
'num_node_expansions': nan,
'num_node_expansions_std': nan,
'plan_length': nan,
'plan_length_std': nan,
'search_time': nan,
'search_time_std': nan,
'success_rate': 0.0,
'timeout_rate': 0.0,
'total_time': nan,
'total_time_std': nan}

I followed the installation stated in the https://github.com/taskography/taskography-api#installation with only a few changes to fix some errors:
0. Ubuntu 22.04.

  1. Conda create an empty env with python=3.7.
  2. Add a comma , at the end of line
    "tqdm"
    to separate the two lines.
  3. Run pip install -e . and pip install -r requirements.txt.
  4. Downgrade importlib-metadata from 6.7.0 to 4.12.0 to avoid error 'EntryPoints' object has no attribute 'get'. Source: https://stackoverflow.com/questions/73929564/entrypoints-object-has-no-attribute-get-digital-ocean
  5. Move from __future__ import annotations to the first line to avoid error from __future__ imports must occur at the beginning of the file. Source: https://stackoverflow.com/questions/38688504/from-future-imports-must-occur-at-the-beginning-of-the-file-what-defines
  6. Run scripts/validate/loader.py and scripts/validate/taskography_env.py, pass both.

I'm willing to offer more details if needed. Highly appreciate it if you could offer some help as a solid benchmark is the pre-requisite to any possible future researches. Thanks in advance!

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions