Skip to content

sparse ten_J#1119

Open
thowell wants to merge 8 commits intogoogle-deepmind:mainfrom
thowell:ten_J
Open

sparse ten_J#1119
thowell wants to merge 8 commits intogoogle-deepmind:mainfrom
thowell:ten_J

Conversation

@thowell
Copy link
Collaborator

@thowell thowell commented Feb 5, 2026

(always) sparse tendon Jacobian

tl;dr
performance is about the same for scenes with few tendons (eg, 2) and performance (steps per second and memory) is significantly better for a scene with many tendons (eg, 67)


myoarm (#937)

sps: 789,390 -> 952,616
memory (MiB): 1852.00 -> 1820.00

ntendon: 67 nwrap: 464
mjwarp-testspeed benchmark/myo_sim/arm/myoarm.xml --nworld=8192 --nconmax=16 --njmax=48 -o "opt.ccd_iterations=50" --event_trace --memory

this pr

Loading model from: benchmark/myo_sim/arm/myoarm.xml...

Model
  nq: 38 nv: 38 nu: 63 nbody: 40 ngeom: 161
Option
  integrator: EULER
  cone: PYRAMIDAL
  solver: NEWTON iterations: 100 ls_iterations: 50
  is_sparse: True
  ls_parallel: False
  broadphase: NXN broadphase_filter: PLANE|SPHERE|OBB
Data
  nworld: 8192 naconmax: 131072 njmax: 48
Rolling out 1000 steps at dt = 0.002...

Summary for 8192 parallel rollouts

Total JIT time: 0.51 s
Total simulation time: 8.60 s
Total steps per second: 952,616
Total realtime factor: 1,905.23 x
Total time per step: 1049.74 ns
Total converged worlds: 8192 / 8192

Event trace:

step: 1047.62
  forward: 922.01
    fwd_position: 489.25
      kinematics: 57.90
      com_pos: 15.00
      camlight: 1.93
      flex: 0.17
      tendon: 290.97
      crb: 25.08
      tendon_armature: 11.07
      collision: 12.40
        nxn_broadphase: 4.32
        convex_narrowphase: 5.90
        primitive_narrowphase: 1.28
      make_constraint: 31.15
      transmission: 41.66
    sensor_pos: 0.17
    fwd_velocity: 106.62
      com_vel: 7.16
      passive: 15.46
      rne: 13.90
      tendon_bias: 30.54
    sensor_vel: 0.17
    fwd_actuation: 20.11
    fwd_acceleration: 128.90
      xfrc_accumulate: 4.76
    solve: 174.80
      mul_m: 5.75
    sensor_acc: 0.17
  euler: 125.09

Model memory 17.28 MiB (0.95% of used memory):
 (no field >= 1% of used memory)
Data memory 796.09 MiB (43.74% of used memory):
 geom_xmat: 45.28 MiB (2.49%)
 site_xpos: 47.53 MiB (2.61%)
 site_xmat: 142.59 MiB (7.83%)
 ten_J: 37.56 MiB (2.06%)
 wrap_obj: 29.00 MiB (1.59%)
 wrap_xpos: 87.00 MiB (4.78%)
 actuator_moment: 74.81 MiB (4.11%)
 efc.J: 72.00 MiB (3.96%)
Other memory: 1006.64 MiB (55.31% of used memory)
Total memory: 1820.00 MiB (3.74% of total device memory)

0df83a0 + a put_data fix + tendon event scope

Loading model from: benchmark/myo_sim/arm/myoarm.xml...

Model
  nq: 38 nv: 38 nu: 63 nbody: 40 ngeom: 161
Option
  integrator: EULER
  cone: PYRAMIDAL
  solver: NEWTON iterations: 100 ls_iterations: 50
  is_sparse: True
  ls_parallel: False
  broadphase: NXN broadphase_filter: PLANE|SPHERE|OBB
Data
  nworld: 8192 naconmax: 131072 njmax: 48
Rolling out 1000 steps at dt = 0.002...

Summary for 8192 parallel rollouts

Total JIT time: 0.51 s
Total simulation time: 10.38 s
Total steps per second: 789,390
Total realtime factor: 1,578.78 x
Total time per step: 1266.80 ns
Total converged worlds: 8192 / 8192

Event trace:

step: 1264.61
  forward: 1138.18
    fwd_position: 681.06
      kinematics: 58.03
      com_pos: 14.95
      camlight: 1.91
      flex: 0.17
      tendon: 448.51
      crb: 25.24
      tendon_armature: 11.46
      collision: 12.39
        nxn_broadphase: 4.28
        convex_narrowphase: 5.92
        primitive_narrowphase: 1.29
      make_constraint: 31.60
      transmission: 74.88
    sensor_pos: 0.17
    fwd_velocity: 132.26
      com_vel: 7.35
      passive: 15.30
      rne: 13.28
      tendon_bias: 29.89
    sensor_vel: 0.18
    fwd_actuation: 20.58
    fwd_acceleration: 126.29
      xfrc_accumulate: 4.75
    solve: 175.65
      mul_m: 5.76
    sensor_acc: 0.17
  euler: 125.91

Model memory 17.27 MiB (0.93% of used memory):
 (no field >= 1% of used memory)
Data memory 838.09 MiB (45.25% of used memory):
 geom_xmat: 45.28 MiB (2.44%)
 site_xpos: 47.53 MiB (2.57%)
 site_xmat: 142.59 MiB (7.70%)
 ten_J: 79.56 MiB (4.30%)
 wrap_obj: 29.00 MiB (1.57%)
 wrap_xpos: 87.00 MiB (4.70%)
 actuator_moment: 74.81 MiB (4.04%)
 efc.J: 72.00 MiB (3.89%)
Other memory: 996.64 MiB (53.81% of used memory)
Total memory: 1852.00 MiB (3.81% of total device memory)

mujoco/model/humanoid/humanoid.xml
https://github.com/google-deepmind/mujoco/blob/main/model/humanoid/humanoid.xml

sps: 3,786,138 -> 3,786,812
memory (MiB): 402.00 -> 402.00

ntendon: 2 nwrap: 4
mjwarp-testspeed mujoco/model/humanoid/humanoid.xml --nworld=8192 --nconmax=24 --njmax=64 --event_trace --memory

this pr

Loading model from: mujoco/model/humanoid/humanoid.xml...

Model
  nq: 28 nv: 27 nu: 21 nbody: 17 ngeom: 20
Option
  integrator: EULER
  cone: PYRAMIDAL
  solver: NEWTON iterations: 100 ls_iterations: 50
  is_sparse: False
  ls_parallel: False
  broadphase: NXN broadphase_filter: PLANE|SPHERE|OBB
Data
  nworld: 8192 naconmax: 196608 njmax: 64
Rolling out 1000 steps at dt = 0.005...

Summary for 8192 parallel rollouts

Total JIT time: 0.32 s
Total simulation time: 2.16 s
Total steps per second: 3,786,812
Total realtime factor: 18,934.06 x
Total time per step: 264.07 ns
Total converged worlds: 8192 / 8192

Event trace:

step: 262.33
  forward: 252.52
    fwd_position: 84.97
      kinematics: 9.94
      com_pos: 5.67
      camlight: 1.77
      flex: 0.17
      tendon: 0.97
      crb: 13.30
      tendon_armature: 0.73
      collision: 9.84
        nxn_broadphase: 3.60
        convex_narrowphase: 0.17
        primitive_narrowphase: 5.11
      make_constraint: 38.31
      transmission: 2.34
    sensor_pos: 0.17
    fwd_velocity: 28.13
      com_vel: 5.93
      passive: 1.67
      rne: 7.08
      tendon_bias: 1.57
    sensor_vel: 0.18
    fwd_actuation: 2.28
    fwd_acceleration: 11.21
      xfrc_accumulate: 1.59
    solve: 123.55
      mul_m: 6.08
    sensor_acc: 0.17
  euler: 9.29

Model memory 0.03 MiB (0.01% of used memory):
 (no field >= 1% of used memory)
Data memory 258.50 MiB (64.30% of used memory):
 xmat: 4.78 MiB (1.19%)
 ximat: 4.78 MiB (1.19%)
 geom_xmat: 5.62 MiB (1.40%)
 cdof: 5.06 MiB (1.26%)
 cinert: 5.31 MiB (1.32%)
 actuator_moment: 17.72 MiB (4.41%)
 crb: 5.31 MiB (1.32%)
 qM: 24.50 MiB (6.09%)
 qLD: 22.78 MiB (5.67%)
 cdof_dot: 5.06 MiB (1.26%)
 contact.frame: 6.75 MiB (1.68%)
 efc.J: 56.00 MiB (13.93%)
Other memory: 143.47 MiB (35.69% of used memory)
Total memory: 402.00 MiB (0.83% of total device memory)

0df83a0 + a put_data fix + tendon event scope

Loading model from: mujoco/model/humanoid/humanoid.xml...

Model
  nq: 28 nv: 27 nu: 21 nbody: 17 ngeom: 20
Option
  integrator: EULER
  cone: PYRAMIDAL
  solver: NEWTON iterations: 100 ls_iterations: 50
  is_sparse: False
  ls_parallel: False
  broadphase: NXN broadphase_filter: PLANE|SPHERE|OBB
Data
  nworld: 8192 naconmax: 196608 njmax: 64
Rolling out 1000 steps at dt = 0.005...

Summary for 8192 parallel rollouts

Total JIT time: 0.77 s
Total simulation time: 2.16 s
Total steps per second: 3,786,138
Total realtime factor: 18,930.69 x
Total time per step: 264.12 ns
Total converged worlds: 8192 / 8192

Event trace:

step: 262.44
  forward: 252.63
    fwd_position: 84.31
      kinematics: 9.86
      com_pos: 5.67
      camlight: 1.76
      flex: 0.17
      tendon: 0.89
      crb: 13.33
      tendon_armature: 0.70
      collision: 9.71
        nxn_broadphase: 3.56
        convex_narrowphase: 0.17
        primitive_narrowphase: 5.02
      make_constraint: 38.01
      transmission: 2.30
    sensor_pos: 0.17
    fwd_velocity: 28.96
      com_vel: 5.93
      passive: 1.67
      rne: 7.07
      tendon_bias: 1.56
    sensor_vel: 0.17
    fwd_actuation: 2.28
    fwd_acceleration: 11.21
      xfrc_accumulate: 1.58
    solve: 123.49
      mul_m: 6.09
    sensor_acc: 0.18
  euler: 9.29

Model memory 0.03 MiB (0.01% of used memory):
 (no field >= 1% of used memory)
Data memory 260.06 MiB (64.69% of used memory):
 xmat: 4.78 MiB (1.19%)
 ximat: 4.78 MiB (1.19%)
 geom_xmat: 5.62 MiB (1.40%)
 cdof: 5.06 MiB (1.26%)
 cinert: 5.31 MiB (1.32%)
 actuator_moment: 17.72 MiB (4.41%)
 crb: 5.31 MiB (1.32%)
 qM: 24.50 MiB (6.09%)
 qLD: 22.78 MiB (5.67%)
 cdof_dot: 5.06 MiB (1.26%)
 contact.frame: 6.75 MiB (1.68%)
 efc.J: 56.00 MiB (13.93%)
Other memory: 141.91 MiB (35.30% of used memory)
Total memory: 402.00 MiB (0.83% of total device memory)

@thowell thowell linked an issue Feb 5, 2026 that may be closed by this pull request
4 tasks
@thowell thowell mentioned this pull request Feb 5, 2026
4 tasks
@thowell thowell added this to the Sparsity milestone Feb 6, 2026
@thowell thowell moved this to In progress in MJWarp Roadmap Feb 10, 2026
@thowell thowell force-pushed the ten_J branch 2 times, most recently from a66ea0b to db1514c Compare February 27, 2026 18:07
@thowell thowell force-pushed the ten_J branch 5 times, most recently from beb251f to 53005f3 Compare March 3, 2026 13:00
@thowell thowell requested a review from quagla March 3, 2026 19:24
Copy link
Collaborator

@quagla quagla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice improvement!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In progress

Development

Successfully merging this pull request may close these issues.

JacobianType.SPARSE

2 participants