sparse constraints: count non-zeros for rownnz and rowadr#1202
sparse constraints: count non-zeros for rownnz and rowadr#1202thowell wants to merge 2 commits intogoogle-deepmind:mainfrom
Conversation
|
Very interesting! @thowell any impressions so far of performance tradeoff of this approach? |
|
tl;dr
humanoid
three humanoids
humanoid this pr with 5e8c0e2 (main) with 5e8c0e2 (main) with #936 9354443 with with nefcdof=16 three humanoids this pr with 5e8c0e2 (main) with 5e8c0e2 (main) with #936 9354443 with with nefcdof=32 |
|
wow, these numbers look massive! Amazing. |
adenzler-nvidia
left a comment
There was a problem hiding this comment.
Looks good, numbers are very convincing. Maybe it's possible to pre-calculate a lot of these numbers even? Might not move the needle much in terms of performance, but we should check
| da1 = dof_parentid[da1] | ||
| if da2 == da: | ||
| da2 = dof_parentid[da2] | ||
| rownnz += 1 |
There was a problem hiding this comment.
these numbers could even be pre-calculated, and don't need to be a runtime exploration?
There was a problem hiding this comment.
yes, the number of non-zeros could be pre-computed. added todos to _equality_connect and _equality_weld.
erikfrey
left a comment
There was a problem hiding this comment.
Fantastic improvement! Just two nits
| da2 = int(body_dofadr[body2] + body_dofnum[body2] - 1) | ||
|
|
||
| # count non-zeros | ||
| da1_save = da1 |
There was a problem hiding this comment.
nit: just to reduce cognitive overhead a bit, why don't we flip this to avoid mental bookkeeping:
pda1, pda2 = da1, da2then iterate over pda1 and pda2 for counting
(nit applies here and elsewhere)
| rownnz += 1 | ||
|
|
||
| # get rowadr | ||
| rowadr_base = wp.atomic_add(efc_nnz_out, worldid, 3 * rownnz) |
There was a problem hiding this comment.
nit: for brevity, maybe just do:
rowadr = wp.atomic_add(efc_nnz_out, worldid, 3 * rownnz)
efc_J_rowadr_out[worldid, efcid + 0] = rowadr
efc_J_rowadr_out[worldid, efcid + 1] = rowadr + rownnz
efc_J_rowadr_out[worldid, efcid + 2] = rowadr + 2 * rownnz(nit applies here and elsewhere)
for sparse constraints, count the number of non-zeros for rownnz and rowadr.
efc_nnz(sizenworld) in themake_constraintscope. for each world, this is the running count of non-zeros forefc_J/efc_J_colindrownnz, then callsatomic_addwithefc_nnzandrownnzto getrowadr(note: multi-row constraints like equality connect can call withnrow * rownnzwherenrow=3).note: one potential side effect of this pattern is that the constraint memory in
efc_J/efc_colindmay not be sequential by constraint efc id since there are 2 separateatomic_addoperations that may not be synced. all of the sparse matrix operations will be correct, but if one inspects the memory the order might not be sequential.in a follow-up pr, we can add a parameter (something like
nefcJmax/nefcJnnz) that determines the memory allocation forefc_J/efc_J_colind. overflow will be reported ifrowadr + rownnz >= {allocated number of non-zeros}.this is an alternative to #936. note: the tradeoff with the changes proposed in this pr is that the counting of non-zeros + atomic_operations +
efc_nnzis expected to be computationally more expensive, but enables potentially more memory savings compared to the approach proposed in #936.