Commit d6aae1e
authored
WebGPU GPU compiler backend improvements (and small CUDA) (#565)
* handle `nnkSym` in `getInnerPointerType`
* fix `constructPtrSignature` after change from nil -> gtVoid for iTyp
The type of the identifier is now always `gtVoid`, so the previous
check `not idTyp.isNil` does not work anylonger.
* handle `gpuCast` in determineSymKind/Mutability/Ident
* allow determination of GPU type to fail in `nimToGpuType`
This is only for the optional helper used on the WebGPU backend, where
we try to determine the type in an infix expression. However, for some
arguments to the infix this is not uniquely possible. I.e. we might
encounter `SomeInteger`, which is not a unique type. In this case we
just fall back to not assigning a known type.
* handle `UncheckedArray` in `gpuTypeToString` on CUDA backend
* rename constant to set workgroup size
* minor cleanup
* correctly handle `gtUA` (UncheckedArray) on CUDA backend
The `gtUA` type enum element is new and was not correctly handled yet
on the CUDA backend.
* fix access of type for left ident in assignment
* rewrite compound assignment operators in all functions
i.e.
x += 5
becomes
x = x + 5
etc for any prefix `foo=` in
x foo= y
we generate
x = x foo y
* extend `GpuFieldInit` by a type field
We need that type to determine information about what type we actually
assign in a `gpuObjConstr`.
* make sure to update symbols in global functions too
Otherwise we don't have up to date type / symbol kind information in globals.
* remove local `determineIdent` from `genWebGpu`
* fix tree representation of GpuAst
* add `mpairs`, `pairs` iterator for `GpuAst`
* support default initialization in obj constr fields
As structs only support 'constructible types' in their fields anyway,
we can just default initialize all fields the user leaves out in an
object constructor.
* correctly handle `var foo {.constant.}` variables by ⇒ globals
Those are intended for runtime constants, i.e. further storage buffers
in the context of WGSL. In CUDA we'd use `copyToSymbol` to copy the
data to them before execution.
* move rewriting of compound assignment out of `storagePass`
* implement lifting of struct pointer fields
This gives us the convenience of passing around a struct with a
pointer field, while preserving the restrictions of WebGPU. Because of
the fact that storage buffers are global anyway, we can "pretend" the
pointers are part of an object and simply replace them by the global
when the fields are used.
We throw a CT error if one tries to assign a non storage buffer
pointer to a field.
* add helper to check code for validity
Currently only checks if we assign a pointer to a `var`
variable (which is not allowed in WGSL).
* add support for full Nim generics in the context of GPU code
This means we can finally write something like:
```
proc foo[N: static int](ar: array[N, BigInt], ...)
```
We generate generic instantiations for every instantiation the Nim
compiler produces (plus potentially additional ones for every pointer
type argument the user passes into such a function).
* implement support for type aliases
So far only the code generation for WGSL is done, but the CUDA code
gen is simple.
* allow "pulling in" procs from outside `cuda` scope
The generic logic is essentially what we need to pull in a proc
defined outside the scope of the `cuda` macro into the
code. Essentially when we encounter a function that is not known to
us yet, we simply look up its implementation from the symbol.
This is the first step towards untying the current `cuda` macro code
from relying on having _everything_ be defined under that macro.
In the future the idea is that essentially only the `{.global.}` procs
strictly need to be defined in the macro. These then pull in
everything they need (i.e. everything that is used and has been
checked by the compiler to be used).
* add pretty printer for GpuType
This is lossy so it is not the default representation.
* store all types in `ctx.types` table, support 'pulling' types...
from outside the `cuda` scope. This is the second step towards making
it so that the `cuda` macro only needs to contain the calling
code (`{.global.}` proc in general).
* support generic instantiations, producing unique WGSL types
* add `{.builtin.}` pragma intended to specify builtin procs in CUDA/WGSL
This is fundamentally doing the same as `{.nimonly.}`, but for this
purpose the different name makes the intent clearer.
* fix context version for `cuModuleGetGlobal`
* allow to compile with `-d:debugCuda` to compile in debug mode
* ignore `varargs` pragma in procs
* make sure statically sized arrays are copied
As they are passed by pointer in C/C++/CUDA, e.g. a `BigInt state[2]`
decays to `BigInt *state` in a function argument behind the scenes.
* make sure to catch `CUdeviceptr` arguments
After our recent Nvidia API wrapper changes, it is now an alias and
not a distinct type anymore.
* change CUDA codegen to similar style as WGSL
This allows us to make use of the Nim generic instantiation and
regular procs we 'pick up' as well as types.
In principle it also allow us to do dead code elimination by only
generating what's actually called in the global functions (or single
kernel, if one uses `toGpuAst` first and then manually calls `codegen`).
* improve error message for `Dot` node if not ident/deref in WGSL
* always also inject `num_workgroups` argument into WGSL globals
When working with workgroups dispatched in an e.g. 2D grid,
`passEncoder.dispatchWorkgroups(N, M)` one needs the number of
workgroups to compute the unique thread ID.
* rename WGSL `storagePass` to `preprocess`
* remove `gpuTypeToString`, `genFunctionType` from `backends`
In the end we never used those variants anyway, as the code is fully
separate in their respective submodules now.
* [CUDA] fix indentation for variable declarations
* improve type deduction for types that look generic
I.e. an array type with a constant in place for the array
size (i.e. *not* a `static int` generic argument) will look like a
generic in the sense that depending on how you look at the type with
`getType*`, you will end up seeing `Ident "Foo"` for the constant.
Therefore, we allow array length determination to fail and return
`gtInvalid`. If we see that happens for a variable or parameter, we
try to determine the type from the symbol of the parameter or
variable. For some reason that tends to have the fully instantiated
type.
However, for *return types* this is not possible, as there is no easy
symbol to look at (unless one were to go into the proc body and look
at symbol that is being returned). Therefore, if we encounter this
type of array as a return type (NOTE: most backends won't anyway allow
to return arrays, but that's beside the point), we treat the function
as a generic and look up its type when we encounter it in a `gpuCall`.
Then, the type will be fully instantiated again.
* clone `forwardDeclare` in GpuAst
* support tuples by mapping to object types
Tuples are essentially just objects anyway. So we make the
transformation explicit and thus support tuples (including anonymous).
Note: The Nim compiler already transforms tuple unpacking into
temporary variable + statements to assign the fields.
* support Nim's implicit `result` variable
Finally supports the implicit `result`. :)
* first steps towards supporting expressions
This by itself does not do much. But in order to support (statement
list / block) expressions, some more information about if something is
an expression / if a call is an expression is going to be needed.
* move `cIsExpr` in clone to correct branch
Whoops
* also catch tuple types from `nnkBracketExpr`
* handle `inline` pragma same as `forceinline`
* fix detection of custom `result` by looking into `gpuBlock`
* fix handling of `skipSemicolon` for nested usage
If nested, we must not reset the `skipSemicolon` back to false once
the inner one is done!
* if last expression _is_ return, we don't need `result` variable
At least not in theory. We _could_ construct a function in which there
_is_ a branch that returns `result` and another that returns something
else, but uhh, I guess we can leave that for later.
* remove debug output
* doc comment updates for procs in gpu_field_ops
* do not emit semicolon in binary operand child nodes
* handle function overloads by using `iSym` if encountered
* ignore `magic` pragma procs
* ignore forward declarations in Nim -> GpuAst
Any forward declaration will also have its regular implementation
somewhere. Otherwise the code is invalid anyway. As we generate our
own forward declarations, we can ignore them.
* add BigInt comparison and `toCanonical` conversion
We could consider to rename it `fromMont` to match the regular
Constantine naming, but given that here we (currently) only have a
single type to represent both, I picked a different name.
* add FIPS Montgomery multiplication for fields without spare bits
The Goldilocks field does not have any spare bits. As a result using
CIOS leads to the wrong result.
* [cuda] for now emit `constexpr` for a `gpuConstexpr` (i.e. Nim `const`)
See the added TODO. The main point is that if we write things like
```nim
const M = toBigInt(bigintToUint32Limbs(T.getModulus))
```
we want to have the Nim compiler evaluate the RHS at compile time. If
we were to make the user write
```
var M {.constant.} = toBigInt(bigintToUint32Limbs(T.getModulus))
```
instead the Nim compiler would evaluate the RHS at runtime.
We _could_ force the user to write
```nim
const M {.constant.} = toBigInt(bigintToUint32Limbs(T.getModulus))
```
instead though.
It is no problem however, to emit `__constant__` if it's a global and
`constexpr` for a local (where `__constant__` is anyhow forbidden in
CUDA). The only minor annoyance is that _maybe_ someone wants to emit
`constexpr` also in global scope.
_Maybe_ we'll go with a design that does the above by default but
allows you to overwrite it via
```nim
const M {.constant.} = toBigInt(bigintToUint32Limbs(T.getModulus))
```
-> explicitly force `__constant__`
```nim
const M {.constexpr.} = toBigInt(bigintToUint32Limbs(T.getModulus))
```
-> explicity force `constexpr` instead.
* [cuda] handle generic instantiations and `UncheckedArray`
* [wgsl] do not suffix generic inst types if they have no args
* handle `raises`, `noinit` pragmas
And use normalized strings because `noInit` and `noinit` both appears
often enough.
* use `getTypeName` for tuple type fields
Otherwise we get a big mess for objects :)
* map `ntyString` to string explicitly in gpu type kinds
Not supported on some backends
* map ntyUnused2 (lent T) to ptr T
* generate type names for bracket expressions e.g. generics
* improve generic inst -> gpu type by handling nnkSym
This is not perfect yet (I think). Needs some tests for different
situations. Works in practice for what I've used it for at least.
* overwrite function names of problematic identifiers
E.g. `[]` is not a sensible name on GPU backends. We rename it to
`get` for example. Note that the binary operators there likely never
appear there. We need to handle those via `gpuBinOp` (and call
`maybePatchFnName` from there)
* refactor proc signature parsing & unify adding types to `ctx.types`
NOTE: We'll change the `maybeAddType` code in the future to be done in
`nimToGpuType` directly. However, at the moment that produces a bit
too much required change with having to add `GpuContext` to a whole
bunch of functions that all call `nimToGpuType` internally.
* handle recursive calls in GPU code
Previously due to our parsing of procs whenever they are called, we
would infinitely recurse in the parsing logic. We now record the
function signature and function identifier so that we can avoid that.
We need the function signature to get information about the return
type before we actually start parsing a function. Otherwise _inside_
of the recursive function we wouldn't be able to determine the return
type at the callsite of the recursive call (the initial parse hasn't
been completed at that point yet, which would fill the proc into `allFnTab`)
* fix `modadd` in debug builds on CUDA
In a debug build (`-d:debugCuda`) modular addition produced off by one
errors.
As it turned out the problem was our calculation of `overflowedLimbs`
inside of `finalSubMayOverflow`. The carry flag set in the last
`add_cio` call in `modadd` does not reliably survive into the function
call in a debug build. We compute it directly after computing the last
`add_cio` call in `modadd` and simply pass it as an argument. This way
arithmetic also works reliably on a debug build.
* fix reassignment of `types` in RT codegen for aliases
* handle `nnkObjConstr` correctly when encountering gtGenericInst
* do not emit `f` suffix for float literals
Nim float literals when converting to strings already come with a `f` suffix.
* [wgsl] append `;` for aliases
* rewrite infix as `gpuCall` if arguments not basic types
The idea is that if the arguments are not basic types, we will need a
custom function to perform the infix operation.
Most backends however do not support custom operators and hence actual
`gpuBinOp` are not valid for custom types in general. Hence, we
rewrite them as `gpuCall` nodes with non-symbol based naming.
NOTE: Currently this does not handle the case where we might use an
inbuilt type like `vec3` and its implementation of infix operators
that _may_ be defined after all.
We need to find an elegant solution for that. Either by checking if
argument are of basic types (like in this commit) or if they are a
type annotated with `{.builtin.}`.
Alternatively, we could force the user to define operators for such
inbuilt types (i.e. wrap them) and then if there is a wrapper that is
marked `{.builtin.}` we don't replace infix by `gpuCall` either.
* support arbitrary values in array literals
* handle generic instantiation with actual initializations
I.e. instead of just:
```nim
Vec[float32]()
```
being:
```
ObjConstr
BracketExpr
Sym "Vec3"
Sym "float32"
```
Also handle the case of:
```
Vec3[float32](limbs: [1'f32, 1'f32, 1'f32])
```
being:
```
ObjConstr
BracketExpr
Sym "Vec3"
Sym "float32"
ExprColonExpr
Sym "limbs"
Bracket
Float32Lit 1.0
Float32Lit 1.0
Float32Lit 1.0
```
* fix top level type definitions
Because we still had the `farmTopLevel` adding a (now empty) element
to the `globalBlocks` our array access to `ctx.globalBlocks[0]` to
generate the types didn't actually emit anything.
* [cuda] add pass to strip `deref` if found inside `index` for pointers
But only strip if not pointer to an array type!
* remove old comment
* better handle replacement of derefs
This allows us to make a better choice about when to replace and when
not to replace.
* remove Nim gensym'd suffix from `tmpTuple` variables
I.e. variables that correspond to tuple unpacking in Nim
* replace single letter strings by characters
* extend `maybeAddType` to find types behind Ptr/UA/Array
* move all `builtins` into separate files, one for each backend
* update doc comment of `cuda` macro
* improve explanation of `typeOfIndex` in CUDA's `getType` helper1 parent b3f4ebd commit d6aae1e
File tree
15 files changed
+1922
-488
lines changed- constantine
- math_compiler/experimental
- backends
- builtins
- platforms/abis
15 files changed
+1922
-488
lines changedLines changed: 3 additions & 19 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
14 | 14 | | |
15 | 15 | | |
16 | 16 | | |
17 | | - | |
18 | | - | |
19 | | - | |
20 | | - | |
21 | | - | |
22 | | - | |
23 | | - | |
24 | | - | |
25 | | - | |
26 | | - | |
27 | | - | |
28 | | - | |
29 | | - | |
30 | | - | |
31 | | - | |
32 | | - | |
33 | | - | |
34 | 17 | | |
35 | 18 | | |
36 | 19 | | |
37 | | - | |
| 20 | + | |
| 21 | + | |
38 | 22 | | |
39 | | - | |
| 23 | + | |
40 | 24 | | |
Lines changed: 29 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
| 9 | + | |
9 | 10 | | |
10 | | - | |
11 | 11 | | |
12 | 12 | | |
13 | | - | |
14 | 13 | | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
0 commit comments