[InferAtomsPass] Instruction scheduling #2

robertzhidealx · 2023-12-19T01:30:06Z

Optimizing atomic regions for (smaller) size.

It's now necessary to have a complete picture of which instructions are tainted and which aren't (whereas before we really only needed to know the boundaries of a region).

Test plan: make eg3 for an example where the freshness atomic region size is reduced (quite substantially) thanks to the optimization.

Before optimization:

define void @app() #0 {
entry:
  %x = alloca i32, align 4
  %y = alloca i32, align 4
  %z = alloca i32, align 4
  call void @atomic_start()         ; <--- START
  %call = call i32 @input()
  store i32 %call, ptr %x, align 4
  store i32 1, ptr %y, align 4
  %0 = load i32, ptr %y, align 4
  %add = add nsw i32 %0, 1
  store i32 %add, ptr %z, align 4
  %1 = load i32, ptr %z, align 4
  call void @log(i32 noundef %1)
  %2 = load i32, ptr %x, align 4
  call void @log(i32 noundef %2)
  call void @atomic_end()           ; <--- END
  ret void
}

After optimization (example03.ll):

define void @app() #0 {
entry:
  %x = alloca i32, align 4
  %y = alloca i32, align 4
  %z = alloca i32, align 4
  call void @atomic_start()         ; <--- START
  %call = call i32 @input()
  store i32 %call, ptr %x, align 4
  %0 = load i32, ptr %x, align 4
  call void @log(i32 noundef %0)
  call void @atomic_end()           ; <--- END
  store i32 1, ptr %y, align 4
  %1 = load i32, ptr %y, align 4
  %2 = add nsw i32 %1, 1
  store i32 %2, ptr %z, align 4
  %3 = load i32, ptr %z, align 4
  call void @log(i32 noundef %3)
  ret void
}

For an example of an optimization of FreshConsistent regions, check out example04.*, or run the example yourself via make eg4/make run_eg4.

Loops are also now optimized. Untainted loop instructions are extracted into their own loop, not to be wrapped in the atomic region, thereby reducing its size. Check out example05.* and example07.* for example transformations.

In this process, much care was taken to soundly clone and rewire the basic blocks constituting loops together, as you may observe from the example IRs and the code. However, it is an overall elegant approach, with clear cloning logic and special bookkeeping mostly for blocks that handle loop condition checking and loop variable updating.

…tor code Below are the key changes: - Use LLVM's new pass manager, a major improvement from the legacy one. - Fix a shortcoming of the inference algorithm to actually collect all uses of a fresh/consistent variable. - Optimize the inference cleanup algorithm to remove all instructions associated with the arguments of fresh/consistent annotations. - Thoroughly log debug messages throughout the components of the pass for a clearer view of the process. - Rename files, structs, functions, variables, etc. to be more descriptive and consistent. - General code style refactoring (e.g., use `auto` and structured bindings (destructuring) where possible). - Added simple C tests to `benchmarks/ctests`.

Useful extensible shortcuts to running tests.

Step 1 of optimizing atomic regions for (smaller) size. In essence, it's now necessary to have a complete picture of which instructions are tainted (whereas before we really only needed to know the boundaries of a region). Test plan: `make eg3` for an example where the freshness atomic region size is reduced thanks to the optimization.

robertzhidealx · 2023-12-19T17:54:02Z

ocelot/AtomicRegionInference/src/InferFreshCons.cpp

+    errs() << "[regionsNeeded] Go over all block insts\n";
+#endif
+    std::set<BasicBlock*> seenBlocks;
+    for (auto& [_, B] : blocks) {


Instruction scheduling starts here.

@app

The optimization is now much more robust against general source programs. Freshness annotations now work pretty well! The main fix to the previous setup involves a mapping from old instructions to cloned ones. Since cloning an instruction (e.g., BinaryOperator) doesn't automatically clone its operands, this mapping is required to help replace the operands of cloned instructions with the clones of those operands. Cloning is the only approach to such replacements due to the LLVM IR being in SSA form. Test plan: Run examples01/02/03 to see the tranformations. For example, ```sh make eg3 ``` Before optimization: ```llvm define void @app() #0 { entry: %x = alloca i32, align 4 %y = alloca i32, align 4 %z = alloca i32, align 4 call void @atomic_start() ; <--- START %call = call i32 @input() store i32 %call, ptr %x, align 4 store i32 1, ptr %y, align 4 %0 = load i32, ptr %y, align 4 %add = add nsw i32 %0, 1 store i32 %add, ptr %z, align 4 %1 = load i32, ptr %z, align 4 call void @log(i32 noundef %1) %2 = load i32, ptr %x, align 4 call void @log(i32 noundef %2) call void @atomic_end() ; <--- END ret void } ``` After optimization: ```llvm define void @app() #0 { entry: %x = alloca i32, align 4 %y = alloca i32, align 4 %z = alloca i32, align 4 call void @atomic_start() ; <--- START %call = call i32 @input() store i32 %call, ptr %x, align 4 %0 = load i32, ptr %x, align 4 call void @log(i32 noundef %0) call void @atomic_end() ; <--- END store i32 1, ptr %y, align 4 %1 = load i32, ptr %y, align 4 %2 = add nsw i32 %1, 1 store i32 %2, ptr %z, align 4 %3 = load i32, ptr %z, align 4 call void @log(i32 noundef %3) ret void } ``` You may also link, build, and run an executable via: ```sh make run_eg3 && ../../benchmarks/ctests/example03.out ```

...by moving non-IO instructions out of regions.

…regions Mostly working, except optimizations done on a FreshConsistent region need to converge back into a single (nested) region.

@app

…ation When a variable has both freshness and consistency constraints, the overlap between the optimized inferred atomic region is now properly handled, by nesting them such that only the outermost bounds count. See benchmarks/ctests/example04.ll for an example. Before: ```llvm define void @app() #0 { entry: %x = alloca i32, align 4 %y = alloca i32, align 4 call void @atomic_start() ; <-- OUTER START %call = call i32 @input() store i32 %call, ptr %x, align 4 call void @atomic_start() ; <-- INNER START %call1 = call i32 @input() call void @atomic_end() ; <-- INNER END store i32 %call1, ptr %y, align 4 %0 = load i32, ptr %x, align 4 call void @log(i32 noundef %0) %1 = load i32, ptr %y, align 4 call void @log(i32 noundef %1) call void @atomic_end() ; <-- OUTER END ret void } ``` After: ```llvm define void @app() #0 { entry: %x = alloca i32, align 4 %y = alloca i32, align 4 call void @atomic_start() ; <-- OUTER START %call = call i32 @input() call void @atomic_start() ; <-- INNER START %call1 = call i32 @input() call void @atomic_end() ; <-- INNER END store i32 %call1, ptr %y, align 4 %0 = load i32, ptr %y, align 4 call void @log(i32 noundef %0) call void @atomic_end() ; <-- OUTER END store i32 %call, ptr %x, align 4 %1 = load i32, ptr %x, align 4 call void @log(i32 noundef %1) ret void } ```

@app

…re optimizing loops One objective as of now is to make optimizations even more robust by supporting more corner cases. For an example where the IO function is `input(int i)` (`benchmarks/ctests/example06.c`), optimizations shouldn't incorrectly delay the instructions related to the argument `i`, and should instead produce: ```llvm define void @app() #0 { entry: %i = alloca i32, align 4 %x = alloca i32, align 4 store i32 1, ptr %i, align 4 <-- %0 = load i32, ptr %i, align 4 <-- call void @atomic_start() %call = call i32 @input(i32 noundef %0) <-- DEPENDS ON THE ABOVE store i32 %call, ptr %x, align 4 %1 = load i32, ptr %x, align 4 call void @log(i32 noundef %1) call void @atomic_end() ret void } ``` As for loop optimizations, unlike WARio (which targets checkpointing runtimes), loop unrolling (i.e., creating multiple smaller copies of the loop) doesn't help in atomic region inference, since these loops must still be in the same region. Thus, the "costliness" of the region won't be lessened. There are optimizations to be done though. For instance, loops entirely untainted by inputs under constraint(s) can be delayed and moved out of atomic regions just like many other instructions can. The difficulty with this part lies in rewiring the complex branching/connections among the basic blocks that form these loops, making an optimizing analysis harder to devise. `benchmarks/ctests/example05` illustrates an instance where the optimization above applies. I will be working on this as a next step.

Extract untainted instructions into their own loop that doesn't go into the atomic region. Test plan: `make eg5` and observe the difference between `benchmarks/ctests/example05.ll` (optimized) and `benchmarks/ctests/example05.orig.ll` (original), or `make eg7`.

…and refactoring for concision Now, the instructions "tainted" by an IO call will be included in the fresh set as well, making it so that they remain preceeding the IO call, within their atomic region. This is a more fundamental solution than before, where exceptions were only made to these instructions during optimization. The optimization now has a more modular structure where common instruction patching logic is extracted into a reusable procedure to be run more than once (`Helpers::patchClonedBlock`). It comes into play after cloning a basic block, to rewire its instructions to properly reference each other. Test plan: `make`

In the case of loop conditions that depend on fresh/consistent input values, no instruction in the loop body can be extracted out from the atomic region, as shown in the example below: ```rust fn app() -> () { let x = input(); for _ in 0..10 { let y = 1; log(y + 2); log(x); } Fresh(x); } ``` Test plan: `make eg8`

Fix an issue with extracting IO functions from source code. Add several tests, including a few in Rust.

Only small changes are required for the optimization to work on Rust programs involving loops. See tests `example.rs`, `example11.rs`, and `example12.rs`.

robertzhidealx added 4 commits November 21, 2023 17:52

[InferAtomsPass] Makefile to simplify testing

c7fd8d0

Useful extensible shortcuts to running tests.

[InferAtomsPass] Unignore .ll files in ctests and add more comments

cde1b66

robertzhidealx force-pushed the optimizations branch from 6083f0e to cde1b66 Compare December 19, 2023 03:44

robertzhidealx commented Dec 19, 2023

View reviewed changes

robertzhidealx force-pushed the optimizations branch from f531af3 to c772992 Compare January 31, 2024 04:11

robertzhidealx added 6 commits February 3, 2024 18:33

[InferAtomsPass] More code cleanup & debug logging

7d71c88

[InferAtomsPass] Minimize consistent atomic regions

7c01f9a

...by moving non-IO instructions out of regions.

[WIP][InferAtomsPass] Optimize Consistent and FreshConsistent atomic …

e7b9bfa

…regions Mostly working, except optimizations done on a FreshConsistent region need to converge back into a single (nested) region.

[InferAtomsPass] Demo Consistent region optimization

ebc7cc7

robertzhidealx force-pushed the optimizations branch from 0907a4a to 6859d35 Compare February 13, 2024 03:26

robertzhidealx force-pushed the optimizations branch 4 times, most recently from f5316b2 to f56c010 Compare March 10, 2024 23:23

robertzhidealx force-pushed the optimizations branch from f56c010 to b8b0037 Compare March 11, 2024 02:04

robertzhidealx added 5 commits March 11, 2024 19:05

[InferAtomsPass] More tests and working impl. for some Rust programs

c02ae2c

Fix an issue with extracting IO functions from source code. Add several tests, including a few in Rust.

[InferAtomsPass] Rename unit test folder to "tests"

fd5d5b7

[InferAtomsPass] Slight tweak to support Rust programs with loops

883cb9c

Only small changes are required for the optimization to work on Rust programs involving loops. See tests `example.rs`, `example11.rs`, and `example12.rs`.

robertzhidealx force-pushed the optimizations branch from f04cba1 to 883cb9c Compare March 16, 2024 03:09

[InferAtomsPass] More Rust loop tests

ca0e667

robertzhidealx changed the title ~~[WIP][InferAtomsPass] Instruction scheduling~~ [InferAtomsPass] Instruction scheduling Mar 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[InferAtomsPass] Instruction scheduling #2

[InferAtomsPass] Instruction scheduling #2

Uh oh!

robertzhidealx commented Dec 19, 2023 •

edited

Loading

Uh oh!

robertzhidealx Dec 19, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[InferAtomsPass] Instruction scheduling #2

Are you sure you want to change the base?

[InferAtomsPass] Instruction scheduling #2

Uh oh!

Conversation

robertzhidealx commented Dec 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

robertzhidealx Dec 19, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

robertzhidealx commented Dec 19, 2023 •

edited

Loading