-
Notifications
You must be signed in to change notification settings - Fork 3
Add Test Suite for Long Running Tests #314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
yacovm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
made a quick pass, I think it's pretty useful overall. we can build something on top that will generate random test cases and just run this.
Will make another pass later.
| } | ||
| } | ||
|
|
||
| func (n *LongRunningInMemoryNetwork) waitUntilAllRoundsEqual() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't this function return even if no blocks were committed? e.g at round 0?
Shouldn't we perhaps pass in some kind of predicate on the round / sequence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, i'm using it as more of a helper function for StopAndAssert but I can see a predicate being helpful if we decide to expose this function.
| } | ||
|
|
||
| // AssertHealthy checks that the WAL has at most one of each record type per round. | ||
| func (tw *TestWAL) AssertHealthy(bd simplex.BlockDeserializer, qcd simplex.QCDeserializer) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we have a notarization and an empty notarization in the WAL? via replicating one of them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea we can have one of each for the same round, but we should never have two of the same one for the same round.
ca4efa4 to
b5ec286
Compare
long_running_test.go
Outdated
| func TestLongRunningReplication(t *testing.T) { | ||
| net := testutil.NewDefaultLongRunningNetwork(t, 10) | ||
| for _, instance := range net.Instances { | ||
| instance.SilenceExceptKeywords("Received replication response", "Resending replication requests for missing rounds/sequences") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why we're doing this. There is nothing in the test that I see that requires intercepting the log, so why do we care that it's printed?
Can you explain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'll remove it. it was there to help debug the test, same with the other comment you had below.
| l *TestLogger | ||
| t *testing.T | ||
| BB ControlledBlockBuilder | ||
| messageTypesSent map[string]uint64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we're only incrementing the message types but never reading them... why do we need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i accidentally forgot to uncomment out the code that prints messageTypesSent. Relates to this comment(#314 (comment)) It was useful to print the message types for debugging purposes.
| msg *simplex.Message | ||
| from simplex.NodeID | ||
| }, 1000)} | ||
| }, 100000)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do the tests not work with the previous buffer size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some tests were getting flakey at 1000, so i bumped it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is that possible? we increment the time only 10 times per second 🤷♂️
| } | ||
|
|
||
| amount := simplex.DefaultEmptyVoteRebroadcastTimeout / 5 | ||
| go n.UpdateTime(100*time.Millisecond, amount) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UpdateTime iterates over instances without taking a lock, but we may re-create an instance via RestartNodes. Isn't it unsafe from a concurrency aspect?
| } | ||
|
|
||
| func (n *LongRunningInMemoryNetwork) RestartNodes(nodeIndexes ...uint64) { | ||
| for _, idx := range nodeIndexes { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't we need to stop the previous instance? I can't find where we're doing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CrashNodes calls stop on the instances. I guess currently there is no way for the epoch to know if we have already called Stop on it. Should the LongRunningInMemoryNetwork struct keep track of whether or not we already stopped an instance and check it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just add to stop the nodes in the restart. You never restart the nodes alongside crash so it should be fine.
🤔 |
This PR adds a functionality for a different style of testing. It allows the tester to spin up tests where they control the behavior of the network rather than the behavior of individual nodes. This framework can test Simplex in ways that would have otherwise been to tedious and cumbersome to test. For example this new simple test allows us to spin up a network of 10 nodes, wait for them to enter a specific round, disconnect a few of them, and then reconnect at a later time. At the end of the test, we assert that all nodes are properly functioning.
Before this would have required a tremendous amount of boilerplate code, and specific wherewithal to properly orchestrate replication, block building, etc..
This new struct wraps InMemNetwork, so all the previous functionality of InMemNetwork can still be used in these tests. However, we add an additional set of helper functions