This repository was archived by the owner on Oct 20, 2024. It is now read-only.
Fix a crash in -[PSWebSocket connect]#80
Open
eunikolsky wants to merge 1 commit intozwopple:developfrom
Open
Conversation
A crash report of an application using `PocketSocket` shows this stacktrace: ``` EXC_BAD_ACCESS KERN_INVALID_ADDRESS 0x0000000000000030 Crashed: PSWebSocket <http://1.2.3.4:5/path> 0 CoreFoundation 0x88885e9e0 _CFStreamScheduleWithRunLoop + 60 1 CoreFoundation 0x8888dda34 _CFStreamSetDispatchQueue + 236 2 PocketSocket 0x888803114 -[PSWebSocket connect] (PSWebSocket.m:341) 3 libdispatch.dylib 0x888818aa0 _dispatch_call_block_and_release + 24 … ``` Line 341 is `if(_inputStream.streamStatus == NSStreamStatusNotOpen) {` — it's strange that it would call `CFStreamSetDispatchQueue`, so the line number is probably incorrect. The crash could be reproduced by foregrounding and backgrounding the app quickly (in ~1 second) a few times; it happened on line 338: `CFWriteStreamSetDispatchQueue((__bridge CFWriteStreamRef)_outputStream, _workQueue);`, which seems more correct. The analysis of the disassembled `_CFStreamScheduleWithRunLoop` showed the line `0x1fbc01e58 <+60>: ldr x28, [x19, #0x30]` and the exception was `Thread 35: EXC_BAD_ACCESS (code=1, address=0x30)`, so the `x19` register must be zero. `x19` is set to `x0` a few instructions above, where `x0` is the first argument (according to the ["Procedure Call Standard for the ARM 64-bit Architecture"][arm64]). This pointed to and confirmed the problem: `CFWriteStreamSetDispatchQueue` crashes when the `stream` parameter is `nil`. So when the socket is disconnected in the middle of `-connect`, `_inputStream` and/or `_outputStream` become `nil` and cause the crash. The fix is to check if the streams are no longer valid, and if so, stop. To avoid a race condition when the streams may become `nil` right after the check, they are retained strongly for the rest of the `-connect` method. A unit test to reproduce the crash is possible (setup a server socket, then connect to and disconnect from it without a delay multiple times (~500) in a loop), but not included here because: * it's very unreliable and ugly — reproduced this crash in ~5% of runs and required as much CPU usage as possible to slow down the socket processing to mess with the concurrency; * it caused at least three other types of crashes during the stress test; * it required changes in the `-connect` method to add artificial delays to force the crash. [arm64]: http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055b/IHI0055B_aapcs64.pdf
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
A crash report of an application using
PocketSocketshows thisstacktrace:
Line 341 is
if(_inputStream.streamStatus == NSStreamStatusNotOpen) {—it's strange that it would call
CFStreamSetDispatchQueue, so the linenumber is probably incorrect. The crash could be reproduced by
foregrounding and backgrounding the app quickly (in ~1 second) a few
times; it happened on line 338:
CFWriteStreamSetDispatchQueue((__bridge CFWriteStreamRef)_outputStream, _workQueue);, which seems more correct.The analysis of the disassembled
_CFStreamScheduleWithRunLoopshowedthe line
0x1fbc01e58 <+60>: ldr x28, [x19, #0x30]and theexception was
Thread 35: EXC_BAD_ACCESS (code=1, address=0x30), so thex19register must be zero.x19is set tox0a few instructionsabove, where
x0is the first argument (according to the "ProcedureCall Standard for the ARM 64-bit Architecture"). This pointed to
and confirmed the problem:
CFWriteStreamSetDispatchQueuecrashes whenthe
streamparameter isnil. So when the socket is disconnected inthe middle of
-connect,_inputStreamand/or_outputStreambecomeniland cause the crash.The fix is to check if the streams are no longer valid, and if so, stop.
To avoid a race condition when the streams may become
nilright afterthe check, they are retained strongly for the rest of the
-connectmethod.
A unit test to reproduce the crash is possible (setup a server socket,
then connect to and disconnect from it without a delay multiple times
(~500) in a loop), but not included here because:
it's very unreliable and ugly — reproduced this crash in ~5% of runs
and required as much CPU usage as possible to slow down the socket
processing to mess with the concurrency;
it caused at least three other types of crashes during the stress
test;
it required changes in the
-connectmethod to add artificial delaysto force the crash.