Add --stream-logs flag to run-errand for live log tailing via SSH#713
Add --stream-logs flag to run-errand for live log tailing via SSH#713abg wants to merge 2 commits intocloudfoundry:mainfrom
Conversation
5733a01 to
f3c0e0a
Compare
f3c0e0a to
263088b
Compare
When --stream-logs is set, the errand is started asynchronously and
task events are polled to discover running instances. SSH sessions
are established to each instance to tail errand log files in real
time, interleaved with the standard task event output.
Requirements:
- The CLI must have SSH access to the BOSH VMs where errands run
(directly or via a gateway configured with --gw-* flags).
- Errands must write output to log files under /var/vcap/sys/log/.
By default, the CLI tails /var/vcap/sys/log/<errand>/<errand>.{stdout,stderr}.log.
Errands that write to non-standard paths will not have their output
streamed unless --stream-log-path is specified.
- Use --stream-log-path <path> to override the default log location,
e.g. --stream-log-path "my-job/*.log". The path is relative to
/var/vcap/sys/log/ and supports shell glob and brace expansion.
263088b to
1850e8d
Compare
selzoc
left a comment
There was a problem hiding this comment.
This is quite the Rube Goldberg machine! However, without an involved refactoring of how errands work in the BOSH ecosystem, I think this is pretty close to the only way to do it.
Unsure how fragile using SSH streaming will be over long errands.
| } | ||
|
|
||
| func (c RunErrandCmd) runWithStreaming(opts RunErrandOpts) error { | ||
| if c.nonIntSSHRunner == nil { |
There was a problem hiding this comment.
Is there a circumstance in which this could happen?
There was a problem hiding this comment.
No, this was silly defensive code.
| LogsDirectory DirOrCWDArg `long:"logs-dir" description:"Destination directory for logs" default:"."` | ||
|
|
||
| StreamLogs bool `long:"stream-logs" description:"Stream errand log files via SSH while errand runs"` | ||
| StreamLogPath string `long:"stream-log-path" description:"Log file path to tail (default: /var/vcap/sys/log/$ERRAND/$ERRAND.{stdout,stderr}.log)"` |
There was a problem hiding this comment.
I think this is implemented as being relative to /var/vcap/sys/log? Should we restrict it to a subdirectory for security reasons (i.e. ../../../etc/passwd or whatever should be disallowed)
There was a problem hiding this comment.
Or perhaps we just say "we'll stream everything in /var/vcap/sys/log/$ERRAND"
There was a problem hiding this comment.
Hmm. I had thought about this one earlier. I see in my local history was initially rejecting paths with ".." in them - next to the regex safety checks in BuildErrandTailCmd but it seems to have been dropped from the commit. Whoops.
This is definitely something to guard against.
There was a problem hiding this comment.
Or perhaps we just say "we'll stream everything in
/var/vcap/sys/log/$ERRAND"
Yeah, I had started with that concept early on, but there can be a lot of nonsense in the errand directory that's not pretty to look at. bpm.log for instance is probably not terribly informative. Maybe we could filter that out though.
I think I like the idea of dropping --stream-log-path though. I had put it in initially because there are some really oddly named errands in the world that do not necessarily map to the job name or similar.
If this could enforce the bpm convention of /var/vcap/sys/log/$ERRAND/$ERRAND.{stdout,stderr}.log that simplifies some things including these security concerns.
There was a problem hiding this comment.
I'd rather take the approach of starting very narrow, and then expanding as needed to keep thing simpler if we can
| // arguments with spaces before passing them to the remote shell. Without | ||
| // quotes, "bash -c <script>" is interpreted as "bash -c for" with the | ||
| // remaining tokens becoming positional parameters. | ||
| return []string{"sudo", "bash", "-c", "'" + tailScript + "'"}, nil |
There was a problem hiding this comment.
Do we need sudo if we're looking at stuff in /var/vcap/sys/log?
There was a problem hiding this comment.
This is the same sudo wrapping bosh logs --follow does ref.
I think in particular bpm creates logs scoped purely to the vcap user, not the vcap group (0600/-rw-------).
The bosh ssh user is random and part of the vcap group, but not the vcap user so must sudo to read those logs.
Example:
$ bosh ssh ...
$ whoami
bosh_9564f3b73d69470
$ groups
bosh_9564f3b73d69470 admin vcap bosh_sshers bosh_sudoers
$ tail /var/vcap/sys/log/proxy/proxy.stdout.log
tail: cannot open '/var/vcap/sys/log/proxy/proxy.stdout.log' for reading: Permission denied
$ sudo tail ...
# success!
| sshCancel() | ||
| sshWg.Wait() | ||
|
|
||
| for _, slug := range sessions { |
There was a problem hiding this comment.
Will we possibly leak ssh sessions here if an error occurs before we reach this point? Any call for a defer of the cleanup to make sure it runs?
There was a problem hiding this comment.
In this case, I suppose I would defend this by suggesting there is no early return that would skip the cleanup.
But the code could be improved here.
| return err | ||
| } | ||
|
|
||
| // Suppress stdout/stderr in the summary since we already streamed the logs. |
There was a problem hiding this comment.
Maybe if any of the ssh streams fail we can fall back to printing these?
When --stream-logs is set, the errand is started asynchronously and task events are polled to discover running instances. SSH sessions are established to each instance to tail errand log files in real time, interleaved with the standard task event output.
Requirements: