Conversation
|
In this PR, validations work only when detailed-stats is turned off. It can be worked on in a separate PR. @patsonluk can you please review and also verify that it doesn't break detailed-stats functionality? |
| log.info("Query base URL " + baseUrl); | ||
| for (int threads = benchmark.minThreads; threads <= benchmark.maxThreads; threads++) { | ||
| ControlledExecutor.ExecutionListener<String, NamedList<Object>> listener = benchmark.detailedStats ? new DetailedQueryStatsListener() : new ErrorListener(); | ||
| ControlledExecutor.ExecutionListener<String, SolrBenchQueryResponse> listener = benchmark.detailedStats ? new DetailedQueryStatsListener() : new ValidationListener(); |
There was a problem hiding this comment.
Perhaps we should allow both validation and detailed stats later on :)
| try { | ||
| validationResults = runQueryValidations(benchmark.validations, ((ValidationListener) listener).queryResponses, benchmark, collection, baseUrl); | ||
| } catch (Exception e) { | ||
| log.error("Problem during validations", e); |
There was a problem hiding this comment.
Should we just let the exception throws as is? The method signature can actually be throws Exception now. There was a change that wraps call these tasks into a Callable, which it can throws exception (instead of Runnable that does not allow checked exception).
If exception is thrown, then the executor should properly handle the correct task dependency (do not proceed with dependant tasks) and error reporting (print to console with the root cause)
| this.queryRequest = queryRequest; | ||
| this.client = client; | ||
| this.collection = collection; | ||
| this.queryString = queryString; |
There was a problem hiding this comment.
Is this different from the queryRequest.toString() ? getType() and getQueryString() might return very similar result. getType() might not be the best naming that i used 😓 but it was actually used to identify stats for different "types" which each type is just the query.
If query string is more accurate, perhaps getType() should just return the query string?
I named it type probably wanted to make it flexible (so the implementation can determine what is the "key"/"type") , but it might be a bit too generic that's confusing to others
| import org.slf4j.LoggerFactory; | ||
|
|
||
| // nocommit: javadocs | ||
| public class SolrBenchQueryResponse { |
There was a problem hiding this comment.
Nice to have this class to avoid double reading the stream that triggers errors 💪🏼
|
|
||
| InputStream responseStream = (InputStream) response.get("stream"); | ||
| try { | ||
| responseString = getResponseStreamAsString(responseStream); // should only call this once, as this reads the stream! |
There was a problem hiding this comment.
Is it possible to make reading the stream lazy? My concern is that this always read the response stream, which seems unnecessary if the caller does not care about detailed stats nor needing any validations (which i assume is a common case for general benchmarking?)
| } catch (IOException e) { | ||
| logger.warn("Failed to read the response stream for " + typeKey); | ||
| } | ||
| if (sbq.responseString != null) { |
There was a problem hiding this comment.
we can probably remove this check or only check if sbq is null. As sbq.reponseString could be null (if the response is erroneous) and in such case, we do want to increment the errorCount instead of skipping all logic :)
| @Override | ||
| public void onExecutionComplete(String typeKey, NamedList<Object> result, long duration) { | ||
| public void onExecutionComplete(String typeKey, SolrBenchQueryResponse result, long duration) { | ||
| queryResponses.add(result); |
There was a problem hiding this comment.
I have some concerns on memory usage as this appears to store all responses until the current query benchmark task run finishes.
I don't think it's a blocker right now, as it should be okay if the response size is small or if the total amount queries executed are low.
Perhaps in the future we can consider doing the validation at onExecutionComplete itself? I assume the overall computation is the same - validate each result here vs iterate all responses and validate at the end. It might increase the total duration per task, but per query duration (which is what matters more?) should be the same as far as we disregard the time spent in validation :) ?
Another minor benefit of doing it as here is we can fail fast if we want.
Similarly, for the run that generates validation (instead of performing validation), we can as well just keep the doc count here instead of the whole response?
Anyway, not a blocker I think 😊
Sorry about the conflict 😓 ! I have briefly reviewed and it's mostly LGTM except a few comments. I will test tomorrow to make sure the detailed stats are still there w/o issues |
Had to abandon the previous PR (#44) due to various merge conflicts, and re-open it here. Please follow the discussions there.