Improve handling of HTTP response when it is not in JSON form#49
Improve handling of HTTP response when it is not in JSON form#49
Conversation
|
@vkuznet |
|
I can't see any comments myself. I don't know if @vkuznet can. @yuyiguo, I just added you as a requested reviewer. This should give you a green button at the top of the page to "Add your review" which will let you put comments on the code that then @vkuznet can address. At least, this is how I've always handled reviews. @vkuznet if you'd like other reviewers, maybe it would be good to use the reviewers interface on GH to request reviews from them as well? |
|
Thanks @klannon , I used "add your review" button and submitted my comments. I hope everyone can see it this time. :-). |
|
@yuyiguo I don't see your comments. Here is how I review the code:
|
|
I added Alan and Stefano as reviewers too. So far I don't see Yuyi's comments though. |
src/python/dbs/apis/dbsClient.py
Outdated
| self.http_response = method_func(self.url, method, params, data, request_headers) | ||
| except HTTPError as http_error: | ||
| self.__parseForException(http_error) | ||
| if http_error.code == 200: |
There was a problem hiding this comment.
there are json format error returned besides http error code 200. See
#43. It a more general error handling using the error body format than the error code.
There was a problem hiding this comment.
Yuyi, the point is that __parseForException function does not check if given http_error has JSON and what it does check is presence of str(data).find("<html>")!=-1 but the HTTP response can be in different forms, it may not contain such pattern. Moreover, it can be proxy dependent, e.g. apache frontend can return one pattern while another frontend may return another. You either need to rely on HTTP codes and check for Content-Type (this is what I should add to PR) or leave response as is.
I'll change the PR to check for content-type instead of code then.
There was a problem hiding this comment.
@yuyiguo I changed code to check for content type. This is the most robust approach. The HTTP response from any server should provide content type. If it is JSON you can safely parse it, otherwise it is better to print it as is.
src/python/dbs/apis/dbsClient.py
Outdated
| if http_error.code == 200: | ||
| self.__parseForException(http_error) | ||
| else: | ||
| print("### HTTPError ###") |
There was a problem hiding this comment.
Prints are for human, not program. You may want to hide these print in debug mode.
There was a problem hiding this comment.
No, I doubt it, the user should be able to understand the error. If error hides the details of HTTP response I think it is much better to see them in non-verbose mode.
There was a problem hiding this comment.
I realized you like to print out everything in the code and it is OK. but why you think "raise http_error" will not provide enough error info to the clients? In the end, the errors are handled by program that will not look into what you print out, but the exceptions. If the user really want to see the print out, they should turn on debug.
In addition to this, the huge log files w/ unnecessary info will make it difficult for one to find useful content under emergency debugging . This is my two cents.
There was a problem hiding this comment.
Yuyi, I raise because it is already an exception. I print because debug info is too late when exception happen when user do not use debug mode. If exception happened we need to understand what happen. As such the more information user have the better chances to understand what happen. Otherwise we come back to square one that exception is cryptic and users have no clue to understand what went wrong.
|
@vkuznet Can you see now? |
|
@yuyiguo , yes now I can see it and will address it. |
|
A friendly reminder: please squash the commits into one. |
|
Can you put the meaningful information in the exception message ? |
|
@belforte , good point, instead of adding printous which may or may not be visible (depends on app use the logger) I can easily add required info to the exception itself. I'll provide an update. |
|
Based on your feedback I adjust code to be more robust in parsing HTTP headers and through exception with all details. Now the output of exception looks like: So far I decided to not through HTTPError since this class is defined in dbs3-pycurl library and it still hides all relevant details about HTTP error, like it does not provide actual body of HTTP response. If you want a more clean design, then the HTTP error code in dbs3-pycurl should be modified to properly give all details of HTTP response (like body, headers, etc). Please let me know if this would better option and I can re-arrange code to dbs3-pycurl. |
|
@yuyiguo I converted it to working in progress draft as the consensus on how exception should be thrown should be clearly defined. I will squash changes once final version will be confirmed by you and others. |
src/python/dbs/apis/dbsClient.py
Outdated
| http_error.msg, | ||
| http_error.header, | ||
| http_error.body) | ||
| raise Exception(msg) |
There was a problem hiding this comment.
Can you call self.__parseForException instead of raising general Exception ? You may create a json string similar to line 417/msg to pass it to self._parseForException.
There was a problem hiding this comment.
yes, I can please review again. Now the output looks almost identical but I construct custom HTTPError and pass it to self.__parseForException. Here is relevant part of new output:
RestClient.ErrorHandling.RestClientExceptions.HTTPError: HTTP Error 400:
URL=https://cmsweb-testbed.cern.ch:8443/dbs/bla/DBSReader/help
Code=400
Message=Bad Request
Header=HTTP/1.1 400 Bad Request
Date: Thu, 11 Nov 2021 18:22:16 GMT
Server: Apache
Content-Length: 226
Connection: close
Content-Type: text/html; charset=iso-8859-1
Body=<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>400 Bad Request</title>
</head><body>
<h1>Bad Request</h1>
<p>Your browser sent a request that this server could not understand.<br />
</p>
</body></html>
f0064a0 to
93f122c
Compare
|
ok, now it is squashed. |
amaltaro
left a comment
There was a problem hiding this comment.
Thanks, Valentin and Yuyi. I left a couple of comments along the code for your consideration.
Valentin, can you please update the initial description of this PR with what is actually provided here? I believe you have changed it from the initial proposal, right?
93f122c to
b7e7a7b
Compare
|
@amaltaro please review it before you're going to vacation such that I can merge and proceed with it. |
amaltaro
left a comment
There was a problem hiding this comment.
If the only way to parse the headers is with plain string, then it looks good to me. I would expect headers to be of a dict or list of tuples type though.
|
@amaltaro , the headers in this case comes from pycurl library which provide them as they come on a wire (i.e. HTTP response). The headers never comes as dict, list types as those are language specific data-types. |
Currently, DBSClient incorrectly handle JSON decoding errors when the data input does not represent JSON. For example, the following code:
will produce the following error:
Depending on return data it may be difficult to understand what exactly data had since the data is not returned by Python exception.
The proposed PR improve error handling by checking the actual HTTP code and if it is not 200, it can return the actual data including all details of HTTP request. For instance, running the same code we will get the following output:
Now, the output provide all details of HTTP response and preserve the same Python exception.
These modifications will be extremely useful in order to understand the actual error. For instance, if timeout happens on cmsweb frontends currently we have the following error:
This error contains two exceptions, one from HTTPError and another from JSON decoder parser. But it does not provide any detail of HTTP response. For average user this error is very cryptic and hard to understand.