Skip to content

Improve handling of HTTP response when it is not in JSON form#49

Merged
vkuznet merged 2 commits intodmwm:mainfrom
vkuznet:fix-json-err
Nov 16, 2021
Merged

Improve handling of HTTP response when it is not in JSON form#49
vkuznet merged 2 commits intodmwm:mainfrom
vkuznet:fix-json-err

Conversation

@vkuznet
Copy link
Contributor

@vkuznet vkuznet commented Nov 11, 2021

Currently, DBSClient incorrectly handle JSON decoding errors when the data input does not represent JSON. For example, the following code:

from dbs.apis.dbsClient import *

url="https://cmsweb-testbed.cern.ch/dbs/bla/DBSReader"
api = DbsApi(url=url, debug=1)
res = api.help()
print(res)

will produce the following error:

HTTP=GET URL=https://cmsweb-testbed.cern.ch:8443/dbs/bla/DBSReader method=help params={} data={} headers={'Content-Type': 'application/json', 'Accept': 'application/json', 'UserID': 'vk@vkair', 'User-Agent': 'DBSClient/Unknown/'}
Traceback (most recent call last):
  File "/Users/vk/CMS/DMWM/GIT/DBSClient/./t.py", line 6, in <module>
    res = api.help()
  File "/Users/vk/CMS/DMWM/GIT/DBSClient/src/python/dbs/apis/dbsClient.py", line 496, in help
    return self.__callServer("help", params=kwargs)
  File "/Users/vk/CMS/DMWM/GIT/DBSClient/src/python/dbs/apis/dbsClient.py", line 403, in __callServer
    self.__parseForException(http_error)
  File "/Users/vk/CMS/DMWM/GIT/DBSClient/src/python/dbs/apis/dbsClient.py", line 429, in __parseForException
    raise http_error
  File "/Users/vk/CMS/DMWM/GIT/DBSClient/src/python/dbs/apis/dbsClient.py", line 401, in __callServer
    self.http_response = method_func(self.url, method, params, data, request_headers)
  File "/Users/vk/CMS/DMWM/GIT/DBSClient/dbs3-pycurl-3.17.4/src/python/RestClient/RestApi.py", line 34, in get
    return http_request(self._curl)
  File "/Users/vk/CMS/DMWM/GIT/DBSClient/dbs3-pycurl-3.17.4/src/python/RestClient/RequestHandling/HTTPRequest.py", line 62, in __call__
    raise HTTPError(effective_url, http_code, http_response.msg, http_response.raw_header, http_response.body)
RestClient.ErrorHandling.RestClientExceptions.HTTPError: HTTP Error 400: Bad Request

Depending on return data it may be difficult to understand what exactly data had since the data is not returned by Python exception.

The proposed PR improve error handling by checking the actual HTTP code and if it is not 200, it can return the actual data including all details of HTTP request. For instance, running the same code we will get the following output:

HTTP=GET URL=https://cmsweb-testbed.cern.ch:8443/dbs/bla/DBSReader method=help params={} data={} headers={'Content-Type': 'application/json', 'Accept': 'application/json', 'UserID': 'vk@vkair', 'User-Agent': 'DBSClient/Unknown/'}
### HTTPError ###
URL: https://cmsweb-testbed.cern.ch:8443/dbs/bla/DBSReader/help
HTTP code: 400
HTTP Message: Bad Request
HTTP Header: HTTP/1.1 400 Bad Request
Date: Thu, 11 Nov 2021 14:21:01 GMT
Server: Apache
Content-Length: 226
Connection: close
Content-Type: text/html; charset=iso-8859-1


HTTP Body
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>400 Bad Request</title>
</head><body>
<h1>Bad Request</h1>
<p>Your browser sent a request that this server could not understand.<br />
</p>
</body></html>

Traceback (most recent call last):
  File "/Users/vk/CMS/DMWM/GIT/DBSClient/./t.py", line 6, in <module>
    res = api.help()
  File "/Users/vk/CMS/DMWM/GIT/DBSClient/src/python/dbs/apis/dbsClient.py", line 506, in help
    return self.__callServer("help", params=kwargs)
  File "/Users/vk/CMS/DMWM/GIT/DBSClient/src/python/dbs/apis/dbsClient.py", line 413, in __callServer
    raise http_error
  File "/Users/vk/CMS/DMWM/GIT/DBSClient/src/python/dbs/apis/dbsClient.py", line 402, in __callServer
    self.http_response = method_func(self.url, method, params, data, request_headers)
  File "/Users/vk/CMS/DMWM/GIT/DBSClient/dbs3-pycurl-3.17.4/src/python/RestClient/RestApi.py", line 34, in get
    return http_request(self._curl)
  File "/Users/vk/CMS/DMWM/GIT/DBSClient/dbs3-pycurl-3.17.4/src/python/RestClient/RequestHandling/HTTPRequest.py", line 62, in __call__
    raise HTTPError(effective_url, http_code, http_response.msg, http_response.raw_header, http_response.body)
RestClient.ErrorHandling.RestClientExceptions.HTTPError: HTTP Error 400: Bad Request

Now, the output provide all details of HTTP response and preserve the same Python exception.

These modifications will be extremely useful in order to understand the actual error. For instance, if timeout happens on cmsweb frontends currently we have the following error:

Traceback (most recent call last):
  File "/data/users/vk/dbs/DBS/DBSClient/src/python/dbs/apis/dbsClient.py", line 440, in __parseForException
    data = json.loads(data)
  File "/data/users/vk/anaconda3/lib/python3.8/json/__init__.py", line 357, in loads
    return _default_decoder.decode(s)
  File "/data/users/vk/anaconda3/lib/python3.8/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/data/users/vk/anaconda3/lib/python3.8/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/data/users/vk/dbs/DBS/DBSClient/tests/dbsclient_t/validation/DBSValidation_t.py", line 648, in test16
    result2 = self.cmswebtestbed_api.listFileParentsByLumi(block_name=block["block_name"])
  File "/data/users/vk/dbs/DBS/DBSClient/src/python/dbs/apis/dbsClient.py", line 737, in listFileParentsByLumi
    return self.__callServer("fileparentsbylumi", data=kwargs, callmethod='POST', aggFunc=aggFileParentsByLumi)
  File "/data/users/vk/dbs/DBS/DBSClient/src/python/dbs/apis/dbsClient.py", line 404, in __callServer
    self.__parseForException(http_error)
  File "/data/users/vk/dbs/DBS/DBSClient/src/python/dbs/apis/dbsClient.py", line 442, in __parseForException
    raise http_error
  File "/data/users/vk/dbs/DBS/DBSClient/src/python/dbs/apis/dbsClient.py", line 402, in __callServer
    self.http_response = method_func(self.url, method, params, data, request_headers)
  File "/data/users/vk/dbs/DBS/DBSClient/dbs3-pycurl-3.17.4/src/python/RestClient/RestApi.py", line 40, in post
    return http_request(self._curl)
  File "/data/users/vk/dbs/DBS/DBSClient/dbs3-pycurl-3.17.4/src/python/RestClient/RequestHandling/HTTPRequest.py", line 62, in __call__
    raise HTTPError(effective_url, http_code, http_response.msg, http_response.raw_header, http_response.body)
RestClient.ErrorHandling.RestClientExceptions.HTTPError: HTTP Error 502: Proxy Error

This error contains two exceptions, one from HTTPError and another from JSON decoder parser. But it does not provide any detail of HTTP response. For average user this error is very cryptic and hard to understand.

@vkuznet
Copy link
Contributor Author

vkuznet commented Nov 11, 2021

@yuyiguo , @amaltaro , @belforte please review and provide your feedback.

@yuyiguo
Copy link
Member

yuyiguo commented Nov 11, 2021

@vkuznet
Can you see my comments in the code? I hope they showed up this time.

@klannon klannon requested review from klannon and yuyiguo and removed request for klannon November 11, 2021 15:37
@klannon
Copy link
Contributor

klannon commented Nov 11, 2021

I can't see any comments myself. I don't know if @vkuznet can. @yuyiguo, I just added you as a requested reviewer. This should give you a green button at the top of the page to "Add your review" which will let you put comments on the code that then @vkuznet can address. At least, this is how I've always handled reviews. @vkuznet if you'd like other reviewers, maybe it would be good to use the reviewers interface on GH to request reviews from them as well?

@yuyiguo
Copy link
Member

yuyiguo commented Nov 11, 2021

Thanks @klannon , I used "add your review" button and submitted my comments. I hope everyone can see it this time. :-).

@vkuznet
Copy link
Contributor Author

vkuznet commented Nov 11, 2021

@yuyiguo I don't see your comments. Here is how I review the code:

  • I click on "Files Changed" tab which brings me to code commits
  • then I use plus sign and click on a line where I want to make a comment
  • once I make a comment I click on green button
  • finally, there is Review changes button where you should choose the action, like Request changes or Comment, and then you should click on submit, otherwise I think changes will not be visible.

@vkuznet vkuznet requested review from amaltaro and belforte November 11, 2021 15:48
@vkuznet
Copy link
Contributor Author

vkuznet commented Nov 11, 2021

I added Alan and Stefano as reviewers too. So far I don't see Yuyi's comments though.

self.http_response = method_func(self.url, method, params, data, request_headers)
except HTTPError as http_error:
self.__parseForException(http_error)
if http_error.code == 200:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are json format error returned besides http error code 200. See
#43. It a more general error handling using the error body format than the error code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yuyi, the point is that __parseForException function does not check if given http_error has JSON and what it does check is presence of str(data).find("<html>")!=-1 but the HTTP response can be in different forms, it may not contain such pattern. Moreover, it can be proxy dependent, e.g. apache frontend can return one pattern while another frontend may return another. You either need to rely on HTTP codes and check for Content-Type (this is what I should add to PR) or leave response as is.

I'll change the PR to check for content-type instead of code then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuyiguo I changed code to check for content type. This is the most robust approach. The HTTP response from any server should provide content type. If it is JSON you can safely parse it, otherwise it is better to print it as is.

if http_error.code == 200:
self.__parseForException(http_error)
else:
print("### HTTPError ###")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prints are for human, not program. You may want to hide these print in debug mode.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I doubt it, the user should be able to understand the error. If error hides the details of HTTP response I think it is much better to see them in non-verbose mode.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized you like to print out everything in the code and it is OK. but why you think "raise http_error" will not provide enough error info to the clients? In the end, the errors are handled by program that will not look into what you print out, but the exceptions. If the user really want to see the print out, they should turn on debug.
In addition to this, the huge log files w/ unnecessary info will make it difficult for one to find useful content under emergency debugging . This is my two cents.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yuyi, I raise because it is already an exception. I print because debug info is too late when exception happen when user do not use debug mode. If exception happened we need to understand what happen. As such the more information user have the better chances to understand what happen. Otherwise we come back to square one that exception is cryptic and users have no clue to understand what went wrong.

@yuyiguo
Copy link
Member

yuyiguo commented Nov 11, 2021

@vkuznet Can you see now?

@vkuznet
Copy link
Contributor Author

vkuznet commented Nov 11, 2021

@yuyiguo , yes now I can see it and will address it.

@vkuznet vkuznet requested a review from yuyiguo November 11, 2021 16:10
@yuyiguo
Copy link
Member

yuyiguo commented Nov 11, 2021

A friendly reminder: please squash the commits into one.

@belforte
Copy link
Member

Can you put the meaningful information in the exception message ?
That way I will find them in the current logs. I am not sure that print to stdout would be logged by
current code, while exceptions will be logged with traceback pointing to where they happened.
Other than that I am all for this and do not dare to comment on coding, of all people here I am the one with worst python skills !

Copy link
Member

@belforte belforte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I defer

@vkuznet
Copy link
Contributor Author

vkuznet commented Nov 11, 2021

@belforte , good point, instead of adding printous which may or may not be visible (depends on app use the logger) I can easily add required info to the exception itself. I'll provide an update.

@vkuznet
Copy link
Contributor Author

vkuznet commented Nov 11, 2021

Based on your feedback I adjust code to be more robust in parsing HTTP headers and through exception with all details. Now the output of exception looks like:

HTTP=GET URL=https://cmsweb-testbed.cern.ch:8443/dbs/bla/DBSReader method=help params={} data={} headers={'Content-Type': 'application/json', 'Accept': 'application/json', 'UserID': 'vk@vkair', 'User-Agent': 'DBSClient/Unknown/'}
Traceback (most recent call last):
  File "/Users/vk/CMS/DMWM/GIT/DBSClient/src/python/dbs/apis/dbsClient.py", line 412, in __callServer
    self.http_response = method_func(self.url, method, params, data, request_headers)
  File "/Users/vk/CMS/DMWM/GIT/DBSClient/dbs3-pycurl-3.17.4/src/python/RestClient/RestApi.py", line 34, in get
    return http_request(self._curl)
  File "/Users/vk/CMS/DMWM/GIT/DBSClient/dbs3-pycurl-3.17.4/src/python/RestClient/RequestHandling/HTTPRequest.py", line 62, in __call__
    raise HTTPError(effective_url, http_code, http_response.msg, http_response.raw_header, http_response.body)
RestClient.ErrorHandling.RestClientExceptions.HTTPError: HTTP Error 400: Bad Request

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/vk/CMS/DMWM/GIT/DBSClient/./t.py", line 6, in <module>
    res = api.help()
  File "/Users/vk/CMS/DMWM/GIT/DBSClient/src/python/dbs/apis/dbsClient.py", line 516, in help
    return self.__callServer("help", params=kwargs)
  File "/Users/vk/CMS/DMWM/GIT/DBSClient/src/python/dbs/apis/dbsClient.py", line 423, in __callServer
    raise Exception(msg)
Exception:
URL=https://cmsweb-testbed.cern.ch:8443/dbs/bla/DBSReader/help
Code=400
Message=Bad Request
Header=HTTP/1.1 400 Bad Request
Date: Thu, 11 Nov 2021 17:54:57 GMT
Server: Apache
Content-Length: 226
Connection: close
Content-Type: text/html; charset=iso-8859-1


Body=<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>400 Bad Request</title>
</head><body>
<h1>Bad Request</h1>
<p>Your browser sent a request that this server could not understand.<br />
</p>
</body></html>

So far I decided to not through HTTPError since this class is defined in dbs3-pycurl library and it still hides all relevant details about HTTP error, like it does not provide actual body of HTTP response. If you want a more clean design, then the HTTP error code in dbs3-pycurl should be modified to properly give all details of HTTP response (like body, headers, etc). Please let me know if this would better option and I can re-arrange code to dbs3-pycurl.

@vkuznet vkuznet marked this pull request as draft November 11, 2021 18:00
@vkuznet
Copy link
Contributor Author

vkuznet commented Nov 11, 2021

@yuyiguo I converted it to working in progress draft as the consensus on how exception should be thrown should be clearly defined. I will squash changes once final version will be confirmed by you and others.

http_error.msg,
http_error.header,
http_error.body)
raise Exception(msg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you call self.__parseForException instead of raising general Exception ? You may create a json string similar to line 417/msg to pass it to self._parseForException.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I can please review again. Now the output looks almost identical but I construct custom HTTPError and pass it to self.__parseForException. Here is relevant part of new output:

RestClient.ErrorHandling.RestClientExceptions.HTTPError: HTTP Error 400:
URL=https://cmsweb-testbed.cern.ch:8443/dbs/bla/DBSReader/help
Code=400
Message=Bad Request
Header=HTTP/1.1 400 Bad Request
Date: Thu, 11 Nov 2021 18:22:16 GMT
Server: Apache
Content-Length: 226
Connection: close
Content-Type: text/html; charset=iso-8859-1


Body=<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>400 Bad Request</title>
</head><body>
<h1>Bad Request</h1>
<p>Your browser sent a request that this server could not understand.<br />
</p>
</body></html>

@vkuznet vkuznet requested a review from yuyiguo November 11, 2021 18:24
Copy link
Member

@yuyiguo yuyiguo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vkuznet Looked good to me. Feel free to squash changes and commit.

@vkuznet
Copy link
Contributor Author

vkuznet commented Nov 11, 2021

ok, now it is squashed.

@vkuznet vkuznet requested a review from yuyiguo November 11, 2021 20:35
Copy link
Member

@yuyiguo yuyiguo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looked good to me .

Copy link
Member

@yuyiguo yuyiguo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please merge!

@vkuznet vkuznet marked this pull request as ready for review November 12, 2021 15:13
@vkuznet
Copy link
Contributor Author

vkuznet commented Nov 12, 2021

@yuyiguo I want to get Alan's confirmation too.

@amaltaro please review and give your ok

@yuyiguo
Copy link
Member

yuyiguo commented Nov 12, 2021

@yuyiguo I want to get Alan's confirmation too.

Sure, @vkuznet

Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Valentin and Yuyi. I left a couple of comments along the code for your consideration.
Valentin, can you please update the initial description of this PR with what is actually provided here? I believe you have changed it from the initial proposal, right?

@vkuznet
Copy link
Contributor Author

vkuznet commented Nov 16, 2021

@amaltaro please review it before you're going to vacation such that I can merge and proceed with it.

Copy link
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the only way to parse the headers is with plain string, then it looks good to me. I would expect headers to be of a dict or list of tuples type though.

@vkuznet
Copy link
Contributor Author

vkuznet commented Nov 16, 2021

@amaltaro , the headers in this case comes from pycurl library which provide them as they come on a wire (i.e. HTTP response). The headers never comes as dict, list types as those are language specific data-types.

@vkuznet vkuznet merged commit 946c886 into dmwm:main Nov 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants

Comments