gunicorn-django-canonical-logs provides extensible canonical log lines for Gunicorn/Django applications.
This is alpha software. It has not (yet!) been battle-tested and does several risky things worth highlighting:
- Overrides Django settings to include custom middleware to gather request/response context
- Modifies Django template rendering and database query execution to gather template exception/database query context
- Runs a separate timeout thread for every request to gather timeout context
- Leverages shared memory between the Gunicorn arbiter and workers to gather saturation context
- There's currently no cleanup and processes that receive
SIGKILLwill leak memory
- There's currently no cleanup and processes that receive
pip install gunicorn-django-canonical-logsAdd the following to your Gunicorn configuration file:
from gunicorn_django_canonical_logs.glogging import Logger
from gunicorn_django_canonical_logs.gunicorn_hooks import * # register Gunicorn hooks and instrumenters
accesslog = "-"
logger_class = LoggerImportant
Only sync Gunicorn worker types are supported
Tip
Set GUNICORN_PRESERVE_EXISTING_LOGGER=1 in the environment to preserve existing gunicorn access logs in addition to canonical logs
This library also includes an @on_error(return_value=...) decorator that will emit a partial_failure event correlated to the request
log via req_id with exception context. See tests/server/app.py:partial_failure for an example.
The motivation is that some failure is expected and we'd prefer to return a degraded experience in some cases. In the event that's necessary, correlated logs provide the opportunity to monitor the frequency and type of errors observed.
The goal is to enhance obersvability by providing reasonable defaults and extensibility to answer two questions:
- If a request was processed, what did it do?
- If a request timed out, what had it done and what was it doing?
A request will generate exactly one of these two event_types:
request- the worker process was able to successfully process the request and return a responsetimeout- the worker process timed out before returning a response- timeout events include a
timeout_loc/timeout_cause_loc
- timeout events include a
Examples can be generated from the app used for integration testing:
cd tests/serverDJANGO_SETTINGS_MODULE=settings python app.py migrateDJANGO_SETTINGS_MODULE=settings gunicorn -c gunicorn_config.py app
And then, from another shell:
curl http://localhost:8080/db_queries/curl http://localhost:8080/rude_sleep/?duration=10
200 response from Django with DB queries
event_type="request"
req_method="GET"
req_path="/db_queries/"
req_referrer="localhost:8080"
req_user_agent="curl/7.88.1"
resp_view="app.db_queries"
resp_time="0.016"
resp_cpu_time="0.006"
resp_status="200"
db_queries="3"
db_time="0.007"
db_dup_queries="2"
db_dup_time="0.003"
g_w_count="1"
g_w_active="0"
g_backlog="0"
404 response from Django
event_type="request"
req_method="GET"
req_path="/does-no-exist/"
req_referrer="-"
req_user_agent="curl/7.88.1"
resp_view="-"
resp_time="0.000"
resp_cpu_time="0.000"
resp_status="404"
g_w_count="1"
g_w_active="0"
g_backlog="0"
500 response from Django
event_type="request"
req_method="GET"
req_path="/view_exception/"
req_referrer="-"
req_user_agent="curl/7.88.1"
exc_type="MyError"
exc_msg="Oh noes!"
exc_loc="app.py:38:view_exception"
exc_cause_loc="app.py:30:func_that_throws"
resp_view="app.view_exception"
resp_time="0.005"
resp_cpu_time="0.003"
resp_status="500"
g_w_count="1"
g_w_active="0"
g_backlog="0"
app_key="val"
200 response from Whitenoise (static assets)
event_type="request"
req_method="GET"
req_path="/static/foo.txt"
req_referrer="localhost:8080"
req_user_agent="curl/7.88.1"
resp_status="200"
g_w_count="1"
g_w_active="0"
g_backlog="0"
timeout
event_type="timeout"
req_method="GET"
req_path="/rude_sleep/"
req_referrer="localhost:8080"
req_user_agent="curl/7.88.1"
resp_time="0.8"
timeout_loc="gunicorn_django_canonical_logs/instrumenters/request.py:25:context_middleware"
timeout_cause_loc="app.py:103:simulate_blocking_and_ignoring_signals"
g_w_count="1"
g_w_active="0"
g_backlog="0"
partial failure
event_type="partial_failure"
req_id="944e3dc1-14df-4dcd-aafe-00deea240c8b"
exc_type="MyError"
exc_msg="Oh noes!"
exc_loc="app.py:63:will_throw"
exc_cause_loc="app.py:31:func_that_throws"
event_type="request"
req_id="944e3dc1-14df-4dcd-aafe-00deea240c8b"
req_method="GET" req_path="/partial_failure/"
req_referrer="-"
req_user_agent="curl/7.88.1"
resp_view="app.partial_failure"
resp_time="0.006"
resp_cpu_time="0.005"
resp_status="200"
g_w_count="1"
g_w_active="0"
g_backlog="0"
req_method(string) - HTTP method (e.g.GET/POST)req_path(string) - URL pathreq_referer(string) -ReferrerHTTP headerreq_user_agent(string) -User-AgentHTTP headerresp_time(float) - wall time spent processing the request (in seconds)resp_view(string) - Django view that generated the responseresp_cpu_time(float) - CPU time (i.e. ignoring sleep/wait) spent processing the request (in seconds)resp_status(int) - HTTP status code of the response
exc_type(string) -typeof the exceptionexc_message(string) - exception messageexc_loc(string) -{module}:{line_number}:{name}of the top of the stack (i.e. the last place the exception could've been handled)exc_cause_loc(string) -{module}:{line_number}:{name}of the frame that threw the exceptionexc_template(string) -{template_name}:{line_number}(if raised during template rendering)
NB There's some subtlety in how
loc/cause_locwork; they attempt to provide application-relevant info by ignoring frames in library code if application frames are available.
db_queries(int) - total number of queries executeddb_time(float) - total time spent executing queries (in seconds)db_dup_queries(int) - total number of non-unique queries; could indicate N+1 issuesdb_dup_time(float) - total time spent executing non-unique queries (in seconds); could indicate N+1 issues
g_w_count(int) - total number of Gunicorn workersg_w_active(int) - number of active Gunicorn workersg_w_backlog(int) - number of queued requests
NB These values are sampled about once a second, and represent a snapshot. To derive useful data, average the values over time.
The saturation monitor samples and aggregates Gunicorn data; it provides data on the current number of active/idle workers as well as the number of queued requests that have not been assigned to a worker.
The timeout monitor wakes up slightly before the Gunicorn timeout in order to emit stack frame and instrumenter data before Gunicorn recycles the worker.
from anywhere in your application, use
from gunicorn_django_canonical_logs import Context
Context.set("custom", "my_value")This would add app_custom="my_value" to the log for the current request; context is cleared between requests.
from anywhere in your application, use
from gunicorn_django_canonical_logs import Context
with Context.time("custom"):
do_thing_that_takes_time()This would add app_custom_time="{wall time in seconds}" to the log for the current request based on the execution
time of do_thing_that_takes_time(); multiple timings using the same key are summed.
from gunicorn_django_canonical_logs import Context, register_instrumenter
@register_instrumenter
class MyInstrumenter:
def setup(self):
pass # called once after forking a Gunicorn worker
def call(self, request, response, environ):
pass # called every time an event is emittedImportant
The application must import the instrumenter for it to register itself.
gunicorn-django-canonical-logs is distributed under the terms of the MIT license.