What is canonical log lines?
Structured logging augments the practice by giving developers an easy way to annotate lines with additional data. The use of the word structured is ambiguous—it can refer to a natively structured data format like JSON, but it often means that log lines are enhanced by appending
The added structure also makes the emitted logs machine readable (the key=value convention is designed to be a compromise between machine and human readability), which makes them ingestible for a number of different log processing tools, many of which provide the ability to query production logs in near real-time.
The information on whether a request was rate limited and which user performed it is spread across multiple log lines, which makes it harder to query. Most log processing systems can still do so by collating a trace’s data on something like a request ID and querying the result, but that involves scanning a lot of data, and it’s slower to run. It also requires more complex syntax that’s harder for a human to remember, and is more time consuming for them to write.
We use canonical log lines to help address this. They’re a simple idea: in addition to their normal log traces, requests (or some other unit of work that’s executing) also emit one long log line at the end that pulls all its key telemetry into one place. They look something like this:
[2019-03-18 22:48:32.999] canonical-log-line alloc_count=9123 auth_type=api_key database_queries=34 duration=0.009 http_method=POST http_path=/v1/charges http_status=200 key_id=mk_123 permissions_used=account_write rate_allowed=true rate_quota=100 rate_remaining=99 request_id=req_123 team=acquiring user_id=usr_123