January 12, 2020

Using Grafana to visualise syslog files with Loki

I’ve recently started to look at this as an alternative to Greylog, and while the project is still in its early stages as I already run Grafana an prometheus to capture Netdata information of my home servers this seemed like something less to run.

Using Grafana to visualise syslog files with Loki

The Mightywomble

The MightywombleSep 21, 2019 · 6 min read

I’ve recently started to look at this as an alternative to Greylog, and while the project is still in its early stages as I already run Grafana an prometheus to capture Netdata information of my home servers this seemed like something less to run.

I Run the system using Docker on the central RSyslog server which all the other servers point thier syslogs to. I’m running Ubuntu 18.04

A useful point of note: If you chose to install docker during the OS install, it gets installed using Snap, and I ha permissions issues with this, so used snap remove to get rid of it an deploy docker-ce from the docker repos.

The text below is lifted straight from the Loki Github repo, i’ve pulle out what helped me as a reference here.

https://github.com/grafana/loki

Run locally using Docker

The Docker images for Loki and Promtail are available on DockerHub.
To test locally, we recommend using the docker-compose.yaml file in this directory.

Docker starts containers for promtail, Loki, and Grafana.

Either git clone this repository locally and cd loki/production, or download a copy of the docker-compose.yaml locally.

Ensure you have the most up-to-date Docker container images:
docker-compose pull

Run the stack on your local Docker:
docker-compose up

Grafana should now be available at http://localhost:3000/. Log in with admin / admin

Note: When running locally, promtail starts before loki is ready. This can lead to the error message “Data source connected, but no labels received.” After a couple seconds, Promtail will forward all newly created log messages correctly. Until this is fixed we recommend building and running from source.
For instructions on how to query Loki, see our usage docs.

Error: Error connecting to datasource: Data source connected, but no labels received. Verify that Loki and Promtail is configured properly.

Edit the docker compose file so promtail is last sorted this out for me

version: “3”

networks:
loki:

services:
loki:
image: grafana/loki:latest
ports:
— “3100:3100”
command: -config.file=/etc/loki/local-config.yaml
networks:
— loki

grafana:
image: grafana/grafana:master
ports:
— “3000:3000”
networks:
— loki

promtail:
image: grafana/promtail:latest
volumes:
— /var/log:/var/log
command: -config.file=/etc/promtail/docker-config.yaml
networks:
— loki

Grafana

Grafana ships with built-in support for Loki for versions greater than 6.0, however using 6.3 or later is highly recommended.

1. Log into your Grafana, e.g, http://localhost:3000 (default username: admin, default password: admin)
2. Go to Configuration > Data Sources via the cog icon on the left side bar.
3. Click the big + Add data source button.
4. Choose Loki from the list.
5. The http URL field should be the address of your Loki server e.g. http://loki:3100
6. To see the logs, click Explore on the sidebar, select the Loki datasource, and then choose a log stream using the Log labels button.

Querying

To get the previously ingested logs back from Loki for analysis, you need a client that supports LogQL. Grafana will be the first choice for most users, nevertheless, LogCLI represents a viable standalone alternative.

LogQL

Loki has it’s very own language for querying logs from the Loki server called LogQL. Think of it as distributed grep with labels for selection.
A log query consists of two parts: log stream selector, and a filter expression. For performance reasons you need to start by choosing a set of log streams using a Prometheus-style log stream selector.

The log stream selector will reduce the number of log streams to a manageable volume and then the regex search expression is used to do a distributed grep over those log streams.

Log Stream Selector
For the label part of the query expression, wrap it in curly braces {} and then use the key value syntax for selecting labels. Multiple label expressions are separated by a comma:
{app=”mysql”,name=”mysql-backup”}
The following label matching operators are currently supported:
• = exactly equal.
• != not equal.
• =~ regex-match.
• !~ do not regex-match.

Examples:
• {name=~”mysql.+”}
• {name!~”mysql.+”}

The same rules that apply for Prometheus Label Selectors apply for Loki Log Stream Selectors.

Filter Expression
After writing the Log Stream Selector, you can filter the results further by writing a search expression. The search expression can be just text or a regex expression.

Example queries:
• {job=”mysql”} |= “error”
• {name=”kafka”} |~ “tsdb-ops.*io:2003”
• {instance=~”kafka-[23]”,name=”kafka”} != kafka.server:type=ReplicaManager

Filter operators can be chained and will sequentially filter down the expression — resulting log lines will satisfy every filter. Eg:
{job=”mysql”} |= “error” != “timeout”

The following filter types have been implemented:
• |= line contains string.
• != line does not contain string.
• |~ line matches regular expression.
• !~ line does not match regular expression.

The regex expression accepts RE2 syntax. The matching is case-sensitive by default and can be switched to case-insensitive prefixing the regex with (?i).

Query Language Extensions
The query language is still under development to support more features, e.g.,:
• AND / NOT operators
• Number extraction for timeseries based on number in log messages
• JSON accessors for filtering of JSON-structured logs
• Context (like grep -C n)

Counting logs
Loki’s LogQL support sample expression allowing to count entries per stream after the regex filtering stage.

Range Vector aggregation
The language shares the same range vector concept from Prometheus, except that the selected range of samples contains a value of one for each log entry. You can then apply an aggregation over the selected range to transform it into an instant vector.

rate calculates the number of entries per second and count_over_time count of entries for the each log stream within the range.
In this example, we count all the log lines we have recorded within the last 5min for the mysql job.
count_over_time({job=”mysql”}[5m])

A range vector aggregation can also be applied to a Filter Expression, allowing you to select only matching log entries.
rate( ( {job=”mysql”} |= “error” != “timeout)[10s] ) )

The query above will compute the per second rate of all errors except those containing timeout within the last 10 seconds.

You can then use aggregation operators over the range vector aggregation.
Aggregation operators

Like PromQL, Loki’s LogQL support a subset of built-in aggregation operators that can be used to aggregate the element of a single vector, resulting in a new vector of fewer elements with aggregated values:
• sum (calculate sum over dimensions)
• min (select minimum over dimensions)
• max (select maximum over dimensions)
• avg (calculate the average over dimensions)
• stddev (calculate population standard deviation over dimensions)
• stdvar (calculate population standard variance over dimensions)
• count (count number of elements in the vector)
• bottomk (smallest k elements by sample value)
• topk (largest k elements by sample value)

These operators can either be used to aggregate over all label dimensions or preserve distinct dimensions by including a without or by clause.
<aggr-op>([parameter,] <vector expression>) [without|by (<label list>)]
parameter is only required for topk and bottomk. without removes the listed labels from the result vector, while all other labels are preserved the output. by does the opposite and drops labels that are not listed in the by clause, even if their label values are identical between all elements of the vector.

topk and bottomk are different from other aggregators in that a subset of the input samples, including the original labels, are returned in the result vector. by and without are only used to bucket the input vector.

Examples
Get top 10 applications by highest log throughput:
topk(10,sum(rate({region=”us-east1"}[5m]) by (name))

Get the count of logs during the last 5 minutes by level:
sum(count_over_time({job=”mysql”}[5m])) by (level)

Get the rate of HTTP GET requests from nginx logs:
avg(rate(({job=”nginx”} |= “GET”)[10s])) by (region)