Linux Logging with Grafana Loki & custom Promtail labels from OpenStack or AWS EC2 metadata

Akriotis Kyriakos
11 min readMar 22, 2023

Enrich the collected logs of your systems by injecting relabelled OpenStack or AWS EC2 instances metadata in the Promtail data.

Gopher artwork by Ashley Willis @ashleymcnamara

Introduction

In a previous post, we investigated how we could easily consolidate all Kubernetes logs in an intuitive Grafana dashboard by using Grafana Loki and Promtail as an agent. In most systems though, the workloads are not limited to the ones running on a Kubernetes cluster. Various virtual machines might be required, that could generate a vast amount of logs and inevitably sooner or latter the need of a unified logs’ aggregation mechanism will arise.

Of course there are a lot of different ways that this can be attained, and thousands of articles that are focusing on using Logstash or Fluentd in combination with Elasticsearch and Kibana; and to be truly honest these approaches are quite solid and I am holding nothing against them. But in our case we started the journey with the Grafana/Loki/Promtail stack for our Kubernetes logs and it would be ideal to consolidate the rest of our non-containerized workloads logs using the same tools.

For that matter, I will assume you have already in place a Kubernetes cluster with a Grafana installation. If not, follow the instructions of this article in order to bootstrap one:

We are going to need additionally some linux servers (personal preference would be Ubuntu) hosted on AWS or on any OpenStack-based cloud (like for instace Open Telekom Cloud) or even on an on-premises OpenStack installation.

Make sure you provision your linux servers in the same private subnet that the Kubernetes worker nodes are residing. We are going to need to access the Grafana Loki Gateway service later.

As we’ve explained in the previous post, Grafana Loki is a logs aggregation system, more specifically as stated in their website: ”is a horizontally scalable, highly available, multi-tenant log aggregation system inspired by Prometheus. It is designed to be very cost effective and easy to operate. It does not index the contents of the logs, but rather a set of labels for each log stream.” It’s a fairly new open source project that aws started in 2018 at Grafana Labs.

Loki uses Promtail to aggregate logs. Promtail is a logs collector agent that collects, (re)labels and ships logs to Loki. It is built specifically for Loki — an instance of Promtail will run on each Kubernetes node. It uses the exact same service discovery as Prometheus and support similar methods for labeling, transforming, and filtering logs before their ingestion to Loki.

That means that we have to install the Promtail agent in every virtual machine we created in AWS or in OpenStack.

Install Promtail

First thing, check the current release of Promtail here and adjust the script below to target to the actual latest version. We are going to install it in every machine directly from the binaries:

sudo mkdir -p /opt/promtail

sudo wget -qO /opt/promtail/promtail-linux-amd64.gz "https://github.com/grafana/loki/releases/download/v2.7.3/promtail-linux-amd64.zip"

sudo gunzip /opt/promtail/promtail-linux-amd64.gz
sudo chmod a+x /opt/promtail/promtail-linux-amd64
sudo ln -s /opt/promtail/promtail /usr/local/bin/promtail

We are going to definitely configure our Promtail agent to run as a service, so let’s go and create a designated user for that purpose, and add it to the adm group so it can access without problems various log files:

sudo useradd --system promtail
sudo usermod -a -G adm promtail

Adjust the ownership of /opt/promtail:

sudo chown -R promtail:promtail /opt/promtail 

and make a sanity check of Promtail itself and of the created user:

promtail --version
id promtail

We’ll put a pin here and configure Promtail itself and then we will come back to configure Promtail’s daemon.

Configure Promtail

Let’s create a simple Promtail configuration file and fill it in:

sudo nano /opt/promtail/config-promtail.yaml

You need to adjust the following content and assign as clients::url the exposed endpoint of your Grafana Loki Gateway servicein Kubernetes,GRAFANA_LOKI_GATEWAY_ENDPOINT.

If you are working with an on-premises Kubernetes cluster you can either expose the service via a LoadBalancer; if you don’t have any check this article how to provision and configure one:

Alternatively, if you are working on a public cloud make sure you create an internal LoadBalancer that lives in the same private subnet with the rest of the machines and then expose this service via this LoadBalancer. Be sure to choose an internal one, otherwise you will send all this big amount of traffic over the public internet which wouldn’t be encouraged for security and cost efficiency reasons.

Of course it would be more efficient and robust to expose those services via an Ingress but this is not in the scope of this post.

What do we eventually want to configure here, are two things: A job that takes care of syslog messages and a second job relared to the journald messages.

You will definitely ask why both. Valid question; first and foremost for the sake of demonstrating the various scraping configuration capabilities of Promtail and secondly because the dust of the internal war between syslog & journald hasn’t yet settled down.

syslog origins go back to the 80’s, when it became the de-facto standard for log management, and until today a lot of distros are still using it as their default solution. Its Linux development stopped somewhen in 2007 but a bunch of forks like rsyslog, syslog-ng and nxlog are carrying on the legacy.

journald was introduced in 2011 and created quite heated discussions and controversies with syslog traditionalists. It is essentially the part of systemd that deals with logging, supporting a broad range of features like indexing, structured logging, access control, automatic log rotation. Contrary to the text format of syslog, journald holds the information in binary format. Most of the distributions nowadays include journald as well and it coexists with syslog without any problem. You can find a very interesting paper from Rainer Gerhards and Andre Lorbach as follow-up to the talk of former in LinuxTag 2013 in Berlin, “rsyslog vs journal”.

server:
http_listen_port: 3100
grpc_listen_port: 0

clients:
- url: http://{{GRAFANA_LOKI_GATEWAY_ENDPOINT}}/loki/api/v1/push

positions:
filename: /opt/promtail/positions.yaml

scrape_configs:
- job_name: ecs/syslog
syslog:
listen_address: 127.0.0.1:40514
idle_timeout: 12h
use_incoming_timestamp: true
labels:
job: ecs/syslog
relabel_configs:
- source_labels: ["__syslog_message_hostname"]
target_label: host_name
- source_labels: ["__syslog_message_severity"]
target_label: level
- source_labels: ["__syslog_message_facility"]
target_label: syslog_facility
- source_labels: ["__syslog_message_app_name"]
target_label: syslog_identifier

Promtail discovers locations of log files and extract labels from them through the scrape_configs section in the configuration file. The syntax of it, is identical to what Prometheus uses for the same purpose. Promtail can receive IETF Syslog (RFC5424) messages from either a TCP or UDP stream. Receiving syslog messages is defined by thesyslog stanza. Thelisten_address field is required, and demands a valid network address. TCP is the default protocol for receiving messages. Let’s configure now this stream’s endpoint for rsyslog:

sudo nano /etc/rsyslog.d/99-promtail-relay.conf
# https://www.rsyslog.com/doc/v8-stable/concepts/multi_ruleset.html#split-local-and-remote-logging
ruleset(name="remote"){}

# https://www.rsyslog.com/doc/v8-stable/configuration/modules/imudp.html
module(load="imudp")
input(type="imudp" port="50514")

# https://www.rsyslog.com/doc/v8-stable/configuration/modules/imtcp.html
module(load="imtcp")
input(type="imtcp" port="50514")

# forward everything
*.* action(type="omfwd" protocol="tcp" target="127.0.0.1" port="40514" Template="RSYSLOG_SyslogProtocol23Format" TCP_Framing="octet-counted" KeepAlive="on" action.resumeRetryCount="-1" queue.type="linkedlist" queue.size="50000")

Reload the configuration and restart the rsyslog daemon:

sudo systemctl daemon-reload
sudo systemctl restart rsyslog

When Promtail receives syslog messages, it brings along all available header fields, that were parsed from the received message, prefixed with __syslog_ as internal label. In the configuration above, __syslog_message_hostname field from the will transformed(relabelled) into a target label called host through relabel_configs — and serve as an index later in Loki and Grafana.

Let’s test it:

sudo /opt/promtail/promtail-linux-amd64 --config.file=/opt/promtail/promtail-config.yaml --dry-run

Now let’s add a second job in scrape_configs, one that will point to journald. For that purpose, append the following snippet in the /opt/promtail/config-promtail.yaml:

- job_name: ecs/journal
journal:
json: false
max_age: 12h
path: /var/log/journal
labels:
job: ecs/systemd-journal
relabel_configs:
- action: drop
source_labels: ["__journal__transport"]
regex: "kernel"
- source_labels: ["__journal__systemd_unit"]
target_label: "unit"
- source_labels: ["__journal__hostname"]
target_label: "host_name"
- source_labels: ["__journal__transport"]
target_label: "transport"
- source_labels: ["__journal__cmdline"]
target_label: "_cmdline"
- source_labels: ["__journal_priority"]
target_label: "_priority"
- source_labels: ["__journal_priority_keyword"]
target_label: "priority"
- source_labels: ["__journal_syslog_identifier"]
target_label: "syslog_identifier"
- source_labels: ["__journal_syslog_message_severity"]
target_label: "level"
- source_labels: ["__journal_syslog_message_facility"]
target_label: "syslog_facility"

It is almost identical, however in this case we have a journal stanza and a path parameter that points to the location of the journal file in the filesystem,/var/log/journal, instead of a listening_address and a syslog stanza. The incoming available header fields of journal, in this case prefixed with __journal_, will be relabelled accordingly so they don’t create overlapping indices later in Grafana Loki.

Let’s try it again, and go have a look in our Grafana Dashboard:

sudo /opt/promtail/promtail-linux-amd64 --config.file=/opt/promtail/promtail-config.yaml --dry-run

Configure Promtail with environment variables

So far so good, we can collect in almost real-time the log files from various servers and explore them via Grafana. Nevertheless, there’re bits of information that are missing and they would be invaluable for future incidents investigations or audits. For example, although we do know the host name of our machines, we have no information about its IP addresses or the availability zone that they’ve been provisioned.

Both AWS and OpenStack provide a metadata service for their virtual machine instances in order to retrieve instance-specific data.

http://169.254.169.254/latest/meta-data/

OpenStack specifically provides two versions of this API, an OpenStack metadata API and an EC2-compatible API. Both APIs are versioned by date. The later uses the same endpoint as AWS and the former listen to the address:

http://169.254.169.254/openstack/latest/meta-data/

In this article, we will use the EC2-compatible endpoint in order to cover simultaneously the AWS and OpenStack instance metadata retrieval cases.

In order to retrieve those instance metadata, I wrote a small program in Go that simply calls those endpoints and retrieves the following information: hostname, private IPV4 address, public IPV4 address (if any), instance/flavour type and availability zone. The source code and the release binaries can be found in this repository. Download and install it in your linux servers. (there is a deb as well but still not fully tested — so I recommend you just install it directly from the tar.gz to /usr/local/bin)

You can either execute without parameters and it will produce a JSON output:

 sudo ./usr/local/bin/ec2-metadata

or you can export those metadata in a file as env variables

sudo ./usr/local/bin/ec2-metadata > /opt/promtail/instance-metadata

We are going to use the second option here, because we want to find a way to feed our Promtail agent and decorate the collected log traces with instance-specific metadata. We can infer to environment variable in the Promtail configuration file in order to set values that need to be configurable during deployment. To do this, we need to pass -config.expand-env=true as a startup flag to our Promtail agent and use in the configuration our environment variables as $VAR.

sudo ./opt/promtail/promtail-linux-amd64 --config.file=/opt/promtail/config-promtail -config.expand-env=true

That requires changes in our Promtail configuration file.

We need first to add a segment that reads the values of these environment variables as hidden metadata (in the journal/label stanza)

journal:
json: false
max_age: 12h
path: /var/log/journal
labels:
job: ecs/systemd-journal
__meta_ecs_instance_type: ${META_EC2_INSTANCE_TYPE}
__meta_ecs_availability_zone: ${META_EC2_AVAILABILITY_ZONE}
__meta_ecs_public_ip: ${META_EC2_PUBLIC_IP}
__meta_ecs_private_ip: ${META_EC2_PRIVATE_IP}

Pay attention that we created labels with a double trailing underscore (__meta_ecs_) in order to ensure that they will be hidden.

and a second addition to the relabel_configs stanza where will instruct Promtail to transform those hidden metadata to actual labels that could be consequently become indexable and filterable entities in Loki:

 - source_labels: ["__meta_ecs_availability_zone"]
target_label: "availability_zone"
- source_labels: ["__meta_ecs_instance_type"]
target_label: "instance_type"
- source_labels: ["__meta_ecs_public_ip"]
target_label: "public_ip"
- source_labels: ["__meta_ecs_private_ip"]
target_label: "private_ip"

Let’s try it now again:

sudo ./opt/promtail/promtail-linux-amd64 --config.file=/opt/promtail/config-promtail -config.expand-env=true

You should now, if all went right, be able to see the new labels in Grafana dashboard when you choose the ecs/systemd-journal job:

Assuming your box is living in a fairly volatile and dynamic environment — and it does, the values of some of those metadata might change over time (e.g. the public IP address). We need to find a way to reload both the metadata values themselves and the configuration file of Promtail that points to those variables. We can achieve that in a two steps trick, by creating two services that run periodically, one for ec2-metadata and one for Promtail.

For ec2-metadata let’s create the following Unit file:

sudo nano /etc/systemd/system/ec2-metadata.service

and add the following content to it:

[Unit]
Description=EC2 Metadata Extractor
After=network.target
AssertPathExists=/usr/local/bin/ec2-metadata

[Service]
Type=notify-reload
User=root
WorkingDirectory=/usr/local/bin
ExecStart=/usr/local/bin/ec2-metadata --path /opt/promtail/ec2-metadata
SuccessExitStatus=143
Restart=always
RestartSec=180
ReloadSignal=1
TimeoutStopSec=10
WatchdogSec=180

[Install]
WantedBy=multi-user.target

For Promtail now, we will create another Unit file:

sudo nano /etc/systemd/system/promtail.service
[Unit]
Description=Promtail
After=network.target

[Service]
Type=notify-reload
User=promtail
Group=adm
WorkingDirectory=/opt/promtail
ExecStart=/opt/promtail/promtail-linux-amd64 --config.file=/opt/promtail/config-promtail.yaml -config.expand-env=true
EnvironmentFile=/opt/promtail/bin/instance-metadata
SuccessExitStatus=143
Restart=always
RestartSec=5
ReloadSignal=1
TimeoutStopSec=10
WatchdogSec=180

[Install]
WantedBy=multi-user.target

There are plenty of small details in these Unit file. We want that they runs periodically so we define Type as anotify-reload one and set the WatchdogSec to 180 seconds. That will reboot our service every 3mins.

We instruct Promtail agent in ExecStart to use the configuration file we created and use the environment variables in place, that are defined by an additional flag that is called EnvironmentFile and points to the /opt/promtail/instance-metadata that the ec2-metadata daemon — we created above — will periodically update with the actual instance metadata. Reload the configuration and sit back and watch the services recycling themselves in constant intervals — there is no performance hit to your system.

sudo systemctl daemon-reload
sudo systemctl enable ec2-metadata.service
sudo systemctl enable promtail.service
sudo systemctl restart ec2-metadata.service
sudo systemctl restart promtail.service

IMPORTANT NOTE: You can exploit any kind of data or metadata from your systems or your custom applications and using the very same rationale to inject them as labels and decorate your logs before sending to Loki. Using the EC2-Compatible API endpoint to gather instance specific metadata was just for demonstrating purposes — nevertheless it is very useful

In theory, Promtail in EC2 instances provide out of the box many labels prefixed as __meta_ec2_ ready to be consumed and scraped, but to be very honest I didn’t manage to get it working out the box and there’re bunch of people raising similar issues and concerns.

Next steps

That were a lot of manual steps and there is definitely a lot of space for improvement. You can easily pack all those steps in a cloud-init file and let every virtual machine you create from now get configured automatically on their first boot — and receive almost spontaneous all their logs in your dashboard from the very first second they are up and running. That’s fairly easy and we will explore it in a future post.

Additionally a nice Grafana dashboard would add extreme value to the whole solution. You can find plenty of them here and tailor the one you like to your needs.

I hope you found the Grafana Loki & Promtail posts series useful, feel free to leave comments and don’t forget to hit the follow button if you liked the content.

--

--

Akriotis Kyriakos

talking about: kubernetes, golang, open telekom cloud, aws, openstack, sustainability, software carbon emissions