Kubernetes Observability Part 2: Events, Logs & integration with Slack, OpenAI and Grafana
Build a Kubernetes custom Controller to watch Kubernetes Events and forward them to a Slack channel as alerts.
TL;DR: Forward Kubernetes Events and their Logs from a Kubernetes custom controller to a Slack channel.
Introduction
In this articles series we will explore an one-stop-shop Observability solution for Kubernetes, where a custom Controller collects all the Kubernetes Events occuring in a cluster, forwards them to Grafana Loki via Promtail, decorates the one that indicate a problem with the Pods’ Logs that were recorded in the timeframe that each Event took place and sends a an interactive message to a Slack channel/bot that on demand could ask OpenAI Chat API to help us solving the error, based on the original Event message.
The series will consist out of three parts:
- Part 1: Create a custom Controller, forward Events to Grafana Loki, present Events & Logs on a Grafana dashboard
- Part 2: Build the integration with Slack and forward alerts.
- Part 3: Build the integration with OpenAI Chat API and get solution proposal.
This is the second part of the series.
What is Slack ?
Slack is a cloud-based collaboration platform that allows teams to communicate and work together more effectively. It provides a centralized place for team members to send messages, share files, and collaborate on projects in real-time. Slack supports text-based messaging, voice and video calls, and integration with other tools and services.
Slack organizes communication into channels, which can be public or private. Public channels are visible to all members of a team, while private channels are restricted to selected team members. In addition to channels, Slack also supports direct messages between individual team members.
Slack integrates with a wide range of tools and services and using its SDK we can integrate easily a Slack Bot with our custom controller in order to send real-time alerts to a designated channel when it processes an Event of type Warning.
How to configure a Slack Bot
First we must create a new Workspace; open your Slack app and choose Create a new workspace:
Follow the wizard (web & app) and create a new workspace and a new channel:
Next, we have to create a new Slack App. Open https://api.slack.com/apps and click Your Apps and then Create New App:
Choose From scratch:
Fill in the name of the app and connect it with the workspace we just created:
Go to App Home and choose Review Scopes to Add:
Under Scopes add chat:write
to the permitted Bot Token Scopes:
Install the app to your workspace:
and copy the generated token:
and as last step invite the app to the channel we created above:
How to send a message to a Slack channel
The easiest way to make REST calls to api.slack.com is with the original Slack library for Go that you can find in the repo below:
Add github.com/slack-go/slack
to your imports and those 3 variables to your event_controller.go
:
var (
slackBotToken = "xoxb-XXXXXXXXXXXXXXXXXXXXXXXX"
channelID = "CXXXXXXXXXX"
slackClient *slack.Client
)
where token
is the auth token you got when you installed your app in your Slack workspace and channelID
the ID of the channel you created in your workspace. You can find this value if you choose to view the channel details in Slack:
All these configuration values should not be passed as static variables and magic-strings in your controller, it would be better to use a sort of combination of
ConfigMap
andSecret
resources, which we will see later how to pull this off; but for the time being let’s stick to posting an alert to the Slack channel and don’t worry about that.
Instantiation a Slack client couldn’t be easier. Add to your controller the following function:
func initSlackClient() {
slackClient = slack.New(slackBotToken, slack.OptionDebug(false))
}
Next step would be to amend the reconciliation loop. Every time an Event
that is not Normal is being processed we want to send an alert to our Slack channel. Go the region of the reconciliation where we check the Event.Type
value and make the following changes:
if event.Type != "Normal" {
level = promtail.Warn
if slackClient == nil {
initSlackClient()
}
err := forwardEvent(event.Note)
if err != nil {
logger.Error(err, "failed to forward to slack channel")
slackClient = nil
}
}
Initialise the slackClient
, if that’s not already been done, and forward the event to the designated Slack channel by calling a new function called forwardEvent
, which we will analyse straight ahead.
func forwardEvent(note string) error {
simpleText := fmt.Sprintf("📣 Event: *%s*", note)
simpleTextTextBlock := slack.NewTextBlockObject("mrkdwn", simpleText, false, false)
simpleSection := slack.NewSectionBlock(simpleTextTextBlock, nil, nil)
msgOptionBlocks := slack.MsgOptionBlocks(
simpleSection,
)
_, _, err := slackClient.PostMessage(
channelID,
msgOptionBlocks,
slack.MsgOptionAsUser(true),
)
if err != nil {
return err
}
return nil
}
Slack client for Go is not really meticulously documented (I would say IMHO actually the contrary — guys you can do better). In the greater picture, in order to oversimplify things, it works like this: each message consists of discrete blocks, which can be different kind of block depending on what kind of content will depict and how will render it. These blocks are packed in a MsgOptionBlock
slice and then passed to the PostMessage
method of the Slack client along with the channel ID that we want to post the message to.
I suggest you start from the github repo of the client in order to get a glimpse of how you could work with different kind of message configurations https://github.com/slack-go/slack/tree/master/examples. Again, don’t expect much the code comments are nearly zero. Another good resource is the so called Block Kit Builder, where you can find or build interactively various combinations of blocks. Unfortunately, its output is in JSON, that is not good for us and we have to adapt it for Go. Thanks but not thanks. I had zero fun working with Slack client for Go!
Run your controller and see what happens. If everything went well you should start seeing your channel flooding with Event alerts:
Be aware that this is a very rough implementation and considering the amount of Events happening in your cluster you might be occasionally cut off from the Slack API rate limiter. But that’s merely a prototype, we can worry about those staff in the future:
That was a significant milestone. So far we can watch and process our Events and forward them to Slack whenever they state that something didn’t go as expected in our cluster. But our message structure and information is not even remotely close to be useful and meaningful. So let’s enhance it a bit. It would be very useful to know which pod recorded the Event, in which namespace is it living, when was the first and last time it occurred, and what is the reason this Event happened in the first place. That would require some changes in our forwardEvent
function:
func forwardEvent(
level string,
note string,
commonLabels map[string]string,
extraLabels map[string]string,
firstSeen time.Time,
lastSeen time.Time,
) error {
headerText := fmt.Sprintf("🔔Cluster: *%s*, Type: *%s*, Reason: *%s*, Kind: *%s* \n\n 🚦*Alert:* %s", commonLabels["cluster_name"], level, extraLabels["reason"], extraLabels["kind"], note)
headerTextBlock := slack.NewTextBlockObject("mrkdwn", headerText, false, false)
headerSection := slack.NewSectionBlock(headerTextBlock, nil, nil)
podText := fmt.Sprintf("• *namespace:* %s\n• *pod:* %s", extraLabels["namespace"], extraLabels["pod"])
podTextBlock := slack.NewTextBlockObject("mrkdwn", podText, false, false)
podSectionBlock := slack.NewSectionBlock(podTextBlock, nil, nil)
timeStampText := fmt.Sprintf("🔛 *First seen:* %s\n 🔚 *Last Seen:* %s", firstSeen, lastSeen)
timeStampTextBlock := slack.NewTextBlockObject("mrkdwn", timeStampText, false, false)
timeStampSectionBlock := slack.NewSectionBlock(timeStampTextBlock, nil, nil)
msgOptionBlocks := slack.MsgOptionBlocks(
headerSection,
podSectionBlock,
timeStampSectionBlock,
)
_, _, err := slackClient.PostMessage(
channelID,
msgOptionBlocks,
slack.MsgOptionAsUser(true))
if err != nil {
return err
}
return nil
}
and in adjust the callee of the forwardEvent
function in the Reconcile
method in order to conform to the new forwardEvent
signature:
err := forwardEvent(
event.Type,
event.Note,
r.CommonLabels,
extraLabels,
event.DeprecatedFirstTimestamp.Time,
event.DeprecatedLastTimestamp.Time)
if err != nil {
logger.Error(err, "failed to forward to slack channel")
slackClient = nil
}
Run your controller again and see what we are getting this time:
That is much better and way cleaner, and it could really point an admin to a meaningful direction. But still, something is missing — the person that will read that message has no clue why this Event took place and he has to jump into a console or to the container itself in order to have a look at pod logs for the specific timeframe that the Event happened. It would be very helpful if we could add to the payload of this message those logs, so we can check from one place all the necessary details related to that Event before we take any action. Let’s see how we can pull this off.
How to send an attachment to a Slack channel
The easiest way to send an attachment to a Slack channel is by uploading a file. This is something quite easy, you can have a look at the examples of Slack client — as said before, don’t expect much of an explanation or code comments in these examples. So our task has two legs, the one is collecting the pod logs for this timeframe and the second is sending those logs to the channel. Let’s begin from the latter. We are assuming that our logs will be a quite long string
that we use as the contents of that attachment. We need to extend the implementation of the forwardEvent
function. We add a logs
parameter in the signature and after we are done sending the message itself we additionally upload a file that will contain logs
value as Content
:
func forwardEvent(
level string,
note string,
commonLabels map[string]string,
extraLabels map[string]string,
firstSeen time.Time,
lastSeen time.Time,
logs string,
) error {
{{...omitted for brevity, as it's the same as before...}}
_, _, err := slackClient.PostMessage(
channelID,
msgOptionBlocks,
slack.MsgOptionAsUser(true))
if err != nil {
return err
}
if strings.TrimSpace(logs) != "" {
filename := fmt.Sprintf("%s/%s.log",
extraLabels["namespace"],
extraLabels["pod"])
params := slack.FileUploadParameters{
Title: filename,
Filename: filename,
Filetype: "log",
Content: logs,
Channels: []string{channelID},
InitialComment: filename,
}
_, err = slackClient.UploadFile(params)
if err != nil {
return err
}
}
_, err = slackClient.UploadFile(params)
if err != nil {
return err
}
return nil
}
Now it’s time to deal with former part. Collecting the pod logs. Here comes into play a small nuance, sigs.k8s.io/controller-runtime/pkg/client
is not able to retrieve this kind of information — its responsibility is retrieving Kubernetes resources along with their metadata in the attempt to reconcile their actual state with their desired state. For that matter, collecting the logs, we have to resort to Go client for Kubernetes, import k8s.io/api/core/v1
and utilise the kubernetes.Clientset
struct that provides access to the various groups of interest. The most reasonable way to do it, is not in the controller but in the manager (main.go
), that is already having in a *rest.Config
place — ctrl.GetConfigOrDie()
— for talking to the Kubernetes apiserver.
We are going to create a new kubernetes.Clientset
in the manager and expand the EventReconciler
by embeding a kubernetes.Clientset
. In event_controller.go
change the struct definition of the EventReconciler
to:
type EventReconciler struct {
client.Client
kubernetes.Clientset
Scheme *runtime.Scheme
PromtailClient promtail.Client
CommonLabels map[string]string
}
In main.go
create a kubernetes.Clientset
after you create your manager, mgr
:
mgr, err := ctrl.NewManager(ctrl.GetConfigOrDie(), options)
if err != nil {
setupLog.Error(err, "unable to start manager")
os.Exit(1)
}
clientset, err := kubernetes.NewForConfig()
if err != nil {
setupLog.Error(err, "unable to get a kubernetes clientset")
os.Exit(1)
}
and change accordingly the creation of the EventReconciler
.
if err = (&controllers.EventReconciler{
Client: mgr.GetClient(),
Clientset: *clientset,
Scheme: mgr.GetScheme(),
PromtailClient: promtailJsonClient,
CommonLabels: labels,
}).SetupWithManager(mgr); err != nil {
setupLog.Error(err, "unable to create controller", "controller", "Event")
os.Exit(1)
}
Now, we have to go back to event_controller.go and add the functionality to collect those logs. First thing, add the following packages to your imports
:
v1core "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
Next add the following method to EventReconciler
:
func (r *EventReconciler) getPodLogs(ctx context.Context, namespace string, podName string, since metav1.Time) (logs string, err error) {
objectKey := client.ObjectKey{
Namespace: namespace,
Name: podName,
}
var pod v1core.Pod
if err := r.Get(ctx, objectKey, &pod); err != nil {
return "", err
}
logOptions := &v1core.PodLogOptions{
Container: pod.Spec.Containers[0].Name,
Previous: true,
Timestamps: true,
SinceTime: &since,
//TailLines: pointer.Int64Ptr(50),
}
readCloser, err := r.CoreV1().Pods(namespace).GetLogs(podName, logOptions).Stream(context.TODO())
if err != nil {
return "", err
}
defer readCloser.Close()
buffer := new(bytes.Buffer)
buffer.ReadFrom(readCloser)
logs = buffer.String()
return logs, err
}
This method will collect all the logs from the default container of the pod, from the moment that the Event
first occured (SinceTime
) till now and return them as a string.
If you want to limit the amount of log lines returned uncomment the
TailLines
field of thePodLogOptions
and give it the appropriate number that suits your needs.
Last thing, we have to make some changes to the Reconcile
method — we want to collect the pod logs if the Event.Type
is not Normal
and the resource that Event
is regarding to is Kind
of pod
; only then should the reconciler attempt to retrieve logs and save them in logs
variable:
level := promtail.Info
if event.Type != "Normal" {
level = promtail.Warn
var logs string
if strings.ToLower(event.Regarding.Kind) == "pod" {
out, err := r.getPodLogs(
ctx,
event.Regarding.Namespace,
event.Regarding.Name,
event.DeprecatedFirstTimestamp,
)
if err != nil {
logger.V(5).Error(err, "failed to get pod logs")
}
logs = out
}
if slackClient == nil {
initSlackClient()
}
err := forwardEvent(
event.Type,
event.Note,
r.CommonLabels,
extraLabels,
event.DeprecatedFirstTimestamp.Time,
event.DeprecatedLastTimestamp.Time,
logs)
if err != nil {
logger.Error(err, "failed to forward to slack channel")
slackClient = nil
}
}
We are not there yet, last thing we have to do is to give permissions to our Slack app to write files in the Slack channel. Go back to https://api.slack.com/apps and add to your app the scope files:write
and reinstall it in the workspace.
Now run again the controller (make run
) and let’s see what is received in our Slack channel:
Next Steps
In the third part of this series, we will see how we can make this Slack message interactive, and issue ad-hoc requests to OpenAI Chat API requesting guidance on how to proceed solving this error.
You can find the source code here: