Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamically set the logType in the Azure Log Analytics fluentbit plugin #7420

Closed
felfa01 opened this issue May 17, 2023 · 10 comments
Closed

Comments

@felfa01
Copy link

felfa01 commented May 17, 2023

Is your feature request related to a problem? Please describe.
I have a multi-tenant setup in K8s clusters where tenants can opt in to have their application logs sent to Azure Log Analytics. In Azure LA all the log entries are separated based on the logType configured in the fluentbit Output. It would be great if logType could be dynamically configured based on each record so that I don't need to configure Outputs and LogTypes for each and every tenant.

Describe the solution you'd like
I'd like to be able to dynamically set the logType in the Azure Log Analytics fluentbit plugin based on a value gotten from record accessor (in this case a kubernetes label).

Describe alternatives you've considered
Configure Outputs for each tenant with a hardspeced logType

@felfa01
Copy link
Author

felfa01 commented May 24, 2023

Suggested approach is to follow the patterns established by e.g. Loki and Splunk and introduce a separate property logTypeKey. logTypeKey sets a record key that will populate the logType field. If the key is found, it will have precedence over the value set in log_type. By introducing a separate property the change will not be breaking.

@kforeverisback
Copy link
Contributor

@felfa01 Can you provide a sample possible configuration?
Something like below?

[INPUT]
    Name  cpu
    Tag  sample.key

[OUTPUT]
    Name        azure
    Match       *
    log_type_key sample.*
    Customer_ID abcd-abcd
    Shared_Key  +w0erjRzUUrabcdabcd==

[OUTPUT]
    Name            stdout
    Match *

@felfa01
Copy link
Author

felfa01 commented Jun 20, 2023

@kforeverisback
My suggestion for logTypeKey is pretty much equal with the setup as for Loki Output tenant_id_key and Splunk Output event_index_key. Basically a way to access record and populating logTypeKey with that key value (which in my case would be a kubernetes label):

[INPUT]
    Name  cpu
    Tag  sample.key

[OUTPUT]
    Name        azure
    Match       *
    Log_Type_Key $kubernetes['labels']['log-type']
    Customer_ID abcd-abcd
    Shared_Key  +w0erjRzUUrabcdabcd==

[OUTPUT]
    Name            stdout
    Match *

@kforeverisback
Copy link
Contributor

kforeverisback commented Jun 21, 2023

@felfa01 I modified the azure plugin to include log_type_key using a record-accessor (format using this doc and your sample config).

Check out my branch-commit for details: kforeverisback@7b39ce1

The config I used for testing is below:

[SERVICE]
    Flush     1
    Log_Level info

[INPUT]
    NAME   dummy
    # Below is a sample fluent-bit record json, collected from a k8s cluster fluent-bit instance
    Dummy  {"stream":"stdout", "logtag":"F", "log":"SAMPLE LOG", "kubernetes":{"pod_name":"ngsa-mem-695df8f557-pq22p", "namespace_name":"default", "pod_id":"c233a4a5-1b14-4b61-a089-dd12f4db617f", "labels":{"app":"ngsa-mem", "pod-template-hash":"695df8f557", "version":"v1"}, "host":"k3d-fluentbit-server-0", "container_name":"ngsa-mem", "docker_id":"XXXXXXXX", "container_hash":"XXXX/ngsa-mem:beta", "container_image":"XXXX/ngsa-mem:beta"}}
    #JSON-MERGED-LOG Dummy  {"stream"=>"stdout", "logtag"=>"F", "Date"=>"2023-06-21T08:07:42.7105166Z", "LogName"=>"Ngsa.RequestLog", "StatusCode"=>200, "TTFB"=>0.350000, "Duration"=>0.360000, "Verb"=>"GET", "Path"=>"/api/XXX/XXXX", "Host"=>"ngsa-mem:8080", "ClientIP"=>"10.42.0.13", "XFF"=>"", "UserAgent"=>"l8r/0.5.0", "TraceID"=>"1df8d6e9d461d0afd90cd20e6aa8a2d5", "SpanID"=>"c600551d74b71295", "ParentSpanID"=>"835086394feaf551", "Category"=>"Movies", "Subcategory"=>"Movies", "Mode"=>"Direct", "Zone"=>"dev", "Region"=>"dev", "CosmosName"=>"in-memory", "kubernetes"=>{"pod_name"=>"ngsa-mem-695df8f557-pq22p", "namespace_name"=>"default", "pod_id"=>"c233a4a5-1b14-4b61-a089-dd12f4db617f", "labels"=>{"app"=>"ngsa-mem", "pod-template-hash"=>"695df8f557", "version"=>"v1"}, "host"=>"k3d-fluentbit-server-0", "container_name"=>"ngsa-mem", "docker_id"=>"XXXX", "container_hash"=>"XXXX/ngsa-mem:beta", "container_image"=>"XXXX/ngsa-mem:beta"}}
    Tag    test_tag

[OUTPUT]
    Name    stdout
    Match   *

[OUTPUT]
    Name    azure
    Match   *
    log_type_key $kubernetes['labels']['app']
    #[IF_LABEL_KEY_NO_EXIST] log_type_key $kubernetes['labels']['app_no_exist']
    #[WITHOUT_LABEL_KEY] log_type sample
    Customer_ID abcd-398e-abcd-b84c-abcd
    Shared_Key  +w0erjabcd-abcd==

Let me know if you need help testing it out.
If this works for you, I can do an official PR for this issue.

@felfa01
Copy link
Author

felfa01 commented Jun 21, 2023

@kforeverisback This looks great, thanks a lot for this contribution.

To be honest I am not sure how to test this out from your branch. We run fluent-bit via fluent operator as part of our k8s setup. But, I will clone your fork and try to run a local test.

When studying the code it does look exactly like what I am after so I'd think you can create a PR for this and let the community in to review.

Thanks again!

@kforeverisback
Copy link
Contributor

kforeverisback commented Jun 22, 2023

@felfa01 You can test it out locally using a k3d kubernetes cluster and a docker registry.
Of course, you'd need log analytics solution to push data to LA.

Download k3d-binary, make sure you have docker running and
Follow the steps below to create a k8s cluster and deploy fluenbit with a sample log generating app:

# Create a custom registry
k3d registry create registry.localhost --port 5000
# Create a simple k8s cluster using that registry
k3d cluster create --registry-use k3d-registry.localhost:5000

# Create a new directory or otherwise docker-context will be large
mkdir deploy && cd deploy
# Copy fluent-bit binary from <FLUENTBIT_REPO_ROOT_DIR>/build/bin/fluent-bit (if using CMake and Unix Makefiles)
cp ../build/bin/fluent-bit .

# Build debug-docker image, it'll take some time to download the debug image which is over 1GB
docker build -t localhost:5000/fluent-bit:dev -f - . <<EOF
FROM fluent/fluent-bit:latest-debug
COPY ./fluent-bit /fluent-bit/bin/
EOF
# Push to the registry
docker push localhost:5000/fluent-bit:dev

# Now we'll deploy to our local k8s cluster
# Make sure its ready
kubectl wait node --for condition=ready --all --timeout=60s
kubectl get po -A # All pods should be running/completed

kubectl apply -f sample-deployment.yaml

Here is the sample-deployment.yaml file:

Sample FluentBit deployment in k3d
--- # Simple log generating app deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: flog
labels:
  app: flog
  version: latest
spec:
replicas: 1
selector:
  matchLabels:
    app: flog
    version: latest
template:
  metadata:
    name: flog
    labels:
      app: flog
      version: latest
      la_table: sometable
      is_it_table: nope
      maybe_key: couldbe
  spec:
    containers:
    - name: flog
      image: mingrammer/flog:latest
      args:
        - -l
        - -s
        - 1s
        - -d
        - 1s
        # - -f
        # - json
      imagePullPolicy: Always
--- # FluentBit config
apiVersion: v1
kind: ConfigMap
metadata:
name: fluentbit
data:
fluent-bit.conf: |
  [SERVICE]
     Flush 1
     Daemon Off
     Log_Level trace
     Parsers_File parsers.conf
     HTTP_Server On
     HTTP_Listen   0.0.0.0
     HTTP_Port     2020
     storage.path  /var/log/flb-storage/
     storage.sync  normal
     storage.backlog.mem_limit 32MB
  @INCLUDE input.conf
  @INCLUDE filter-kubernetes.conf
  @INCLUDE output.conf
input.conf: |
  [INPUT]
     Name tail
     Path /var/log/containers/*flog*.log
     DB /var/log/flb_kube.db
     Skip_Long_Lines Off
     storage.type filesystem
     Parser cri
     Tag kube.*
     Mem_Buf_Limit 5MB
output.conf: |
  [OUTPUT]
     name stdout
     match kube.var.log.containers.flog*
  [OUTPUT]
     Name        azure
     Match       kube.var.log.containers.flog*
     log_type_key $kubernetes['labels']['app']
     Customer_ID XXXXXXXXX-398e-XXXX-b84c-XXXXXXXXXXXXXXX
     Shared_Key  +w0erjRzUUrRgloFXXXXXXXXXXXXXXXXXX==

filter-kubernetes.conf: |
  [FILTER]
      Name                kubernetes
      Match               kube.*
      Kube_URL            https://kubernetes.default.svc:443
      Kube_CA_File        /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      Kube_Token_File     /var/run/secrets/kubernetes.io/serviceaccount/token
      Kube_Tag_Prefix     kube.var.log.containers.
      Merge_Log           On
      Merge_Log_Trim      On
      Keep_Log            Off
      K8S-Logging.Parser  On
      K8S-Logging.Exclude Off
      Annotations         Off
      Labels              On
parsers.conf: |
  [PARSER]
      Name cri
      Format regex
      Regex ^(?<time>[^ ]+) (?<stream>stdout|stderr) (?<logtag>[^ ]*) (?<log>.*)$
      Time_Key    time
      Time_Format %Y-%m-%dT%H:%M:%S.%L%z
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluentbit
labels:
  app.kubernetes.io/component: fluentbit
  app.kubernetes.io/name: fluentbit
spec:
selector:
  matchLabels:
    app.kubernetes.io/name: fluentbit
template:
  metadata:
    labels:
      app.kubernetes.io/component: fluentbit
      app.kubernetes.io/name: fluentbit
  spec:
    priorityClassName: system-node-critical
    serviceAccountName: fluentbit
    terminationGracePeriodSeconds: 10
    containers:
      - name: fluentbit
        image: k3d-registry.localhost:5000/fluent-bit:dev
        imagePullPolicy: Always
        ports:
          - containerPort: 2020
        resources:
          limits:
            memory: "512Mi"
            cpu: "1000m"
          requests: 
            memory: "512Mi"
            cpu: "500m"
        volumeMounts:
          - name: varlog
            mountPath: /var/log
          - name: varlibdockercontainers
            mountPath: /var/lib/docker/containers
            readOnly: true
          - name: fluent-bit-config
            mountPath: /fluent-bit/etc/
    volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers
      - name: fluent-bit-config
        configMap:
          name: fluentbit
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: fluentbit
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: fluentbit
rules:
- apiGroups:
    - ""
  resources:
    - namespaces
    - pods
  verbs: 
    - get
    - list
    - watch
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: fluentbit
roleRef:
kind: ClusterRole
name: fluentbit
apiGroup: ""
subjects:
- kind: ServiceAccount
  namespace: default
  name: fluentbit
  apiGroup: ""

@felfa01
Copy link
Author

felfa01 commented Jul 4, 2023

@kforeverisback My apologies for delay in getting back to you. I got stuck on testing this out and didn't have the time to troubleshoot it further. I'll see if I get some time to test this in the coming days - in the meantime I'd suggest to open a PR for this and let the community in to comment.

@kforeverisback
Copy link
Contributor

@felfa01 Created a PR #7663
Hopefully everything will be checked out and merged 🤞

@edsiper
Copy link
Member

edsiper commented Sep 23, 2023

merged , thank you!

@edsiper edsiper closed this as completed Sep 23, 2023
@kforeverisback
Copy link
Contributor

Much Appreciated @edsiper!

@felfa01 Just to notify you, its merged :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants