Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

config: complete Yaml support #9513

Merged
merged 26 commits into from
Oct 26, 2024
Merged

config: complete Yaml support #9513

merged 26 commits into from
Oct 26, 2024

Conversation

edsiper
Copy link
Member

@edsiper edsiper commented Oct 22, 2024

The following PR/branch implements the final pieces for full Yaml support. Components that are being added to Yaml

  • parsers
  • multiline parsers
  • stream processor
  • plugins
  • upstream servers

Parsers

Parsers are now supported in the main config file or through an included Yaml file, definition:

parsers:
  - name: json
    format: json

  - name: docker2
    format: json
    time_key: time
    time_format: "%Y-%m-%dT%H:%M:%S.%L"
    time_keep: true

Note that this functionality is compatible with the old service parsers_file directive.

Multiline Parsers

Multiline parsers combine logs that are split across multiple events into one, keeping the full message together, like with stack traces or detailed logs.

The current implementation in Yaml differs a bit in syntax from classic format, however there are no breaking changes. Here is an example of one multiline parser defined in the main file and other through an included file that also defines a similar section (multiline parser contain 2 rules):

includes:
    - more_parsers.yaml
    
multiline_parsers:
  - name: multiline-regex-test
    type: regex
    flush_timeout: 1000
    rules:
      - state: start_state
        regex: '/([a-zA-Z]+ \d+ \d+:\d+:\d+)(.*)/'
        next_state: cont
      - state: cont
        regex: '/^\s+at.*/'
        next_state: cont


pipeline:
  inputs:
    - name: tail
      path: ../test_multiline.log
      read_from_head: true
      multiline.parser: multiline-regex-test

  outputs:
    - name: stdout
      match: '*'
      format: json_lines

Stream Processor

The stream processor is a little known but power functionality that allows to create new streams of data based on results done by SQL queries (including aggregation), when configured it runs right after the last filter (if set). The following is the new format to expose stream processor tasks in Yaml:

stream_processor:
  - name: stream_chile
    exec: CREATE STREAM country_test WITH (tag='only_chile') AS SELECT word, num FROM STREAM:tail.0 WHERE country='Chile';
    
parsers:
  - name: json
    format: json

pipeline:
  inputs:
    - name: tail
      path: ../sp-samples*.log
      parser: json
      read_from_head: true

  outputs:
    - name: stdout
      match: 'only_chile'
      format: json_lines

Plugins

While Fluent Bit ships with built-in plugins, it also supports to load external plugins at runtime, this mechanism is used to load Go or Wasm plugins which are built as shared object files (.so). Yaml is extended in two ways:

1. inline Yaml section

---

plugins:
  - /home/edsiper/c/fluent-bit-go/examples/out_gstdout/out_gstdout.so

service:
  log_level: info
  plugins_file: other_plugins.yaml

pipeline:
  inputs:
    - name: random

  outputs:
    - name: gstdout
      match: '*'

2. Yaml plugins file included through service section plugins_file option:

service:
  log_level: info
  plugins_file: extra_plugins.yaml

pipeline:
  inputs:
    - name: random

  outputs:
    - name: gstdout
      match: '*'

where extra_plugins.yaml might contain the definition described in 1:

plugins:
  - /home/edsiper/c/fluent-bit-go/examples/out_gstdout/out_gstdout.so

Upstream Servers

Certain output plugins supports mechanisms to connect to different endpoints through the definition of upstream servers. This PR introduces the new configuration format for Yaml.

The new directive upstream_servers can define blocks of upstreams servers that are composed by one or multiple nodes, the example below defines 2 upstreams and each one with it own nodes:

upstream_servers:
  - name: forward-balancing
    nodes:
      - name: node-1
        host: 127.0.0.1
        port: 43000

      - name: node-2
        host: 127.0.0.1
        port: 44000

      - name: node-3
        host: 127.0.0.1
        port: 45000
        tls: true
        tls_verify: false
        shared_key: secret

  - name: forward-balancing-2
    nodes:
      - name: node-A
        host: 192.168.1.10
        port: 50000

      - name: node-B
        host: 192.168.1.11
        port: 51000

This upstream servers can be now defined globally, however the output plugins that supports this functionality like Forward or Elasticsearch needs to add an extra configuration option to specify the upstream server to use: note that the current classic config mode they load a file and it assumes that's the only one supported.


Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant