Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

in_docker: stricter config json name matching #5343

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

fopina
Copy link
Contributor

@fopina fopina commented Apr 22, 2022

When a container is part of a docker swarm service, service name was reported (instead of container name) and first char was lost

Currently "name"=>"kron_agents":

/opt/td-agent-bit/bin/td-agent-bit -i docker -o stdout -q
[0] docker.0: [1650624963.504885670, {"id"=>"afef4d705052", "name"=>"kron_agents", "cpu_used"=>1679490969020, "mem_used"=>104538112, "mem_limit"=>104857600}]

Expected "name"=>"dkron2_agents.o2y1zv88kwvw97w14brnqneen.attv68rjfwcyg227xgpcogl1w":

/opt/td-agent-bit/bin/td-agent-bit -e /opt/td-agent-bit/flb-in_docker2.so -i docker2 -o stdout -q
[0] docker2.0: [1650625107.506103721, {"id"=>"afef4d705052", "name"=>"dkron2_agents.o2y1zv88kwvw97w14brnqneen.attv68rjfwcyg227xgpcogl1w", "cpu_used"=>1680827091746, "mem_used"=>104534016, "mem_limit"=>104857600}]

The issue is that this plugin extracts container name by looking at its config json, finding the first "Name" entry and skipping 9 characters, which also skips the leading / in internal container names.

The problem is that when container belongs to a swarm service there's a JSON entry {Service: {Name: ...} that is listed before container name and the extract code does not expect that. Result being service name is extracted instead of container name and, as service names are not prefixed with /, the first character is lost.

This change updates the extract code to look for "Name":"/ specifically, assuming proper JSON parsing was not implemented from the start for performance reasons.

It has also been done in a standalone fork of the plugin so I could test/use straight away - https://github.com/fopina/fluent-bit-in-docker2


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • [N/A] Example configuration file for the change
  • Debug log output from testing the change
  • [N/A] Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

Documentation

  • [N/A] Documentation required for this feature

Backporting

  • [N/A] Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

@fopina
Copy link
Contributor Author

fopina commented Apr 22, 2022

debug log

/opt/td-agent-bit/bin/td-agent-bit -i docker -o stdout -m '*' -o exit -m '*' -v
Fluent Bit v1.9.2
* Copyright (C) 2015-2022 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2022/04/22 12:06:00] [ info] Configuration:
[2022/04/22 12:06:00] [ info]  flush time     | 1.000000 seconds
[2022/04/22 12:06:00] [ info]  grace          | 5 seconds
[2022/04/22 12:06:00] [ info]  daemon         | 0
[2022/04/22 12:06:00] [ info] ___________
[2022/04/22 12:06:00] [ info]  inputs:
[2022/04/22 12:06:00] [ info]      docker
[2022/04/22 12:06:00] [ info] ___________
[2022/04/22 12:06:00] [ info]  filters:
[2022/04/22 12:06:00] [ info] ___________
[2022/04/22 12:06:00] [ info]  outputs:
[2022/04/22 12:06:00] [ info]      stdout.0
[2022/04/22 12:06:00] [ info]      exit.1
[2022/04/22 12:06:00] [ info] ___________
[2022/04/22 12:06:00] [ info]  collectors:
[2022/04/22 12:06:00] [ info] [fluent bit] version=1.9.2, commit=, pid=27218
[2022/04/22 12:06:00] [debug] [engine] coroutine stack size: 196608 bytes (192.0K)
[2022/04/22 12:06:00] [debug] [storage] [cio stream] new stream registered: docker.0
[2022/04/22 12:06:00] [ info] [storage] version=1.1.6, type=memory-only, sync=normal, checksum=disabled, max_chunks_up=128
[2022/04/22 12:06:00] [ info] [cmetrics] version=0.3.0
[2022/04/22 12:06:00] [debug] [docker:docker.0] created event channels: read=21 write=22
[2022/04/22 12:06:00] [debug] [stdout:stdout.0] created event channels: read=23 write=24
[2022/04/22 12:06:00] [debug] [exit:exit.1] created event channels: read=30 write=31
[2022/04/22 12:06:00] [debug] [router] match rule docker.0:stdout.0
[2022/04/22 12:06:00] [debug] [router] match rule docker.0:exit.1
[2022/04/22 12:06:00] [ info] [sp] stream processor started
[2022/04/22 12:06:00] [ info] [output:stdout:stdout.0] worker #0 started
[2022/04/22 12:06:00] [debug] [input chunk] update output instances with new chunk size diff=147
[2022/04/22 12:06:00] [debug] [input chunk] update output instances with new chunk size diff=129
[2022/04/22 12:06:00] [debug] [input chunk] update output instances with new chunk size diff=153
[2022/04/22 12:06:01] [debug] [task] created task=0x7fb6044700 id=0 OK
[2022/04/22 12:06:01] [debug] [output:stdout:stdout.0] task_id=0 assigned to thread #0
[0] docker.0: [1650625560.505582130, {"id"=>"afef4d705052", "name"=>"dkron2_agents.o2y1zv88kwvw97w14brnqneen.attv68rjfwcyg227xgpcogl1w", "cpu_used"=>1684874715504, "mem_used"=>104529920, "mem_limit"=>104857600}]
[1] docker.0: [1650625560.505766441, {"id"=>"f145bdacdd2d", "name"=>"surf_resync-rootbox.1.i1azo708bm4nyhn3zbjzbbakn", "cpu_used"=>4356669333, "mem_used"=>46792704, "mem_limit"=>209715200}]
[2] docker.0: [1650625560.505814773, {"id"=>"d6ff50d4b960", "name"=>"portainer_agent.o2y1zv88kwvw97w14brnqneen.xxanqkshung8ocz0q4ui1b1ek", "cpu_used"=>616098336485, "mem_used"=>33619968, "mem_limit"=>9223372036854771712}]
[2022/04/22 12:06:01] [debug] [out flush] cb_destroy coro_id=0
[2022/04/22 12:06:01] [debug] [input chunk] update output instances with new chunk size diff=147
[2022/04/22 12:06:01] [debug] [input chunk] update output instances with new chunk size diff=129
[2022/04/22 12:06:01] [debug] [input chunk] update output instances with new chunk size diff=153
[2022/04/22 12:06:01] [ info] [input] pausing docker.0
[2022/04/22 12:06:01] [debug] [task] created task=0x7fb6044770 id=1 OK
[2022/04/22 12:06:01] [debug] [output:stdout:stdout.0] task_id=1 assigned to thread #0
[2022/04/22 12:06:01] [ warn] [engine] service will shutdown in max 5 seconds
[0] docker.0: [1650625561.505994702, {"id"=>"afef4d705052", "name"=>"dkron2_agents.o2y1zv88kwvw97w14brnqneen.attv68rjfwcyg227xgpcogl1w", "cpu_used"=>1684879407984, "mem_used"=>104529920, "mem_limit"=>104857600}]
[2022/04/22 12:06:01] [debug] [out flush] cb_destroy coro_id=0
[2022/04/22 12:06:01] [debug] [task] destroy task=0x7fb6044700 (task_id=0)
[1] docker.0: [1650625561.506206642, {"id"=>"f145bdacdd2d", "name"=>"surf_resync-rootbox.1.i1azo708bm4nyhn3zbjzbbakn", "cpu_used"=>4356669333, "mem_used"=>46792704, "mem_limit"=>209715200}]
[2022/04/22 12:06:01] [debug] [out flush] cb_destroy coro_id=1
[2] docker.0: [1650625561.506268641, {"id"=>"d6ff50d4b960", "name"=>"portainer_agent.o2y1zv88kwvw97w14brnqneen.xxanqkshung8ocz0q4ui1b1ek", "cpu_used"=>616100355538, "mem_used"=>33619968, "mem_limit"=>9223372036854771712}]
[2022/04/22 12:06:01] [debug] [out flush] cb_destroy coro_id=1
[2022/04/22 12:06:01] [debug] [task] destroy task=0x7fb6044770 (task_id=1)
[2022/04/22 12:06:02] [ info] [engine] service has stopped (0 pending tasks)
[2022/04/22 12:06:02] [ info] [output:stdout:stdout.0] thread worker #0 stopping...
[2022/04/22 12:06:02] [ info] [output:stdout:stdout.0] thread worker #0 stopped

@fopina
Copy link
Contributor Author

fopina commented Apr 22, 2022

@edsiper @patrick-stephens seems CODEOWNERS is broken on this one and did not request reviewers...

@patrick-stephens
Copy link
Contributor

It should have been caught by the default one so not sure why that didn't work, although maybe a Github issue.

@patrick-stephens
Copy link
Contributor

Ah, the configuration seems to specify a user that does not map for me at least:

/plugins/in_docker       @ashutoshdhundhara

So that is likely why, I'll raise a separate issue on updating that.

@nokute78
Copy link
Collaborator

nokute78 commented May 4, 2022

I'm not familiar with docker swarm.

Note:
config.v2.json seems to represent below struct Container.
https://github.com/moby/moby/blob/v20.10.14/container/container.go#L61

It contains Name which we want to look up and NetworkSettings.
https://github.com/moby/moby/blob/v20.10.14/container/container.go#L77
https://github.com/moby/moby/blob/v20.10.14/container/container.go#L75

NetworkSettings has a member Service.
https://github.com/moby/moby/blob/v20.10.14/daemon/network/settings.go#L22

Service has a Name.
https://github.com/moby/moby/blob/v20.10.14/daemon/cluster/provider/network.go#L33

Summary:
config.v2.json contains below Name members.

{
    "NetworkSettings": {
         "Service": {
              "Name":"hoge"
         },
    },
    "Name":"/hoge"
}

@fopina
Copy link
Contributor Author

fopina commented May 4, 2022

thanks for the comment @nokute78
Was there a question or remark towards the PR there?
As that is indeed the problem: a non-container “Name:” shows up before the container Name key and the current code extracts it (wrongly).

as container names always start with /, adding that to the extraction fixes it (without doing full json parsing)

@nokute78
Copy link
Collaborator

nokute78 commented May 4, 2022

@fopina
My note is to understand container.v2.json which is the file to get container information.

I didn't confirm that first character of container name should be /.
Could you point a document or source ?
If container name should start with /, I agree the patch.

In my understanding, this patch depends on the struct Container and name of container.
If a member of Container or name rule will be changed, we may need more patches.

@fopina
Copy link
Contributor Author

fopina commented May 4, 2022

I haven’t looked it up documented, I’ll try to find it in the code itself.

But even without that confirmation, I believe it’s safe to assume that, from two different perspectives:

  1. My own experience: every docker inspect I’ve done (CLI and API) always return / in the container name
  2. Current plugin: if you see the number of caracters skipped during name extraction, you see there’s one more - so / is not included. Meaning current code is also expecting /, it’s not really changing that behavior

@github-actions
Copy link
Contributor

github-actions bot commented Aug 2, 2022

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the Stale label Aug 2, 2022
@fopina
Copy link
Contributor Author

fopina commented May 8, 2023

I was just now going to upgrade my fork from this plugin to fluent-bit v2 and realized it was only to address this issue that is still open.

Is it still up for consideration / Is it worth for me to resolve conflicts and update the PR?

@github-actions github-actions bot removed the Stale label May 9, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Aug 8, 2023

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the Stale label Aug 8, 2023
@edsiper
Copy link
Member

edsiper commented Oct 4, 2023

let's fix the conflicts to revisit pls

@fopina
Copy link
Contributor Author

fopina commented Oct 5, 2023

let's fix the conflicts to revisit pls

Has anything changed regarding @nokute78 concerns?

They're not related to code itself, so I don't see why resolving the conflicts would help before clearing that up

@github-actions github-actions bot removed the Stale label Oct 5, 2023
Copy link
Contributor

github-actions bot commented Jan 3, 2024

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the Stale label Jan 3, 2024
@github-actions github-actions bot removed the Stale label Aug 15, 2024
@fopina
Copy link
Contributor Author

fopina commented Nov 2, 2024

hi @edsiper giving this PR another try, let me know if it's ok now

@fopina
Copy link
Contributor Author

fopina commented Nov 2, 2024

I'd like to re-iterate the issue here:

Regarding previous questions about not seeing a slash anywhere, it's important to highlight this plugin is not using docker API, it's reading the config.v2.json file directly.

This is output of a cat /var/lib/docker/containers/53ede05dffac31d9d10cae92fbc6a7b1bfa219f526cbb2db097451be3fdbcd11/config.v2.json in a machine running docker (without swarm)

{"StreamConfig":{},"State":{"Running":true,"Paused":false,"Restarting":false,"OOMKilled":false,"RemovalInProgress":false,"Dead":false,"Pid":236,"ExitCode":0,"Error":"","StartedAt":"2024-11-02T19:56:50.661022417Z","FinishedAt":"0001-01-01T00:00:00Z","Health":null},"ID":"53ede05dffac31d9d10cae92fbc6a7b1bfa219f526cbb2db097451be3fdbcd11","Created":"2024-11-02T19:56:50.483934923Z","Managed":false,"Path":"tail","Args":["-f","/dev/null"],"Config":{"Hostname":"53ede05dffac","Domainname":"","User":"","AttachStdin":false,"AttachStdout":false,"AttachStderr":false,"Tty":false,"OpenStdin":false,"StdinOnce":false,"Env":["PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"],"Cmd":["tail","-f","/dev/null"],"Image":"alpine","Volumes":null,"WorkingDir":"","Entrypoint":null,"OnBuild":null,"Labels":{}},"Image":"sha256:91ef0af61f39ece4d6710e465df5ed6ca12112358344fd51ae6a3b886634148b","ImageManifest":null,"NetworkSettings":{"Bridge":"","SandboxID":"404058c64e41043d97ace4d5dcb25b304976a252e016f737eaf301070db9846e","SandboxKey":"/var/run/docker/netns/404058c64e41","HairpinMode":false,"LinkLocalIPv6Address":"","LinkLocalIPv6PrefixLen":0,"Networks":{"bridge":{"IPAMConfig":null,"Links":null,"Aliases":null,"MacAddress":"02:42:ac:12:00:02","DriverOpts":null,"NetworkID":"38d188b71dfe64dd67436dc65743636f03fbe49f459b4f82fda16c8fcbc401de","EndpointID":"133c66c22dfe48d473555669bcd108bd4a032d67198eb77f333d62332234e112","Gateway":"172.18.0.1","IPAddress":"172.18.0.2","IPPrefixLen":16,"IPv6Gateway":"","GlobalIPv6Address":"","GlobalIPv6PrefixLen":0,"DNSNames":null,"IPAMOperational":false,"DesiredMacAddress":""}},"Service":null,"Ports":{},"SecondaryIPAddresses":null,"SecondaryIPv6Addresses":null,"HasSwarmEndpoint":false},"LogPath":"/var/lib/docker/containers/53ede05dffac31d9d10cae92fbc6a7b1bfa219f526cbb2db097451be3fdbcd11/53ede05dffac31d9d10cae92fbc6a7b1bfa219f526cbb2db097451be3fdbcd11-json.log","Name":"/vibrant_proskuriakova","Driver":"vfs","OS":"linux","RestartCount":0,"HasBeenStartedBefore":true,"HasBeenManuallyStopped":false,"MountPoints":{},"SecretReferences":null,"ConfigReferences":null,"MountLabel":"","ProcessLabel":"","AppArmorProfile":"","SeccompProfile":"","NoNewPrivileges":false,"HostnamePath":"/var/lib/docker/containers/53ede05dffac31d9d10cae92fbc6a7b1bfa219f526cbb2db097451be3fdbcd11/hostname","HostsPath":"/var/lib/docker/containers/53ede05dffac31d9d10cae92fbc6a7b1bfa219f526cbb2db097451be3fdbcd11/hosts","ShmPath":"","ResolvConfPath":"/var/lib/docker/containers/53ede05dffac31d9d10cae92fbc6a7b1bfa219f526cbb2db097451be3fdbcd11/resolv.conf","LocalLogCacheMeta":{"HaveNotifyEnabled":false}}

We can see only one "Name" attribute in the whole file and it does have the /, which matches the 9 magic skip ("Name":"/) in the original code.

Now moving to a host using docker swarm, and checking a container that does belong to swarm service

{
  "StreamConfig": {},
    "Service": {
      "Name": "dojo_runnerdinda4",
  },
  "Name": "/dojo_runnerdinda4.1.9qdppb11pqkqlqj791zft1cyy",
}

I've formatted so it's easier to read and removed everything in between:

There's an entry for Service and it also has a Name attribute. And it doesn't start with /.

As it is listed before the container name, strstr used in the original code would just wrongly pick up the service name and the magic 9 would also remove the first character...

By adding / to the DOCKER_NAME_ARG, I make sure it always picks up the second Name - slash is not allowed in container or service names, so that check should be safe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants