[Troubleshooting] Fluentd not sending K8s logs occasionally
Things you can check to narrow the problem, this post is based on using fluent in_tail source plugin (@type tail).
- Is the container writing lots of logs and causing log files in
/var/log/podsrotated very fast?
First, check fluentd log to observe the pattern and frequency of
detected rotation and
following tail. These should happen in pairs, if
following tail messages are missing, fluentd will NOT ship that container’s log.
[info]: #0 detected rotation of /var/log/containers/...
[info]: #0 following tail of /var/log/containers...
Then, check if some container log reaches max size and trigger a rotation very frequently, causing fluentd unable to catch up:
12–30 10:50 /var/log/pods/mycontainer-90162b4a21a9/0.log.20211230–104915.gz
12–30 10:51 /var/log/pods/mycontainer-90162b4a21a9/0.log.20211230–105015.gz
12–30 10:52 /var/log/pods/mycontainer-90162b4a21a9/0.log.20211230–105116.gz
12–30 10:53 /var/log/pods/mycontainer-90162b4a21a9/0.log
Can also check logging driver settings (usually cluster-wide) to find out max-size setting for a single log file, e.g.
pos_file_compaction_interval 3mto a proper time range so the pos file can catch up with fast log rotation.
With pos_file_compaction_interval 10m, in_tail removes unwatched file from pos_file entries at 10m intervals. This feature is for short-live and lots of containers environment.
The condition to detect the rotation is here.
If the file’s inode changed and the file size is smaller than previous status of size, fluentd identified the log rotation occurred.
if @inode != inode || fsize < @fsize(According to https://github.com/fluent/fluentd/issues/2692)
2. Is network (or Istio) max throughput reached? If yes, set
refresh_intervalis 60 seconds, if log files in
/var/log/containers rotated faster than that, try to decrease the time interval to refresh the list of watch files more frequently.
4. Is this issue caused by a stuck of inotify? Setting
enable_stat_watcher to false will disable the inotify events and use only timer watcher for file tailing.
5. Check fluentd version. Some version (1.12.x) seems to have in_tail bugs and is fixed in newer versions.