RKE2集群基于rancher-logging的多行日志合并的问题

Rancher Server 设置

  • Rancher 版本:2.7.1
  • 安装选项:
    • Helm高可用安装
    • Local 集群的类型:RKE2 版本2.7.1
  • 在线或离线部署:在线部署

下游集群信息

  • Kubernetes 版本: v1.24.9+rke2r2
  • Cluster Type (Local/Downstream):
    • Downstream,自定义集群

Rancher logging 版本:v101.0.0+up3.17.7

用户信息

  • 登录用户的角色是什么?
    • 管理员

主机操作系统:
Centos7.9 x64

问题描述:
可能只针对containerd运行时,Rancher内置的logging工具在收集日志时,无法将类似java异常这些错误日志多行合并为一行记录,导致一次异常输出几十条记录。
关于flow的配置,在筛选这一栏里,是有多行合并插件concat相关的配置项的,但是我在实际配置中,始终无法成功实现,翻阅资料,发现针对containerd而不是docker环境时是有一些额外配置项的,但是当我apply这些配置时,这些配置好像apply不上去,不知道是不是目前rancher logging还不支持这些配置项。

    use_partial_cri_logtag true
    partial_cri_logtag_key logtag
    partial_cri_stream_key stream

对应到flow的设置应该为:

apiVersion: logging.banzaicloud.io/v1beta1
kind: Flow
metadata:
  name: rke2-flows-test
  namespace: test
spec:
  filters:
    - concat:
        key: message
        multiline_end_regexp: /^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}.\d{3}
        multiline_start_regexp: /^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}.\d{3}
        stream_identity_key: container_hash
        use_first_timestamp: true
        use_partial_cri_logtag: true
        partial_cri_logtag_key: logtag
        partial_cri_stream_key: stream
  globalOutputRefs: []
  localOutputRefs:
  - rke2-output-test
  match:
  - select:
      container_names: []
      hosts: []
      labels: {}
  - exclude:
      container_names: []
      hosts: []
      labels:
        app.kubernetes.io/type: nginx

相关插件说明可以在这个链接找到:

重现步骤:
以下是目前rancher logging相关配置:

apiVersion: logging.banzaicloud.io/v1beta1
kind: Output
metadata:
  name: rke2-output-test
  namespace: test
spec:
  elasticsearch:
    buffer:
      flush_at_shutdown: true
      flush_interval: 10s
      flush_mode: interval
      # flush_mode: immediate
    default_elasticsearch_version: "7"
    host: xxx.xxx.xxx.xxx
    suppress_type_name: true
    include_timestamp: false
    index_name: rke2-output-test
    logstash_format: true
    logstash_prefix: rke2-output-test
    password:
      valueFrom:
        secretKeyRef:
          key: elastic
          name: es-elastic-user
    port: 9200
    scheme: https
    ssl_verify: false
    ssl_version: TLSv1_2
    user: elastic
---
apiVersion: logging.banzaicloud.io/v1beta1
kind: Flow
metadata:
  name: rke2-flows-test
  namespace: test
spec:
  filters:
    - concat:
        key: message
        multiline_end_regexp: /^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}.\d{3}
        multiline_start_regexp: /^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}.\d{3}
        stream_identity_key: container_hash
        use_first_timestamp: true
  globalOutputRefs: []
  localOutputRefs:
  - rke2-output-test
  match:
  - select:
      container_names: []
      hosts: []
      labels: {}
  - exclude:
      container_names: []
      hosts: []
      labels:
        app.kubernetes.io/type: nginx

配置flow部分中的filters这段是新加的,准备用来合并多行报错用的。

最终输出在ES中的数据结构如下:

{
  "rke2-output-test-2023.02.23" : {
    "aliases" : { },
    "mappings" : {
      "properties" : {
        "@timestamp" : {
          "type" : "date"
        },
        "kubernetes" : {
          "properties" : {
            "annotations" : {
              "properties" : {
                "cattle" : {
                  "properties" : {
                    "io/timestamp" : {
                      "type" : "date"
                    }
                  }
                },
                "kubernetes" : {
                  "properties" : {
                    "io/psp" : {
                      "type" : "text",
                      "fields" : {
                        "keyword" : {
                          "type" : "keyword",
                          "ignore_above" : 256
                        }
                      }
                    }
                  }
                }
              }
            },
            "container_hash" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "container_image" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "container_name" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "docker_id" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "host" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "labels" : {
              "properties" : {
                "app" : {
                  "type" : "text",
                  "fields" : {
                    "keyword" : {
                      "type" : "keyword",
                      "ignore_above" : 256
                    }
                  }
                },
                "pod-template-hash" : {
                  "type" : "text",
                  "fields" : {
                    "keyword" : {
                      "type" : "keyword",
                      "ignore_above" : 256
                    }
                  }
                }
              }
            },
            "namespace_name" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "pod_id" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            },
            "pod_name" : {
              "type" : "text",
              "fields" : {
                "keyword" : {
                  "type" : "keyword",
                  "ignore_above" : 256
                }
              }
            }
          }
        },
        "logtag" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "message" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "stream" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    },
    "settings" : {
      "index" : {
        "routing" : {
          "allocation" : {
            "include" : {
              "_tier_preference" : "data_content"
            }
          }
        },
        "number_of_shards" : "1",
        "provided_name" : "rke2-output-test-2023.02.23",
        "creation_date" : "1677110407526",
        "number_of_replicas" : "1",
        "uuid" : "JwMiJOU9TNaK2JBnCXvUGw",
        "version" : {
          "created" : "7130299"
        }
      }
    }
  }
}

需要合并的错误日志示例:

2023-02-23 14:41:27.907 ERROR 7 --- [ XNIO-2 task-12] c.h.m.c.c.e.ErrorResultHandlerHelper     : [Hanlder Error]

	at sun.reflect.GeneratedMethodAccessor160.invoke(Unknown Source) ~[na:na]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_222]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_222]
	at org.springframework.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:223) [spring-core-5.0.10.RELEASE.jar!/:5.0.10.RELEASE]
	at org.springframework.cloud.context.scope.GenericScope$LockedScopedProxyFactoryBean.invoke(GenericScope.java:494) [spring-cloud-context-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:185) [spring-aop-5.0.10.RELEASE.jar!/:5.0.10.RELEASE]
	at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:688) [spring-aop-5.0.10.RELEASE.jar!/:5.0.10.RELEASE]
	at sun.reflect.GeneratedMethodAccessor160.invoke(Unknown Source) ~[na:na]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_222]
	at java.lang.Thread.run(Thread.java:748) [na:1.8.0_222]

/var/log/containers目录下对应pod的日志原始输出类似于:

2023-02-23T14:41:27.913307153+08:00 stdout F 2023-02-23 14:41:27.907 ERROR 7 --- [ XNIO-2 task-12] c.h.m.c.c.e.ErrorResultHandlerHelper     : [Hanlder Error]
2023-02-23T14:41:27.913309444+08:00 stdout F
2023-02-23T14:41:27.913311685+08:00 stdout F com.hletong.miracle.common.core.exception.BizServiceException: 密码不正确
2023-02-23T14:41:27.913322678+08:00 stdout F 	at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_222]
2023-02-23T14:41:27.913324835+08:00 stdout F 	at org.springframework.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:223) [spring-core-5.0.10.RELEASE.jar!/:5.0.10.RELEASE]
2023-02-23T14:41:27.91332708+08:00 stdout F 	at org.springframework.cloud.context.scope.GenericScope$LockedScopedProxyFactoryBean.invoke(GenericScope.java:494) [spring-cloud-context-2.0.0.RELEASE.jar!/:2.0.0.RELEASE]
2023-02-23T14:41:27.913329241+08:00 stdout F 	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:185) [spring-aop-5.0.10.RELEASE.jar!/:5.0.10.RELEASE]
2023-02-23T14:41:27.913334493+08:00 stdout F 	at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:688) [spring-aop-5.0.10.RELEASE.jar!/:5.0.10.RELEASE]
2023-02-23T14:41:27.913337991+08:00 stdout F 	at com.hletong.miracle.auth.controller.LoginController$$EnhancerBySpringCGLIB$$f0724e86.login(<generated>) [classes!/:1.0.4]
2023-02-23T14:41:27.913342839+08:00 stdout F 	at sun.reflect.GeneratedMethodAccessor160.invoke(Unknown Source) ~[na:na]
2023-02-23T14:41:27.913346357+08:00 stdout F 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_222]
2023-02-23T14:41:27.913349686+08:00 stdout F 	at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_222]
2023-02-23T14:41:27.913352809+08:00 stdout F 	at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:209) [spring-web-5.0.10.RELEASE.jar!/:5.0.10.RELEASE]
2023-02-23T14:41:27.913356077+08:00 stdout F 	at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:136) [spring-web-5.0.10.RELEASE.jar!/:5.0.10.RELEASE]
2023-02-23T14:41:27.9133591+08:00 stdout F 	at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:102) [spring-webmvc-5.0.10.RELEASE.jar!/:5.0.10.RELEASE]
2023-02-23T14:41:27.91364378+08:00 stdout F 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_222]
2023-02-23T14:41:27.913647208+08:00 stdout F 	at java.lang.Thread.run(Thread.java:748) [na:1.8.0_222]
2023-02-23T14:41:27.913650267+08:00 stdout F

结果:
无法进行多行合并

截图:
始终不能达到在一条message里合并整个报错异常的效果:

麻烦大佬们看一下,是否哪里配置的有问题?还是说目前不支持基于containerd的多行合并?

有尝试过使用 detectExceptions filter 吗

谢谢!这个方案完美解决了我的问题,下面附上完整flow:

apiVersion: logging.banzaicloud.io/v1beta1
kind: Flow
metadata:
  name: rke2-flows-busi
  namespace: busi
spec:
  filters:
    - detectExceptions:
        force_line_breaks: true
        languages:
          - java
          - python
        message: message
        multiline_flush_interval: '0.1'
        remove_tag_prefix: kubernetes
  globalOutputRefs: []
  localOutputRefs:
  - rke2-output-busi
  match:
  - select:
      container_names: []
      hosts: []
      labels: {}
  - exclude:
      container_names: []
      hosts: []
      labels:
        app.kubernetes.io/type: nginx

效果:

1 个赞

如何单独从message中获取时间呢。怎么去配置。

如何实现只采集error报错的日志呢

关于你说的问题,我建议你看下这个地址:Fluentd filters | Logging operator
我目前的解法虽然不算完美,但也够我用了。。。如果你解决了可以回来贴一下解决方案 :grinning:

时间通过升级版本默认解决了,只收集error我看见fluentBit是支持的,格式如下。 Configuration File - Fluent Bit: Official Manual


然后有个示例

,当然,这只是单独的fluentBIT的配置,我发现在rancher 的logging 空间的configmap修改后,没有作用,应该是只能通过flow进行,不知道这个flow该如何去写