当前位置:首页 > Web开发 > 正文

Filebeat7 Kafka Gunicorn Flask Web应用措施日志收罗

2024-03-31 Web开发

如何用filebeat kafka es做一个好用,好打点的日志收集工具

放弃logstash,使用elastic pipeline

gunicron日志格局与filebeat/es配置

flask日志格局与异平日志收罗与filebeat/es配置

以上的配置

表面

我有一个HTTP请求,颠末的路径为

Gateway(kong)-->WebContainer(gunicorn)-->WebApp(flask)

我筹备以下流向措置惩罚惩罚我的日志

file --> filebeat --> kafka topic--> filebeat --> elastic pipeline --> elasticsearch | | ----------> HBase 为什么这么做 Logstash去哪里了?

Logstash太重了,不过这不是问题,也就是多个机器加点钱的问题。能把工作措置惩罚惩罚就行。

Logstash不美,Logstash虽然是集中打点配置,但是一个logstash仿佛总是不够,Logstash仿佛可以分隔配置,但是你永远不知道如何划分哪些配置应该放在一个配置文件,哪些应该分隔。

删除一个配置?不成能的,我怎么知道应该删除什么配置。

如果用了Logstash. As a ‘poor Ops guys having to understand and keep up with all the crazy input possibilities. ^_^

Filebeat的把柄

看看这个Issue吧, 万人血书让filebeat撑持grok, 但是就是不撑持,不过给了我们两条路,好比你可以用存JSON的日志啊, 或者用pipeline

Filebeat以前是没有一个好的kafka-input。只能本身写kafka-es的转发工具

简单点

我想要的日志收罗就是简简单单,或者说微处事的内聚力。 一条日志收罗线就不该和其他业务混合。最好的就是以下这种状态

onefile -> filebeat_config -> kafka_topic -> filebeat_config -> elastic pipepline -> es index Gunicorn日志 gunicorn日志

gunicorn日志收罗如下的信息

time

client_ip

http method

http scheme

url

url query string

response status code

client name

rt

trace id

remote ips

日志格局

%(t)s [%(h)s] [%(m)s] [%(H)s] [%(U)s] [%(q)s] [%(s)s] [%(a)s] [%(D)s] [%({Kong-Request-ID}i)s] [%({X-Forwarded-For}i)s] 日志例子 [15/Nov/2019:10:23:37 +0000] [172.31.37.123] [GET] [HTTP/1.1] [/api/v1/_instance/json_schema/Team/list] [a=1] [200] [Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36] [936] [9cbf6a3b-9c3a-4835-a2ef-02e03ee826d7#16] [137.59.103.3, 172.30.17.253, 172.30.18.12] Es processing解析

es processing是6.0之后的成果,相当于es之前自带了一个logstash.对付庞大日志有多种processing,
可以使用grok或者dissect.某些情况下dissect越发快一些.
颠末kafka,再有filebeat打到ES, 需要删除多余的信息

PUT _ingest/pipeline/gunicorn { "description" : "devops gunicorn pipeline", "processors" : [ { "remove": {"field": ["agent", "ecs", "host", "input", "kafka"]} }, { "json": { "field": "message", "add_to_root": true } }, { "remove": {"field": ["@metadata", "ecs", "agent", "input"]} }, { "dissect" : { "field": "message", "pattern": "[%{@timestamp}] [%{client_ip}] [%{method}] [%{scheme}] [%{path}] [%{query_string}] [%{status}] [%{client}] [%{rt_millo}] [%{trace_id}] [%{remote_ips}]" } } ], "on_failure": [ { "set": { "field": "_index", "value": "failed-{{ _index }}" } } ] } Es mapping

这里对照关键的是ES时间格局文档的界说,, 如果某些字段我们感受有须要分词,就是用text。否则使用keyword。这样可以越发
便利的聚合和盘问日志数据, 开启_source便利做一些数据统计

PUT _template/gunicorn { "index_patterns": ["*gunicorn*"], "settings": { "number_of_shards": 1 }, "version": 1, "mappings": { "_source": { "enabled": true }, "properties": { "@timestamp": { "type": "date", "format": "dd/LLL/yyyy:HH:mm:ss Z" }, "client_ip": { "type": "ip" }, "method": { "type": "keyword" }, "scheme": { "type": "keyword" }, "path": { "type": "text" }, "query_string": { "type": "text" }, "status": { "type": "integer" }, "client": { "type": "text" }, "rt_millo": { "type": "long" }, "trace_id": { "type": "keyword" }, "remote_ips": { "type": "text" } } } } filebeat 收罗到kafka配置文件 filebeat.inputs: - type: log paths: - /yourpath/gunicorn-access.log multiline.pattern: '^\[' multiline.negate: true multiline.match: after tail_files: true queue.mem: events: 4096 flush.min_events: 512 flush.timeout: 5s output.kafka: hosts: ["kafka-01","kafka-02","kafka-03"] topic: 'gunicron_access' required_acks: 1 compression: gzip max_message_bytes: 1000000 filebeat 从kafka消费配置文件 filebeat.inputs: - type: kafka hosts: ["kafka-01","kafka-02","kafka-03"] topics: ["gunicron_access"] group_id: "filebeat_gunicron" output.elasticsearch: hosts: ["es-url"] pipeline: "gunicorn" index: "gunicorn-%{+yyyy.MM.dd}" setup.template.name: "gunicorn" setup.template.pattern: "gunicorn-*" setup.ilm.enabled: false setup.template.enabled: false Flask日志

温馨提示: 本文由Jm博客推荐,转载请保留链接: https://www.jmwww.net/file/web/33149.html