<Way 1> Use fluent filter plugin to combine log lines
<filter ** >
@type concat
key message
use_first_timestamp true
partial_key logtag
partial_value P
seperator ""
</filter>
<Way 2> Turn your log into JSON format in your application
e.g. Java application
pom.xml
<dependency>
<groupId>ch.qos.logback.contrib</groupId>
<artifactId>logback-json-classic</artifactId>
<version>0.1.5</version>
</dependency>
<dependency>
<groupId>ch.qos.logback.contrib</groupId>
<artifactId>logback-jackson</artifactId>
<version>0.1.5</version>
</dependency>
<dependency>
<groupId>net.logstash.logback</groupId>
<artifactId>logstash-logback-encoder</artifactId>
<version>6.6</version>
</dependency>
logback.xml
<appender name="JSON" class="ch.qos.logback.core.ConsoleAppender">
<layout class="ch.qos.logback.contrib.json.classic.JsonLayout">
<jsonFormatter class="ch.qos.logback.contrib.jackson.JacksonJsonFormatter" />
<timestampFormat>yyyy-MM-dd HH:mm:ss</timestampFormat>
<appendLineSeparator>true</appendLineSeparator>
</layout>
<encoder class="net.logstash.logback.encoder.LogstashEncoder">
<timeZone>UTC</timeZone>
<timestampPattern>yyyy/MM/dd HH:mm:ss.SSS</timestampPattern>
<customFields>{"appname":"ES_DATA_INITIAL"}</customFields>
<includeMdc>true</includeMdc>
<includeMdcKeyName>sessionId</includeMdcKeyName>
<includeCallerData>true</includeCallerData>
<fieldNames>
<timestamp>log_timestamp</timestamp>
<version>[ignore]</version>
<levelValue>[ignore]</levelValue>
<stackTrace>exception</stackTrace>
</fieldNames>
</encoder>
</appender>
- K8s Yaml Generator : https://k8syaml.com
- VS Code Extension : Kubernetes Templates
https://marketplace.visualstudio.com/items?itemName=lunuan.kubernetes-templates
According to Elasticsearch's official guide, there are no hard limits on shard size, but experience shows that shards between 10GB and 50GB typically work well for logs and time series data.
- Install elasticsearch-exporter
2. Add a Prometheus rule
- alert: ElasticsearchShardTooLarge
expr: sum by(index, cluster, instance) (elasticsearch_indices_store_size_bytes_primary)/count by (index, cluster, instance)(elasticsearch_indices_shards_docs)/1024/1024/1024 >50
for: 5m
labels:
severity: warning
service: EFK
frequency: daily annotations:
summary: Elasticsearch Single Shard > 50G
action: Edit template setting - number_of_shards
3. Setup alertmanager.yml, and fire this alert once a day
route:
- match:
frequency: daily
service: EFK
group_by: [cluster, instance]
receiver: efk-receiver
active_time_intervals:
- morning
repeat_interval: 50mtime_intervals:
- name: morning
time_intervals:
- times:
- start_time: 00:00 # 8-9 AM in GMT+8 timezone
end_time: 01:00
[Type A] Manually move a shard to another data node
POST _cluster/reroute
{
“commands” : [
{
“move” : {
“index” : “logstash-default-gigacim-2022.01.17”, “shard” : 9,
“from_node” : “f12glog29_d3”, “to_node” : “f12glog21_d2”
}
}
]
}
[Type B] Move only today’s shards away from a data node
Check (sort by node):GET /_cat/shards/*2022.03.02*?v&s=node
Move shards:
PUT logstash-*-2022.03.02/_settings
{
"index.routing.allocation.exclude._ip": "10.xx.xx.xx"
}PUT .monitoring-kibana-*-2022.03.02/_settings
{
"index.routing.allocation.exclude._ip": "10.xx.xx.xx"
}PUT .monitoring-es-*-2022.03.02/_settings{
"index.routing.allocation.exclude._ip": "10.xx.xx.xx"
}
[Type C] Move ALL shards away from a data node
By ip
PUT _cluster/settings
{ "transient" :
{ "cluster.routing.allocation.exclude._ip" : "10.xx.xx.xx" }
}
or by node name
PUT _cluster/settings
{ "transient" :
{ "cluster.routing.allocation.exclude._name" : "es_node_1" }
}
Situation:
While adding nodes to an existing elasticsearch cluster, new nodes can’t be discovered and the master node log shows Elasticsearch — Client did not trust this server’s certificate, closing connection Netty4TcpChannel
Possible Reasons:
- elastic-certificate.p12 is different on elasticsearch hosts.
- Master nodes not started
- Connections between nodes are blocked behind a firewall
Reference: