<Way 1> Use fluent filter plugin to combine log lines

<filter ** >
@type concat
key message
use_first_timestamp true
partial_key logtag
partial_value P
seperator ""
</filter>

<Way 2> Turn your log into JSON format in your application

e.g. Java application

pom.xml

<dependency>
<groupId>ch.qos.logback.contrib</groupId>
<artifactId>logback-json-classic</artifactId>
<version>0.1.5</version>
</dependency>
<dependency>
<groupId>ch.qos.logback.contrib</groupId>
<artifactId>logback-jackson</artifactId>
<version>0.1.5</version>
</dependency>
<dependency>
<groupId>net.logstash.logback</groupId>
<artifactId>logstash-logback-encoder</artifactId>
<version>6.6</version>
</dependency>

logback.xml

<appender name="JSON" class="ch.qos.logback.core.ConsoleAppender">
<layout class="ch.qos.logback.contrib.json.classic.JsonLayout">
<jsonFormatter class="ch.qos.logback.contrib.jackson.JacksonJsonFormatter" />
<timestampFormat>yyyy-MM-dd HH:mm:ss</timestampFormat>
<appendLineSeparator>true</appendLineSeparator>
</layout>
<encoder class="net.logstash.logback.encoder.LogstashEncoder">
<timeZone>UTC</timeZone>
<timestampPattern>yyyy/MM/dd HH:mm:ss.SSS</timestampPattern>
<customFields>{"appname":"ES_DATA_INITIAL"}</customFields>
<includeMdc>true</includeMdc>
<includeMdcKeyName>sessionId</includeMdcKeyName>
<includeCallerData>true</includeCallerData>
<fieldNames>
<timestamp>log_timestamp</timestamp>
<version>[ignore]</version>
<levelValue>[ignore]</levelValue>
<stackTrace>exception</stackTrace>
</fieldNames>
</encoder>
</appender>

--

--

According to Elasticsearch's official guide, there are no hard limits on shard size, but experience shows that shards between 10GB and 50GB typically work well for logs and time series data.

  1. Install elasticsearch-exporter

2. Add a Prometheus rule

- alert: ElasticsearchShardTooLarge
expr: sum by(index, cluster, instance) (elasticsearch_indices_store_size_bytes_primary)/count by (index, cluster, instance)(elasticsearch_indices_shards_docs)/1024/1024/1024 >50
for: 5m
labels:
severity: warning
service: EFK
frequency: daily
annotations:
summary: Elasticsearch Single Shard > 50G
action: Edit template setting - number_of_shards

3. Setup alertmanager.yml, and fire this alert once a day

route:
- match:
frequency: daily
service: EFK
group_by: [cluster, instance]
receiver: efk-receiver
active_time_intervals:
- morning
repeat_interval: 50m
time_intervals:
- name: morning
time_intervals:
- times:
- start_time: 00:00 # 8-9 AM in GMT+8 timezone
end_time: 01:00

--

--

[Type A] Manually move a shard to another data node

POST _cluster/reroute
{
“commands” : [
{
“move” : {
“index” : “logstash-default-gigacim-2022.01.17”, “shard” : 9,
“from_node” : “f12glog29_d3”, “to_node” : “f12glog21_d2”
}
}
]
}

[Type B] Move only today’s shards away from a data node

Check (sort by node):GET /_cat/shards/*2022.03.02*?v&s=node

Move shards:

PUT logstash-*-2022.03.02/_settings
{
"index.routing.allocation.exclude._ip": "10.xx.xx.xx"
}
PUT .monitoring-kibana-*-2022.03.02/_settings
{
"index.routing.allocation.exclude._ip": "10.xx.xx.xx"
}
PUT .monitoring-es-*-2022.03.02/_settings{
"index.routing.allocation.exclude._ip": "10.xx.xx.xx"
}

[Type C] Move ALL shards away from a data node

By ip

PUT _cluster/settings 
{ "transient" :
{ "cluster.routing.allocation.exclude._ip" : "10.xx.xx.xx" }
}

or by node name

PUT _cluster/settings 
{ "transient" :
{ "cluster.routing.allocation.exclude._name" : "es_node_1" }
}

--

--

Error message in a Kubernetes pod:

[Faraday::ConnectionFailed] SSL_connect SYSCALL returned=5 errno=0 state=SSLv3/TLS write client hello (OpenSSL::SSL::SSLError)

Possible Solution:

1. Add AuthorizationPolicy to allow traffic

apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: allow-elasticsearch
spec:
action: ALLOW
rules:
- from:
- source:
ipBlocks:
- 172.0.0.0/8
to:
- operation:
ports: [“9200”]

2. Check if istio is enable in deployment yaml

--

--

Jasmine H

Jasmine H

Data Engineer from Taiwan, recently working on EFK and Kubernetes projects.