Prometheus with Thanos for Long-Term Storage S3
- 持續複製資料,避免一次要傳送大量資料,以及同步時間問題。
- 減少Prometheus儲存空間, 資料複製到 S3。
- 原本Query 的網頁可以直接存取 S3 空間,也可提供Grafana繪製Dashboard。
- 多個元件,可以自由架設。
- 不用Sidecar,採用Receiver
- 不需要與Prometheus共住。
- 不遺漏即時資料
Key Thanos Components
•Receiver: Receives data from Prometheus’s remote-write WAL, exposes it and/or upload it to cloud storage.
•Store(API): Thanos Store acts as an API for querying Prometheus metrics stored in the object store.
•Query(GUI):Querying via Thanos Query, aggregate data from the underlying components.
•Compact: Compacts, downsamples and applies retention on the data stored in cloud storage bucket.
(注意:一個bucket只能由一個Compact服務)
Prometheus設定
- ConfigMap中的prometheus.yml新增remote_write url
remote_write:
- url: "http://thanos-receive:19291/api/v1/receive"
remote_timeout: 30s
Receiver設定
containers:
- args:
- receive
- --receive.replication-factor=1
- --objstore.config=$(OBJSTORE_CONFIG)
- --tsdb.path=/var/thanos/receive # data save local path
- --label=receive_replica="$(NAME)"
- --receive.local-endpoint=$(NAME).thanos-receive.$(NAMESPACE).svc.cluster.local:10901
- --tsdb.retention=12h
- --receive.hashrings-file=/etc/thanos/thanos-receive-hashrings.json
- --tsdb.min-block-duration=30m
- --tsdb.max-block-duration=30mapiVersion: v1
kind: ConfigMap
metadata:
name: receive-config
labels:
name: receive-config
data:
thanos-receive-hashrings.json: |
[
{
"hashring":"default",
"endpoints":[
"thanos-receive-1.thanos-receive.ns.svc.cluster.local:10901"
]
},
{
"hashring":"hashring-0",
"endpoints":[
"thanos-receive-0.thanos-receive.ns.svc.cluster.local:10901",
"thanos-receive-2.thanos-receive.ns.svc.cluster.local:10901"
],
"tenants":[
"tenant-a"
]
}
]
Store設定
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: thanos-store
labels:
app.kubernetes.io/name: thanos-store
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: thanos-store
serviceName: thanos-store
podManagementPolicy: Parallel
template:
metadata:
labels:
app.kubernetes.io/name: thanos-store
spec:
containers:
- args:
- store
- --log.level=debug
- --data-dir=/var/thanos/store
- --grpc-address=0.0.0.0:10907
- --http-address=0.0.0.0:10908
- --objstore.config-file=/etc/thanos/objectstorage.yaml
#- --experimental.enable-index-header
image: bitnami/thanos:0.22.0
livenessProbe:
failureThreshold: 40
httpGet:
path: /-/healthy
port: 10908
scheme: HTTP
periodSeconds: 30
name: thanos-store
ports:
- containerPort: 10907
name: grpc
- containerPort: 10908
name: http
readinessProbe:
failureThreshold: 40
httpGet:
path: /-/ready
port: 10908
scheme: HTTP
periodSeconds: 5
resources:
requests:
cpu: 1
memory: 1Gi
limits:
cpu: 2
memory: 2Gi
terminationMessagePolicy: FallbackToLogsOnError
volumeMounts:
- mountPath: var/thanos/store
name: data
readOnly: false
- name: thanos-objectstorage
mountPath: /etc/thanos/
terminationGracePeriodSeconds: 120
volumes:
- name: thanos-objectstorage
secret:
secretName: thanos-objectstorage
volumeClaimTemplates:
- metadata:
labels:
app.kubernetes.io/name: thanos-store
name: data
spec:
storageClassName: ceph-ssd-1
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
Query設定
---
apiVersion: v1
kind: Service
metadata:
labels:
app: thanos-query
name: thanos-query
spec:
ports:
- name: grpc
port: 10908
protocol: TCP
targetPort: 10908
- name: http
port: 9099
protocol: TCP
targetPort: 9099
selector:
app.kubernetes.io/name: thanos-query
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app.kubernetes.io/name: thanos-query
name: thanos-query
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: thanos-query
template:
metadata:
labels:
app.kubernetes.io/name: thanos-query
spec:
affinity: {}
securityContext:
runAsUser: 1000
fsGroup: 1000
containers:
- args:
- query
- --log.level=debug
- --query.auto-downsampling
- --grpc-address=0.0.0.0:10908
- --http-address=0.0.0.0:9099
- --query.partial-response
- --query.replica-label=receive_replica
- --store=dnssrv+_grpc._tcp.thanos-store.ns.svc.cluster.local
#- --store=thanos-store:10907
- --store=dnssrv+_grpc._tcp.thanos-store.ns.svc.cluster.local
# - --store=dnssrv+_grpc._tcp.thanos-receive.ns.svc.cluster.local #query real-time data
- --store=thanos-receive-0.thanos-receive.ns.svc.cluster.local:10901 image: bitnami/thanos:0.22.0
livenessProbe:
failureThreshold: 8
httpGet:
path: /-/healthy
port: 9099
scheme: HTTP
periodSeconds: 30
name: thanos-query
ports:
- containerPort: 10908
name: grpc
- containerPort: 9099
name: http
readinessProbe:
failureThreshold: 20
httpGet:
path: /-/ready
port: 9099
scheme: HTTP
periodSeconds: 5
terminationMessagePolicy: FallbackToLogsOnError
terminationGracePeriodSeconds: 120