随着云原生技术的发展,容器编排管理工具Kubernetes(简称K8s)在实现自动化运维和应用部署上得到了广泛应用。而监控系统Prometheus则是必不可少的一部分,作为云原生下的监控解决方案,它既轻巧又功能丰富,能够为应用提供实时监控和故障告警。本文将从多个方面详细介绍如何在K8s上部署Prometheus。
一、k8s部署prometheus高可用
在生产环境中,高可用性是最为重要的一个考虑因素。K8s的弹性扩容和服务发现机制,使得我们可以很容易地实现Prometheus的高可用部署。下面是具体实现步骤:
1、首先,我们需要将Prometheus放在K8s集群中,如采用StatefulSet控制器部署。该控制器支持Pod名称的持久性和网络标识的固定,能够更好地保证Pod的唯一性和稳定性。
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: prometheus
spec:
replicas: 3
serviceName: prometheus
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
spec:
containers:
- name: prometheus
image: prom/prometheus
args:
- --storage.tsdb.retention.time=3d
- --web.enable-lifecycle
ports:
- containerPort: 9090
name: web
volumeMounts:
- name: prometheus-data
mountPath: /prometheus
volumes:
- name: prometheus-data
emptyDir: {}
2、为Prometheus部署Service资源,确保集群中的所有节点都能发现到Prometheus服务并进行通信。
apiVersion: v1
kind: Service
metadata:
name: prometheus
spec:
selector:
app: prometheus
ports:
- protocol: TCP
port: 9090
targetPort: web
3、在K8s集群中安装Etcd服务,用作K8s集群中各个节点的状态管理和同步。
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: etcd
spec:
replicas: 3
serviceName: etcd
selector:
matchLabels:
app: etcd
template:
metadata:
labels:
app: etcd
spec:
containers:
- name: etcd
image: quay.io/coreos/etcd
command:
- /usr/local/bin/etcd
- --data-dir=/etcd-data
- --name=node1
- --initial-advertise-peer-urls=http://node1.etcd:2380
- --listen-peer-urls=http://0.0.0.0:2380
- --advertise-client-urls=http://node1.etcd:2379
- --listen-client-urls=http://0.0.0.0:2379
- --initial-cluster=node1=http://node1.etcd:2380,node2=http://node2.etcd:2380,node3=http://node3.etcd:2380
- --initial-cluster-token=etcd-cluster-1
- --initial-cluster-state=new
ports:
- containerPort: 2380
name: peer
- containerPort: 2379
name: client
volumeMounts:
- name: etcd-data
mountPath: /etcd-data
volumes:
- name: etcd-data
emptyDir: {}
4、将Prometheus的状态信息存储到Etcd的数据目录中。这样,当一个节点宕机后,Pod重新启动后可以读取到它前一个状态(原来的节点名称及数据目录),从而保证数据不丢失。
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-conf
data:
prometheus.yml: |-
global:
scrape_interval: 15s
evaluation_interval: 15s
rule_files:
- /etc/prometheus/rules
scrape_configs:
- job_name: 'prometheus'
scrape_interval: 5s
static_configs:
- targets: ['localhost:9090']
- job_name: 'node-exporter'
scrape_interval: 5s
static_configs:
- targets: ['localhost:9100']
alerting:
alertmanagers:
- static_configs:
- targets: ['localhost:9093']
remote_write:
- url: http://node1.etcd:2379/metrics
5、将Prometheus配置文件中的一些关键参数修改,以在集群中寻找Prometheus的实例和服务,这里以修改`scrape_configs`和`remote_write`两项内容为例。
scrape_configs:
- job_name: 'prometheus'
scrape_interval: 5s
static_configs:
- targets: ['prometheus-0.prometheus.default.svc.cluster.local:9090', 'prometheus-1.prometheus.default.svc.cluster.local:9090', 'prometheus-2.prometheus.default.svc.cluster.local:9090']
- job_name: 'node-exporter'
scrape_interval: 5s
static_configs:
- targets: ['localhost:9100']
remote_write:
- url: http://etcd:2379/metrics
在修改完成后,将Prometheus重新启动即可实现K8s集群中的高可用部署。
二、k8s部署prometheus+alertmanager
与K8s集群高可用部署相似,Prometheus结合Alertmanager平台也是方便快捷的,尤其在实现告警通知的过程中。下面是具体实现步骤:
1、首先,在K8s集群中部署Alertmanager。根据K8s文档中提供的alertmanager.yaml进行部署,其中涉及到的服务端口、访问服务的DNS名称等需要根据实际环境进行修改。
apiVersion: apps/v1
kind: Deployment
metadata:
name: alertmanager
spec:
replicas: 1
selector:
matchLabels:
app: alertmanager
template:
metadata:
labels:
app: alertmanager
spec:
containers:
- name: alertmanager
image: prom/alertmanager
args:
- --config.file=/etc/alertmanager/config.yml
ports:
- containerPort: 9093
name: web
volumeMounts:
- name: alertmanager-data
mountPath: /alertmanager
volumes:
- name: alertmanager-data
emptyDir: {}
- name: alertmanager-conf
configMap:
name: alertmanager-conf
---
apiVersion: v1
kind: Service
metadata:
name: alertmanager
spec:
selector:
app: alertmanager
ports:
- protocol: TCP
port: 9093
targetPort: web
type: NodePort
externalTrafficPolicy: Cluster
---
apiVersion: v1
kind: Service
metadata:
name: alertmanager-operated
spec:
selector:
app: alertmanager
ports:
- protocol: TCP
port: 9093
targetPort: web
type: NodePort
externalTrafficPolicy: Cluster
2、在Prometheus中配置告警规则(rules),并将告警信息推送至Alertmanager。
rules:
- alert: HighRequestLatency
expr: job:request_latency_seconds:mean5m{job="myjob"} > 0.5
for: 10m
labels:
severity: warning
annotations:
description: 'High request latency: {{ $value }}'
summary: 'High latency for {{ $labels.instance }}'
alerting:
- alertmanagers:
- static_configs:
- targets:
- "alertmanager-operated.default.svc.cluster.local"
3、在K8s集群中部署Prometheus。
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus
spec:
replicas: 1
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
spec:
containers:
- name: prometheus
image: prom/prometheus
args:
- --config.file=/etc/prometheus/prometheus.yml
- --web.enable-lifecycle
ports:
- containerPort: 9090
name: web
volumeMounts:
- name: prometheus-data
mountPath: /prometheus
volumes:
- name: prometheus-data
emptyDir: {}
- name: prometheus-conf
configMap:
name: prometheus-conf
---
apiVersion: v1
kind: Service
metadata:
name: prometheus
spec:
selector:
app: prometheus
ports:
- protocol: TCP
port: 9090
targetPort: web
type: NodePort
externalTrafficPolicy: Cluster
在完成以上步骤后,就可以实现Prometheus和Alertmanager的联动,并在规定条件达到时实现告警功能。
三、k8s部署prometheus多副本
为了避免单点故障,我们可以部署多个Prometheus实例,实现保证性能和可用性的目的。
1、首先,在K8s集群中部署多个Prometheus实例。
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: prometheus
spec:
replicas: 3
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
spec:
containers:
- name: prometheus
image: prom/prometheus
args:
- --config.file=/etc/prometheus/prometheus.yml
- --web.enable-lifecycle
ports:
- containerPort: 9090
name: web
volumeMounts:
- name: prometheus-data
mountPath: /prometheus
volumes:
- name: prometheus-data
emptyDir: {}
- name: prometheus-conf
configMap:
name: prometheus-conf
2、为每个Prometheus实例部署一个Service资源。
apiVersion: v1
kind: Service
metadata:
name: prometheus
spec:
selector:
app: prometheus
ports:
- protocol: TCP
port: 9090
targetPort: web
type: NodePort
externalTrafficPolicy: Cluster
在完成以上步骤后,就可以轻松实现Prometheus的多副本部署。需要注意的是,在部署多副本时还需要修改Prometheus配置文件,确保每个实例都能够正确地在K8s集群中进行工作。
四、k8s部署prometheus的流程与方式
下面我们将一步一步带你了解如何在K8s集群中部署Prometheus。
1、安装K8s集群,确保集群中有足够的节点。如果您没有K8s集群,请参考官方文档进行安装。
2、将Prometheus的镜像文件上传至镜像仓库,或者手动将Prometheus的镜像文件保存到本地,并将其打成一个.tar包。
3、为Prometheus定义一个Deployment资源,用于部署多个Pod实例。
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus
spec:
replicas: 3
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
spec:
containers:
- name: prometheus
image: prom/prometheus
args: