您的位置:

k8s部署prometheus全面攻略

随着云原生技术的发展,容器编排管理工具Kubernetes(简称K8s)在实现自动化运维和应用部署上得到了广泛应用。而监控系统Prometheus则是必不可少的一部分,作为云原生下的监控解决方案,它既轻巧又功能丰富,能够为应用提供实时监控和故障告警。本文将从多个方面详细介绍如何在K8s上部署Prometheus。

一、k8s部署prometheus高可用

在生产环境中,高可用性是最为重要的一个考虑因素。K8s的弹性扩容和服务发现机制,使得我们可以很容易地实现Prometheus的高可用部署。下面是具体实现步骤:

1、首先,我们需要将Prometheus放在K8s集群中,如采用StatefulSet控制器部署。该控制器支持Pod名称的持久性和网络标识的固定,能够更好地保证Pod的唯一性和稳定性。

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: prometheus
spec:
  replicas: 3
  serviceName: prometheus
  selector:
    matchLabels:
      app: prometheus
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      containers:
      - name: prometheus
        image: prom/prometheus
        args:
        - --storage.tsdb.retention.time=3d
        - --web.enable-lifecycle
        ports:
        - containerPort: 9090
          name: web
        volumeMounts:
        - name: prometheus-data
          mountPath: /prometheus
      volumes:
      - name: prometheus-data
        emptyDir: {}

2、为Prometheus部署Service资源,确保集群中的所有节点都能发现到Prometheus服务并进行通信。

apiVersion: v1
kind: Service
metadata:
  name: prometheus
spec:
  selector:
    app: prometheus
  ports:
  - protocol: TCP
    port: 9090
    targetPort: web

3、在K8s集群中安装Etcd服务,用作K8s集群中各个节点的状态管理和同步。

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: etcd
spec:
  replicas: 3
  serviceName: etcd
  selector:
    matchLabels:
      app: etcd
  template:
    metadata:
      labels:
        app: etcd
    spec:
      containers:
      - name: etcd
        image: quay.io/coreos/etcd
        command:
        - /usr/local/bin/etcd
        - --data-dir=/etcd-data
        - --name=node1
        - --initial-advertise-peer-urls=http://node1.etcd:2380
        - --listen-peer-urls=http://0.0.0.0:2380
        - --advertise-client-urls=http://node1.etcd:2379
        - --listen-client-urls=http://0.0.0.0:2379
        - --initial-cluster=node1=http://node1.etcd:2380,node2=http://node2.etcd:2380,node3=http://node3.etcd:2380
        - --initial-cluster-token=etcd-cluster-1
        - --initial-cluster-state=new
        ports:
        - containerPort: 2380
          name: peer
        - containerPort: 2379
          name: client
        volumeMounts:
        - name: etcd-data
          mountPath: /etcd-data
      volumes:
      - name: etcd-data
        emptyDir: {}

4、将Prometheus的状态信息存储到Etcd的数据目录中。这样,当一个节点宕机后,Pod重新启动后可以读取到它前一个状态(原来的节点名称及数据目录),从而保证数据不丢失。

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-conf
data:
  prometheus.yml: |-
    global:
      scrape_interval: 15s
      evaluation_interval: 15s
    rule_files:
      - /etc/prometheus/rules
    scrape_configs:
      - job_name: 'prometheus'
        scrape_interval: 5s
        static_configs:
          - targets: ['localhost:9090']
      - job_name: 'node-exporter'
        scrape_interval: 5s
        static_configs:
          - targets: ['localhost:9100']
    alerting:
      alertmanagers:
        - static_configs:
            - targets: ['localhost:9093']
    remote_write:
      - url: http://node1.etcd:2379/metrics

5、将Prometheus配置文件中的一些关键参数修改,以在集群中寻找Prometheus的实例和服务,这里以修改`scrape_configs`和`remote_write`两项内容为例。

scrape_configs:
  - job_name: 'prometheus'
    scrape_interval: 5s
    static_configs:
      - targets: ['prometheus-0.prometheus.default.svc.cluster.local:9090', 'prometheus-1.prometheus.default.svc.cluster.local:9090', 'prometheus-2.prometheus.default.svc.cluster.local:9090']
  - job_name: 'node-exporter'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9100']
remote_write:
  - url: http://etcd:2379/metrics

在修改完成后,将Prometheus重新启动即可实现K8s集群中的高可用部署。

二、k8s部署prometheus+alertmanager

与K8s集群高可用部署相似,Prometheus结合Alertmanager平台也是方便快捷的,尤其在实现告警通知的过程中。下面是具体实现步骤:

1、首先,在K8s集群中部署Alertmanager。根据K8s文档中提供的alertmanager.yaml进行部署,其中涉及到的服务端口、访问服务的DNS名称等需要根据实际环境进行修改。

apiVersion: apps/v1
kind: Deployment
metadata:
  name: alertmanager
spec:
  replicas: 1
  selector:
    matchLabels:
      app: alertmanager
  template:
    metadata:
      labels:
        app: alertmanager
    spec:
      containers:
      - name: alertmanager
        image: prom/alertmanager
        args:
        - --config.file=/etc/alertmanager/config.yml
        ports:
        - containerPort: 9093
          name: web
        volumeMounts:
        - name: alertmanager-data
          mountPath: /alertmanager
      volumes:
      - name: alertmanager-data
        emptyDir: {}
      - name: alertmanager-conf
        configMap:
          name: alertmanager-conf
---
apiVersion: v1
kind: Service
metadata:
  name: alertmanager
spec:
  selector:
    app: alertmanager
  ports:
  - protocol: TCP
    port: 9093
    targetPort: web
  type: NodePort
  externalTrafficPolicy: Cluster
---
apiVersion: v1
kind: Service
metadata:
  name: alertmanager-operated
spec:
  selector:
    app: alertmanager
  ports:
  - protocol: TCP
    port: 9093
    targetPort: web
  type: NodePort
  externalTrafficPolicy: Cluster

2、在Prometheus中配置告警规则(rules),并将告警信息推送至Alertmanager。

rules:
  - alert: HighRequestLatency
    expr: job:request_latency_seconds:mean5m{job="myjob"} > 0.5
    for: 10m
    labels:
      severity: warning
    annotations:
      description: 'High request latency: {{ $value }}'
      summary: 'High latency for {{ $labels.instance }}'
    alerting:
      - alertmanagers:
        - static_configs:
            - targets:
              - "alertmanager-operated.default.svc.cluster.local"

3、在K8s集群中部署Prometheus。

apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      containers:
      - name: prometheus
        image: prom/prometheus
        args:
        - --config.file=/etc/prometheus/prometheus.yml
        - --web.enable-lifecycle
        ports:
        - containerPort: 9090
          name: web
        volumeMounts:
        - name: prometheus-data
          mountPath: /prometheus
      volumes:
      - name: prometheus-data
        emptyDir: {}
      - name: prometheus-conf
        configMap:
          name: prometheus-conf
---
apiVersion: v1
kind: Service
metadata:
  name: prometheus
spec:
  selector:
    app: prometheus
  ports:
  - protocol: TCP
    port: 9090
    targetPort: web
  type: NodePort
  externalTrafficPolicy: Cluster

在完成以上步骤后,就可以实现Prometheus和Alertmanager的联动,并在规定条件达到时实现告警功能。

三、k8s部署prometheus多副本

为了避免单点故障,我们可以部署多个Prometheus实例,实现保证性能和可用性的目的。

1、首先,在K8s集群中部署多个Prometheus实例。

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: prometheus
spec:
  replicas: 3
  selector:
    matchLabels:
      app: prometheus
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      containers:
      - name: prometheus
        image: prom/prometheus
        args:
        - --config.file=/etc/prometheus/prometheus.yml
        - --web.enable-lifecycle
        ports:
        - containerPort: 9090
          name: web
        volumeMounts:
        - name: prometheus-data
          mountPath: /prometheus
      volumes:
      - name: prometheus-data
        emptyDir: {}
      - name: prometheus-conf
        configMap:
          name: prometheus-conf

2、为每个Prometheus实例部署一个Service资源。

apiVersion: v1
kind: Service
metadata:
  name: prometheus
spec:
  selector:
    app: prometheus
  ports:
  - protocol: TCP
    port: 9090
    targetPort: web
  type: NodePort
  externalTrafficPolicy: Cluster

在完成以上步骤后,就可以轻松实现Prometheus的多副本部署。需要注意的是,在部署多副本时还需要修改Prometheus配置文件,确保每个实例都能够正确地在K8s集群中进行工作。

四、k8s部署prometheus的流程与方式

下面我们将一步一步带你了解如何在K8s集群中部署Prometheus。

1、安装K8s集群,确保集群中有足够的节点。如果您没有K8s集群,请参考官方文档进行安装。

2、将Prometheus的镜像文件上传至镜像仓库,或者手动将Prometheus的镜像文件保存到本地,并将其打成一个.tar包。

3、为Prometheus定义一个Deployment资源,用于部署多个Pod实例。

apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus
spec:
  replicas: 3
  selector:
    matchLabels:
      app: prometheus
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      containers:
      - name: prometheus
        image: prom/prometheus
        args: