高级配置¶

前面我们一起学习了如何在 Prometheus Operator 下面自定义一个监控项，以及自定义报警规则的使用。那么我们还能够直接使用前面课程中的自动发现功能吗？如果在我们的 Kubernetes 集群中有了很多的 Service/Pod，那么我们都需要一个一个的去建立一个对应的 ServiceMonitor 或 PodMonitor 对象来进行监控吗？这样岂不是又变得麻烦起来了？

自动发现配置¶

为解决上面的问题，Prometheus Operator 为我们提供了一个额外的抓取配置来解决这个问题，我们可以通过添加额外的配置来进行服务发现进行自动监控。和前面自定义的方式一样，我们可以在 Prometheus Operator 当中去自动发现并监控具有 prometheus.io/scrape=true 这个 annotations 的 Service，之前我们定义的 Prometheus 的配置如下：

- job_name: "endpoints"
  kubernetes_sd_configs:
    - role: endpoints
  relabel_configs: # 指标采集之前或采集过程中去重新配置
    - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
      action: keep # 保留具有 prometheus.io/scrape=true 这个注解的Service
      regex: true
    - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
      action: replace
      target_label: __metrics_path__
      regex: (.+)
    - source_labels:
        [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
      action: replace
      target_label: __address__
      regex: ([^:]+)(?::\d+)?;(\d+) # RE2 正则规则，+是一次多多次，?是0次或1次，其中?:表示非匹配组(意思就是不获取匹配结果)
      replacement: $1:$2
    - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
      action: replace
      target_label: __scheme__
      regex: (https?)
    - action: labelmap
      regex: __meta_kubernetes_service_label_(.+)
      replacement: $1
    - source_labels: [__meta_kubernetes_namespace]
      action: replace
      target_label: kubernetes_namespace
    - source_labels: [__meta_kubernetes_service_name]
      action: replace
      target_label: kubernetes_service
    - source_labels: [__meta_kubernetes_pod_name]
      action: replace
      target_label: kubernetes_pod
    - source_labels: [__meta_kubernetes_node_name]
      action: replace
      target_label: kubernetes_node

如果你对上面这个配置还不是很熟悉的话，建议去查看下前面关于 Kubernetes 常用资源对象监控章节的介绍，要想自动发现集群中的 Service，就需要我们在 Service 的 annotation 区域添加 prometheus.io/scrape=true 的声明，将上面文件直接保存为 prometheus-additional.yaml，然后通过这个文件创建一个对应的 Secret 对象：

$ kubectl create secret generic additional-configs --from-file=prometheus-additional.yaml -n monitoring
secret "additional-configs" created

然后我们需要在声明 prometheus 的资源对象文件中通过 additionalScrapeConfigs 属性添加上这个额外的配置：

# prometheus-prometheus.yaml
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  labels:
    app.kubernetes.io/component: prometheus
    app.kubernetes.io/instance: k8s
    app.kubernetes.io/name: prometheus
    app.kubernetes.io/part-of: kube-prometheus
    app.kubernetes.io/version: 2.35.0
  name: k8s
  namespace: monitoring
spec:
  alerting:
    alertmanagers:
      - apiVersion: v2
        name: alertmanager-main
        namespace: monitoring
        port: web
  enableFeatures: []
  externalLabels: {}
  image: quay.io/prometheus/prometheus:v2.35.0
  nodeSelector:
    kubernetes.io/os: linux
  podMetadata:
    labels:
      app.kubernetes.io/component: prometheus
      app.kubernetes.io/instance: k8s
      app.kubernetes.io/name: prometheus
      app.kubernetes.io/part-of: kube-prometheus
      app.kubernetes.io/version: 2.35.0
  podMonitorNamespaceSelector: {}
  podMonitorSelector: {}
  probeNamespaceSelector: {}
  probeSelector: {}
  replicas: 2
  resources:
    requests:
      memory: 400Mi
  ruleNamespaceSelector: {}
  ruleSelector: {}
  securityContext:
    fsGroup: 2000
    runAsNonRoot: true
    runAsUser: 1000
  serviceAccountName: prometheus-k8s
  serviceMonitorNamespaceSelector: {}
  serviceMonitorSelector: {}
  version: 2.35.0
  additionalScrapeConfigs:
    name: additional-configs
    key: prometheus-additional.yaml

关于 additionalScrapeConfigs 属性的具体介绍，我们可以使用 kubectl explain 命令来了解详细信息：

$ kubectl explain prometheus.spec.additionalScrapeConfigs
KIND:     Prometheus
VERSION:  monitoring.coreos.com/v1

RESOURCE: additionalScrapeConfigs <Object>

DESCRIPTION:
     AdditionalScrapeConfigs allows specifying a key of a Secret containing
     additional Prometheus scrape configurations. Scrape configurations
     specified are appended to the configurations generated by the Prometheus
     Operator. Job configurations specified must have the form as specified in
     the official Prometheus documentation:
     https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config.
     As scrape configs are appended, the user is responsible to make sure it is
     valid. Note that using this feature may expose the possibility to break
     upgrades of Prometheus. It is advised to review Prometheus release notes to
     ensure that no incompatible scrape configs are going to break Prometheus
     after the upgrade.

FIELDS:
   key  <string> -required-
     The key of the secret to select from. Must be a valid secret key.

   name <string>
     Name of the referent. More info:
     https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
     TODO: Add other useful fields. apiVersion, kind, uid?

   optional     <boolean>
     Specify whether the Secret or its key must be defined

添加完成后，直接更新 prometheus 这个 CRD 资源对象即可：

$ kubectl apply -f prometheus-prometheus.yaml
prometheus.monitoring.coreos.com "k8s" configured

隔一小会儿，可以前往 Prometheus 的 Dashboard 中查看配置已经生效了：

Prometheus 配置

但是我们切换到 targets 页面下面却并没有发现对应的监控任务，查看 Prometheus 的 Pod 日志：

$ kubectl logs -f prometheus-k8s-0 prometheus -n monitoring
......
ts=2022-05-26T09:34:30.845Z caller=klog.go:108 level=warn component=k8s_client_runtime func=Warningf msg="pkg/mod/k8s.io/client-go@v0.23.5/tools/cache/reflector.go:167: failed to list *v1.Service: services is forbidden: User \"system:serviceaccount:monitoring:prometheus-k8s\" cannot list resource \"services\" in API group \"\" at the cluster scope"
ts=2022-05-26T09:34:30.845Z caller=klog.go:116 level=error component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/client-go@v0.23.5/tools/cache/reflector.go:167: Failed to watch *v1.Service: failed to list *v1.Service: services is forbidden: User \"system:serviceaccount:monitoring:prometheus-k8s\" cannot list resource \"services\" in API group \"\" at the cluster scope"
ts=2022-05-26T09:34:40.552Z caller=klog.go:108 level=warn component=k8s_client_runtime func=Warningf msg="pkg/mod/k8s.io/client-go@v0.23.5/tools/cache/reflector.go:167: failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:monitoring:prometheus-k8s\" cannot list resource \"pods\" in API group \"\" at the cluster scope"
ts=2022-05-26T09:34:40.552Z caller=klog.go:116 level=error component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/client-go@v0.23.5/tools/cache/reflector.go:167: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:monitoring:prometheus-k8s\" cannot list resource \"pods\" in API group \"\" at the cluster scope"

可以看到有很多错误日志出现，都是 xxx is forbidden，这说明是 RBAC 权限的问题，通过 prometheus 资源对象的配置可以知道 Prometheus 绑定了一个名为 prometheus-k8s 的 ServiceAccount 对象，而这个对象绑定的是一个名为 prometheus-k8s 的 ClusterRole：

# prometheus-clusterRole.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    app.kubernetes.io/component: prometheus
    app.kubernetes.io/instance: k8s
    app.kubernetes.io/name: prometheus
    app.kubernetes.io/part-of: kube-prometheus
    app.kubernetes.io/version: 2.35.0
  name: prometheus-k8s
rules:
  - apiGroups:
      - ""
    resources:
      - nodes/metrics
    verbs:
      - get
  - nonResourceURLs:
      - /metrics
    verbs:
      - get

上面的权限规则中我们可以看到明显没有对 Service 或者 Pod 的 list 权限，所以报错了，要解决这个问题，我们只需要添加上需要的权限即可：

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    app.kubernetes.io/component: prometheus
    app.kubernetes.io/instance: k8s
    app.kubernetes.io/name: prometheus
    app.kubernetes.io/part-of: kube-prometheus
    app.kubernetes.io/version: 2.35.0
  name: prometheus-k8s
rules:
  - apiGroups:
      - ""
    resources:
      - nodes
      - services
      - endpoints
      - pods
      - nodes/proxy
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - ""
    resources:
      - configmaps
      - nodes/metrics
    verbs:
      - get
  - nonResourceURLs:
      - /metrics
    verbs:
      - get

更新上面的 ClusterRole 这个资源对象，然后重建下 Prometheus 的所有 Pod，正常就可以看到 targets 页面下面有 endpoints 这个监控任务了：

endpoints targets

这里发现的几个抓取目标是因为 Service 中都有 prometheus.io/scrape=true 这个 annotation。

scrape service

数据持久化¶

上面我们在修改完权限的时候，重启了 Prometheus 的 Pod，如果我们仔细观察的话会发现我们之前采集的数据已经没有了，这是因为我们通过 prometheus 这个 CRD 创建的 Prometheus 并没有做数据的持久化，我们可以直接查看生成的 Prometheus Pod 的挂载情况就清楚了：

$ kubectl get pod prometheus-k8s-0 -n monitoring -o yaml
......
    volumeMounts:
    - mountPath: /prometheus
      name: prometheus-k8s-db
......
  volumes:
......
  - emptyDir: {}
    name: prometheus-k8s-db
......

我们可以看到 Prometheus 的数据目录 /prometheus 实际上是通过 emptyDir 进行挂载的，我们知道 emptyDir 挂载的数据的生命周期和 Pod 生命周期一致的，所以如果 Pod 挂掉了，数据也就丢失了，这也就是为什么我们重建 Pod 后之前的数据就没有了的原因，对应线上的监控数据肯定需要做数据的持久化的，同样的 prometheus 这个 CRD 资源也为我们提供了数据持久化的配置方法，由于我们的 Prometheus 最终是通过 Statefulset 控制器进行部署的，所以我们这里通过 StorageClass 来做数据持久化，此外由于 Prometheus 本身对 NFS 存储没有做相关的支持，所以线上一定不要用 NFS 来做数据持久化，对于如何去为 prometheus 这个 CRD 对象配置存储数据，我们可以去查看官方文档 API，也可以用 kubectl explain 命令去了解：

$ kubectl explain prometheus.spec.storage
KIND:     Prometheus
VERSION:  monitoring.coreos.com/v1

RESOURCE: storage <Object>

DESCRIPTION:
     Storage spec to specify how storage shall be used.

FIELDS:
   disableMountSubPath  <boolean>
     Deprecated: subPath usage will be disabled by default in a future release,
     this option will become unnecessary. DisableMountSubPath allows to remove
     any subPath usage in volume mounts.

   emptyDir     <Object>
     EmptyDirVolumeSource to be used by the Prometheus StatefulSets. If
     specified, used in place of any volumeClaimTemplate. More info:
     https://kubernetes.io/docs/concepts/storage/volumes/#emptydir

   ephemeral    <Object>
     EphemeralVolumeSource to be used by the Prometheus StatefulSets. This is a
     beta field in k8s 1.21, for lower versions, starting with k8s 1.19, it
     requires enabling the GenericEphemeralVolume feature gate. More info:
     https://kubernetes.io/docs/concepts/storage/ephemeral-volumes/#generic-ephemeral-volumes

   volumeClaimTemplate  <Object>
     A PVC spec to be used by the Prometheus StatefulSets.

所以我们在 prometheus 的 CRD 对象中通过 storage 属性配置 volumeClaimTemplate 对象即可：

# prometheus-prometheus.yaml
......
storage:
  volumeClaimTemplate:
    spec:
      storageClassName: local-path
      resources:
        requests:
          storage: 20Gi

然后更新 prometheus 这个 CRD 资源，更新完成后会自动生成两个 PVC 和 PV 资源对象：

$ kubectl apply -f prometheus-prometheus.yaml
prometheus.monitoring.coreos.com/k8s configured
$ kubectl get pvc -n monitoring
NAME                                 STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
prometheus-k8s-db-prometheus-k8s-0   Bound    pvc-4c67d0a9-e97c-4820-9c66-340a7da3c53f   20Gi       RWO            local-path     4s
prometheus-k8s-db-prometheus-k8s-1   Bound    pvc-6f26e85c-01c6-483c-9d42-55166f77e5d0   20Gi       RWO            local-path     4s
$ kubectl get pv |grep monitoring
pvc-4c67d0a9-e97c-4820-9c66-340a7da3c53f   20Gi       RWO            Delete           Bound    monitoring/prometheus-k8s-db-prometheus-k8s-0                       local-path               17s
pvc-6f26e85c-01c6-483c-9d42-55166f77e5d0   20Gi       RWO            Delete           Bound    monitoring/prometheus-k8s-db-prometheus-k8s-1                       local-path               17s

现在我们再去看 Prometheus Pod 的数据目录就可以看到是关联到一个 PVC 对象上了：

$ kubectl get pod prometheus-k8s-0 -n monitoring -o yaml
......
    volumeMounts:
    - mountPath: /prometheus
      name: prometheus-k8s-db
......
  volumes:
  - name: prometheus-k8s-db
    persistentVolumeClaim:
      claimName: prometheus-k8s-db-prometheus-k8s-0
......

现在即使我们的 Pod 挂掉了，数据也不会丢失了。到这里 Prometheus Operator 的一些基本配置就算完成了，对于大型的监控集群还需要做一些其他配置，比如前面我们学习的使用 Thanos 和 VictorialMetrics 来做 Prometheus 集群的高可用以及数据远程存储，对于 Prometheus Operator 来说，要配置 Thanos 也比较简单，因为 prometheus 这个 CRD 对象本身也支持的。

关于 prometheus operator 中如何配置 thanos，可以查看官方文档的介绍：https://github.com/coreos/prometheus-operator/blob/master/Documentation/thanos.md。

但是 Prometheus Operator 没有提供对 VictorialMetrics 的支持，不过 VM Operator 可以识别 Prometheus Operator 的 ServiceMonitor、PodMonitor、PrometheusRule 和 Probe 对象，如果我们使用的是 Prometheus Operator，然后想使用 VM 来做监控数据的远程存储的话，那我们只有通过去配置 Prometheus 的 remote-write 了，同样 prometheus 这个 crd 对象中也支持配置远程存储。

$ kubectl explain prometheus.spec.remoteWrite
KIND:     Prometheus
VERSION:  monitoring.coreos.com/v1

RESOURCE: remoteWrite <[]Object>

DESCRIPTION:
     remoteWrite is the list of remote write configurations.

     RemoteWriteSpec defines the configuration to write samples from Prometheus
     to a remote endpoint.

FIELDS:
   authorization        <Object>
     Authorization section for remote write

   basicAuth    <Object>
     BasicAuth for the URL.

   bearerToken  <string>
     Bearer token for remote write.

   bearerTokenFile      <string>
     File to read bearer token for remote write.

   headers      <map[string]string>
     Custom HTTP headers to be sent along with each remote write request. Be
     aware that headers that are set by Prometheus itself can't be overwritten.
     Only valid in Prometheus versions 2.25.0 and newer.

   metadataConfig       <Object>
     MetadataConfig configures the sending of series metadata to the remote
     storage.

   name <string>
     The name of the remote write queue, it must be unique if specified. The
     name is used in metrics and logging in order to differentiate queues. Only
     valid in Prometheus versions 2.15.0 and newer.

   oauth2       <Object>
     OAuth2 for the URL. Only valid in Prometheus versions 2.27.0 and newer.

   proxyUrl     <string>
     Optional ProxyURL.

   queueConfig  <Object>
     QueueConfig allows tuning of the remote write queue parameters.

   remoteTimeout        <string>
     Timeout for requests to the remote write endpoint.

   sendExemplars        <boolean>
     Enables sending of exemplars over remote write. Note that exemplar-storage
     itself must be enabled using the enableFeature option for exemplars to be
     scraped in the first place. Only valid in Prometheus versions 2.27.0 and
     newer.

   sigv4        <Object>
     Sigv4 allows to configures AWS's Signature Verification 4

   tlsConfig    <Object>
     TLS Config to use for remote write.

   url  <string> -required-
     The URL of the endpoint to send samples to.

   writeRelabelConfigs  <[]Object>
     The list of remote write relabel configurations.

这里我们就以前面 VM Operator 章节的 VM 集群为例来作为 Prometheus 的远程存储，整个集群的状态如下所示：

$ kubectl get pods
NAME                                       READY   STATUS    RESTARTS      AGE
grafana-fc7c7d476-5wgdh                    1/1     Running   4 (30m ago)   8d
vmagent-vmagent-demo-6cb84cfc55-6p4sv      2/2     Running   4 (30m ago)   6d6h
vminsert-vmcluster-demo-6bbd664f6f-dn82f   1/1     Running   2 (30m ago)   6d5h
vminsert-vmcluster-demo-6bbd664f6f-xnx2c   1/1     Running   2 (30m ago)   6d5h
vmselect-vmcluster-demo-0                  1/1     Running   2 (30m ago)   6d5h
vmselect-vmcluster-demo-1                  1/1     Running   2 (30m ago)   6d5h
vmstorage-vmcluster-demo-0                 1/1     Running   2 (30m ago)   6d5h
vmstorage-vmcluster-demo-1                 1/1     Running   2 (30m ago)   6d5h
$ kubectl get svc
NAME                       TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
grafana                    ClusterIP   10.106.109.228   <none>        80/TCP                       8d
kubernetes                 ClusterIP   10.96.0.1        <none>        443/TCP                      61d
vmagent-vmagent-demo       ClusterIP   10.106.129.4     <none>        8429/TCP                     8d
vminsert-vmcluster-demo    ClusterIP   10.97.108.211    <none>        8480/TCP                     8d
vmselect-vmcluster-demo    ClusterIP   None             <none>        8481/TCP                     8d
vmstorage-vmcluster-demo   ClusterIP   None             <none>        8482/TCP,8400/TCP,8401/TCP   8d

我们只需要将 vminsert 组件地址作为 Prometheus 的远程存储地址写入即可，在 prometheus-prometheus.yaml 文件中添加如下所示配置：

# prometheus-prometheus.yaml
......
remoteWrite:
  - url: http://vminsert-vmcluster-demo.default:8480/insert/0/prometheus/

然后更新 prometheus 对象即可：

$ kubectl apply -f prometheus-prometheus.yaml

更新后 Prometheus 实例就会将数据远程写入到 VM 集群中去了，我们可以通过 vmselect 组件提供的 vmui 来验证数据是否接收到了：

node_load1

图上我们查询的 node_load1{prometheus="monitoring/k8s", instance="node2"} 可以看到有两条一样的序列，这是因为有两个 Prometheus 实例都在将数据远程写入到 VM 中去，要想去重可以在 vmselect 与 vmstorage 组件中配置 -dedup.minScrapeInterval 参数，多复制因子模式下默认配置了该参数的。

这是因为 VM 去重机制的问题，我们需要将 Prometheus 两个实例的共同的额外标签清理掉才可以，只需要设置 replicaExternalLabelName 属性为空即可：

remoteWrite:
  - url: http://vminsert-vmcluster-demo.default:8480/insert/0/prometheus/
replicaExternalLabelName: ""

更新后 Prometheus 全局配置中就会去掉默认的 prometheus_replica 标签了：

global:
  scrape_interval: 30s
  scrape_timeout: 10s
  evaluation_interval: 30s
  external_labels:
    prometheus: monitoring/k8s

这个时候再去 vmui 中查看数据就已经去重了，只保留了一份数据：

dedup

关于 Prometheus Operator 的其他高级用法可以参考官方文档 https://prometheus-operator.dev 了解更多信息。