# 8.3 prometheus 配置文件详解

本文按照官方文档的相关内容整理整理的配置语法以及实现功能

## 1.prometheus 配置文件主体

```yaml
# 此片段指定的是prometheus的全局配置， 比如采集间隔，抓取超时时间等.
global:
  # 抓取间隔
  [ scrape_interval: <duration> | default = 1m ]

  # 抓取超时时间
  [ scrape_timeout: <duration> | default = 10s ]

  # 评估规则间隔
  [ evaluation_interval: <duration> | default = 1m ]

  # 外部一些标签设置
  external_labels:
    [ <labelname>: <labelvalue> ... ]

  # File to which PromQL queries are logged.
  # Reloading the configuration will reopen the file.
  [ query_log_file: <string> ]

# 此片段指定报警规则文件， prometheus根据这些规则信息，会推送报警信息到alertmanager中。
rule_files:
  [ - <filepath_glob> ... ]

# 此片段指定抓取配置，prometheus的数据采集通过此片段配置。
scrape_configs:
  [ - <scrape_config> ... ]

# 此片段指定报警配置， 这里主要是指定prometheus将报警规则推送到指定的alertmanager实例地址。
alerting:
  alert_relabel_configs:
    [ - <relabel_config> ... ]
  alertmanagers:
    [ - <alertmanager_config> ... ]

# 指定后端的存储的写入api地址。
remote_write:
  [ - <remote_write> ... ]

# 指定后端的存储的读取api地址。
remote_read:
  [ - <remote_read> ... ]
```

## 2.scrape\_configs配置详解

一个scrape\_config 片段指定一组目标和参数， 目标就是实例，指定采集的端点， 参数描述如何采集这些实例， 配置文件格式如下

```yaml
# The job name assigned to scraped metrics by default.
job_name: <job_name>

# 抓取间隔,默认继承global值。
[ scrape_interval: <duration> | default = <global_config.scrape_interval> ]

# 抓取超时时间,默认继承global值。
[ scrape_timeout: <duration> | default = <global_config.scrape_timeout> ]

# 抓取路径， 默认是/metrics
[ metrics_path: <path> | default = /metrics ]

# honor_labels controls how Prometheus handles conflicts between labels that are
# already present in scraped data and labels that Prometheus would attach
# server-side ("job" and "instance" labels, manually configured target
# labels, and labels generated by service discovery implementations).
#
# If honor_labels is set to "true", label conflicts are resolved by keeping label
# values from the scraped data and ignoring the conflicting server-side labels.
#
# If honor_labels is set to "false", label conflicts are resolved by renaming
# conflicting labels in the scraped data to "exported_<original-label>" (for
# example "exported_instance", "exported_job") and then attaching server-side
# labels.
#
# Setting honor_labels to "true" is useful for use cases such as federation and
# scraping the Pushgateway, where all labels specified in the target should be
# preserved.
#
# Note that any globally configured "external_labels" are unaffected by this
# setting. In communication with external systems, they are always applied only
# when a time series does not have a given label yet and are ignored otherwise.
[ honor_labels: <boolean> | default = false ]

# honor_timestamps controls whether Prometheus respects the timestamps present
# in scraped data.
#
# If honor_timestamps is set to "true", the timestamps of the metrics exposed
# by the target will be used.
#
# If honor_timestamps is set to "false", the timestamps of the metrics exposed
# by the target will be ignored.
[ honor_timestamps: <boolean> | default = true ]

# 指定采集使用的协议，http或者https。
[ scheme: <scheme> | default = http ]

# 指定url参数。
params:
  [ <string>: [<string>, ...] ]

# 指定认证信息。
basic_auth:
  [ username: <string> ]
  [ password: <secret> ]
  [ password_file: <string> ]

# 指定token的数值， 用户get metrics认证使用
[ bearer_token: <secret> ]

# 指定获取token的文件， 用户get metrics认证使用
[ bearer_token_file: /path/to/bearer/token/file ]

# 指定获取metrics时需要的tls证书
tls_config:
  [ <tls_config> ]

# Optional proxy URL.
[ proxy_url: <string> ]

# List of Azure service discovery configurations.
azure_sd_configs:
  [ - <azure_sd_config> ... ]

# List of Consul service discovery configurations.
consul_sd_configs:
  [ - <consul_sd_config> ... ]

# List of DNS service discovery configurations.
dns_sd_configs:
  [ - <dns_sd_config> ... ]

# List of EC2 service discovery configurations.
ec2_sd_configs:
  [ - <ec2_sd_config> ... ]

# List of OpenStack service discovery configurations.
openstack_sd_configs:
  [ - <openstack_sd_config> ... ]

# List of file service discovery configurations.
file_sd_configs:
  [ - <file_sd_config> ... ]

# List of GCE service discovery configurations.
gce_sd_configs:
  [ - <gce_sd_config> ... ]

# List of Kubernetes service discovery configurations.
kubernetes_sd_configs:
  [ - <kubernetes_sd_config> ... ]

# List of Marathon service discovery configurations.
marathon_sd_configs:
  [ - <marathon_sd_config> ... ]

# List of AirBnB's Nerve service discovery configurations.
nerve_sd_configs:
  [ - <nerve_sd_config> ... ]

# List of Zookeeper Serverset service discovery configurations.
serverset_sd_configs:
  [ - <serverset_sd_config> ... ]

# List of Triton service discovery configurations.
triton_sd_configs:
  [ - <triton_sd_config> ... ]

# 静态指定服务job
static_configs:
  [ - <static_config> ... ]

# 控制采集哪些数据标签，可以删除不必要的标签
relabel_configs:
  [ - <relabel_config> ... ]

# 添加、编辑或修改指标的标签值或标签格式。
metric_relabel_configs:
  [ - <relabel_config> ... ]

# Per-scrape limit on number of scraped samples that will be accepted.
# If more than this number of samples are present after metric relabelling
# the entire scrape will be treated as failed. 0 means no limit.
[ sample_limit: <int> | default = 0 ]
```

因为部署在kubernetes环境中所以我只在意基于kubernetes\_sd\_configs的服务发现和static\_configs静态文件的发现

### 2.1 relabel\_configs

relable\_configss是功能强大的工具，就是Relabel可以在Prometheus采集数据之前，通过Target实例的Metadata信息，动态重新写入Label的值。除此之外，我们还能根据Target实例的Metadata信息选择是否采集或者忽略该Target实例。

relabel\_configs

配置格式如下：

```yaml
# The source labels select values from existing labels. Their content is concatenated
# using the configured separator and matched against the configured regular expression
# for the replace, keep, and drop actions.
[ source_labels: '[' <labelname> [, ...] ']' ]

# 默认分隔符
[ separator: <string> | default = ; ]

# Label to which the resulting value is written in a replace action.
# It is mandatory for replace actions. Regex capture groups are available.
[ target_label: <labelname> ]

# Regular expression against which the extracted value is matched.
[ regex: <regex> | default = (.*) ]

# Modulus to take of the hash of the source label values.
[ modulus: <uint64> ]

# Replacement value against which a regex replace is performed if the
# regular expression matches. Regex capture groups are available.
[ replacement: <string> | default = $1 ]

# Action to perform based on regex matching.
[ action: <relabel_action> | default = replace ]
```

其中action主要包括:

* [replace](https://book.gxd88.cn/kubernetes-monitor/pages/-M8_4BmBa3vqSYrGBHNs#2.1.1%20replace用法) 默认，
* [keep](https://book.gxd88.cn/kubernetes-monitor/pages/-M8_4BmBa3vqSYrGBHNs#2.1.2%20keep用法)&#x20;
* [drop](https://book.gxd88.cn/kubernetes-monitor/pages/-M8_4BmBa3vqSYrGBHNs#2.1.3%20drop用法)  &#x20;
* [hashmod](https://book.gxd88.cn/kubernetes-monitor/pages/-M8_4BmBa3vqSYrGBHNs#2.1.5%20hashmod用法)
* [labelmap](https://book.gxd88.cn/kubernetes-monitor/pages/-M8_4BmBa3vqSYrGBHNs#2.1.4%20labelmap用法)&#x20;
* [labeldrop](https://book.gxd88.cn/kubernetes-monitor/pages/-M8_4BmBa3vqSYrGBHNs#2.1.6%20labeldrop用法)&#x20;
* [labelkeep](https://book.gxd88.cn/kubernetes-monitor/pages/-M8_4BmBa3vqSYrGBHNs#2.1.7%20labelkeep用法)&#x20;

replace：默认，通过regex匹配source\_label的值，使用replacement来引用表达式匹配的分组 keep：删除regex与连接不匹配的目标 source\_labels drop：删除regex与连接匹配的目标 source\_labels labeldrop：删除regex匹配的标签 labelkeep：删除regex不匹配的标签 hashmod：设置target\_label为modulus连接的哈希值source\_labels labelmap：匹配regex所有标签名称。然后复制匹配标签的值进行分组，replacement分组引用（${1},${2},…）替代

*prometheus中的数值都是key:value格式， 其中replace、keep、drop都是对value的操作， labelmap、labeldrop、labelkeep都是对key的操作*

#### 2.1.1 replace用法

replace是action的默认值， 通过regex匹配source\_label的值，使用replacement来引用表达式匹配的分组

```yaml
  - action: replace
    regex: ([^:]+)(?::\d+)?;(\d+)
    replacement: $1:$2
    source_labels:
    - __address__
    - __meta_kubernetes_service_annotation_prometheus_io_port
    target_label: __address__
```

上面的列子中**address**的值为`$1:$2`， 其中 `$1` 是正则表达式`([^:]+)(?::\d+)?`从**address**中获取， `$2`是正则表达式`(\d+)从(\d+)`中获取， 最后的**address**的数值为192.168.1.1:9100

#### 2.1.2 keep用法

```yaml
relabel_configs:
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
    action: keep
    regex: true
```

上面的例子只要匹配\_\_meta\_kubernetes\_service\_annotation\_prometheus\_io\_probe=true数据就保留， 反正source\_labels中的值没有匹配regex中的值就丢弃

#### 2.1.3 drop用法

drop 的使用和keep刚好相反， 还是使用keep的例子:

```yaml
relabel_configs:
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
    action: keep
    regex: true
```

上面的例子只要**meta\_kubernetes\_service\_annotation\_prometheus\_io\_probe这个标签的值为true就丢弃， 反之如果**meta\_kubernetes\_service\_annotation\_prometheus\_io\_probe!=true的数据就保留

#### 2.1.4 labelmap用法

labelmap的用法和上面说到replace、keep、drop不同， *labelmap匹配的是标签名称， 而replace、keep、drop匹配的是value*

```yaml
relabel_configs:
  - action: labelmap
    regex: __meta_kubernetes_service_label_(.+)
```

上面例子中只要匹配到正则表达式`__meta_kubernetes_service_label_(.+)`的标签， 就将标签重写为`(.+)`中的内容， 效果如下：

```
原标签： __meta_kubernetes_service_label_test=111 
重写后： test=111
```

#### 2.1.5 hashmod用法

待续

#### 2.1.6 labeldrop用法

使用labeldrop则可以对Target标签进行过滤，删除符合过滤条件的标签，例如：

```
relabel_configs:
  - action: labeldrop
    regex: __meta_kubernetes_service_label_(.+)
```

该配置会使用regex匹配当前target中的所有标签， 删除符合规则的标签， 反之保留不符合规则的

#### 2.1.7 labelkeep用法

使用labelkeep则可以对Target标签进行过滤，仅保留符合过滤条件的标签，例如：

```
relabel_configs:
  - action: labelkeep
    regex: __meta_kubernetes_service_label_(.+)
```

该配置会使用regex匹配当前target中的所有标签， 保留符合规则的标签， 反之不符合的移除

### 2.2 metric\_relabel\_configs

上面我们说到relabel\_config是获取metrics之前对标签的重写， 对应的metric\_relabel\_configs是对获取metrics之后对标签的操作， metric\_relabel\_configs能够确定我们保存哪些指标，删除哪些指标，以及这些指标将是什么样子。

metric\_relabel\_configs的配置和relabel\_config的配置基本相同， 如果需要配置相关参数请参考[2.scrape\_configs](https://book.gxd88.cn/kubernetes-monitor/pages/-M8_4BmBa3vqSYrGBHNs#2.scrape_configs)

### 2.2 static\_configs

主要用途为指定exporter获取metrics数据的目标， 可以指定prometheus、 mysql、 nginx等目标

```yaml
    scrape_configs:
    - job_name: prometheus
      static_configs:
      - targets:
        - localhost:9090
```

此规则主要是用于抓取prometheus自己数据的配置， targets列表中的为prometheus 获取metrics的地址和端口， 因为没有指定metrics\_path所以使用默认的/metrics中获取数据，

*简单理解就是， prometheus访问* <http://localhost:9090/metrics> *获取监控数据*

还可以配置指定exporter中的目的地址， 如获取node\_exporter的数据

```
scrape_configs: 
- job_name: node 
  static_configs: 
  - targets: 
    - 10.40.58.153:9100 
    - 10.40.61.116:9100 
    - 10.40.58.154:9100
```

*简单理解为分别访问* <http://10.40.58.153:9100/metrics> <http://10.40.58.154:9100/metrics> <http://10.40.61.116:9100/metrics> *获取metrics数据*

### 2.3 kubernetes\_sd\_configs

kubernetes的服务发现可以刮取以下几种数据

* node
* service&#x20;
* pod
* endpoints
* ingress

通过指定kubernetes\_sd\_config的模式为endpoints，Prometheus会自动从Kubernetes中发现到所有的endpoints节点并作为当前Job监控的Target实例。如下所示，

```yaml
kubernetes_sd_configs:   
- role: endpoints
```

#### 配置实例一

*该配置是使用kubernetes的发现机制发现kube-apiservers*

```yaml
scrape_configs:
- bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  job_name: kubernetes-apiservers 
  kubernetes_sd_configs:   
  - role: endpoints
  relabel_configs:
  - action: keep
    regex: default;kubernetes;https
    source_labels:
    - __meta_kubernetes_namespace
    - __meta_kubernetes_service_name
    - __meta_kubernetes_endpoint_port_name
  scheme: https
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    insecure_skip_verify: true
```

上面的刮取配置定义了如下信息：

* job名称为kubernetes-apiservers(job-name: kubernetes-apiservers)
* 获取kubernetes中endpoints的相关信息(role: endpoints)
* 使用https的方式获取信息(scheme: https)
* target的需要满足default名称空间下service名字为kubernetes，并且端口为https
  * \_\_meta\_kubernetes\_namespace=～default
  * \_\_meta\_kubernetes\_service\_name=～kubernetes
  * \_\_meta\_kubernetes\_endpoint\_port\_name=～=https

#### 配置实例二

*该配置是自动发现kubernetes中的endpoints*

```yaml
- job_name: 'kubernetes-service-endpoints'

kubernetes_sd_configs:
  - role: endpoints

relabel_configs:
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
    action: keep
    regex: true

  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
    action: replace
    target_label: __scheme__
    regex: (https?)
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
    action: replace
    target_label: __metrics_path__
    regex: (.+)
  - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
    action: replace
    target_label: __address__
    regex: ([^:]+)(?::\d+)?;(\d+)
    replacement: $1:$2

  - action: labelmap
    regex: __meta_kubernetes_service_label_(.+)

  - source_labels: [__meta_kubernetes_namespace]
    action: replace
    target_label: kubernetes_namespace
  - source_labels: [__meta_kubernetes_service_name]
    action: replace
    target_label: kubernetes_name
  - source_labels: [__meta_kubernetes_pod_node_name]
    action: replace
    target_label: kubernetes_node
```

可以看到relable\_configs中的规则很多， 具体的内容如下

* job名称为kubernetes-service-endpoints(job-name: kubernetes-service-endpoints)
* 获取kubernetes中endpoints的相关信息(role: endpoints)
* 使用http的方式获取信息(没有配置使用默认配置http)
* relabel配置部分：
  * annotations中必须存在`prometheus.io/scrape: "true"`配置才会被promethues发现
  * `__scheme__`的值为\_\_meta\_kubernetes\_service\_annotation\_prometheus\_io\_scheme的value， 需要满足正则表达式`(https?)`
  * `__metrics_path__`的值为\_\_meta\_kubernetes\_service\_annotation\_prometheus\_io\_path的value， 满足正则表达式`(.+)`
  * `__address__`的value替换为IP:port的方式
  * kubernetes\_namespace的value replace为\_\_meta\_kubernetes\_namespace的value
  * kubernetes\_name的value replace为\_\_meta\_kubernetes\_service\_name的value
  * kubernetes\_node的value replace为\_\_meta\_kubernetes\_pod\_node\_name的value

获取的metrics的信息如下：

```
up{app="prometheus",app_kubernetes_io_managed_by="Helm",chart="prometheus-11.3.0",component="node-exporter",heritage="Helm",instance="10.40.61.116:9100",job="kubernetes-service-endpoints",kubernetes_name="prometheus-node-exporter",kubernetes_namespace="devops",kubernetes_node="py-modelo2o08cn-p005.pek3.example.com",release="prometheus"}
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://book.gxd88.cn/kubernetes-monitor/prometheus-config.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
