prometheus监控之blackbox_exporter黑盒监测[icmp,tcp,http(get\post),dns,ssl证书过期时间]

使用场景

HTTP 测试
定义 Request Header 信息
判断 Http status / Http Respones Header / Http Body 内容
TCP 测试
业务组件端口状态监听
应用层协议定义与监听
ICMP 测试
主机探活机制
POST 测试
接口联通性
SSL 证书过期时间

1 安装blackbox_exporter

 cd /usr/local/
 wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.21.1/blackbox_exporter-0.21.1.linux-amd64.tar.gz
 tar -xf blackbox_exporter-0.21.1.linux-amd64.tar.gz 
 mv blackbox_exporter-0.21.1.linux-amd64 blackbox_exporter
 cd blackbox_exporter/

2 配置blackbox.yml

modules:
  http_2xx:  # http 检测模块  Blockbox-Exporter 中所有的探针均是以 Module 的信息进行配置
    prober: http
    timeout: 30s
    http:
      valid_http_versions: ["HTTP/1.1", "HTTP/2"]   
      valid_status_codes: [200,301,302,303]  # 这里最好作一个返回状态码,在grafana作图时,有明示
      method: GET
      preferred_ip_protocol: "ip4"
  http_post_2xx:     # http post 监测模块
    prober: http
    timeout: 10s
    http:
      valid_http_versions: ["HTTP/1.1", "HTTP/2"]
      method: POST
      preferred_ip_protocol: "ip4"
  tcp_connect:   # TCP 检测模块
    prober: tcp
    timeout: 10s
  icmp:
     prober: icmp  
     timeout: 10s     

3 启动blackbox

 vim /usr/lib/systemd/system/blackbox_exporter.service
 [Unit]
Description=blackbox_exporter v0.18.0 for sccin production envirenment.
ConditionFileIsExecutable=/usr/local/blackbox_exporter/blackbox_exporter
Requires=network-online.target
After=network-online.target

[Service]
Type=simple
User=root
Group=root
WorkingDirectory=/usr/local/blackbox_exporter/
ExecStart=/usr/local/blackbox_exporter/blackbox_exporter --config.file /usr/local/blackbox_exporter/blackbox.yml
PrivateTmp=true
StartLimitInterval=0
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target

 systemctl daemon-reload 
 systemctl restart blackbox_exporter
 systemctl status blackbox_exporter

4 HTTP监控

 - job_name: '域名监测'  # blackbox_export module
    scrape_interval: 30s
    metrics_path: /probe
    params:
      module: [http_2xx]
    static_configs:
      - targets:
        - https://devopstack.cn
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 127.0.0.1:9115

采集数据如下

    # DNS解析时间,单位 s
    probe_dns_lookup_time_seconds 0.000199105
    # 探测从开始到结束的时间,单位 s,请求这个页面响应时间
    probe_duration_seconds 0.010889113
    # HELP probe_failed_due_to_regex Indicates if probe failed due to regex
    # TYPE probe_failed_due_to_regex gauge
    probe_failed_due_to_regex 0
    # HTTP 内容响应的长度
    probe_http_content_length -1
    # 按照阶段统计每阶段的时间
    probe_http_duration_seconds{phase="connect"} 0.001083728    #连接时间
    probe_http_duration_seconds{phase="processing"} 0.008365885 #处理请求的时间
    probe_http_duration_seconds{phase="resolve"} 0.000199105    #响应时间
    probe_http_duration_seconds{phase="tls"} 0                  #校验证书的时间
    probe_http_duration_seconds{phase="transfer"} 0.000446424   #传输时间
    # 重定向的次数
    probe_http_redirects 0
    # ssl 指示是否将 SSL 用于最终重定向
    probe_http_ssl 0
    # 返回的状态码
    probe_http_status_code 200
    # 未压缩的响应主体长度
    probe_http_uncompressed_body_length 1766
    # http 协议的版本
    probe_http_version 1.1
    # HELP probe_ip_addr_hash Specifies the hash of IP address. It's useful to detect if the IP address changes.
    probe_ip_addr_hash 3.24030434e+09
    # 使用的 ip 协议的版本号
    probe_ip_protocol 4
    # 是否探测成功
    probe_success 1

5 TCP监控

  - job_name: "端口存活检测"
    metrics_path: /probe
    params:
      module: [tcp_connect]
    static_configs:
    - targets:
      - 192.168.33.11:80
      - 192.168.33.11:10050
    relabel_configs:
    - source_labels: [__address__]
      target_label: __param_target
    - source_labels: [__param_target]
      target_label: instance
    - target_label: __address__
      replacement: 127.0.0.1:9115

6 ICMP监控

  - job_name: "主机探活"
    metrics_path: /probe
    params:
      modelus: [icmp]
    static_configs:
    - targets:
      - 192.168.33.11
    relabel_configs:
    - source_labels: [__address__]
      target_label: __param_target
    - source_labels: [__param_target]
      target_label: instance
    - target_label: __address__
      replacement: 127.0.0.1:9115

7 Grafana面板

导入dashboard 
13659 HTTP状态监控
9965 SSL TCP HTTP综合监控图标
13230 SSL证书监控
选择prometheus数据源

8 alertmanger告警

ssl证书过期警告
groups:
- name: check_ssl_status
  rules:
  - alert: "ssl证书过期警告"
    expr: (probe_ssl_earliest_cert_expiry - time())/86400 <30
    for: 1h
    labels:
      severity: warn
      status: 非常严重
    annotations:
      description: '域名{{labels.instance}}的证书还有{{ printf "%.1f"value }}天就过期了,请尽快更新证书'
      summary: "ssl证书过期警告"
  • 我的微信
  • 这是我的微信扫一扫
  • weinxin
  • 我的微信公众号
  • 我的微信公众号扫一扫
  • weinxin
avatar

发表评论

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: