====== Сервис Prometheus ======

  * [[https://habr.com/ru/company/selectel/blog/275803/|Мониторинг сервисов с Prometheus]]
  * [[https://habr.com/ru/company/southbridge/blog/455290/|Полное руководство по Prometheus в 2019 году]]
  * [[https://www.shellhacks.com/ru/prometheus-delete-time-series-metrics/|Prometheus: Удаление Метрик Временных Рядов]]
  * [[https://habr.com/ru/companies/tochka/articles/683608/|Человеческим языком про метрики]]
  * [[https://selectel.ru/blog/tutorials/monitoring-in-k8s-with-prometheus/|Мониторинг в K8s с помощью Prometheus]]

===== Установка в Debian/Ubuntu =====

<code>
# apt install prometheus

# less /etc/prometheus/prometheus.yml
</code><code>
...
global:
  scrape_interval:     15s
...
</code>

==== Проверка конфигурации и перезапуск ====

<code>
# promtool check config /etc/prometheus/prometheus.yml

# service prometheus restart
</code>

==== Подключение ====

!!! Ссылки содержат DNS имена

  * [[http://192.168.X.10:9090/]]
  * [[http://server.corpX.un:9090/]]
  * [[http://server:9090/classic/targets]]

==== Источники данных ====

  * [[#Exporters]]
===== prometheus-alertmanager =====

  * [[https://samber.github.io/awesome-prometheus-alerts/rules|Awesome Prometheus alerts]]
  * [[https://github.com/samber/awesome-prometheus-alerts/tree/master/dist/rules]]

  * [[Сервис MTA#Установка и настройка MTA на обработку почты домена hostname]]

<code>
# apt install prometheus-alertmanager

# cat /etc/prometheus/alertmanager.yml
</code><code>
global:
  smtp_smarthost: 'server.corpX.un:25'
  smtp_from: 'alertmanager@corpX.un'
  smtp_require_tls: false
templates:
- '/etc/prometheus/alertmanager_templates/*.tmpl'

route:
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 3h
  receiver: team-X-mails

receivers:
- name: 'team-X-mails'
  email_configs:
  - to: 'student@corpX.un'
    send_resolved: true
</code><code>
# service prometheus-alertmanager restart

# cat /etc/prometheus/first_rules.yml
</code><code>
groups:
- name: alert.rules
  rules:
  - alert: InstanceDown
    expr: up == 0
    for: 1m
    labels:
      severity: "critical"
    annotations:
      summary: "Endpoint {{ $labels.instance }} down"
      description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minutes."

  - alert: EndpointDown
    expr: probe_success == 0
    for: 1m
    labels:
      severity: "critical"
    annotations:
      summary: "Endpoint {{ $labels.instance }} down"

  - alert: CriticalTraffic
    expr: rate(ifInOctets{instance="router"}[1m])>125000
    for: 1m
    labels:
      severity: "critical"
    annotations:
      summary: "CriticalTraffic {{ $labels.instance }}"
</code><code>
# cat /etc/prometheus/prometheus.yml
</code><code>
...
# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets: ['localhost:9093']

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  - "first_rules.yml"
  # - "second_rules.yml"

...
</code>

  * [[#Проверка конфигурации и перезапуск]]

<code>
...
Checking /etc/prometheus/first_rules.yml
  SUCCESS: N rules found
...
</code>

  * [[http://server.corpX.un:9090/classic/alerts]]
===== Exporters =====

==== prometheus-node-exporter ====

  * В Debian/Ubuntu ставится как зависимость к пакету prometheus и добавлен в конфигурацию
  * [[http://server.corpX.un:9100/metrics]]

=== Примеры счетчиков ===

== node_filesystem_free_bytes ==
<code>
$ df /
...
/dev/mapper/debian--vg-root  15662008 1877488  12969212  13% /
...
# TYPE node_filesystem_free_bytes gauge
node_filesystem_free_bytes{device="/dev/mapper/debian--vg-root",fstype="ext4",mountpoint="/"} = (15662008 - 1877488) * 1024 
</code>

== node_network_receive_bytes_total ==
<code>
$ cat /sys/class/net/eth0/statistics/rx_bytes
  или
$ cat /sys/class/net/bond0/statistics/rx_bytes

# TYPE node_network_receive_bytes_total counter
node_network_receive_bytes_total{device="bond0"}
</code>

=== Подключение к prometheus ===

<code>
# less /etc/prometheus/prometheus.yml
...
  - job_name: node
    # If prometheus-node-exporter is installed, grab stats about the local
    # machine by default.
    static_configs:
      - targets: ['localhost:9100']
</code>

=== Запросы PromQL ===
<code>
8*rate(node_network_receive_bytes_total[1m])

8*rate(node_network_receive_bytes_total{device="bond0"}[1m])

8*rate(node_network_receive_bytes_total{device="eth0",instance="localhost:9100",job="node"}[1m])
</code>


==== prometheus-blackbox-exporter ====

  * [[https://geekflare.com/monitor-website-with-blackbox-prometheus-grafana/|How to Monitor Website Performance with Blackbox Exporter and Grafana?]]
  * [[https://habr.com/ru/company/otus/blog/500448/|Prometheus: мониторинг HTTP через Blackbox экспортер]]

<code>
# apt install prometheus-blackbox-exporter
</code>

=== Пример конфигурации ===
<code>
# cat /etc/prometheus/blackbox.yml
</code><code>
...
  http_2xx:
    prober: http
    http:
      preferred_ip_protocol: "ip4"
...
</code><code>
# service prometheus-blackbox-exporter restart

# cat /etc/prometheus/prometheus.yml
</code><code>
...
  - job_name: check_http
    metrics_path: /probe
    params:
      module: [http_2xx]
    static_configs:
      - targets:
        - https://val.bmstu.ru
        - https://ya.ru
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: localhost:9115

  - job_name: check_ssh
    metrics_path: /probe
    params:
      module: [ssh_banner]
    static_configs:
      - targets:
        - switch1:22
        - switch2:22
        - switch3:22
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: localhost:9115
</code>

  * [[#Проверка конфигурации и перезапуск]]
  * [[http://server.corpX.un:9115/]] Blackbox Exporter->Logs и [[http://server.corpX.un:9090/graph]]

<code>
probe_success...

probe_duration_seconds...

probe_http_duration_seconds...
</code>
=== Пример использования file-based service discovery и сервиса ping ===

  * [[https://www.robustperception.io/icmp-pings-with-the-blackbox-exporter|ICMP Pings with the Blackbox exporter]], [[https://github.com/prometheus/blackbox_exporter?tab=readme-ov-file#permissions|github blackbox_exporter permissions]], [[POSIX capabilities]] или [[Управление ядром и модулями в Linux#Переменные ядра]]

<code>
# cat /etc/prometheus/prometheus.yml
</code><code>
...
  - job_name: check_ping
    metrics_path: /probe
    params:
      module: [icmp]
    file_sd_configs:
      - files:
#        - switchs.yml
#        - switchs.json
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: localhost:9115
</code><code>
# cat /etc/prometheus/switchs.json
</code><code>
[
  {
    "targets": [ "switch1", "switch2", "switch3" ]
  }
]
</code><code>
# cat /etc/prometheus/switchs.yml
</code><code>
- targets:
  - switch1
  - switch2
  - switch3
</code>

  * [[#Проверка конфигурации и перезапуск]]
==== prometheus-snmp-exporter ====

  * [[https://blogspot.sysadm.kz/2018/07/grafana-prometheus-cisco-snmp.html|Grafana + Prometheus мониторинг траффика Cisco SNMP]]
  * [[https://medium.com/@openmohan/snmp-monitoring-and-easing-it-with-prometheus-b157c0a42c0c|SNMP monitoring and easing it with Prometheus]]
  * [[https://grafana.com/blog/2022/02/01/an-advanced-guide-to-network-monitoring-with-grafana-and-prometheus/|An advanced guide to network monitoring with Grafana and Prometheus]]

<code>
# apt install prometheus-snmp-exporter
</code>

=== Создание файла конфигурации "вручную" ===

<code>
# cat /etc/prometheus/snmp.yml
</code><code>
auths:
  public_v2:
    community: public
    version: 2
modules:
  if_mib:
    walk:
    - 1.3.6.1.2.1.2.2.1.10
    - 1.3.6.1.2.1.2.2.1.16
    - 1.3.6.1.2.1.2.2.1.2
    metrics:
    - name: ifInOctets
      oid: 1.3.6.1.2.1.2.2.1.10
      type: counter
      indexes:
      - labelname: ifIndex
        type: Integer
      lookups:
      - labels:
        - ifIndex
        labelname: ifDescr
        oid: 1.3.6.1.2.1.2.2.1.2
        type: DisplayString
    - name: ifOutOctets
      oid: 1.3.6.1.2.1.2.2.1.16
      type: counter
      indexes:
      - labelname: ifIndex
        type: Integer
      lookups:
      - labels:
        - ifIndex
        labelname: ifDescr
        oid: 1.3.6.1.2.1.2.2.1.2
        type: DisplayString
</code>

=== Создание файла конфигурации через "generator" ===

<code>
# cp /usr/share/doc/prometheus-snmp-exporter/examples/generator.yml .
  может понадобиться удалить все modules, кроме if_mib

# prometheus-snmp-generator generate

# cp snmp.yml /etc/prometheus/snmp.yml
</code>


=== Проверка конфигурации и перезапуск prometheus-snmp-exporter ===
<code>
# prometheus-snmp-exporter --dry-run

# service prometheus-snmp-exporter restart
</code>

=== Примеры использования ===

  * [[http://server.corpX.un:9116/]]

<code>
# curl 'http://127.0.0.1:9116/snmp?target=router'
</code><code>
# cat /etc/prometheus/prometheus.yml
</code><code>
...
  - job_name: 'snmp'
    static_configs:
      - targets:
        - router
    metrics_path: /snmp
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: localhost:9116
</code>

  * [[#Проверка конфигурации и перезапуск]]

<code>
rate(ifInOctets{ifDescr="FastEthernet0/0",ifIndex="1",instance="router",job="snmp"}[1m])
  или
rate(ifOutOctets{ifIndex="5",instance="router",job="snmp"}[1m])

8*rate(ifInOctets{ifDescr="FastEthernet0/0",instance="router"}[1m])
  или
8*rate(ifOutOctets{ifDescr="Port-channel1",instance="router"}[1m])
</code>

===== prometheus-pushgateway =====

==== Установка и настройка ====
<code>
# apt install prometheus-pushgateway

# cat /etc/prometheus/prometheus.yml
</code><code>
...
  - job_name: 'pushgateway'
    honor_labels: true
    static_configs:
    - targets: ['localhost:9091']
</code>

  * [[#Проверка конфигурации и перезапуск]]
==== Пример prometheus pushgateway на bash ====

  * [[https://vinayakpandey-7997.medium.com/pushing-bash-script-result-to-prometheus-using-pushgateway-a0760cd261e|Scrape data using Bash script and push it to Prometheus using PushGateway]]

<code>
# cat ip_dhcp_binding.sh
</code><code>
#!/bin/sh

unset http_proxy
DHCP_SERVER=router
NET=192.168

COUNT=`rsh ${DHCP_SERVER} show ip dhcp binding | grep ${NET} | wc -l`

cat << EOF | curl --data-binary @- http://127.0.0.1:9091/metrics/job/cisco_dhcp/dhcp_server/${DHCP_SERVER}/net/${NET}
  ip_dhcp_binding ${COUNT}
EOF

</code><code>
ip_dhcp_binding{dhcp_server="router",job="cisco_dhcp",net="192.168"}
</code><code>
# crontab -l
</code><code>
* * * * * /root/ip_dhcp_binding.sh
</code>