====== Сервис Prometheus ======
* [[https://habr.com/ru/company/selectel/blog/275803/|Мониторинг сервисов с Prometheus]]
* [[https://habr.com/ru/company/southbridge/blog/455290/|Полное руководство по Prometheus в 2019 году]]
* [[https://www.shellhacks.com/ru/prometheus-delete-time-series-metrics/|Prometheus: Удаление Метрик Временных Рядов]]
* [[https://habr.com/ru/companies/tochka/articles/683608/|Человеческим языком про метрики]]
* [[https://selectel.ru/blog/tutorials/monitoring-in-k8s-with-prometheus/|Мониторинг в K8s с помощью Prometheus]]
===== Установка в Debian/Ubuntu =====
# apt install prometheus
# less /etc/prometheus/prometheus.yml
...
global:
scrape_interval: 15s
...
==== Проверка конфигурации и перезапуск ====
# promtool check config /etc/prometheus/prometheus.yml
# service prometheus restart
==== Подключение ====
!!! Ссылки содержат DNS имена
* [[http://192.168.X.10:9090/]]
* [[http://server.corpX.un:9090/targets]]
* [[http://server:9090/classic/targets]]
==== Источники данных ====
* [[#Exporters]]
===== prometheus-alertmanager =====
* [[https://medium.com/devops-dudes/prometheus-alerting-with-alertmanager-e1bbba8e6a8e|Prometheus Alerting with AlertManager]]
* [[https://alex.dzyoba.com/blog/prometheus-alerts/|Prometheus alerts examples]]
* [[https://www.digitalocean.com/community/tutorials/how-to-use-alertmanager-and-blackbox-exporter-to-monitor-your-web-server-on-ubuntu-16-04|How To Use Alertmanager And Blackbox Exporter To Monitor Your Web Server On Ubuntu 16.04]]
* [[https://awesome-prometheus-alerts.grep.to/|Awesome Prometheus alerts]]
* [[Сервис MTA#Установка и настройка MTA на обработку почты домена hostname]]
# apt install prometheus-alertmanager
# cat /etc/prometheus/alertmanager.yml
...
global:
smtp_smarthost: 'localhost:25'
smtp_from: 'prometheus@server.corpX.un'
smtp_require_tls: false
# smtp_auth_username: 'alertmanager'
# smtp_auth_password: 'password'
...
# A default receiver
receiver: team-X-mails
...
receivers:
- name: 'team-X-mails'
email_configs:
- to: 'student@corpX.un'
send_resolved: true
...
# service prometheus-alertmanager restart
# cat /etc/prometheus/first_rules.yml
groups:
- name: alert.rules
rules:
- alert: InstanceDown
expr: up == 0
for: 1m
labels:
severity: "critical"
annotations:
summary: "Endpoint {{ $labels.instance }} down"
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minutes."
- alert: EndpointDown
expr: probe_success == 0
for: 1m
labels:
severity: "critical"
annotations:
summary: "Endpoint {{ $labels.instance }} down"
- alert: CriticalTraffic
expr: rate(ifInOctets{instance="router"}[1m])>125000
for: 1m
labels:
severity: "critical"
annotations:
summary: "CriticalTraffic {{ $labels.instance }}"
# cat /etc/prometheus/prometheus.yml
...
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets: ['localhost:9093']
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- "first_rules.yml"
# - "second_rules.yml"
...
* [[#Проверка конфигурации и перезапуск]]
...
Checking /etc/prometheus/first_rules.yml
SUCCESS: N rules found
...
* [[http://192.168.X.10:9090/alerts]]
===== Exporters =====
==== prometheus-node-exporter ====
* В Debian/Ubuntu ставится как зависимость к пакету prometheus и добавлен в конфигурацию
* [[http://server.corpX.un:9100/metrics]]
=== Примеры счетчиков ===
== node_filesystem_free_bytes ==
$ df /
...
/dev/mapper/debian--vg-root 15662008 1877488 12969212 13% /
...
# TYPE node_filesystem_free_bytes gauge
node_filesystem_free_bytes{device="/dev/mapper/debian--vg-root",fstype="ext4",mountpoint="/"} = (15662008 - 1877488) * 1024
== node_network_receive_bytes_total ==
$ cat /sys/class/net/eth0/statistics/rx_bytes
или
$ cat /sys/class/net/bond0/statistics/rx_bytes
# TYPE node_network_receive_bytes_total counter
node_network_receive_bytes_total{device="bond0"}
=== Подключение к prometheus ===
# less /etc/prometheus/prometheus.yml
...
- job_name: node
# If prometheus-node-exporter is installed, grab stats about the local
# machine by default.
static_configs:
- targets: ['localhost:9100']
=== Запросы PromQL ===
8*rate(node_network_receive_bytes_total[1m])
8*rate(node_network_receive_bytes_total{device="bond0"}[1m])
8*rate(node_network_receive_bytes_total{device="eth0",instance="localhost:9100",job="node"}[1m])
==== prometheus-blackbox-exporter ====
* [[https://geekflare.com/monitor-website-with-blackbox-prometheus-grafana/|How to Monitor Website Performance with Blackbox Exporter and Grafana?]]
* [[https://habr.com/ru/company/otus/blog/500448/|Prometheus: мониторинг HTTP через Blackbox экспортер]]
# apt install prometheus-blackbox-exporter
=== Пример конфигурации ===
# cat /etc/prometheus/blackbox.yml
...
http_2xx:
prober: http
http:
preferred_ip_protocol: "ip4"
...
# service prometheus-blackbox-exporter restart
# cat /etc/prometheus/prometheus.yml
...
- job_name: check_http
metrics_path: /probe
params:
module: [http_2xx]
static_configs:
- targets:
- https://val.bmstu.ru
- https://ya.ru
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: localhost:9115
- job_name: check_ssh
metrics_path: /probe
params:
module: [ssh_banner]
static_configs:
- targets:
- switch1:22
- switch2:22
- switch3:22
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: localhost:9115
* [[#Проверка конфигурации и перезапуск]]
* [[http://server.corpX.un:9115/]] Blackbox Exporter->Logs и [[http://server.corpX.un:9090/graph]]
probe_success...
probe_duration_seconds...
probe_http_duration_seconds...
=== Пример использования file-based service discovery и сервиса ping ===
* [[https://www.robustperception.io/icmp-pings-with-the-blackbox-exporter|ICMP Pings with the Blackbox exporter]], [[https://github.com/prometheus/blackbox_exporter?tab=readme-ov-file#permissions|github blackbox_exporter permissions]], [[POSIX capabilities]] или [[Управление ядром и модулями в Linux#Переменные ядра]]
# cat /etc/prometheus/prometheus.yml
...
- job_name: check_ping
metrics_path: /probe
params:
module: [icmp]
file_sd_configs:
- files:
# - switchs.yml
# - switchs.json
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: localhost:9115
# cat /etc/prometheus/switchs.json
[
{
"targets": [ "switch1", "switch2", "switch3" ]
}
]
# cat /etc/prometheus/switchs.yml
- targets:
- switch1
- switch2
- switch3
* [[#Проверка конфигурации и перезапуск]]
==== prometheus-snmp-exporter ====
* [[https://blogspot.sysadm.kz/2018/07/grafana-prometheus-cisco-snmp.html|Grafana + Prometheus мониторинг траффика Cisco SNMP]]
* [[https://medium.com/@openmohan/snmp-monitoring-and-easing-it-with-prometheus-b157c0a42c0c|SNMP monitoring and easing it with Prometheus]]
* [[https://grafana.com/blog/2022/02/01/an-advanced-guide-to-network-monitoring-with-grafana-and-prometheus/|An advanced guide to network monitoring with Grafana and Prometheus]]
# apt install prometheus-snmp-exporter
=== Создание файла конфигурации "вручную" ===
# cat /etc/prometheus/snmp.yml
auths:
public_v2:
community: public
version: 2
modules:
if_mib:
walk:
- 1.3.6.1.2.1.2.2.1.10
- 1.3.6.1.2.1.2.2.1.16
- 1.3.6.1.2.1.2.2.1.2
metrics:
- name: ifInOctets
oid: 1.3.6.1.2.1.2.2.1.10
type: counter
indexes:
- labelname: ifIndex
type: Integer
lookups:
- labels:
- ifIndex
labelname: ifDescr
oid: 1.3.6.1.2.1.2.2.1.2
type: DisplayString
- name: ifOutOctets
oid: 1.3.6.1.2.1.2.2.1.16
type: counter
indexes:
- labelname: ifIndex
type: Integer
lookups:
- labels:
- ifIndex
labelname: ifDescr
oid: 1.3.6.1.2.1.2.2.1.2
type: DisplayString
=== Создание файла конфигурации через "generator" ===
# cp /usr/share/doc/prometheus-snmp-exporter/examples/generator.yml .
может понадобиться удалить все modules, кроме if_mib
# prometheus-snmp-generator generate
# cp snmp.yml /etc/prometheus/snmp.yml
=== Примеры использования ===
# service prometheus-snmp-exporter restart
* [[http://server.corpX.un:9116/]]
# curl 'http://127.0.0.1:9116/snmp?target=router'
# cat /etc/prometheus/prometheus.yml
...
- job_name: 'snmp'
static_configs:
- targets:
- router
metrics_path: /snmp
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: localhost:9116
* [[#Проверка конфигурации и перезапуск]]
rate(ifInOctets{ifDescr="FastEthernet0/0",ifIndex="1",instance="router",job="snmp"}[1m])
или
rate(ifOutOctets{ifIndex="5",instance="router",job="snmp"}[1m])
8*rate(ifInOctets{ifDescr="FastEthernet0/0",instance="router"}[1m])
или
8*rate(ifOutOctets{ifDescr="Port-channel1",instance="router"}[1m])
===== prometheus-pushgateway =====
==== Установка и настройка ====
# apt install prometheus-pushgateway
# cat /etc/prometheus/prometheus.yml
...
- job_name: 'pushgateway'
honor_labels: true
static_configs:
- targets: ['localhost:9091']
* [[#Проверка конфигурации и перезапуск]]
==== Пример prometheus pushgateway на bash ====
* [[https://vinayakpandey-7997.medium.com/pushing-bash-script-result-to-prometheus-using-pushgateway-a0760cd261e|Scrape data using Bash script and push it to Prometheus using PushGateway]]
# cat ip_dhcp_binding.sh
#!/bin/sh
unset http_proxy
DHCP_SERVER=router
NET=192.168
COUNT=`rsh ${DHCP_SERVER} show ip dhcp binding | grep ${NET} | wc -l`
cat << EOF | curl --data-binary @- http://127.0.0.1:9091/metrics/job/cisco_dhcp/dhcp_server/${DHCP_SERVER}/net/${NET}
ip_dhcp_binding ${COUNT}
EOF
ip_dhcp_binding{dhcp_server="router",job="cisco_dhcp",net="192.168"}
# crontab -l
* * * * * /root/ip_dhcp_binding.sh