Сервис VictoriaMetrics
Метрики
(venv1) server# ansible all -f 4 -m apt -a 'pkg=prometheus-node-exporter state=present update_cache=true' -i /root/kubespray/inventory/mycluster/hosts.yaml
root@9f823e41e1c4:/kubespray# ansible all -f 4 -m apt -a 'pkg=prometheus-node-exporter state=present update_cache=true' -i /inventory/inventory.ini
kube1# helm repo add vm https://victoriametrics.github.io/helm-charts/
kube1# helm repo update
kube1# helm search repo vm
kube1:~/vm# helm show values vm/victoria-metrics-single --version 0.35.0 > vmsingle-values.yaml
kube1:~/vm# cat vmsingle-values.yaml
...
scrape:
enabled: true
...
# End of COPY
- job_name: kube-state-metrics
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_container_name]
regex: kube-state-metrics
action: keep
- source_labels: [__meta_kubernetes_pod_container_port_number]
regex: "8080"
action: keep
- job_name: node-exporter
static_configs:
- targets:
- kube1.corpX.un:9100
- kube2.corpX.un:9100
- kube3.corpX.un:9100
kube1:~/vm# helm upgrade -i vmsingle vm/victoria-metrics-single -f vmsingle-values.yaml -n vm --create-namespace --version 0.35.0
cmder> kubectl port-forward svc/vmsingle-victoria-metrics-single-server 8428 -n vm
Уведомления
(venv1) server# ansible-playbook /root/conf/ansible/roles/mail.yml
kube1:~/vm# helm show values vm/victoria-metrics-alert --version 0.37.0 > vm-alert-values.yaml
$ wget -qO - https://raw.githubusercontent.com/samber/awesome-prometheus-alerts/master/dist/rules/host-and-hardware/node-exporter.yml | sed 's/^/ /'
$ wget -qO - https://raw.githubusercontent.com/samber/awesome-prometheus-alerts/master/dist/rules/kubernetes/kubestate-exporter.yml | sed 's/^/ /'
kube1:~/vm# cat vm-alert-values.yaml
...
server:
...
datasource:
url: "http://vmsingle-victoria-metrics-single-server:8428"
# url: "http://vmsingle-victoria-metrics-single-server.vm.svc.cluster.local:8428"
# url: "http://vmcluster-victoria-metrics-cluster-vmselect.vm.svc.cluster.local:8481/select/0/prometheus/"
...
config:
alerts:
# groups: []
groups:
- name: NodeExporter
...
- alert: HostFilesystemDeviceError
expr: 'node_filesystem_device_error{fstype!~"^(fuse.*|tmpfs|cifs|nfs)", mountpoint!~".*/kubelet.*"} == 1'
...
- name: KubestateExporter
...
alertmanager:
...
enabled: true
...
config:
global:
smtp_smarthost: 'server.corpX.un:25'
smtp_from: 'alertmanager@corpX.un'
smtp_require_tls: false
route:
group_wait: 30s
group_interval: 5m
repeat_interval: 3h
receiver: team-mails
receivers:
- name: 'team-mails'
email_configs:
- to: 'student@corpX.un'
send_resolved: true
...
kube1:~/vm# helm upgrade -i vma vm/victoria-metrics-alert -f vm-alert-values.yaml -n vm --version 0.37.0
kube1:~/vm# kubectl -n vm exec -ti pods/vma-victoria-metrics-alert-server-<TAB> -- sh
/ # ps|cat
...
1 root 0:00 /vmalert-prod --datasource.url=http://vmsingle-victoria-metrics-single-server:8428 --envflag.enable --envflag.prefix=VM_ --httpListenAddr=:8880 --loggerFormat=json --notifier.url=http://vma-victoria-metrics-alert-alertmanager.vm.svc.cluster.local.:9093 --rule=/config/alert-rules.yaml
...
/ # cat /config/alert-rules.yaml
...
cmder> kubectl -n vm port-forward svc/vma-victoria-metrics-alert-server 8880
kube1:~/vm# kubectl -n vm exec -ti pods/vma-victoria-metrics-alert-alertmanager-<TAB> -- sh
/alertmanager $ cat /config/alertmanager.yaml
...
cmder> kubectl -n vm port-forward svc/vma-victoria-metrics-alert-alertmanager 9093
Журналы
kube1:~/vm# helm show values vm/victoria-logs-single --version 0.12.0 > vls-values.yaml
kube1:~/vm# cat vls-values.yaml
...
vector:
...
enabled: true
...
kube1:~/vm# helm upgrade -i vls vm/victoria-logs-single -f vls-values.yaml -n vm --version 0.12.0
cmder$ kubectl port-forward svc/vls-victoria-logs-single-server 9428 -n vm
VMUI-> Log Query: kubernetes.pod_name: my-debian
Подключение Grafana
Grafana -> Connections ->
Add new connection: VictoriaLogs ->
Install -> New Datasources
http://vls-victoria-logs-single-server:9428 ->
Explore -> LogQL: _time:5m
Builder-> Filter: kubernetes.pod_name = my-debian
Черновик
- alert: CriticalCPU
expr: sum by (kubernetes_io_hostname) (rate (container_cpu_usage_seconds_total[1m])) / sum (machine_cpu_cores) * 100 > 40
for: 1m
labels:
severity: "critical"
annotations:
summary: "CriticalCPU {{ $labels.instance }}"
- alert: CriticalFS
expr: container_fs_usage_bytes{device=~"^/dev/[sv]d[a-z][1-9]$"} / container_fs_limit_bytes * 100 > 80
for: 1m
labels:
severity: "critical"
annotations:
summary: "CriticalFS {{ $labels.instance }}"
- alert: CriticalMEM
expr: sum by (kubernetes_io_hostname) (container_memory_working_set_bytes) / sum (machine_memory_bytes) * 100 > 80
for: 1m
labels:
severity: "critical"
annotations:
summary: "CriticalMEM {{ $labels.instance }}"