====== Сервис VictoriaMetrics ====== ===== Метрики ===== * [[https://docs.victoriametrics.com/sd_configs/|vmagent and single-node VictoriaMetrics supports the following Prometheus-compatible service discovery]] * [[https://docs.victoriametrics.com/guides/k8s-monitoring-via-vm-cluster/|Kubernetes monitoring with VictoriaMetrics Cluster]] * [[https://docs.victoriametrics.com/guides/k8s-monitoring-via-vm-single/|Kubernetes monitoring via VictoriaMetrics Single]] * [[https://docs.victoriametrics.com/scrape_config_examples/]] * [[https://github.com/VictoriaMetrics/helm-charts/tags]] * [[Система Kubernetes#kube-state-metrics]] собирает информацию о состоянии объектов внутри кластера Kubernetes (поды, узлы, deployments, namespaces) (venv1) server# ansible all -f 4 -m apt -a 'pkg=prometheus-node-exporter state=present update_cache=true' -i /root/kubespray/inventory/mycluster/hosts.yaml root@9f823e41e1c4:/kubespray# ansible all -f 4 -m apt -a 'pkg=prometheus-node-exporter state=present update_cache=true' -i /inventory/inventory.ini kube1# helm repo add vm https://victoriametrics.github.io/helm-charts/ kube1# helm repo update kube1# helm search repo vm kube1:~/vm# helm show values vm/victoria-metrics-single --version 0.35.0 > vmsingle-values.yaml kube1:~/vm# cat vmsingle-values.yaml ... size: 16Gi # replace to 6 ... scrape: enabled: true ... # End of COPY - job_name: kube-state-metrics kubernetes_sd_configs: - role: pod relabel_configs: - source_labels: [__meta_kubernetes_pod_container_name] regex: kube-state-metrics action: keep - source_labels: [__meta_kubernetes_pod_container_port_number] regex: "8080" action: keep - job_name: node-exporter static_configs: - targets: - kube1.corpX.un:9100 - kube2.corpX.un:9100 - kube3.corpX.un:9100 ... kube1:~/vm# helm upgrade -i vmsingle vm/victoria-metrics-single -f vmsingle-values.yaml -n vm --create-namespace --version 0.35.0 cmder> kubectl port-forward svc/vmsingle-victoria-metrics-single-server 8428 -n vm * Сервис Grafana в [[Сервис Grafana#Kubernetes]] ===== Уведомления ===== * [[Сервис Prometheus#prometheus-alertmanager]] (ссылки на правила) (venv1) server# ansible-playbook /root/conf/ansible/roles/mail.yml ==== Настройка ==== kube1:~/vm# helm show values vm/victoria-metrics-alert --version 0.37.0 > vm-alert-values.yaml $ wget -qO - https://raw.githubusercontent.com/samber/awesome-prometheus-alerts/master/dist/rules/host-and-hardware/node-exporter.yml | sed 's/^/ /' $ wget -qO - https://raw.githubusercontent.com/samber/awesome-prometheus-alerts/master/dist/rules/kubernetes/kubestate-exporter.yml | sed 's/^/ /' kube1:~/vm# cat vm-alert-values.yaml ... server: ... datasource: url: "http://vmsingle-victoria-metrics-single-server:8428" # url: "http://vmsingle-victoria-metrics-single-server.vm.svc.cluster.local:8428" # url: "http://vmcluster-victoria-metrics-cluster-vmselect.vm.svc.cluster.local:8481/select/0/prometheus/" ... config: alerts: # groups: [] groups: - name: NodeExporter ... - alert: HostFilesystemDeviceError expr: 'node_filesystem_device_error{fstype!~"^(fuse.*|tmpfs|cifs|nfs)", mountpoint!~".*/kubelet.*"} == 1' ... - name: KubestateExporter ... alertmanager: ... enabled: true ... config: global: smtp_smarthost: 'server.corpX.un:25' smtp_from: 'alertmanager@corpX.un' smtp_require_tls: false route: group_wait: 30s group_interval: 5m repeat_interval: 3h receiver: team-mails receivers: - name: 'team-mails' email_configs: - to: 'student@corpX.un' send_resolved: true ... kube1:~/vm# helm upgrade -i vma vm/victoria-metrics-alert -f vm-alert-values.yaml -n vm --version 0.37.0 ==== Отдадка ==== kube1:~/vm# kubectl -n vm exec -ti pods/vma-victoria-metrics-alert-server- -- sh / # ps|cat ... 1 root 0:00 /vmalert-prod --datasource.url=http://vmsingle-victoria-metrics-single-server:8428 --envflag.enable --envflag.prefix=VM_ --httpListenAddr=:8880 --loggerFormat=json --notifier.url=http://vma-victoria-metrics-alert-alertmanager.vm.svc.cluster.local.:9093 --rule=/config/alert-rules.yaml ... / # cat /config/alert-rules.yaml ... ==== Подключение ==== cmder> kubectl -n vm port-forward svc/vma-victoria-metrics-alert-server 8880 kube1:~/vm# kubectl -n vm exec -ti pods/vma-victoria-metrics-alert-alertmanager- -- sh /alertmanager $ cat /config/alertmanager.yaml ... cmder> kubectl -n vm port-forward svc/vma-victoria-metrics-alert-alertmanager 9093 ==== Тестирование ==== * [[Управление сервисами в Linux#Управление юнитами Systemd]] systemctl reset-failed * Используем [[Система Kubernetes#Базовые объекты k8s]] для стресс тестирования из [[Анализ производительности системы]] * [[Команда dd]] для нехватки места на диске ===== Журналы ===== * [[https://docs.victoriametrics.com/helm/victorialogs-single/]] * [[https://docs.victoriametrics.com/victorialogs/logsql-examples/]] kube1:~/vm# helm show values vm/victoria-logs-single --version 0.12.0 > vls-values.yaml kube1:~/vm# cat vls-values.yaml ... size: 10Gi # replace to 5 ... vector: ... enabled: true ... kube1:~/vm# helm upgrade -i vls vm/victoria-logs-single -f vls-values.yaml -n vm --version 0.12.0 cmder$ kubectl port-forward svc/vls-victoria-logs-single-server 9428 -n vm VMUI-> Log Query: kubernetes.pod_name: my-debian ==== Подключение Grafana ==== Grafana -> Connections -> Add new connection: VictoriaLogs -> Install -> New Datasources http://vls-victoria-logs-single-server:9428 -> Explore -> LogsQL: kubernetes.pod_labels.app: "my-debian" _time: 5m ====== Черновик ====== - alert: CriticalCPU expr: sum by (kubernetes_io_hostname) (rate (container_cpu_usage_seconds_total[1m])) / sum (machine_cpu_cores) * 100 > 40 for: 1m labels: severity: "critical" annotations: summary: "CriticalCPU {{ $labels.instance }}" - alert: CriticalFS expr: container_fs_usage_bytes{device=~"^/dev/[sv]d[a-z][1-9]$"} / container_fs_limit_bytes * 100 > 80 for: 1m labels: severity: "critical" annotations: summary: "CriticalFS {{ $labels.instance }}" - alert: CriticalMEM expr: sum by (kubernetes_io_hostname) (container_memory_working_set_bytes) / sum (machine_memory_bytes) * 100 > 80 for: 1m labels: severity: "critical" annotations: summary: "CriticalMEM {{ $labels.instance }}"