This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
мониторинг_кластера_kubernetes [2025/03/25 07:48] val [Метрики] |
мониторинг_кластера_kubernetes [2025/04/02 11:42] (current) val [Мониторинг кластера Kubernetes] |
||
---|---|---|---|
Line 7: | Line 7: | ||
* [[https://selectel.ru/blog/tutorials/monitoring-in-k8s-with-prometheus/|selectel Мониторинг в K8s с помощью Prometheus]] | * [[https://selectel.ru/blog/tutorials/monitoring-in-k8s-with-prometheus/|selectel Мониторинг в K8s с помощью Prometheus]] | ||
* [[https://www.groundcover.com/blog/kubernetes-observability|Kubernetes Observability Guide: Best Practices & Tools]] | * [[https://www.groundcover.com/blog/kubernetes-observability|Kubernetes Observability Guide: Best Practices & Tools]] | ||
+ | * [[https://sysdig.com/blog/how-to-monitor-kubelet/|How to Monitor the Kubelet]] | ||
+ | * [[https://valyala.medium.com/how-to-use-relabeling-in-prometheus-and-victoriametrics-8b90fc22c4b2|How to use relabeling in Prometheus and VictoriaMetrics]] | ||
| | ||
Line 21: | Line 23: | ||
===== Запись вебинара ===== | ===== Запись вебинара ===== | ||
- | * Тэги: | + | * Тэги: Kubernetes, Monitoring, Observability, Metrics Server, VictoriaMetrics, Prometheus, Grafana |
+ | * https://youtu.be/qYKvsOFcpf4 | ||
+ | * https://rutube.ru/video/private/5bfc09467bd36c30276600e7b17b3bfc/ | ||
===== Методическая подготовка ===== | ===== Методическая подготовка ===== | ||
Line 46: | Line 50: | ||
===== Шаг 2. Metrics Server ===== | ===== Шаг 2. Metrics Server ===== | ||
- | * [[https://kubernetes-sigs.github.io/metrics-server/Kubernetes Metrics Server]] | + | * [[Система Kubernetes#Metrics Server]] |
- | * [[https://medium.com/@cloudspinx/fix-error-metrics-api-not-available-in-kubernetes-aa10766e1c2f|Fix “error: Metrics API not available” in Kubernetes]] | + | |
- | <code> | ||
- | kube1:~/metrics-server# curl -L https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.7.2/components.yaml | tee metrics-server-components.yaml | ||
- | |||
- | kube1:~/metrics-server# cat metrics-server-components.yaml | ||
- | </code><code> | ||
- | ... | ||
- | containers: | ||
- | - args: | ||
- | - --cert-dir=/tmp | ||
- | - --kubelet-insecure-tls # add this | ||
- | ... | ||
- | </code><code> | ||
- | kube1:~/metrics-server# kubectl apply -f metrics-server-components.yaml | ||
- | |||
- | kube1# kubectl get pods -A | grep metrics-server | ||
- | |||
- | kube1# kubectl top pod #-n kube-system | ||
- | |||
- | kube1# kubectl top pod -A --sort-by=mem | ||
- | |||
- | kube1# kubectl top node | ||
- | </code> | ||
===== Шаг 3. VictoriaMetrics ===== | ===== Шаг 3. VictoriaMetrics ===== | ||
- | ==== Метрики ==== | + | * [[Сервис VictoriaMetrics]] |
- | * [[https://docs.victoriametrics.com/sd_configs/|vmagent and single-node VictoriaMetrics supports the following Prometheus-compatible service discovery]] | ||
- | * [[https://docs.victoriametrics.com/guides/k8s-monitoring-via-vm-single/|Kubernetes monitoring via VictoriaMetrics Single]] | + | ===== Вопросы ===== |
- | <code> | ||
- | (venv1) server# ansible all -f 4 -m apt -a 'pkg=prometheus-node-exporter state=present update_cache=true' -i /root/kubespray/inventory/mycluster/hosts.yaml | ||
- | |||
- | kube1# helm repo add vm https://victoriametrics.github.io/helm-charts/ | ||
- | kube1# helm repo update | ||
- | |||
- | kube1:~/vm# cat guide-vmsingle-values.yaml | ||
- | </code><code> | ||
- | ... | ||
- | - job_name: node-exporter | ||
- | static_configs: | ||
- | - targets: | ||
- | - kube1.corpX.un:9100 | ||
- | - kube2.corpX.un:9100 | ||
- | - kube3.corpX.un:9100 | ||
- | </code><code> | ||
- | kube1:~/vm# helm upgrade -i vmsingle vm/victoria-metrics-single -f guide-vmsingle-values.yaml -n vm --create-namespace | ||
- | |||
- | kube1:~/vm# kubectl scale --replicas 1 statefulset vmsingle-victoria-metrics-single-server | ||
- | |||
- | cmder> kubectl -n vm port-forward svc/vmsingle-victoria-metrics-single-server 8428 | ||
- | |||
- | kube1# helm repo add grafana https://grafana.github.io/helm-charts | ||
- | kube1# helm repo update | ||
- | |||
- | kube1:~/vm# cat my-grafana-values.yaml | ||
- | </code><code> | ||
- | datasources: | ||
- | datasources.yaml: | ||
- | apiVersion: 1 | ||
- | datasources: | ||
- | - name: victoriametrics | ||
- | type: prometheus | ||
- | orgId: 1 | ||
- | url: http://vmsingle-victoria-metrics-single-server:8428 | ||
- | access: proxy | ||
- | isDefault: true | ||
- | updateIntervalSeconds: 10 | ||
- | editable: true | ||
- | |||
- | dashboardProviders: | ||
- | dashboardproviders.yaml: | ||
- | apiVersion: 1 | ||
- | providers: | ||
- | - name: 'default' | ||
- | orgId: 1 | ||
- | folder: '' | ||
- | type: file | ||
- | disableDeletion: true | ||
- | editable: true | ||
- | options: | ||
- | path: /var/lib/grafana/dashboards/default | ||
- | |||
- | dashboards: | ||
- | default: | ||
- | victoriametrics: | ||
- | gnetId: 10229 | ||
- | revision: 22 | ||
- | datasource: victoriametrics | ||
- | kubernetes: | ||
- | gnetId: 14205 | ||
- | revision: 1 | ||
- | datasource: victoriametrics | ||
- | node-exporter: | ||
- | gnetId: 1860 | ||
- | revision: 37 | ||
- | datasource: victoriametrics | ||
- | </code><code> | ||
- | kube1:~/vm# helm upgrade -i my-grafana grafana/grafana -f my-grafana-values.yaml -n vm --create-namespace | ||
- | |||
- | kube1# kubectl get secret --namespace vm my-grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo | ||
- | |||
- | cmder> kubectl -n vm port-forward svc/vmsingle-victoria-metrics-single-server 3000:80 | ||
- | </code> | ||
- | |||
- | * [[https://docs.victoriametrics.com/guides/k8s-monitoring-via-vm-cluster/|Kubernetes monitoring with VictoriaMetrics Cluster]] | ||
- | ==== Уведомления ==== | ||
- | <code> | ||
- | kube1:~/vm# helm show values vm/victoria-metrics-alert > values-vm-alert.yaml | ||
- | |||
- | kube1:~/vm# cat values-vm-alert.yaml | ||
- | </code><code> | ||
- | ... | ||
- | server: | ||
- | ... | ||
- | datasource: | ||
- | url: "http://vmsingle-victoria-metrics-single-server:8428" | ||
- | # url: "http://vmsingle-victoria-metrics-single-server.default.svc.cluster.local:8428" | ||
- | # url: "http://vmcluster-victoria-metrics-cluster-vmselect.default.svc.cluster.local:8481/select/0/prometheus/" | ||
- | ... | ||
- | notifier: | ||
- | alertmanager: | ||
- | url: "http://alertmanager:9093" | ||
- | ... | ||
- | config: | ||
- | alerts: | ||
- | groups: | ||
- | - name: alert.rules | ||
- | rules: | ||
- | - alert: CriticalCPU | ||
- | expr: sum by (kubernetes_io_hostname) (rate (container_cpu_usage_seconds_total[1m])) / sum (machine_cpu_cores) * 100 > 40 | ||
- | for: 1m | ||
- | labels: | ||
- | severity: "critical" | ||
- | annotations: | ||
- | summary: "CriticalCPU {{ $labels.instance }}" | ||
- | |||
- | - alert: CriticalFS | ||
- | expr: container_fs_usage_bytes{device=~"^/dev/[sv]d[a-z][1-9]$"} / container_fs_limit_bytes * 100 > 80 | ||
- | for: 1m | ||
- | labels: | ||
- | severity: "critical" | ||
- | annotations: | ||
- | summary: "CriticalFS {{ $labels.instance }}" | ||
- | |||
- | - alert: CriticalMEM | ||
- | expr: sum by (kubernetes_io_hostname) (container_memory_working_set_bytes) / sum (machine_memory_bytes) * 100 > 80 | ||
- | for: 1m | ||
- | labels: | ||
- | severity: "critical" | ||
- | annotations: | ||
- | summary: "CriticalMEM {{ $labels.instance }}" | ||
- | |||
- | ... | ||
- | alertmanager: | ||
- | ... | ||
- | enabled: true | ||
- | ... | ||
- | config: | ||
- | global: | ||
- | smtp_smarthost: 'server.corpX.un:25' | ||
- | smtp_from: 'alertmanager@corpX.un' | ||
- | smtp_require_tls: false | ||
- | |||
- | route: | ||
- | group_wait: 30s | ||
- | group_interval: 5m | ||
- | repeat_interval: 3h | ||
- | receiver: team-X-mails | ||
- | |||
- | receivers: | ||
- | - name: 'team-X-mails' | ||
- | email_configs: | ||
- | - to: 'student@corpX.un' | ||
- | send_resolved: true | ||
- | ... | ||
- | </code><code> | ||
- | kube1:~/vm# helm upgrade -i vma vm/victoria-metrics-alert -f values-vm-alert.yaml | ||
- | |||
- | kube1:~/vm# kubectl exec -ti pods/vma-victoria-metrics-alert-server-<TAB> -- sh | ||
- | </code><code> | ||
- | / # cat /config/alert-rules.yaml | ||
- | ... | ||
- | </code><code> | ||
- | kube1:~/vm# kubectl exec -ti pods/vma-victoria-metrics-alert-alertmanager-<TAB> -- sh | ||
- | </code><code> | ||
- | /alertmanager $ cat /config/alertmanager.yaml | ||
- | ... | ||
- | </code> | ||
- | |||
- | * Используем [[Система Kubernetes#Базовые объекты k8s]] для стресс тестирования из [[Основы администрирования систем Linux#Модуль 11. Анализ производительности и оптимизация системы]] | ||
- | |||
- | ==== Журналы ==== | ||
- | |||
- | * [[https://docs.victoriametrics.com/helm/victorialogs-single/]] | ||
- | * [[https://docs.victoriametrics.com/victorialogs/logsql-examples/]] | ||
- | |||
- | <code> | ||
- | kube1:~/vm# helm show values vm/victoria-logs-single > values-vls.yaml | ||
- | |||
- | kube1:~/vm# cat values-vls.yaml | ||
- | ... | ||
- | vector: | ||
- | ... | ||
- | enabled: true | ||
- | ... | ||
- | </code><code> | ||
- | kube1:~/vm# helm upgrade -i vls oci://ghcr.io/victoriametrics/helm-charts/victoria-logs-single -f values-vls.yaml | ||
- | |||
- | cmder$ kubectl port-forward svc/vls-victoria-logs-single-server 9428 | ||
- | Forwarding from 127.0.0.1:9428 -> 9428 | ||
- | ... | ||
- | </code><code> | ||
- | Grafana -> Connections -> | ||
- | Add new connection: VictoriaLogs -> | ||
- | |||
- | Install -> New Datasources | ||
- | http://vls-victoria-logs-single-server:9428 -> | ||
- | |||
- | Explore -> LogQL: _time:5m | ||
- | |||
- | Builder-> Filter: kubernetes.pod_name = my-debian | ||
- | </code> | ||
- | |||
- | |||
- | ===== История ===== | ||
- | |||
- | ==== loki-stack ==== | ||
- | |||
- | * [[https://github.com/grafana/helm-charts/tree/main/charts/loki-stack|Deploy Loki and Promtail to your cluster]] | ||
- | |||
- | <code> | ||
- | http://loki.loki-stack.svc.cluster.local:3100 | ||
- | http://loki-prometheus-server.loki-stack.svc.cluster.local:80 | ||
- | |||
- | |||
- | kube1:~/loki-stack# helm pull grafana/loki-stack | ||
- | |||
- | kube1:~/loki-stack# less loki-stack/charts/loki/values.yaml | ||
- | |||
- | persistence: | ||
- | enabled: false | ||
- | accessModes: | ||
- | - ReadWriteOnce | ||
- | size: 10Gi | ||
- | |||
- | |||
- | kube1:~/loki-stack# cat values.yaml | ||
- | loki: | ||
- | persistence: | ||
- | enabled: true | ||
- | |||
- | prometheus: | ||
- | enabled: true | ||
- | alertmanager: | ||
- | config: | ||
- | global: | ||
- | smtp_smarthost: 'server.corp13.un:25' | ||
- | smtp_from: 'alertmanager@corp13.un' | ||
- | smtp_require_tls: false | ||
- | |||
- | templates: | ||
- | - '/etc/alertmanager/*.tmpl' | ||
- | |||
- | route: | ||
- | group_wait: 30s | ||
- | group_interval: 5m | ||
- | repeat_interval: 3h | ||
- | receiver: team-X-mails | ||
- | |||
- | receivers: | ||
- | - name: 'team-X-mails' | ||
- | email_configs: | ||
- | - to: 'student@corp13.un' | ||
- | send_resolved: true | ||
- | |||
- | serverFiles: | ||
- | alerting_rules.yml: | ||
- | groups: | ||
- | |||
- | |||
- | kube1:~/loki-stack# helm upgrade --install loki --namespace=loki-stack grafana/loki-stack --create-namespace -f values.yaml | ||
- | |||
- | ### helm delete loki --namespace=loki-stack | ||
- | </code> |