374 Commits

Author SHA1 Message Date
Arthur Silva Sens
b884c9465c Add dashboard URL for KubeCPUOvercommit and KubeMemoryOvercommit 2022-09-09 08:03:24 +02:00
ArthurSens
9b382b6f69 Fix PrometheusRule name
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-09-08 20:31:23 +02:00
ArthurSens
7c354c9a38 Replace workspace alerts from jsonnet to YAML
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-09-08 15:46:23 +02:00
ArthurSens
e0bed466e7 Replace webapp alerts from jsonnet to YAML
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-09-08 14:07:23 +02:00
ArthurSens
bb38ad3ba1 Remove leeway build intrusctions
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-09-08 13:33:23 +02:00
ArthurSens
28b014fdc1 Replace IDE alerts from jsonnet to raw YAML
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-09-08 13:33:23 +02:00
Anton Kosyakov
dc9fbe40a7 [code-browser] extensions observability 2022-09-05 12:35:20 +02:00
ArthurSens
894359d4a4 Add prometheus-operator alerts
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-09-02 13:54:17 +02:00
ArthurSens
3b94dd1e63 Add Prometheus alerts
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-09-02 13:54:17 +02:00
Gero Posmyk-Leinemann
81c50611b4 [ops] WebApp: Fix WebAppServicesCrashlooping 2022-09-02 10:02:17 +02:00
Gero Posmyk-Leinemann
b2d0edde79 [ops] WebApp: Remove rate(memory): rate(gauge) does not work 2022-09-02 10:02:17 +02:00
Gero Posmyk-Leinemann
328f48664b [ops] WebApp: Fix alert WebsocketConnectionRateHigh by using a rate(total) instead of rate(gauge) 2022-09-02 10:02:17 +02:00
ArthurSens
d31b43fed3 Add kube-state-metrics alerts
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-09-02 09:33:17 +02:00
ArthurSens
18a533db56 Create alertmanager alerts
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-09-02 09:33:17 +02:00
ArthurSens
1d013d794c Add alerts for kubernetes nodes
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-09-02 09:26:18 +02:00
mustard
2f3b432707 [observability] add cluster selector for browser overview dashboard 2022-09-01 08:30:16 +02:00
Gero Posmyk-Leinemann
73cbd09b66 [ops] WebApp: review comments 2022-08-31 16:07:16 +02:00
Gero Posmyk-Leinemann
cb30274ccc [ops] WebApp: Alert on services crashlooping 2022-08-31 16:07:16 +02:00
Gero Posmyk-Leinemann
ddf5651b7c [ops] WebApp: Alerts on exessive RAM and CPU usage 2022-08-31 16:07:16 +02:00
Gero Posmyk-Leinemann
8701732104 [ops] WebApp: alert if db-sync is not running 2022-08-31 16:07:16 +02:00
Gero Posmyk-Leinemann
4035d1242a [ops] WebApp: alert on messagebus not running 2022-08-31 16:07:16 +02:00
Gero Posmyk-Leinemann
d394c24727 [ops] WebApp: high websocket connection rate 2022-08-31 16:07:16 +02:00
Gero Posmyk-Leinemann
5a00651ffd [ops] WebApp: Internal alert on JSON RPC error rates 2022-08-31 16:07:16 +02:00
Gero Posmyk-Leinemann
313b522c59 [ops] Meta Overview/server: Fix unit of "API Request Error rate" to be reqps 2022-08-31 16:07:16 +02:00
Thomas Schubart
ccb148f2a6 [observability] Add dashboard for network limiting 2022-08-31 10:27:15 +02:00
ArthurSens
aee56a583b Add alerts related to kubernetes resources
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-08-30 22:20:15 +02:00
mustard
f62bc58f7f [observability] add browser overview dashboard 2022-08-30 15:02:15 +02:00
ArthurSens
912410cdb0 Create alerts for certmanager
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-08-29 21:12:14 +02:00
Kyle Brennan
074d9842ca Sum by type, phase (avoids nodepools dupes) 2022-08-25 18:34:10 +02:00
JenTing Hsiao
b31278f1f1 observability: assign default zero if no data found
Signed-off-by: JenTing Hsiao <hsiaoairplane@gmail.com>
2022-08-25 13:41:10 +02:00
Pavel Tumik @ GitPod
cc79d75a96 [alerts] increase GitpodWorkspaceStuckOnStopping for time to 30min to reduce flakiness 2022-08-23 20:32:39 +02:00
Manuel Alejandro de Brito Fontes
7b4a885ee3 Update k8s dependencies to v0.24.3 2022-08-23 08:18:39 +02:00
JenTing Hsiao
d5462c0d02 observability: add #workspace > 20 in alert GitpodWorkspaceTooManyRegularNotActive
To prevent the alert from being triggered once we start traffic shifting.
The number of workspaces might be low, this cause the
gitpod_workspace_regular_not_active_percentage is easily to hit because
the gitpod_ws_manager_workspace_activity_total is low number.

Therefore, we add #workspace > 20 as another criterion for the alert.

Signed-off-by: JenTing Hsiao <hsiaoairplane@gmail.com>
2022-08-11 16:35:28 +02:00
Arthur Silva Sens
79fdcdd0be Update scrape.libsonnet 2022-08-10 14:11:54 +02:00
utam0k
2d1f66ae25 observability: Add a alert for the network connections. 2022-08-10 05:55:54 +02:00
JenTing Hsiao
a986791728 Update the alert description unit
Signed-off-by: JenTing Hsiao <hsiaoairplane@gmail.com>
2022-08-09 02:40:52 -03:00
Andrew Farries
c4363513a5 Run gofmt
gofmt -w .

From the repository root.
2022-08-08 10:54:52 -03:00
Pavel Tumik
06a686acf1 [alerts] change load avg alert to critical 2022-08-05 16:11:49 -03:00
Milan Pavlik
fc2355c241 [usage] Add runbook link for GitpodUsageScheduledReconciliationFailures 2022-08-05 09:34:49 -03:00
Milan Pavlik
9a947a5a81 [usage] Fix UsageReconciliationFailures alert 2022-08-05 02:42:49 -03:00
Milan Pavlik
63f3bb78ae [usage] Add alert on failed reconciliations 2022-08-04 05:32:48 -03:00
Arthur Silva Sens
5bdda8ecea Remove check for absense 2022-08-01 10:31:45 -03:00
ArthurSens
1041c76306 Add alert for target down
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-07-29 17:28:24 -03:00
ArthurSens
5092ab3934 Add alert for OpenVSX-proxy scraping failures
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-07-29 09:41:23 -03:00
Jean Pierre
eec738731c Add openvsx alert 2022-07-29 01:46:23 -03:00
Manuel Alejandro de Brito Fontes
a5dd648f06 Add dashboard for node problem detector 2022-07-26 16:05:21 -03:00
Manuel Alejandro de Brito Fontes
70eaa01676 Add dashboard for ephemeral storage 2022-07-26 15:24:21 -03:00
Arthur Silva Sens
cd28f4c34d Route GitpodWorkspaceStuckOnStarting to #t_workspace_alerts 2022-07-26 14:15:21 -03:00
Manuel Alejandro de Brito Fontes
c7474500ae Improve Summary dashboard row 2022-07-26 06:07:20 -03:00
Milan Pavlik
6893677724 [usage] Add grafana dashboard 2022-07-26 03:15:20 -03:00