311 Commits

Author SHA1 Message Date
ArthurSens
18a533db56 Create alertmanager alerts
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-09-02 09:33:17 +02:00
ArthurSens
1d013d794c Add alerts for kubernetes nodes
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-09-02 09:26:18 +02:00
mustard
2f3b432707 [observability] add cluster selector for browser overview dashboard 2022-09-01 08:30:16 +02:00
Gero Posmyk-Leinemann
73cbd09b66 [ops] WebApp: review comments 2022-08-31 16:07:16 +02:00
Gero Posmyk-Leinemann
cb30274ccc [ops] WebApp: Alert on services crashlooping 2022-08-31 16:07:16 +02:00
Gero Posmyk-Leinemann
ddf5651b7c [ops] WebApp: Alerts on exessive RAM and CPU usage 2022-08-31 16:07:16 +02:00
Gero Posmyk-Leinemann
8701732104 [ops] WebApp: alert if db-sync is not running 2022-08-31 16:07:16 +02:00
Gero Posmyk-Leinemann
4035d1242a [ops] WebApp: alert on messagebus not running 2022-08-31 16:07:16 +02:00
Gero Posmyk-Leinemann
d394c24727 [ops] WebApp: high websocket connection rate 2022-08-31 16:07:16 +02:00
Gero Posmyk-Leinemann
5a00651ffd [ops] WebApp: Internal alert on JSON RPC error rates 2022-08-31 16:07:16 +02:00
Gero Posmyk-Leinemann
313b522c59 [ops] Meta Overview/server: Fix unit of "API Request Error rate" to be reqps 2022-08-31 16:07:16 +02:00
Thomas Schubart
ccb148f2a6 [observability] Add dashboard for network limiting 2022-08-31 10:27:15 +02:00
ArthurSens
aee56a583b Add alerts related to kubernetes resources
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-08-30 22:20:15 +02:00
mustard
f62bc58f7f [observability] add browser overview dashboard 2022-08-30 15:02:15 +02:00
ArthurSens
912410cdb0 Create alerts for certmanager
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-08-29 21:12:14 +02:00
Kyle Brennan
074d9842ca Sum by type, phase (avoids nodepools dupes) 2022-08-25 18:34:10 +02:00
JenTing Hsiao
b31278f1f1 observability: assign default zero if no data found
Signed-off-by: JenTing Hsiao <hsiaoairplane@gmail.com>
2022-08-25 13:41:10 +02:00
Pavel Tumik @ GitPod
cc79d75a96 [alerts] increase GitpodWorkspaceStuckOnStopping for time to 30min to reduce flakiness 2022-08-23 20:32:39 +02:00
Manuel Alejandro de Brito Fontes
7b4a885ee3 Update k8s dependencies to v0.24.3 2022-08-23 08:18:39 +02:00
JenTing Hsiao
d5462c0d02 observability: add #workspace > 20 in alert GitpodWorkspaceTooManyRegularNotActive
To prevent the alert from being triggered once we start traffic shifting.
The number of workspaces might be low, this cause the
gitpod_workspace_regular_not_active_percentage is easily to hit because
the gitpod_ws_manager_workspace_activity_total is low number.

Therefore, we add #workspace > 20 as another criterion for the alert.

Signed-off-by: JenTing Hsiao <hsiaoairplane@gmail.com>
2022-08-11 16:35:28 +02:00
Arthur Silva Sens
79fdcdd0be Update scrape.libsonnet 2022-08-10 14:11:54 +02:00
utam0k
2d1f66ae25 observability: Add a alert for the network connections. 2022-08-10 05:55:54 +02:00
JenTing Hsiao
a986791728 Update the alert description unit
Signed-off-by: JenTing Hsiao <hsiaoairplane@gmail.com>
2022-08-09 02:40:52 -03:00
Andrew Farries
c4363513a5 Run gofmt
gofmt -w .

From the repository root.
2022-08-08 10:54:52 -03:00
Pavel Tumik
06a686acf1 [alerts] change load avg alert to critical 2022-08-05 16:11:49 -03:00
Milan Pavlik
fc2355c241 [usage] Add runbook link for GitpodUsageScheduledReconciliationFailures 2022-08-05 09:34:49 -03:00
Milan Pavlik
9a947a5a81 [usage] Fix UsageReconciliationFailures alert 2022-08-05 02:42:49 -03:00
Milan Pavlik
63f3bb78ae [usage] Add alert on failed reconciliations 2022-08-04 05:32:48 -03:00
Arthur Silva Sens
5bdda8ecea Remove check for absense 2022-08-01 10:31:45 -03:00
ArthurSens
1041c76306 Add alert for target down
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-07-29 17:28:24 -03:00
ArthurSens
5092ab3934 Add alert for OpenVSX-proxy scraping failures
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-07-29 09:41:23 -03:00
Jean Pierre
eec738731c Add openvsx alert 2022-07-29 01:46:23 -03:00
Manuel Alejandro de Brito Fontes
a5dd648f06 Add dashboard for node problem detector 2022-07-26 16:05:21 -03:00
Manuel Alejandro de Brito Fontes
70eaa01676 Add dashboard for ephemeral storage 2022-07-26 15:24:21 -03:00
Arthur Silva Sens
cd28f4c34d Route GitpodWorkspaceStuckOnStarting to #t_workspace_alerts 2022-07-26 14:15:21 -03:00
Manuel Alejandro de Brito Fontes
c7474500ae Improve Summary dashboard row 2022-07-26 06:07:20 -03:00
Milan Pavlik
6893677724 [usage] Add grafana dashboard 2022-07-26 03:15:20 -03:00
Manuel Alejandro de Brito Fontes
b9db8b349b Add Summary row to Gitpod overview dashboard 2022-07-20 13:32:15 -03:00
Manuel Alejandro de Brito Fontes
18c764cbac Add dashboard for swap utilization per cluster and node 2022-07-19 19:40:14 -03:00
ArthurSens
735a30899f Update Preview env's dashboard with new metrics
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-07-18 16:39:13 +02:00
Pudong Zheng
25c5bfbecb [alerts] change alert for adding new nodes rapidly to only count if node type is regular workspace 2022-07-18 03:40:13 +02:00
Nandaja Varma
d13bdfd0cd Improve GitpodWorkspaceTooManyRegularNotActive alert 2022-07-11 06:09:58 +05:30
utam0k
04d945d216 obserbility: Add a alert for AutoscaleFailure. 2022-07-06 00:36:52 +05:30
JenTing Hsiao
7800a21c4d [alerts] fix pod/container/namespace not rendering
Because every time series is uniquely identified by its metric name
a set of labels, and every unique combination of key-value label pairs
represents a new alert for this time series.

There is no common value for these metrics
- kube_pod_container_status_restarts_total
- gitpod_ws_manager_workspace_backups_failure_total

Signed-off-by: JenTing Hsiao <hsiaoairplane@gmail.com>
2022-07-01 06:23:39 +05:30
Arthur Silva Sens
028ef2608b Update overview.json 2022-06-27 13:46:36 +05:30
utam0k
6c2705fbe4 observability: Ring the phone only when a data loss occurs with GitpodWsDaemonCrashLoopingg 2022-06-23 19:06:32 +05:30
Pavel Tumik
cf35903aff Apply suggestions from code review 2022-06-23 02:46:31 +05:30
Pavel Tumik
7e0fe457fb Apply suggestions from code review 2022-06-23 02:46:31 +05:30
utam0k
62859996d5 observability: Add GitpodWorkspaceTooLongTerminating alert. 2022-06-23 02:46:31 +05:30
ArthurSens
9be43de166 Add SLIs to preview-environment dashboard
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-06-22 12:15:31 +05:30