Arthur Silva Sens
b884c9465c
Add dashboard URL for KubeCPUOvercommit and KubeMemoryOvercommit
2022-09-09 08:03:24 +02:00
ArthurSens
9b382b6f69
Fix PrometheusRule name
...
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-09-08 20:31:23 +02:00
ArthurSens
7c354c9a38
Replace workspace alerts from jsonnet to YAML
...
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-09-08 15:46:23 +02:00
ArthurSens
e0bed466e7
Replace webapp alerts from jsonnet to YAML
...
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-09-08 14:07:23 +02:00
ArthurSens
bb38ad3ba1
Remove leeway build intrusctions
...
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-09-08 13:33:23 +02:00
ArthurSens
28b014fdc1
Replace IDE alerts from jsonnet to raw YAML
...
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-09-08 13:33:23 +02:00
Anton Kosyakov
dc9fbe40a7
[code-browser] extensions observability
2022-09-05 12:35:20 +02:00
ArthurSens
894359d4a4
Add prometheus-operator alerts
...
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-09-02 13:54:17 +02:00
ArthurSens
3b94dd1e63
Add Prometheus alerts
...
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-09-02 13:54:17 +02:00
Gero Posmyk-Leinemann
81c50611b4
[ops] WebApp: Fix WebAppServicesCrashlooping
2022-09-02 10:02:17 +02:00
Gero Posmyk-Leinemann
b2d0edde79
[ops] WebApp: Remove rate(memory): rate(gauge) does not work
2022-09-02 10:02:17 +02:00
Gero Posmyk-Leinemann
328f48664b
[ops] WebApp: Fix alert WebsocketConnectionRateHigh by using a rate(total) instead of rate(gauge)
2022-09-02 10:02:17 +02:00
ArthurSens
d31b43fed3
Add kube-state-metrics alerts
...
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-09-02 09:33:17 +02:00
ArthurSens
18a533db56
Create alertmanager alerts
...
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-09-02 09:33:17 +02:00
ArthurSens
1d013d794c
Add alerts for kubernetes nodes
...
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-09-02 09:26:18 +02:00
mustard
2f3b432707
[observability] add cluster selector for browser overview dashboard
2022-09-01 08:30:16 +02:00
Gero Posmyk-Leinemann
73cbd09b66
[ops] WebApp: review comments
2022-08-31 16:07:16 +02:00
Gero Posmyk-Leinemann
cb30274ccc
[ops] WebApp: Alert on services crashlooping
2022-08-31 16:07:16 +02:00
Gero Posmyk-Leinemann
ddf5651b7c
[ops] WebApp: Alerts on exessive RAM and CPU usage
2022-08-31 16:07:16 +02:00
Gero Posmyk-Leinemann
8701732104
[ops] WebApp: alert if db-sync is not running
2022-08-31 16:07:16 +02:00
Gero Posmyk-Leinemann
4035d1242a
[ops] WebApp: alert on messagebus not running
2022-08-31 16:07:16 +02:00
Gero Posmyk-Leinemann
d394c24727
[ops] WebApp: high websocket connection rate
2022-08-31 16:07:16 +02:00
Gero Posmyk-Leinemann
5a00651ffd
[ops] WebApp: Internal alert on JSON RPC error rates
2022-08-31 16:07:16 +02:00
Gero Posmyk-Leinemann
313b522c59
[ops] Meta Overview/server: Fix unit of "API Request Error rate" to be reqps
2022-08-31 16:07:16 +02:00
Thomas Schubart
ccb148f2a6
[observability] Add dashboard for network limiting
2022-08-31 10:27:15 +02:00
ArthurSens
aee56a583b
Add alerts related to kubernetes resources
...
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-08-30 22:20:15 +02:00
mustard
f62bc58f7f
[observability] add browser overview dashboard
2022-08-30 15:02:15 +02:00
ArthurSens
912410cdb0
Create alerts for certmanager
...
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-08-29 21:12:14 +02:00
Kyle Brennan
074d9842ca
Sum by type, phase (avoids nodepools dupes)
2022-08-25 18:34:10 +02:00
JenTing Hsiao
b31278f1f1
observability: assign default zero if no data found
...
Signed-off-by: JenTing Hsiao <hsiaoairplane@gmail.com>
2022-08-25 13:41:10 +02:00
Pavel Tumik @ GitPod
cc79d75a96
[alerts] increase GitpodWorkspaceStuckOnStopping for time to 30min to reduce flakiness
2022-08-23 20:32:39 +02:00
Manuel Alejandro de Brito Fontes
7b4a885ee3
Update k8s dependencies to v0.24.3
2022-08-23 08:18:39 +02:00
JenTing Hsiao
d5462c0d02
observability: add #workspace > 20 in alert GitpodWorkspaceTooManyRegularNotActive
...
To prevent the alert from being triggered once we start traffic shifting.
The number of workspaces might be low, this cause the
gitpod_workspace_regular_not_active_percentage is easily to hit because
the gitpod_ws_manager_workspace_activity_total is low number.
Therefore, we add #workspace > 20 as another criterion for the alert.
Signed-off-by: JenTing Hsiao <hsiaoairplane@gmail.com>
2022-08-11 16:35:28 +02:00
Arthur Silva Sens
79fdcdd0be
Update scrape.libsonnet
2022-08-10 14:11:54 +02:00
utam0k
2d1f66ae25
observability: Add a alert for the network connections.
2022-08-10 05:55:54 +02:00
JenTing Hsiao
a986791728
Update the alert description unit
...
Signed-off-by: JenTing Hsiao <hsiaoairplane@gmail.com>
2022-08-09 02:40:52 -03:00
Andrew Farries
c4363513a5
Run gofmt
...
gofmt -w .
From the repository root.
2022-08-08 10:54:52 -03:00
Pavel Tumik
06a686acf1
[alerts] change load avg alert to critical
2022-08-05 16:11:49 -03:00
Milan Pavlik
fc2355c241
[usage] Add runbook link for GitpodUsageScheduledReconciliationFailures
2022-08-05 09:34:49 -03:00
Milan Pavlik
9a947a5a81
[usage] Fix UsageReconciliationFailures alert
2022-08-05 02:42:49 -03:00
Milan Pavlik
63f3bb78ae
[usage] Add alert on failed reconciliations
2022-08-04 05:32:48 -03:00
Arthur Silva Sens
5bdda8ecea
Remove check for absense
2022-08-01 10:31:45 -03:00
ArthurSens
1041c76306
Add alert for target down
...
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-07-29 17:28:24 -03:00
ArthurSens
5092ab3934
Add alert for OpenVSX-proxy scraping failures
...
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-07-29 09:41:23 -03:00
Jean Pierre
eec738731c
Add openvsx alert
2022-07-29 01:46:23 -03:00
Manuel Alejandro de Brito Fontes
a5dd648f06
Add dashboard for node problem detector
2022-07-26 16:05:21 -03:00
Manuel Alejandro de Brito Fontes
70eaa01676
Add dashboard for ephemeral storage
2022-07-26 15:24:21 -03:00
Arthur Silva Sens
cd28f4c34d
Route GitpodWorkspaceStuckOnStarting to #t_workspace_alerts
2022-07-26 14:15:21 -03:00
Manuel Alejandro de Brito Fontes
c7474500ae
Improve Summary dashboard row
2022-07-26 06:07:20 -03:00
Milan Pavlik
6893677724
[usage] Add grafana dashboard
2022-07-26 03:15:20 -03:00