298 Commits

Author SHA1 Message Date
Milan Pavlik
d9dc8f0623 [grafana] Fix grpc / client dashboard 2022-09-26 14:15:26 +02:00
Milan Pavlik
95b5fb128c [usage] Fix alerting rules 2022-09-26 09:11:26 +02:00
Milan Pavlik
a9c85bd4b3 [grafana] Add gRPC Client dashboard with metrics 2022-09-25 01:12:25 +02:00
Kyle Brennan
f2cf455e4c Link existing KubeNodeNotReady runbook to alert 2022-09-24 02:25:24 +02:00
Milan Pavlik
960f82a3cb [server] Add dashboard for garbage collection 2022-09-23 10:13:23 +02:00
mustard
90e3308f89 [observability] add resources load failed ratio to code-browser 2022-09-22 13:08:23 +02:00
Aleksandar Aleksandrov
4e4e2b5a79 Increase the evaluation time for CPU|MemoryOvercommit alerts 2022-09-22 10:04:23 +02:00
Pavel Tumik @ GitPod
ea8fbdc4dd fix prometheus rule for workspaces 2022-09-21 22:16:22 +02:00
Milan Pavlik
91c40c79a5 [usage] Alert on Stripe webhook handling 2022-09-21 13:56:22 +02:00
Milan Pavlik
7406ac171a [usage] Add link to GCP logs and grpc server dashboard 2022-09-21 13:55:22 +02:00
Milan Pavlik
2206f7e8b4 [usage] Alert on stripe webhooks and invoice finalize 2022-09-21 13:54:22 +02:00
Thomas Schubart
f61eacf1e8 [obs] Fix dashboard import error 2022-09-20 22:46:21 +02:00
Milan Pavlik
0d7563a24c [dashboard] gRPC dashboard filters requests started on selected method 2022-09-20 12:21:21 +02:00
Milan Pavlik
0fb5171b6d [usage] Remove trailing quote from alerts 2022-09-20 10:03:21 +02:00
Milan Pavlik
30cffea01a [image-builder] Move dashboard to team Workspace 2022-09-19 20:24:20 +02:00
Thomas Schubart
bf5917f631 [obs] Add network limiting overview panel 2022-09-19 20:22:20 +02:00
Thomas Schubart
243ee21379 [obs] Fix display of network limiting stats
- Ensure data source is selected
- Use network limiting stats for sourcing workspace and node
2022-09-16 01:00:16 +02:00
Milan Pavlik
2c4ffa4de5 [usage] Extend usage component dashboard with time since last ledger run 2022-09-15 13:54:17 +02:00
Milan Pavlik
df234b9126 [usage] Add warning alert when usage reconciliation has not been succesfull for more than an hour 2022-09-15 13:45:16 +02:00
Milan Pavlik
22909356d3 [usage] Alert on Usage and Invoice Reconciliations 2022-09-15 09:09:16 +02:00
Milan Pavlik
49fc7b9187 [usage] Update usage dashboard to track scheduled jobs 2022-09-14 15:47:15 +02:00
Milan Pavlik
b3d61ac9ca [dashboards] Fix gRPC/Server variable selection 2022-09-14 11:06:15 +02:00
Milan Pavlik
197d9b897c [dashboards] Add gRPC / Server dashboard 2022-09-12 03:33:12 +02:00
Thomas Schubart
fc2b4422c6 Import network limiting dashboard 2022-09-09 13:59:24 +02:00
Arthur Silva Sens
b884c9465c Add dashboard URL for KubeCPUOvercommit and KubeMemoryOvercommit 2022-09-09 08:03:24 +02:00
ArthurSens
9b382b6f69 Fix PrometheusRule name
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-09-08 20:31:23 +02:00
ArthurSens
7c354c9a38 Replace workspace alerts from jsonnet to YAML
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-09-08 15:46:23 +02:00
ArthurSens
e0bed466e7 Replace webapp alerts from jsonnet to YAML
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-09-08 14:07:23 +02:00
ArthurSens
bb38ad3ba1 Remove leeway build intrusctions
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-09-08 13:33:23 +02:00
ArthurSens
28b014fdc1 Replace IDE alerts from jsonnet to raw YAML
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-09-08 13:33:23 +02:00
Anton Kosyakov
dc9fbe40a7 [code-browser] extensions observability 2022-09-05 12:35:20 +02:00
ArthurSens
894359d4a4 Add prometheus-operator alerts
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-09-02 13:54:17 +02:00
ArthurSens
3b94dd1e63 Add Prometheus alerts
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-09-02 13:54:17 +02:00
Gero Posmyk-Leinemann
81c50611b4 [ops] WebApp: Fix WebAppServicesCrashlooping 2022-09-02 10:02:17 +02:00
Gero Posmyk-Leinemann
b2d0edde79 [ops] WebApp: Remove rate(memory): rate(gauge) does not work 2022-09-02 10:02:17 +02:00
Gero Posmyk-Leinemann
328f48664b [ops] WebApp: Fix alert WebsocketConnectionRateHigh by using a rate(total) instead of rate(gauge) 2022-09-02 10:02:17 +02:00
ArthurSens
d31b43fed3 Add kube-state-metrics alerts
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-09-02 09:33:17 +02:00
ArthurSens
18a533db56 Create alertmanager alerts
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-09-02 09:33:17 +02:00
ArthurSens
1d013d794c Add alerts for kubernetes nodes
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-09-02 09:26:18 +02:00
mustard
2f3b432707 [observability] add cluster selector for browser overview dashboard 2022-09-01 08:30:16 +02:00
Gero Posmyk-Leinemann
73cbd09b66 [ops] WebApp: review comments 2022-08-31 16:07:16 +02:00
Gero Posmyk-Leinemann
cb30274ccc [ops] WebApp: Alert on services crashlooping 2022-08-31 16:07:16 +02:00
Gero Posmyk-Leinemann
ddf5651b7c [ops] WebApp: Alerts on exessive RAM and CPU usage 2022-08-31 16:07:16 +02:00
Gero Posmyk-Leinemann
8701732104 [ops] WebApp: alert if db-sync is not running 2022-08-31 16:07:16 +02:00
Gero Posmyk-Leinemann
4035d1242a [ops] WebApp: alert on messagebus not running 2022-08-31 16:07:16 +02:00
Gero Posmyk-Leinemann
d394c24727 [ops] WebApp: high websocket connection rate 2022-08-31 16:07:16 +02:00
Gero Posmyk-Leinemann
5a00651ffd [ops] WebApp: Internal alert on JSON RPC error rates 2022-08-31 16:07:16 +02:00
Gero Posmyk-Leinemann
313b522c59 [ops] Meta Overview/server: Fix unit of "API Request Error rate" to be reqps 2022-08-31 16:07:16 +02:00
Thomas Schubart
ccb148f2a6 [observability] Add dashboard for network limiting 2022-08-31 10:27:15 +02:00
ArthurSens
aee56a583b Add alerts related to kubernetes resources
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-08-30 22:20:15 +02:00