298 Commits

Author SHA1 Message Date
Gero Posmyk-Leinemann
c025034894 [cleanup] Remove kedge 2022-05-27 14:51:45 +05:30
ArthurSens
49de2b95a0 Add example dashboard for monitoring self-hosted 2022-05-26 21:52:45 +05:30
Pavel Tumik
a6d5ec1055 [observability] add horizontal line for load avg panel 2022-05-24 19:19:39 +05:30
Pavel Tumik
ad8d971176 [alerts] add alert when autoscaler adds nodes rapidly 2022-05-19 12:55:34 +05:30
Prince Rachit Sinha
5045e85f2a [observability] Add alerts for pending phase 2022-05-02 16:39:18 +05:30
Milan Pavlik
0b2aa0d34b [wsm-bridge] Dashboard with health metrics 2022-04-29 18:59:15 +05:30
Prince Rachit Sinha
6f55a53d4e [observability] Add filtering n update link n vars 2022-04-27 17:09:13 +05:30
Prince Rachit Sinha
4515e369fc
[observability] Fix incorrect datasource (#9543) 2022-04-26 11:46:49 +02:00
Prince Rachit Sinha
994ddfe03e [observability] Fix Dashboard Error and add Nodepool 2022-04-26 14:07:38 +05:30
Prince Rachit Sinha
de0c0e80a4 [observability] Add GitpodWorkspacesNotStarting alert 2022-04-26 08:53:38 +05:30
Milan Pavlik
75bd9fc4be [ws-man-bridge] Fix Workspace Instance status update dashboard to work across web-app clusters 2022-04-25 17:33:37 +05:30
Prince Rachit Sinha
d7bf168392 [Observability] Add node resource consumption dashboards
Add grafana dashboard with id 11074 and update it to include cluster
level filtering and translation of chinese language to english.
2022-04-22 13:17:34 +05:30
Prince Rachit Sinha
64fbd1e841 [observability] Add alert rule for high ws failure 2022-04-18 22:09:31 +05:30
Prince Rachit Sinha
1d937922ca [observability] Update success criteria formula 2022-04-06 22:22:20 +05:30
Prince Rachit Sinha
568ed40373 [dashboard] Update variable refresh option 2022-04-05 03:12:18 +05:30
Prince Rachit Sinha
d1c610e55d [dashboard] Update success criteria dashboard 2022-04-04 15:24:18 +05:30
Prince Rachit Sinha
d73025c49b [observability] Update agent smith graph 2022-03-24 11:31:07 +05:30
Prince Rachit Sinha
306c7e9179 [observability] Add agent-smith egress violations graph 2022-03-17 18:01:23 +05:30
Gero Posmyk-Leinemann
76ef1af38e [ops] Add alert 'InstanceStartFailures' 2022-03-07 22:30:14 +05:30
Gero Posmyk-Leinemann
8aa11bd566 [ops] WebApp Overview: Add graph for "Instance Starts Success/Failure Rate" 2022-03-07 22:30:14 +05:30
Prince Rachit Sinha
aea24d85f8 Update runbook url for GitpodWsDaemonExcessiveGC 2022-03-01 21:19:08 +05:30
Pavel Tumik
ebb2a33667 change alert period for ws stuck on starting or stopping to 20m 2022-03-01 06:22:08 +05:30
Manuel Alejandro de Brito Fontes
90fe82a508 Remove ghost from the codebase 2022-02-28 14:17:07 +05:30
Prince Rachit Sinha
a48e177120 Add alerts for excessive GC of ws-daemon 2022-02-14 11:25:35 +01:00
Prince Rachit Sinha
95592d00d8 Update run book ref for GitpodWorkspaceTooManyRegularNotActive 2022-02-09 20:04:31 +01:00
Prince Rachit Sinha
2a3e4d60f3 Update GitpodWorkspaceTooManyRegularNotActive severity level 2022-02-09 20:04:31 +01:00
Mads Hartmann
dd8b5b728a Remove OWNERS related files
Fixes https://github.com/gitpod-io/ops/issues/844
2022-02-08 09:15:30 +01:00
Pavel Tumik
8c7cb822ed add alert for conntrack table getting full 2022-02-08 04:42:30 +01:00
Pavel Tumik
a33a4a08a8 [observability] add coredns dashboard 2022-02-04 20:36:26 +01:00
ArthurSens
f6575f7f91 observability/mixins: Add Makefile step that generates dashboards
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-02-02 11:32:24 +01:00
ArthurSens
ac87de0a4c observability/mixins: Remove dashboard linter
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-02-02 11:23:24 +01:00
Laurie T. Malau
9ad62ce3fa Fix server dashboard default time range 2022-01-31 13:38:23 +01:00
Thomas Schubart
b396c23d3a Show difference between agent-smith and ws-manager 2022-01-31 10:28:22 +01:00
Thomas Schubart
c694273d37 Update dashboard 2022-01-31 10:28:22 +01:00
Pavel Tumik
2fb5775ef7 add metric to track failed manifest requests from registry-facade 2022-01-31 10:11:22 +01:00
ArthurSens
d5f92dc8e9 Update monitoring-satellite documentation
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-01-28 10:54:20 +01:00
George Tsiolis
31dfc5bd6b Update WebApp team label in component owners 2022-01-26 10:17:18 +01:00
Jan Koehnlein
d30815e685 [owners] rename team meta to webapp 2022-01-26 08:27:17 +01:00
Manuel Alejandro de Brito Fontes
82d786e2bb Remove ws-scheduler 2022-01-24 20:08:17 +01:00
Kyle Brennan
71f543110f Trigger node high load warnings sooner 2022-01-21 22:24:14 +01:00
Thomas Schubart
c62ec6633b Trigger node high load warnings sooner 2022-01-21 17:41:13 +01:00
Thomas Schubart
ae1d476e34 Update workspace success criteria
The threshold line for workspace startup 95% case should be 40 seconds,
not 30 seconds.
2022-01-20 10:27:12 +01:00
Laurie T. Malau
ea76aec273 Add metric and plug in 2022-01-13 15:52:06 +01:00
Manuel Alejandro de Brito Fontes
4935b242b7 Remove workspace deployment 2022-01-01 13:34:55 +01:00
Kyle Brennan
821d463fb9 Helm is needed to support Observability 2021-12-23 22:43:47 +01:00
Kyle Brennan
efe25d96f2 Fixes #7335
Handle "no data" by adding 'on() vector(0)' to each numerator
Relies on new variable $datasource
Also fixes legend for workspace startup panel
When exporting from Grafana, disable "export for sharing externally"
2021-12-23 22:43:47 +01:00
Gero Posmyk-Leinemann
893036754e [ops] Meta: Add alert ServerEventLoopLagTooHigh 2021-12-18 12:06:42 +01:00
Gero Posmyk-Leinemann
05ec1e39a8 [ops] Dashboard: Fix all server dashboard queries 2021-12-18 12:06:42 +01:00
Gero Posmyk-Leinemann
f21bd2fa59 [ops] Dashboard: fix Meta Overview 2021-12-18 12:06:42 +01:00
Prince Rachit Sinha
6285314e0e Add K3s cluster autoscaler dashboard 2021-12-16 07:36:40 +01:00