Gero Posmyk-Leinemann
c025034894
[cleanup] Remove kedge
2022-05-27 14:51:45 +05:30
ArthurSens
49de2b95a0
Add example dashboard for monitoring self-hosted
2022-05-26 21:52:45 +05:30
Pavel Tumik
a6d5ec1055
[observability] add horizontal line for load avg panel
2022-05-24 19:19:39 +05:30
Pavel Tumik
ad8d971176
[alerts] add alert when autoscaler adds nodes rapidly
2022-05-19 12:55:34 +05:30
Prince Rachit Sinha
5045e85f2a
[observability] Add alerts for pending phase
2022-05-02 16:39:18 +05:30
Milan Pavlik
0b2aa0d34b
[wsm-bridge] Dashboard with health metrics
2022-04-29 18:59:15 +05:30
Prince Rachit Sinha
6f55a53d4e
[observability] Add filtering n update link n vars
2022-04-27 17:09:13 +05:30
Prince Rachit Sinha
4515e369fc
[observability] Fix incorrect datasource ( #9543 )
2022-04-26 11:46:49 +02:00
Prince Rachit Sinha
994ddfe03e
[observability] Fix Dashboard Error and add Nodepool
2022-04-26 14:07:38 +05:30
Prince Rachit Sinha
de0c0e80a4
[observability] Add GitpodWorkspacesNotStarting alert
2022-04-26 08:53:38 +05:30
Milan Pavlik
75bd9fc4be
[ws-man-bridge] Fix Workspace Instance status update dashboard to work across web-app clusters
2022-04-25 17:33:37 +05:30
Prince Rachit Sinha
d7bf168392
[Observability] Add node resource consumption dashboards
...
Add grafana dashboard with id 11074 and update it to include cluster
level filtering and translation of chinese language to english.
2022-04-22 13:17:34 +05:30
Prince Rachit Sinha
64fbd1e841
[observability] Add alert rule for high ws failure
2022-04-18 22:09:31 +05:30
Prince Rachit Sinha
1d937922ca
[observability] Update success criteria formula
2022-04-06 22:22:20 +05:30
Prince Rachit Sinha
568ed40373
[dashboard] Update variable refresh option
2022-04-05 03:12:18 +05:30
Prince Rachit Sinha
d1c610e55d
[dashboard] Update success criteria dashboard
2022-04-04 15:24:18 +05:30
Prince Rachit Sinha
d73025c49b
[observability] Update agent smith graph
2022-03-24 11:31:07 +05:30
Prince Rachit Sinha
306c7e9179
[observability] Add agent-smith egress violations graph
2022-03-17 18:01:23 +05:30
Gero Posmyk-Leinemann
76ef1af38e
[ops] Add alert 'InstanceStartFailures'
2022-03-07 22:30:14 +05:30
Gero Posmyk-Leinemann
8aa11bd566
[ops] WebApp Overview: Add graph for "Instance Starts Success/Failure Rate"
2022-03-07 22:30:14 +05:30
Prince Rachit Sinha
aea24d85f8
Update runbook url for GitpodWsDaemonExcessiveGC
2022-03-01 21:19:08 +05:30
Pavel Tumik
ebb2a33667
change alert period for ws stuck on starting or stopping to 20m
2022-03-01 06:22:08 +05:30
Manuel Alejandro de Brito Fontes
90fe82a508
Remove ghost from the codebase
2022-02-28 14:17:07 +05:30
Prince Rachit Sinha
a48e177120
Add alerts for excessive GC of ws-daemon
2022-02-14 11:25:35 +01:00
Prince Rachit Sinha
95592d00d8
Update run book ref for GitpodWorkspaceTooManyRegularNotActive
2022-02-09 20:04:31 +01:00
Prince Rachit Sinha
2a3e4d60f3
Update GitpodWorkspaceTooManyRegularNotActive severity level
2022-02-09 20:04:31 +01:00
Mads Hartmann
dd8b5b728a
Remove OWNERS related files
...
Fixes https://github.com/gitpod-io/ops/issues/844
2022-02-08 09:15:30 +01:00
Pavel Tumik
8c7cb822ed
add alert for conntrack table getting full
2022-02-08 04:42:30 +01:00
Pavel Tumik
a33a4a08a8
[observability] add coredns dashboard
2022-02-04 20:36:26 +01:00
ArthurSens
f6575f7f91
observability/mixins: Add Makefile step that generates dashboards
...
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-02-02 11:32:24 +01:00
ArthurSens
ac87de0a4c
observability/mixins: Remove dashboard linter
...
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-02-02 11:23:24 +01:00
Laurie T. Malau
9ad62ce3fa
Fix server dashboard default time range
2022-01-31 13:38:23 +01:00
Thomas Schubart
b396c23d3a
Show difference between agent-smith and ws-manager
2022-01-31 10:28:22 +01:00
Thomas Schubart
c694273d37
Update dashboard
2022-01-31 10:28:22 +01:00
Pavel Tumik
2fb5775ef7
add metric to track failed manifest requests from registry-facade
2022-01-31 10:11:22 +01:00
ArthurSens
d5f92dc8e9
Update monitoring-satellite documentation
...
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-01-28 10:54:20 +01:00
George Tsiolis
31dfc5bd6b
Update WebApp team label in component owners
2022-01-26 10:17:18 +01:00
Jan Koehnlein
d30815e685
[owners] rename team meta to webapp
2022-01-26 08:27:17 +01:00
Manuel Alejandro de Brito Fontes
82d786e2bb
Remove ws-scheduler
2022-01-24 20:08:17 +01:00
Kyle Brennan
71f543110f
Trigger node high load warnings sooner
2022-01-21 22:24:14 +01:00
Thomas Schubart
c62ec6633b
Trigger node high load warnings sooner
2022-01-21 17:41:13 +01:00
Thomas Schubart
ae1d476e34
Update workspace success criteria
...
The threshold line for workspace startup 95% case should be 40 seconds,
not 30 seconds.
2022-01-20 10:27:12 +01:00
Laurie T. Malau
ea76aec273
Add metric and plug in
2022-01-13 15:52:06 +01:00
Manuel Alejandro de Brito Fontes
4935b242b7
Remove workspace deployment
2022-01-01 13:34:55 +01:00
Kyle Brennan
821d463fb9
Helm is needed to support Observability
2021-12-23 22:43:47 +01:00
Kyle Brennan
efe25d96f2
Fixes #7335
...
Handle "no data" by adding 'on() vector(0)' to each numerator
Relies on new variable $datasource
Also fixes legend for workspace startup panel
When exporting from Grafana, disable "export for sharing externally"
2021-12-23 22:43:47 +01:00
Gero Posmyk-Leinemann
893036754e
[ops] Meta: Add alert ServerEventLoopLagTooHigh
2021-12-18 12:06:42 +01:00
Gero Posmyk-Leinemann
05ec1e39a8
[ops] Dashboard: Fix all server dashboard queries
2021-12-18 12:06:42 +01:00
Gero Posmyk-Leinemann
f21bd2fa59
[ops] Dashboard: fix Meta Overview
2021-12-18 12:06:42 +01:00
Prince Rachit Sinha
6285314e0e
Add K3s cluster autoscaler dashboard
2021-12-16 07:36:40 +01:00