311 Commits

Author SHA1 Message Date
Mads Hartmann
7b7282e427 Add text panel with link to Preview Start SLO 2022-06-20 17:36:29 +05:30
Anton Kosyakov
bbee5a8df2 [jb] actually fix dashboard 2022-06-17 13:49:26 +05:30
Anton Kosyakov
92ddbcdf24 [jb] fix dashboard 2022-06-17 13:00:26 +05:30
Jan Keromnes
1858e5d61b Remove critical alert GitpodWsDaemonExcessiveGC > 60s (but keep the non-critical warning for now) 2022-06-16 20:56:26 +05:30
Anton Kosyakov
4ff0c7e3b8 [jb]: monitor low memory notifications and gc overhead 2022-06-16 19:41:25 +05:30
Anton Kosyakov
d3dd3be018 Update ssh gw dashboard
Simplify it to show most important info:
- throughput of valid and suspicious traffic
- total failure ratios and error ratios by kind
2022-06-16 17:45:25 +05:30
Arthur Silva Sens
0d02877ace Update preview environment dashboard 2022-06-09 15:39:19 +05:30
Pudong Zheng
931aed87d2 [observability] add SSH gateway overview dashboard 2022-06-08 08:59:17 +05:30
ArthurSens
3a52ba0d10 Add dashboard to monitor preview environments
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-06-06 21:04:17 +05:30
Moritz Eysholdt
65beac91df Fix runbook URL 2022-06-03 22:59:52 +05:30
André Duarte
135a7de765 [observability] Add SLI numbers to the Workspace Success Criteria Dashboard 2022-06-01 17:44:50 +05:30
André Duarte
24031c69ea Improve appearance of Workspace Success Criteria Dashboard 2022-05-30 22:09:48 +05:30
André Duarte
4be3bc9643 Improve Workspace Success Criteria Dashboard
* Ignore non-production clusters (i.e. `prod-meta-.*` and `ephemeral.*`)
* Create a view that represents the overall P50 and P95 workspace startup times
* Group "By Cluster" views in a collapsable row
2022-05-30 22:09:48 +05:30
Gero Posmyk-Leinemann
c025034894 [cleanup] Remove kedge 2022-05-27 14:51:45 +05:30
ArthurSens
49de2b95a0 Add example dashboard for monitoring self-hosted 2022-05-26 21:52:45 +05:30
Pavel Tumik
a6d5ec1055 [observability] add horizontal line for load avg panel 2022-05-24 19:19:39 +05:30
Pavel Tumik
ad8d971176 [alerts] add alert when autoscaler adds nodes rapidly 2022-05-19 12:55:34 +05:30
Prince Rachit Sinha
5045e85f2a [observability] Add alerts for pending phase 2022-05-02 16:39:18 +05:30
Milan Pavlik
0b2aa0d34b [wsm-bridge] Dashboard with health metrics 2022-04-29 18:59:15 +05:30
Prince Rachit Sinha
6f55a53d4e [observability] Add filtering n update link n vars 2022-04-27 17:09:13 +05:30
Prince Rachit Sinha
4515e369fc
[observability] Fix incorrect datasource (#9543) 2022-04-26 11:46:49 +02:00
Prince Rachit Sinha
994ddfe03e [observability] Fix Dashboard Error and add Nodepool 2022-04-26 14:07:38 +05:30
Prince Rachit Sinha
de0c0e80a4 [observability] Add GitpodWorkspacesNotStarting alert 2022-04-26 08:53:38 +05:30
Milan Pavlik
75bd9fc4be [ws-man-bridge] Fix Workspace Instance status update dashboard to work across web-app clusters 2022-04-25 17:33:37 +05:30
Prince Rachit Sinha
d7bf168392 [Observability] Add node resource consumption dashboards
Add grafana dashboard with id 11074 and update it to include cluster
level filtering and translation of chinese language to english.
2022-04-22 13:17:34 +05:30
Prince Rachit Sinha
64fbd1e841 [observability] Add alert rule for high ws failure 2022-04-18 22:09:31 +05:30
Prince Rachit Sinha
1d937922ca [observability] Update success criteria formula 2022-04-06 22:22:20 +05:30
Prince Rachit Sinha
568ed40373 [dashboard] Update variable refresh option 2022-04-05 03:12:18 +05:30
Prince Rachit Sinha
d1c610e55d [dashboard] Update success criteria dashboard 2022-04-04 15:24:18 +05:30
Prince Rachit Sinha
d73025c49b [observability] Update agent smith graph 2022-03-24 11:31:07 +05:30
Prince Rachit Sinha
306c7e9179 [observability] Add agent-smith egress violations graph 2022-03-17 18:01:23 +05:30
Gero Posmyk-Leinemann
76ef1af38e [ops] Add alert 'InstanceStartFailures' 2022-03-07 22:30:14 +05:30
Gero Posmyk-Leinemann
8aa11bd566 [ops] WebApp Overview: Add graph for "Instance Starts Success/Failure Rate" 2022-03-07 22:30:14 +05:30
Prince Rachit Sinha
aea24d85f8 Update runbook url for GitpodWsDaemonExcessiveGC 2022-03-01 21:19:08 +05:30
Pavel Tumik
ebb2a33667 change alert period for ws stuck on starting or stopping to 20m 2022-03-01 06:22:08 +05:30
Manuel Alejandro de Brito Fontes
90fe82a508 Remove ghost from the codebase 2022-02-28 14:17:07 +05:30
Prince Rachit Sinha
a48e177120 Add alerts for excessive GC of ws-daemon 2022-02-14 11:25:35 +01:00
Prince Rachit Sinha
95592d00d8 Update run book ref for GitpodWorkspaceTooManyRegularNotActive 2022-02-09 20:04:31 +01:00
Prince Rachit Sinha
2a3e4d60f3 Update GitpodWorkspaceTooManyRegularNotActive severity level 2022-02-09 20:04:31 +01:00
Mads Hartmann
dd8b5b728a Remove OWNERS related files
Fixes https://github.com/gitpod-io/ops/issues/844
2022-02-08 09:15:30 +01:00
Pavel Tumik
8c7cb822ed add alert for conntrack table getting full 2022-02-08 04:42:30 +01:00
Pavel Tumik
a33a4a08a8 [observability] add coredns dashboard 2022-02-04 20:36:26 +01:00
ArthurSens
f6575f7f91 observability/mixins: Add Makefile step that generates dashboards
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-02-02 11:32:24 +01:00
ArthurSens
ac87de0a4c observability/mixins: Remove dashboard linter
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-02-02 11:23:24 +01:00
Laurie T. Malau
9ad62ce3fa Fix server dashboard default time range 2022-01-31 13:38:23 +01:00
Thomas Schubart
b396c23d3a Show difference between agent-smith and ws-manager 2022-01-31 10:28:22 +01:00
Thomas Schubart
c694273d37 Update dashboard 2022-01-31 10:28:22 +01:00
Pavel Tumik
2fb5775ef7 add metric to track failed manifest requests from registry-facade 2022-01-31 10:11:22 +01:00
ArthurSens
d5f92dc8e9 Update monitoring-satellite documentation
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-01-28 10:54:20 +01:00
George Tsiolis
31dfc5bd6b Update WebApp team label in component owners 2022-01-26 10:17:18 +01:00