Manuel Alejandro de Brito Fontes
b9db8b349b
Add Summary row to Gitpod overview dashboard
2022-07-20 13:32:15 -03:00
Manuel Alejandro de Brito Fontes
18c764cbac
Add dashboard for swap utilization per cluster and node
2022-07-19 19:40:14 -03:00
ArthurSens
735a30899f
Update Preview env's dashboard with new metrics
...
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-07-18 16:39:13 +02:00
Pudong Zheng
25c5bfbecb
[alerts] change alert for adding new nodes rapidly to only count if node type is regular workspace
2022-07-18 03:40:13 +02:00
Nandaja Varma
d13bdfd0cd
Improve GitpodWorkspaceTooManyRegularNotActive alert
2022-07-11 06:09:58 +05:30
utam0k
04d945d216
obserbility: Add a alert for AutoscaleFailure.
2022-07-06 00:36:52 +05:30
JenTing Hsiao
7800a21c4d
[alerts] fix pod/container/namespace not rendering
...
Because every time series is uniquely identified by its metric name
a set of labels, and every unique combination of key-value label pairs
represents a new alert for this time series.
There is no common value for these metrics
- kube_pod_container_status_restarts_total
- gitpod_ws_manager_workspace_backups_failure_total
Signed-off-by: JenTing Hsiao <hsiaoairplane@gmail.com>
2022-07-01 06:23:39 +05:30
Arthur Silva Sens
028ef2608b
Update overview.json
2022-06-27 13:46:36 +05:30
utam0k
6c2705fbe4
observability: Ring the phone only when a data loss occurs with GitpodWsDaemonCrashLoopingg
2022-06-23 19:06:32 +05:30
Pavel Tumik
cf35903aff
Apply suggestions from code review
2022-06-23 02:46:31 +05:30
Pavel Tumik
7e0fe457fb
Apply suggestions from code review
2022-06-23 02:46:31 +05:30
utam0k
62859996d5
observability: Add GitpodWorkspaceTooLongTerminating alert.
2022-06-23 02:46:31 +05:30
ArthurSens
9be43de166
Add SLIs to preview-environment dashboard
...
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-06-22 12:15:31 +05:30
Mads Hartmann
7b7282e427
Add text panel with link to Preview Start SLO
2022-06-20 17:36:29 +05:30
Anton Kosyakov
bbee5a8df2
[jb] actually fix dashboard
2022-06-17 13:49:26 +05:30
Anton Kosyakov
92ddbcdf24
[jb] fix dashboard
2022-06-17 13:00:26 +05:30
Jan Keromnes
1858e5d61b
Remove critical alert GitpodWsDaemonExcessiveGC > 60s (but keep the non-critical warning for now)
2022-06-16 20:56:26 +05:30
Anton Kosyakov
4ff0c7e3b8
[jb]: monitor low memory notifications and gc overhead
2022-06-16 19:41:25 +05:30
Anton Kosyakov
d3dd3be018
Update ssh gw dashboard
...
Simplify it to show most important info:
- throughput of valid and suspicious traffic
- total failure ratios and error ratios by kind
2022-06-16 17:45:25 +05:30
Arthur Silva Sens
0d02877ace
Update preview environment dashboard
2022-06-09 15:39:19 +05:30
Pudong Zheng
931aed87d2
[observability] add SSH gateway overview dashboard
2022-06-08 08:59:17 +05:30
ArthurSens
3a52ba0d10
Add dashboard to monitor preview environments
...
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-06-06 21:04:17 +05:30
Moritz Eysholdt
65beac91df
Fix runbook URL
2022-06-03 22:59:52 +05:30
André Duarte
135a7de765
[observability] Add SLI numbers to the Workspace Success Criteria Dashboard
2022-06-01 17:44:50 +05:30
André Duarte
24031c69ea
Improve appearance of Workspace Success Criteria Dashboard
2022-05-30 22:09:48 +05:30
André Duarte
4be3bc9643
Improve Workspace Success Criteria Dashboard
...
* Ignore non-production clusters (i.e. `prod-meta-.*` and `ephemeral.*`)
* Create a view that represents the overall P50 and P95 workspace startup times
* Group "By Cluster" views in a collapsable row
2022-05-30 22:09:48 +05:30
Gero Posmyk-Leinemann
c025034894
[cleanup] Remove kedge
2022-05-27 14:51:45 +05:30
ArthurSens
49de2b95a0
Add example dashboard for monitoring self-hosted
2022-05-26 21:52:45 +05:30
Pavel Tumik
a6d5ec1055
[observability] add horizontal line for load avg panel
2022-05-24 19:19:39 +05:30
Pavel Tumik
ad8d971176
[alerts] add alert when autoscaler adds nodes rapidly
2022-05-19 12:55:34 +05:30
Prince Rachit Sinha
5045e85f2a
[observability] Add alerts for pending phase
2022-05-02 16:39:18 +05:30
Milan Pavlik
0b2aa0d34b
[wsm-bridge] Dashboard with health metrics
2022-04-29 18:59:15 +05:30
Prince Rachit Sinha
6f55a53d4e
[observability] Add filtering n update link n vars
2022-04-27 17:09:13 +05:30
Prince Rachit Sinha
4515e369fc
[observability] Fix incorrect datasource ( #9543 )
2022-04-26 11:46:49 +02:00
Prince Rachit Sinha
994ddfe03e
[observability] Fix Dashboard Error and add Nodepool
2022-04-26 14:07:38 +05:30
Prince Rachit Sinha
de0c0e80a4
[observability] Add GitpodWorkspacesNotStarting alert
2022-04-26 08:53:38 +05:30
Milan Pavlik
75bd9fc4be
[ws-man-bridge] Fix Workspace Instance status update dashboard to work across web-app clusters
2022-04-25 17:33:37 +05:30
Prince Rachit Sinha
d7bf168392
[Observability] Add node resource consumption dashboards
...
Add grafana dashboard with id 11074 and update it to include cluster
level filtering and translation of chinese language to english.
2022-04-22 13:17:34 +05:30
Prince Rachit Sinha
64fbd1e841
[observability] Add alert rule for high ws failure
2022-04-18 22:09:31 +05:30
Prince Rachit Sinha
1d937922ca
[observability] Update success criteria formula
2022-04-06 22:22:20 +05:30
Prince Rachit Sinha
568ed40373
[dashboard] Update variable refresh option
2022-04-05 03:12:18 +05:30
Prince Rachit Sinha
d1c610e55d
[dashboard] Update success criteria dashboard
2022-04-04 15:24:18 +05:30
Prince Rachit Sinha
d73025c49b
[observability] Update agent smith graph
2022-03-24 11:31:07 +05:30
Prince Rachit Sinha
306c7e9179
[observability] Add agent-smith egress violations graph
2022-03-17 18:01:23 +05:30
Gero Posmyk-Leinemann
76ef1af38e
[ops] Add alert 'InstanceStartFailures'
2022-03-07 22:30:14 +05:30
Gero Posmyk-Leinemann
8aa11bd566
[ops] WebApp Overview: Add graph for "Instance Starts Success/Failure Rate"
2022-03-07 22:30:14 +05:30
Prince Rachit Sinha
aea24d85f8
Update runbook url for GitpodWsDaemonExcessiveGC
2022-03-01 21:19:08 +05:30
Pavel Tumik
ebb2a33667
change alert period for ws stuck on starting or stopping to 20m
2022-03-01 06:22:08 +05:30
Manuel Alejandro de Brito Fontes
90fe82a508
Remove ghost from the codebase
2022-02-28 14:17:07 +05:30
Prince Rachit Sinha
a48e177120
Add alerts for excessive GC of ws-daemon
2022-02-14 11:25:35 +01:00