362 Commits

Author SHA1 Message Date
Thomas Schubart
7c41572bc9
[wsman-mk2] Remove mk1 from workspace success (#17497) 2023-05-25 17:37:59 +08:00
Wouter Verlaek
3bdd523046
[dashboards] Fix min/avg/max-over-time aggregate (#17536) 2023-05-08 20:48:46 +08:00
Thomas Schubart
44e5152607
[wsman-mk2] Make mk2 dashboard the default (#17496) 2023-05-04 17:59:42 +08:00
Thomas Schubart
e8a3c4e3bc
[obs] Include p01, p25 and p75 in success criteria (#17468)
* [obs] Include p25 and p75 in success criteria

* [obs] Include p01 workspace startup
2023-05-03 17:23:41 +08:00
Kyle Brennan
4cd2d5519f
[obs] change startup time rate to rate interval (#17453)
A rate of 5m makes the graph too dense, and, it doesn't match the the overview dashboard's heatmap.
2023-05-02 22:56:40 +08:00
Kyle Brennan
e952b75dd6
[obs] right align image builds in the Running Workspaces pane (#17454)
w/o this change, it becomes difficult to see image builds, because they're visible on the left Y axis, rather than the right.
2023-05-02 19:44:40 +08:00
Thomas Schubart
958db8be9a
[obs] Fix casing of names (#17418) 2023-04-27 22:23:35 +08:00
Thomas Schubart
bea298ae17
[wsman-mk2] Include in workspace success critieria (#17375) 2023-04-27 03:39:35 +08:00
Thomas Schubart
307a7502e1
[wsman-mk2] Add overview dashboard (#17368) 2023-04-27 03:38:35 +08:00
Thomas Schubart
c55d1f911e
[wsman-mk2] Add alerts for ws-manager-mk2 (#17362) 2023-04-26 00:07:46 +08:00
Milan Pavlik
b371e258ed
Remove payment endpoint component (#17278)
* Remove payment endpoint component

* fix

* Fix
2023-04-18 22:15:51 +08:00
Wouter Verlaek
7050e289b4
[ws-manager-mk2] Dashboard controller heatmaps (WKS-21) (#17093)
* [ws-manager-mk2] Dashboard controller heatmaps

* [ws-daemon] Use heatmaps
2023-04-03 10:28:43 +02:00
Wouter Verlaek
e7b89d60d6
[ws-manager-mk2] Dashboard improvements (#17120) 2023-03-31 23:32:41 +02:00
Gero Posmyk-Leinemann
0095dcefd8
[prometheus] Remove references to db_write (#17041) 2023-03-28 20:49:26 +02:00
Huiwen
d9f1988f81
[observability] add supervisor dashboard (#17031) 2023-03-25 14:24:23 +01:00
Milan Pavlik
dbc8574c50
[redis] Adjust dashboard to include variables for instances (#16963) 2023-03-22 10:04:13 +01:00
Milan Pavlik
55719ede91
[redis] Add grafana dashboard (#16939) 2023-03-21 14:02:14 +01:00
Huiwen
77764b8d83
[observability] add superviosr dashboard (#16853) 2023-03-17 09:27:09 +01:00
Milan Pavlik
b3ca36eb2b
[spicedb] Add grafana dashboard (#16722) 2023-03-09 09:10:45 +01:00
Wouter Verlaek
a9810d6a0a
[ws-manager-mk2] Fix race where pod gets recreated in Stopped phase (#16622)
* [ws-manager-mk2] Fix race where pod gets recreated in Stopped phase

* [ws-manager-mk2] Add pod creation logs

* Change to Patch
2023-03-02 13:27:59 +01:00
Wouter Verlaek
cf0dd5571f
[ws-manager-mk2] Show start failures in dashboard, show daemon ctrl metrics (#16612) 2023-03-01 12:13:58 +01:00
Milan Pavlik
5317b53ef8
[db-sync] Remove comment references (#16602) 2023-03-01 11:06:58 +01:00
Milan Pavlik
dade6f7e9f
[db-sync] Remove alerts and dashboards (#16584) 2023-02-28 13:46:58 +01:00
Wouter Verlaek
d827a2b9dd
[ws-manager-mk2] Add queue depth and work duration panels (#16555) 2023-02-24 13:47:54 +01:00
Wouter Verlaek
733c37b2f8
[ws-manager-mk2] Import dashboard (#16532) 2023-02-23 15:12:53 +01:00
Wouter Verlaek
7440f00796
[ws-manager-mk2] Add Grafana dashboard (#16455)
* [ws-manager-mk2] Add Grafana dashboard

* [ws-manager-mk2] Add reconciliations by controller panel
2023-02-23 00:19:52 +01:00
Milan Pavlik
a02a5d9db8 [alert] Page on failing workspace starts 2023-02-17 13:23:21 +01:00
Kyle Brennan
598b5372e8 [obs] Refactor alerts for image builds
For the last 30 days:

GitpodImagebuildDoneSuccess would have triggered once, on January 26 if set to 2h, instead of 4h.  A customer was potentially struggling with the outer loop. We hit a 60% error rate in Pyrra briefly: https://pyrra.gitpod.io/objectives?expr={__name__=%22workspace-imagebuild-buildsdone-success-ratio%22,%20namespace=%22monitoring-central%22,%20team=%22workspace%22}&grouping={}&from=1673297716785&to=1675716916785

GitpodImagebuildStartSuccess would have fired once, on Jan 8, when GCP was having scaling issues, and would have been correct to do so. https://gitpod.slack.com/archives/C01TNS8EVQT/p1673173223060219

Removed the warnings because they're unnecessary. Why? Pyrra sends them now for SLOs to #team-workspace-alerts.
2023-02-16 14:51:21 +01:00
Milan Pavlik
7a8f76f9e5 [ws-man-bridge] Adjust CPU alert to provide better signal 2023-02-16 14:17:20 +01:00
Milan Pavlik
994debf5c0 [dashboard] k8s applications 2023-02-16 08:56:21 +01:00
Kyle Brennan
fc1b4af8e0 [obs] Temporarily avoid imageBuildFailure reason
Why? This alert fires too often / is generally a false positive. In other words, in it's current form, it's not a signal of a system failure.
2023-02-07 07:52:45 +01:00
Milan Pavlik
4628ccb5e6 [grafana] Cleanup server component dashboard 2023-01-27 16:27:34 +01:00
Milan Pavlik
961a3c33ed [alerts] Exclude all of 2xx, 3xx, 4xx from JSON RPC API Error Rates 2023-01-25 16:37:32 +01:00
Milan Pavlik
8b88c8f99d [dashboards] Fix double comma 2023-01-25 16:15:33 +01:00
Milan Pavlik
324b8d4950 [dashboard] Migrate server dashboard to timeseries visualization 2023-01-25 14:31:33 +01:00
Milan Pavlik
63817fdff0 [alerts] Reduce trigger duration for Stripe Webhook Failure alert 2023-01-23 11:45:30 +01:00
utam0k
33e6d1f540 obs: Make AutoscaleFailure ago down to warning level 2023-01-20 06:20:27 +01:00
Milan Pavlik
51c4adf124 [obs] Adjust CPU alert thresholds for webapp services 2023-01-18 15:07:26 +01:00
Milan Pavlik
dec43f11fe [obs] Fix webapp monitoring rule names 2023-01-18 14:25:25 +01:00
Milan Pavlik
0ceaa6532f [webapp] Group CPU alerts by deployment 2023-01-17 10:06:25 +01:00
Wouter Verlaek
b32eb221e7 Switch image builds axis on overview dashboard 2023-01-12 19:34:52 +01:00
Wouter Verlaek
e3ce970423 [observability] Add image build rate panels 2023-01-09 17:00:48 +01:00
Kyle Brennan
f08784fbc8 [obs] fix image-builder-mk3 dashboard 2022-12-26 02:24:34 +01:00
Kyle Brennan
c01d43b809 [obs] move blobserve from Workspace to IDE 2022-12-26 02:22:34 +01:00
ArthurSens
5d96084625 Delete unused PrometheusRules
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2022-12-14 04:38:23 -03:00
mustard
0576091fe1 [observability] add job variable for grpc client 2022-12-14 03:53:23 -03:00
Andrea Falzetti
729e0d8aa7 [ide-service]: update grafana dashboard
Co-authored-by: Victor Nogueira <victor@gitpod.io>
2022-12-09 06:56:18 -03:00
Pudong Zheng
fc6355a8d2 [observability] fix datasource in preview environment 2022-12-09 06:54:19 -03:00
Christian Weichel
478a75e744 Switch license to AGPL 2022-12-08 13:05:19 -03:00
Pudong Zheng
422c7cb690 [observability] fix ide-service dashboard 2022-12-08 05:37:18 -03:00