Gero Posmyk-Leinemann
|
0095dcefd8
|
[prometheus] Remove references to db_write (#17041)
|
2023-03-28 20:49:26 +02:00 |
|
Huiwen
|
d9f1988f81
|
[observability] add supervisor dashboard (#17031)
|
2023-03-25 14:24:23 +01:00 |
|
Milan Pavlik
|
dbc8574c50
|
[redis] Adjust dashboard to include variables for instances (#16963)
|
2023-03-22 10:04:13 +01:00 |
|
Milan Pavlik
|
55719ede91
|
[redis] Add grafana dashboard (#16939)
|
2023-03-21 14:02:14 +01:00 |
|
Huiwen
|
77764b8d83
|
[observability] add superviosr dashboard (#16853)
|
2023-03-17 09:27:09 +01:00 |
|
Milan Pavlik
|
b3ca36eb2b
|
[spicedb] Add grafana dashboard (#16722)
|
2023-03-09 09:10:45 +01:00 |
|
Wouter Verlaek
|
a9810d6a0a
|
[ws-manager-mk2] Fix race where pod gets recreated in Stopped phase (#16622)
* [ws-manager-mk2] Fix race where pod gets recreated in Stopped phase
* [ws-manager-mk2] Add pod creation logs
* Change to Patch
|
2023-03-02 13:27:59 +01:00 |
|
Wouter Verlaek
|
cf0dd5571f
|
[ws-manager-mk2] Show start failures in dashboard, show daemon ctrl metrics (#16612)
|
2023-03-01 12:13:58 +01:00 |
|
Milan Pavlik
|
5317b53ef8
|
[db-sync] Remove comment references (#16602)
|
2023-03-01 11:06:58 +01:00 |
|
Milan Pavlik
|
dade6f7e9f
|
[db-sync] Remove alerts and dashboards (#16584)
|
2023-02-28 13:46:58 +01:00 |
|
Wouter Verlaek
|
d827a2b9dd
|
[ws-manager-mk2] Add queue depth and work duration panels (#16555)
|
2023-02-24 13:47:54 +01:00 |
|
Wouter Verlaek
|
733c37b2f8
|
[ws-manager-mk2] Import dashboard (#16532)
|
2023-02-23 15:12:53 +01:00 |
|
Wouter Verlaek
|
7440f00796
|
[ws-manager-mk2] Add Grafana dashboard (#16455)
* [ws-manager-mk2] Add Grafana dashboard
* [ws-manager-mk2] Add reconciliations by controller panel
|
2023-02-23 00:19:52 +01:00 |
|
Milan Pavlik
|
a02a5d9db8
|
[alert] Page on failing workspace starts
|
2023-02-17 13:23:21 +01:00 |
|
Kyle Brennan
|
598b5372e8
|
[obs] Refactor alerts for image builds
For the last 30 days:
GitpodImagebuildDoneSuccess would have triggered once, on January 26 if set to 2h, instead of 4h. A customer was potentially struggling with the outer loop. We hit a 60% error rate in Pyrra briefly: https://pyrra.gitpod.io/objectives?expr={__name__=%22workspace-imagebuild-buildsdone-success-ratio%22,%20namespace=%22monitoring-central%22,%20team=%22workspace%22}&grouping={}&from=1673297716785&to=1675716916785
GitpodImagebuildStartSuccess would have fired once, on Jan 8, when GCP was having scaling issues, and would have been correct to do so. https://gitpod.slack.com/archives/C01TNS8EVQT/p1673173223060219
Removed the warnings because they're unnecessary. Why? Pyrra sends them now for SLOs to #team-workspace-alerts.
|
2023-02-16 14:51:21 +01:00 |
|
Milan Pavlik
|
7a8f76f9e5
|
[ws-man-bridge] Adjust CPU alert to provide better signal
|
2023-02-16 14:17:20 +01:00 |
|
Milan Pavlik
|
994debf5c0
|
[dashboard] k8s applications
|
2023-02-16 08:56:21 +01:00 |
|
Kyle Brennan
|
fc1b4af8e0
|
[obs] Temporarily avoid imageBuildFailure reason
Why? This alert fires too often / is generally a false positive. In other words, in it's current form, it's not a signal of a system failure.
|
2023-02-07 07:52:45 +01:00 |
|
Milan Pavlik
|
4628ccb5e6
|
[grafana] Cleanup server component dashboard
|
2023-01-27 16:27:34 +01:00 |
|
Milan Pavlik
|
961a3c33ed
|
[alerts] Exclude all of 2xx, 3xx, 4xx from JSON RPC API Error Rates
|
2023-01-25 16:37:32 +01:00 |
|
Milan Pavlik
|
8b88c8f99d
|
[dashboards] Fix double comma
|
2023-01-25 16:15:33 +01:00 |
|
Milan Pavlik
|
324b8d4950
|
[dashboard] Migrate server dashboard to timeseries visualization
|
2023-01-25 14:31:33 +01:00 |
|
Milan Pavlik
|
63817fdff0
|
[alerts] Reduce trigger duration for Stripe Webhook Failure alert
|
2023-01-23 11:45:30 +01:00 |
|
utam0k
|
33e6d1f540
|
obs: Make AutoscaleFailure ago down to warning level
|
2023-01-20 06:20:27 +01:00 |
|
Milan Pavlik
|
51c4adf124
|
[obs] Adjust CPU alert thresholds for webapp services
|
2023-01-18 15:07:26 +01:00 |
|
Milan Pavlik
|
dec43f11fe
|
[obs] Fix webapp monitoring rule names
|
2023-01-18 14:25:25 +01:00 |
|
Milan Pavlik
|
0ceaa6532f
|
[webapp] Group CPU alerts by deployment
|
2023-01-17 10:06:25 +01:00 |
|
Wouter Verlaek
|
b32eb221e7
|
Switch image builds axis on overview dashboard
|
2023-01-12 19:34:52 +01:00 |
|
Wouter Verlaek
|
e3ce970423
|
[observability] Add image build rate panels
|
2023-01-09 17:00:48 +01:00 |
|
Kyle Brennan
|
f08784fbc8
|
[obs] fix image-builder-mk3 dashboard
|
2022-12-26 02:24:34 +01:00 |
|
Kyle Brennan
|
c01d43b809
|
[obs] move blobserve from Workspace to IDE
|
2022-12-26 02:22:34 +01:00 |
|
ArthurSens
|
5d96084625
|
Delete unused PrometheusRules
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
|
2022-12-14 04:38:23 -03:00 |
|
mustard
|
0576091fe1
|
[observability] add job variable for grpc client
|
2022-12-14 03:53:23 -03:00 |
|
Andrea Falzetti
|
729e0d8aa7
|
[ide-service]: update grafana dashboard
Co-authored-by: Victor Nogueira <victor@gitpod.io>
|
2022-12-09 06:56:18 -03:00 |
|
Pudong Zheng
|
fc6355a8d2
|
[observability] fix datasource in preview environment
|
2022-12-09 06:54:19 -03:00 |
|
Christian Weichel
|
478a75e744
|
Switch license to AGPL
|
2022-12-08 13:05:19 -03:00 |
|
Pudong Zheng
|
422c7cb690
|
[observability] fix ide-service dashboard
|
2022-12-08 05:37:18 -03:00 |
|
Pavel Tumik @ GitPod
|
11b9774e3a
|
[alerts] improve autoscale alert to provide actual reason for failure in alert message
|
2022-12-07 22:49:17 -03:00 |
|
Milan Pavlik
|
227beab32b
|
[alerts] Usage - increase duration of expression to 30m
|
2022-12-07 05:01:17 -03:00 |
|
ArthurSens
|
4f8927deea
|
Increase scrape interval to decrease datapoints per minute
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
|
2022-12-06 12:19:16 -03:00 |
|
Victor Nogueira
|
d04294d791
|
Add IDE Service Dashboard
Co-authored-by: Anton Kosyakov <anton@gitpod.io>
|
2022-12-06 10:04:16 -03:00 |
|
Kyle Brennan
|
e845faae3c
|
Update operations/observability/mixins/workspace/rules/central/nodes.yaml
Co-authored-by: Wouter Verlaek <wouter@gitpod.io>
|
2022-12-02 05:56:01 -03:00 |
|
Kyle Brennan
|
171ec14d53
|
Add alerts for image build success rate
|
2022-12-02 05:56:01 -03:00 |
|
Kyle Brennan
|
fc3586b5e2
|
Change GitpodWorkspaceStuckOnStarting to GitpodWorkspaceStuckOnStopped
|
2022-12-02 05:56:01 -03:00 |
|
Kyle Brennan
|
603446291f
|
Reduce noise with GitpodWorkspaceNodeHighNormalizedLoadAverage now that we have network limiting and PSI so its more actionable
|
2022-12-02 05:56:01 -03:00 |
|
ArthurSens
|
d2eea10fbc
|
Drop unused ArgoCD Metrics
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
|
2022-12-01 14:44:00 -03:00 |
|
Pudong Zheng
|
71118933d8
|
[observability] merge all ide kind for ide startup time
|
2022-11-30 06:54:59 -03:00 |
|
Pudong Zheng
|
82eaa40d3a
|
[observability] add IDE startup time dashboard
|
2022-11-28 09:00:57 -03:00 |
|
Pudong Zheng
|
580e20fd20
|
[observability] add ide startup time metrics
|
2022-11-23 11:51:53 -03:00 |
|
Thomas Schubart
|
6469258f28
|
Update StuckOnStopping allert
|
2022-11-21 14:19:51 -03:00 |
|