Milan Pavlik
|
a02a5d9db8
|
[alert] Page on failing workspace starts
|
2023-02-17 13:23:21 +01:00 |
|
Kyle Brennan
|
598b5372e8
|
[obs] Refactor alerts for image builds
For the last 30 days:
GitpodImagebuildDoneSuccess would have triggered once, on January 26 if set to 2h, instead of 4h. A customer was potentially struggling with the outer loop. We hit a 60% error rate in Pyrra briefly: https://pyrra.gitpod.io/objectives?expr={__name__=%22workspace-imagebuild-buildsdone-success-ratio%22,%20namespace=%22monitoring-central%22,%20team=%22workspace%22}&grouping={}&from=1673297716785&to=1675716916785
GitpodImagebuildStartSuccess would have fired once, on Jan 8, when GCP was having scaling issues, and would have been correct to do so. https://gitpod.slack.com/archives/C01TNS8EVQT/p1673173223060219
Removed the warnings because they're unnecessary. Why? Pyrra sends them now for SLOs to #team-workspace-alerts.
|
2023-02-16 14:51:21 +01:00 |
|
Milan Pavlik
|
7a8f76f9e5
|
[ws-man-bridge] Adjust CPU alert to provide better signal
|
2023-02-16 14:17:20 +01:00 |
|
Milan Pavlik
|
994debf5c0
|
[dashboard] k8s applications
|
2023-02-16 08:56:21 +01:00 |
|
Kyle Brennan
|
fc1b4af8e0
|
[obs] Temporarily avoid imageBuildFailure reason
Why? This alert fires too often / is generally a false positive. In other words, in it's current form, it's not a signal of a system failure.
|
2023-02-07 07:52:45 +01:00 |
|
Milan Pavlik
|
4628ccb5e6
|
[grafana] Cleanup server component dashboard
|
2023-01-27 16:27:34 +01:00 |
|
Milan Pavlik
|
961a3c33ed
|
[alerts] Exclude all of 2xx, 3xx, 4xx from JSON RPC API Error Rates
|
2023-01-25 16:37:32 +01:00 |
|
Milan Pavlik
|
8b88c8f99d
|
[dashboards] Fix double comma
|
2023-01-25 16:15:33 +01:00 |
|
Milan Pavlik
|
324b8d4950
|
[dashboard] Migrate server dashboard to timeseries visualization
|
2023-01-25 14:31:33 +01:00 |
|
Milan Pavlik
|
63817fdff0
|
[alerts] Reduce trigger duration for Stripe Webhook Failure alert
|
2023-01-23 11:45:30 +01:00 |
|
utam0k
|
33e6d1f540
|
obs: Make AutoscaleFailure ago down to warning level
|
2023-01-20 06:20:27 +01:00 |
|
Milan Pavlik
|
51c4adf124
|
[obs] Adjust CPU alert thresholds for webapp services
|
2023-01-18 15:07:26 +01:00 |
|
Milan Pavlik
|
dec43f11fe
|
[obs] Fix webapp monitoring rule names
|
2023-01-18 14:25:25 +01:00 |
|
Milan Pavlik
|
0ceaa6532f
|
[webapp] Group CPU alerts by deployment
|
2023-01-17 10:06:25 +01:00 |
|
Wouter Verlaek
|
b32eb221e7
|
Switch image builds axis on overview dashboard
|
2023-01-12 19:34:52 +01:00 |
|
Wouter Verlaek
|
e3ce970423
|
[observability] Add image build rate panels
|
2023-01-09 17:00:48 +01:00 |
|
Kyle Brennan
|
f08784fbc8
|
[obs] fix image-builder-mk3 dashboard
|
2022-12-26 02:24:34 +01:00 |
|
Kyle Brennan
|
c01d43b809
|
[obs] move blobserve from Workspace to IDE
|
2022-12-26 02:22:34 +01:00 |
|
ArthurSens
|
5d96084625
|
Delete unused PrometheusRules
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
|
2022-12-14 04:38:23 -03:00 |
|
mustard
|
0576091fe1
|
[observability] add job variable for grpc client
|
2022-12-14 03:53:23 -03:00 |
|
Andrea Falzetti
|
729e0d8aa7
|
[ide-service]: update grafana dashboard
Co-authored-by: Victor Nogueira <victor@gitpod.io>
|
2022-12-09 06:56:18 -03:00 |
|
Pudong Zheng
|
fc6355a8d2
|
[observability] fix datasource in preview environment
|
2022-12-09 06:54:19 -03:00 |
|
Christian Weichel
|
478a75e744
|
Switch license to AGPL
|
2022-12-08 13:05:19 -03:00 |
|
Pudong Zheng
|
422c7cb690
|
[observability] fix ide-service dashboard
|
2022-12-08 05:37:18 -03:00 |
|
Pavel Tumik @ GitPod
|
11b9774e3a
|
[alerts] improve autoscale alert to provide actual reason for failure in alert message
|
2022-12-07 22:49:17 -03:00 |
|
Milan Pavlik
|
227beab32b
|
[alerts] Usage - increase duration of expression to 30m
|
2022-12-07 05:01:17 -03:00 |
|
ArthurSens
|
4f8927deea
|
Increase scrape interval to decrease datapoints per minute
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
|
2022-12-06 12:19:16 -03:00 |
|
Victor Nogueira
|
d04294d791
|
Add IDE Service Dashboard
Co-authored-by: Anton Kosyakov <anton@gitpod.io>
|
2022-12-06 10:04:16 -03:00 |
|
Kyle Brennan
|
e845faae3c
|
Update operations/observability/mixins/workspace/rules/central/nodes.yaml
Co-authored-by: Wouter Verlaek <wouter@gitpod.io>
|
2022-12-02 05:56:01 -03:00 |
|
Kyle Brennan
|
171ec14d53
|
Add alerts for image build success rate
|
2022-12-02 05:56:01 -03:00 |
|
Kyle Brennan
|
fc3586b5e2
|
Change GitpodWorkspaceStuckOnStarting to GitpodWorkspaceStuckOnStopped
|
2022-12-02 05:56:01 -03:00 |
|
Kyle Brennan
|
603446291f
|
Reduce noise with GitpodWorkspaceNodeHighNormalizedLoadAverage now that we have network limiting and PSI so its more actionable
|
2022-12-02 05:56:01 -03:00 |
|
ArthurSens
|
d2eea10fbc
|
Drop unused ArgoCD Metrics
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
|
2022-12-01 14:44:00 -03:00 |
|
Pudong Zheng
|
71118933d8
|
[observability] merge all ide kind for ide startup time
|
2022-11-30 06:54:59 -03:00 |
|
Pudong Zheng
|
82eaa40d3a
|
[observability] add IDE startup time dashboard
|
2022-11-28 09:00:57 -03:00 |
|
Pudong Zheng
|
580e20fd20
|
[observability] add ide startup time metrics
|
2022-11-23 11:51:53 -03:00 |
|
Thomas Schubart
|
6469258f28
|
Update StuckOnStopping allert
|
2022-11-21 14:19:51 -03:00 |
|
mustard
|
47865a0c76
|
[observability] add alert for upstream down
|
2022-11-15 19:14:45 +02:00 |
|
ArthurSens
|
793877aa5f
|
Create resources to monitor Pyrra
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
|
2022-11-15 17:35:45 +02:00 |
|
mustard
|
6fba3543bb
|
[observability] add vscode install extension failure alert rule
|
2022-11-11 20:03:41 +02:00 |
|
Arthur Silva Sens
|
77c6026f65
|
Fix conflicting PrometheusRules
|
2022-11-10 16:37:40 +02:00 |
|
ArthurSens
|
c0c6b3a150
|
Fix syntax errors
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
|
2022-11-09 17:58:39 +02:00 |
|
mustard
|
ebe2dad066
|
[observability] add threads and latency charts
|
2022-11-09 11:10:39 +01:00 |
|
Kyle Brennan
|
6ff821261d
|
Reduce noise for GitpodWorkspaceNodeHighNormalizedLoadAverage
|
2022-11-07 23:10:37 +01:00 |
|
ArthurSens
|
ebed98a31c
|
Split workspace alerts into central and satellite
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
|
2022-11-07 23:10:37 +01:00 |
|
ArthurSens
|
88bfdb998a
|
Prepare workspace alerts to centralized alerting
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
|
2022-11-07 23:10:37 +01:00 |
|
Anton Kosyakov
|
e2f03743b5
|
[openvsx-mirror] add incoming requests monitoring
|
2022-11-07 11:53:36 +01:00 |
|
Jean Pierre
|
9e6f2f64b6
|
Fix galleryHost label
|
2022-11-04 16:39:09 +01:00 |
|
Pudong Zheng
|
b835a407e3
|
[observability] Adjusting openvsx-mirror and code-browser dashboard
|
2022-11-04 14:42:08 +01:00 |
|
Pudong Zheng
|
1e2ec46e64
|
[observability] Adjusting openvsx-mirror dashboard
|
2022-11-04 12:18:09 +01:00 |
|