25 Commits

Author SHA1 Message Date
Wouter Verlaek
cb192728d1 Update Go dependencies 2023-02-08 16:47:46 +01:00
Tarun Pothulapati
f05f5bd1c3 [ws-rollout] Add prometheus init check
When given a non-usable `--prometheus-url`, We start the
rollout without verifying if the prometheus is reachable or not. This
is a problem as we will be unable to get the metrics from prometheus
and hence the rollout will be reverted later causing unnecessary
time waste.

This can be prevented by performing a simple check to see if the
prometheus is reachable or not. `up` query is used instead of
key metrics as we can't be sure of their existence.

Signed-off-by: Tarun Pothulapati <tarun@gitpod.io>
2023-01-30 22:04:38 +01:00
Tarun Pothulapati
f3efa0be29 [ws-rollout] Fix Build Versioning
This PR updates the Version field in root to
be set correctly and also use the same with
the cobra CLI.

Signed-off-by: Tarun Pothulapati <tarun@gitpod.io>
2023-01-24 08:10:31 +01:00
Tarun Pothulapati
3f7eb00841 fix go build pkg error
Signed-off-by: Tarun Pothulapati <tarun@gitpod.io>
2023-01-23 18:01:31 +01:00
Tarun Pothulapati
ace1c2f56b rollout: Support rollout step score that are not factors of 100
This is done by doing
 `int32(math.Min(float64(r.currentScore+r.rolloutStep), 100))`

Signed-off-by: Tarun Pothulapati <tarun@gitpod.io>
2023-01-23 18:01:31 +01:00
Tarun Pothulapati
56ebbfd525 remove mentions of logrus
Signed-off-by: Tarun Pothulapati <tarun@gitpod.io>
2023-01-23 18:01:31 +01:00
Tarun Pothulapati
0a03b1ce35 analaysis: Switch to enums in MoveForward
DecisionRevert means to revert
DecisionNoData means that there is no data to make an informed decision
DecisionMoveForward means to move forward based on data

Signed-off-by: Tarun Pothulapati <tarun@gitpod.io>
2023-01-23 18:01:31 +01:00
Tarun Pothulapati
65203f34f1 Update README.md & grammatical improvements
Signed-off-by: Tarun Pothulapati <tarun@gitpod.io>
2023-01-23 18:01:31 +01:00
Tarun Pothulapati
250c4493bf hack: never re-run the job on error
Signed-off-by: Tarun Pothulapati <tarun@gitpod.io>
2023-01-23 18:01:31 +01:00
Tarun Pothulapati
46592082a0 analysis: Use prometheusURL instead of port-forward'ing
This also removes the unnecessary FindAnyPodOwnedBy helper function
that was added

Signed-off-by: Tarun Pothulapati <tarun@gitpod.io>
2023-01-23 18:01:31 +01:00
Tarun Pothulapati
98892234ff job: return the right exit status
Signed-off-by: Tarun Pothulapati <tarun@gitpod.io>
2023-01-23 18:01:31 +01:00
Tarun Pothulapati
14e015390f rollout: don't wait for the first ticker
also, Fix unit test regression

Signed-off-by: Tarun Pothulapati <tarun@gitpod.io>
2023-01-23 18:01:31 +01:00
Tarun Pothulapati
5b2f604e4a rollout: revert on interrupt signal
Signed-off-by: Tarun Pothulapati <tarun@gitpod.io>
2023-01-23 18:01:31 +01:00
Tarun Pothulapati
061167dba7 return correctly after rollout or failure
Signed-off-by: Tarun Pothulapati <tarun@gitpod.io>
2023-01-23 18:01:31 +01:00
Tarun Pothulapati
2f8cbb78d9 replace error ratio analyzer with workspace key metrics analyzer
This Commit also removes the old error rate analyzer.

Signed-off-by: Tarun Pothulapati <tarun@gitpod.io>
2023-01-23 18:01:31 +01:00
Tarun Pothulapati
ed79a89d6b analysis: Add new No Data Variant & Target Based Metric Analysis
\# `No Data` Variant

This commit adds a new variant that the `Analysis.MoveForward`
function can return. This variant is used to indicate that there
isn't enough data to make a concrete decision about the rollout.

This is then coupled with the `Rollout.OkayScoreUntilNoData` to
move rollout forward until a specific point even when there is
no data so that data can actually be created. If no data
is present even after `OkayScoreUntilNoData` is reached, then
we rollback as we aren't making an informed rollout. If positive,
we move forward. If negative, we rollback.

\# Target Based Metric Analysis

In this commit, We add a new `ErrorRatioAnalyzer` through which we
calculate the success target percentage by doing
((totalRequests - errorRequests)/totalRequests) * 100 and compare it
with the target percentage provided by the user. This means users can
specify a target percentage (i.e 99%, etc) at which a new cluster
can be considered safe. We rollback, If it is less than that.

Signed-off-by: Tarun Pothulapati <tarun@gitpod.io>
2023-01-23 18:01:31 +01:00
Tarun Pothulapati
7d89d674b6 rollout: Handle analysis error gracefully
Signed-off-by: Tarun Pothulapati <tarun@gitpod.io>
2023-01-23 18:01:31 +01:00
Tarun Pothulapati
6bad28232a cleanup: move settings out
Signed-off-by: Tarun Pothulapati <tarun@gitpod.io>
2023-01-23 18:01:31 +01:00
Tarun Pothulapati
05f06ee744 Make the job work inside a cluster
Signed-off-by: Tarun Pothulapati <tarun@gitpod.io>
2023-01-23 18:01:31 +01:00
Tarun Pothulapati
4681062c30 rollout: remove error wrapping and waiting in tests
Signed-off-by: Tarun Pothulapati <tarun@gitpod.io>
2023-01-23 18:01:31 +01:00
Tarun Pothulapati
c134bd8798 add hack directory for rollout job
Signed-off-by: Tarun Pothulapati <tarun@gitpod.io>
2023-01-23 18:01:31 +01:00
Tarun Pothulapati
9267d33f1a add new prometheus metrics endpoint using common-go/base-server
Signed-off-by: Tarun Pothulapati <tarun@gitpod.io>
2023-01-23 18:01:31 +01:00
Tarun Pothulapati
f24191c769 Move rollout and analysis into separate packages
This also includes:
- make actions using RolloutAction Interface
- abstract out the analysis logic into a separate package
- bugfix: don't close the channel
- working prototype with metric analysis
- logs refactor

Signed-off-by: Tarun Pothulapati <tarun@gitpod.io>
2023-01-23 18:01:31 +01:00
Tarun Pothulapati
13f6345a3c add prometheus package for metric analysis
Signed-off-by: Tarun Pothulapati <tarun@gitpod.io>
2023-01-23 18:01:31 +01:00
Tarun Pothulapati
db8b7b7ec2 Add new `workspace-rollout-job to components
Basic rollout works, but without metric analysis

Signed-off-by: Tarun Pothulapati <tarun@gitpod.io>
2023-01-23 18:01:31 +01:00