We will use Change Data Capture to stream WAL updates from postgres into clickhouse so that features like issue search will be able to join event data and metadata (from postgres) through Snuba.
This requires the followings:
A logical replicaiton plugin to be installed in postgres (https://github.com/getsentry/wal2json)
A service to run that streams from the replication log to Kafka (https://github.com/getsentry/cdc)
Datasets in Snuba.
This PR is preparing postgres to stream updates via the replication log.
The idea is to
download the the replication log plugin binary during install.sh
mount a volume with the binary when starting postgres
providing a new entrypoint to postgres that ensures everything is correctly configured.
There is a difference between how this is set up and how we do the same in the development environment.
In the development environment we download the library from the entrypoint itself and store it in a persistent volume, so we do not have to download it every time.
Unfortunately this does not work here as the postgres image is postgres:9.6 while it is postgres:9.6-alpine. This one does not come with either wget or curl. I don't think installing that in the entrypoint would be a good idea, so the download happens in install.sh. I actually think this way is safer so we never depend on connectivity for postgres to start properly.
We used to build local images for Sentry services to be able to
include required plugins in the image. With this change we instead
do this in a custom entrypoint script and use the volume `/data`
to store the plugins permanently.
This should resolve many issues people have around building local
images and pushing them to places like private repositories or swarm
clusters.
This is not 100% compatible with the old way but it should still be
a mostly transparent change to many folks.
* ref(requirements): Add min CPU requirement, relax soft RAM
Adds minimum of 4 CPU cores requirement as anything below will perform quite poorly even on lower loads. Relaxes the soft RAM requirement from 8000 MB to 7800 MB as even when there is 8 GB RAM installed, the system reserves some of it to itself and under reports the amount.
* pass on CI with soft limit
Fixes#823. In `install.sh` we build a local sentry image that is used by many services from using the build context under the `./sentry` directory. To avoid building this image multiple times, we also give it a specific name which is referred from multiple services. The issue is, we also run `docker-compose build --parallel` which creates a race condition when building this image as `docker-compose` doesn't check whether the image is already there or not. This is the root cause of all these random failures: an unsurprising race condition.
This is in preparation to make the PY3 version the default for Docker images and self-hosted. It is part **4/5**:
1. ~~Add `-py2` variants for the Python 2 build tags and introduce the `SENTRY_PYTHON2` env variable usage~~ (getsentry/sentry#22460)
2. ~~Switch getsentry/onpremise to Python 3 by default*, introducing the `SENTRY_PYTHON2` env var for Py2 builds via the `-py2` suffix~~ (getsentry/onpremise#763)
3. ~~Move the unsuffixed version of the builds to Python 3~~ (getsentry/sentry#22466)
4. **Remove the `SENTRY_PYTHON3` env var support and `-py3` prefix usage from getsentry/onpremise**
5. Remove tagging of `-py3` builds from getsentry/sentry
This is in preparation to make the PY3 version the default* for Docker images and self-hosted. It is part **2/5**:
1. ~~Add `-py2` variants for the Python 2 build tags and introduce the `SENTRY_PYTHON2` env variable usage~~ (getsentry/sentry#22460)
2. __Switch getsentry/onpremise to Python 3 by default*, introducing the `SENTRY_PYTHON2` env var for Py2 builds via the `-py2` suffix__
3. Move the unsuffixed version of the builds to Python 3
4. Remove the `SENTRY_PYTHON3` env var support and `-py3` prefix usage from getsentry/onpremise
5. Remove tagging of `-py3` builds from here
_* this will only happen when item 3 above gets landed_
We kept getting issue reports that we traced down to `git-bash` which doesn't seem to play nice with Docker for Windows with bind mounts. This PR uses the existence of `$MSYSTEM` to detect `git-bash` and exit early with a relevant message.
Co-authored-by: Chad Whitacre <chadwhitacre@sentry.io>
This is to ensure clean shutdown of Celery, with fully drained queues. This is needed as versions may change the event format and not be backwards compatible. FWIW this is a hacky workaround without a strong guarantee that the queues will be empty. Ideally we'd shutdown everything first, spin up the workers and check for queues being drained every second or so.
This fixes a serious bug in install.sh where it ignored externaly set env variable values such as `SENTRY_IMAGE` in favor of the ones defined in `.env`, essentially making all our e2e tests usless.
We need `docker-compose ps -a` for CI so we were already using 1.24.1, this aligns the rest with that.
For Docker, there are a bunch of network-related fixes in 19.03.12 and prior (DNS fallback and IPv6 advertising) that we'd like to have to see if they are going to fix some reported connectivity issues w/ onpremise.
Fixes https://github.com/getsentry/onpremise/issues/672
I split this PR up into 4 commits. The first one is the bare minimum for the issue. The rest are just consistency corrections that we neckbeards at irc://chat.freenode.net/%23bash would always make.
This is for the onpremise release on Sept 15th.
The new migration system has a migration to handle recreating the
transaction table if the old one is present, we no longer need to do
this in `install.sh`.
This continues on the ideas from #607. By "downtime" here I mean "not accepting events" - web, smtp and background processes are out of scope.
This PR adds a `--minimize-downtime` option to install.sh. This options changes the behaviour of the script by:
1. keeping nginx and relay running until the very end,
2. disabling cleanup on exit and failure,
3. explicitly reloading nginx configuration,
4. and starting the whole cluster at the end.
The results are promising: no downtime if relay version doesn't change, and only a second when it does. So far this was only tested with a curl loop, so I'm still not sure if Relay flushes the events to Sentry before getting recreated by `$dc up`.
This is a long-needed test that tests the whole pipeline from Nginx, Relay, to Kafka, and Snuba. The final missing piece is testing the Symbolicator integration.
This PR is also a follow up to #576 as it didn't solve the Relay issues fully (the earlier fix was a coincidence or is not as reliable as it seemed).
Fixes#486 (finally?).
This patch adds `INTERNAL_IPS` definition to `sentry.conf.py` by sniffing the network from eth0 and relies on this for trusted Relays instead of the ALLOWLISTED PKs. This removes the necessity of syncing Relay PKs to `sentry.conf.py`.
This PR needs getsentry/sentry#19798 to work.
I would like to be able to customize the configuration for my Sentry 10 symbolicator instance, which this change allows me to easily do.
See related: https://github.com/getsentry/symbolicator/issues/245
Co-authored-by: Burak Yigit Kaya <ben@byk.im>
Hi,
I've been through quite a few different ways of implementing this fix and settled on creating a variable to store the output of checking whether the zookeeper copy target folder exists and copying the snapshot file based on the copy target folder existing. I've ran quite a few manual tests for each option as well. Currently the PR sits on Option 3 from the below options.
**Option 1**
Judging from the [Jira issue](https://issues.apache.org/jira/browse/ZOOKEEPER-3056), it seems like the work around for zookeeper upgrades could be omitted entirely since the issue relates to upgrades from v3.4.10 to v3.5.4. I've tested removing the zookeeper workaround entirely and that install ran smoothly on a clean install of Sentry (no existing data) as well as an install of Sentry that currently has very minimal amount of log entries (roughly 100 log entries). Could we possibly remove the workaround entirely?
**Option 2**
The second option was to simply add a check to the currently [existing line](https://github.com/getsentry/onpremise/blob/master/install.sh#L178) of whether the copy target folder exists and perform the snapshot file copy only if the copy target folder exists. This is the least amount of code and possibly the simpler fix while also setting the `ZOOKEEPER_SNAPSHOT_TRUST_EMPTY` env var to `true`, however, some unnecessary calculations will be done to determine the `ZOOKEEPER_LOG_FILE_COUNT` and `ZOOKEEPER_SNAPSHOT_FILE_COUNT`.
**Option 3**
I've created a variable to store whether the copy target folder exists and proceed with the zookeeper upgrade workaround only if the copy target folder exists. This means that if the copy target folder does not exist, the env var `ZOOKEEPER_SNAPSHOT_TRUST_EMPTY` will not be set.
Fixes#547.
Co-authored-by: chamirb <chamirb@globalkinetic.com>