self-hosted/postgres/postgres-entrypoint.sh
Filippo Pacifici 8dc84600c5
feat(cdc): Prepare the self hosted environment for the Change Data Capture pipeline (#938)
We will use Change Data Capture to stream WAL updates from postgres into clickhouse so that features like issue search will be able to join event data and metadata (from postgres) through Snuba.

This requires the followings:

A logical replicaiton plugin to be installed in postgres (https://github.com/getsentry/wal2json)
A service to run that streams from the replication log to Kafka (https://github.com/getsentry/cdc)
Datasets in Snuba.
This PR is preparing postgres to stream updates via the replication log.
The idea is to

download the the replication log plugin binary during install.sh
mount a volume with the binary when starting postgres
providing a new entrypoint to postgres that ensures everything is correctly configured.
There is a difference between how this is set up and how we do the same in the development environment.
In the development environment we download the library from the entrypoint itself and store it in a persistent volume, so we do not have to download it every time.
Unfortunately this does not work here as the postgres image is postgres:9.6 while it is postgres:9.6-alpine. This one does not come with either wget or curl. I don't think installing that in the entrypoint would be a good idea, so the download happens in install.sh. I actually think this way is safer so we never depend on connectivity for postgres to start properly.
2021-05-24 17:51:36 -07:00

47 lines
1.5 KiB
Bash
Executable File

#!/bin/bash
# This script replaces the default docker entrypoint for postgres in the
# development environment.
# Its job is to ensure postgres is properly configured to support the
# Change Data Capture pipeline (by setting access permissions and installing
# the replication plugin we use for CDC). Unfortunately the default
# Postgres image does not allow this level of configurability so we need
# to do it this way in order not to have to publish and maintain our own
# Postgres image.
#
# This then, at the end, transfers control to the default entrypoint.
set -e
prep_init_db() {
cp /opt/sentry/init_hba.sh /docker-entrypoint-initdb.d/init_hba.sh
}
cdc_setup_hba_conf() {
# Ensure pg-hba is properly configured to allow connections
# to the replication slots.
PG_HBA="$PGDATA/pg_hba.conf"
if [ ! -f "$PG_HBA" ]; then
echo "DB not initialized. Postgres will take care of pg_hba"
elif [ "$(grep -c -E "^host\s+replication" "$PGDATA"/pg_hba.conf)" != 0 ]; then
echo "Replication config already present in pg_hba. Not changing anything."
else
# Execute the same script we run on DB initialization
/opt/sentry/init_hba.sh
fi
}
bind_wal2json() {
# Copy the file in the right place
cp /opt/sentry/wal2json/wal2json.so `pg_config --pkglibdir`/wal2json.so
}
echo "Setting up Change Data Capture"
prep_init_db
if [ "$1" = 'postgres' ]; then
cdc_setup_hba_conf
bind_wal2json
fi
exec /docker-entrypoint.sh "$@"