Statetrace Config
The STATETRACE_INBOUND_CONFIG
controls inbounds and outbounds connected to Statetrace.
An example of a STATETRACE_INBOUND_CONFIG
is below:
inbounds:
- name: "Postgres DB"
database_type: "postgres"
host: "application_db"
username: "postgres"
password: "postgres"
port: "5432"
database: "postgres"
publication: "statetrace"
slot: "statetrace"
log_new_attributes: true
tables:
- schema: public
name: secret_table
include: false
- schema: public
name: users
columns:
social_security_number:
hash_value_before_logging: true
outbounds:
- name: Postgres Search
handler: search
Setting the environment
In Docker-Compose
Use the |
operator to easily embed the configuration in docker-compose
...
statetrace:
image: statetraceofficial/statetrace-beta
environment:
- STATETRACE_INBOUND_CONFIG: |
inbounds:
name: Postgres DB
database_type: postgres
...
From a file
Read the Yaml from a file:
export STATETRACE_INBOUND_CONFIG=$(cat ./statetrace-config.yaml)
Configuring Inbounds
Required fields
name
, database_type
, and log_new_attributes
are required for each inbound. Setting log_new_attributes
to true will automatically include any new columns that haven't been configured in the YAML.
MySQL
Connect to MySQL by providing the required fields.
inbounds:
- name: "MySQL DB"
database_type: "mysql"
host: "application_db"
username: "mysql"
password: "mysql"
port: "5432"
database: "mysql"
log_new_attributes: true
Postgres
For postgres publication
and slot
are also required. Statetrace will create these when it frist connects to the database. If you stop using Statetrace you will need to manually cleanup the slot so that the WAL doesn't grow.
inbounds:
- name: "Postgres DB"
database_type: "postgres"
host: "application_db"
username: "postgres"
password: "postgres"
port: "5432"
database: "postgres"
publication: "statetrace"
slot: "statetrace"
log_new_attributes: true
Configuring Outbounds
Assign outbounds to an inbound to consume the changes.
Search
The search outbound will index the rows in the Buffer DB for fast searching with search
GraphQL.
inbounds:
- name: "MySQL DB"
database_type: "mysql"
...
outbounds:
- name: "MySQL Search"
handler: "search"
Redshift
The redshift outbound will stream changes into the data warehouse into the table designated under table_name
inbounds:
- name: "MySQL DB"
database_type: "mysql"
...
outbounds:
- name: "MySQL Redshift"
handler: "redshift"
host: "examplecluster.abc123xyz789.us-west-2.redshift.amazonaws.com"
username: "postgres"
password: "postgres"
port: "5439"
database: "dev"
table_name: "statetrace_dump"
Postgres
The postgres outbound is the same as redshift except pipes the changes into a local postgres instance.
inbounds:
- name: "MySQL DB"
database_type: "mysql"
...
outbounds:
- name: "MySQL Postgres"
handler: "postgres"
host: "postgres_datawarehouse"
username: "postgres"
password: "postgres"
port: "5432"
database: "dev"
table_name: "statetrace_dump"
Webhook
The webhook outbound will batch your changes and POST them to a the target_url
. You can further customize the behavior by supplying a request_builder
to supply additional headers or change the method.
inbounds:
- name: "MySQL DB"
database_type: "mysql"
...
outbounds:
- name: "MySQL Webhook"
handler: "webhook"
target_url: "http://localhost:9999/webhook"
request_builder: |
auth_token = System.get_env("AUTH_TOKEN")
fn rows -> %WebhookRequest{headers: [{"Authorization", "Bearer #{auth_token}"}], method: :put, payload: %{rows: rows}} end
Middleware
All Outbounds have a middleware that lets you skip rows from being processed. Below is how you would index only a particular table for search:
inbounds:
- name: "MySQL DB"
database_type: "mysql"
...
outbounds:
- name: "MySQL Search"
handler: "search"
middleware: |
fn row ->
if row.relation_name == "my_special_table" do
:ok
else
:skip
end
end
Controlling tables and columns
You can control what tables and columns Statetrace includes in its buffer by configuring the tables
property.
Don't include a table
Below is how you exclude a table completely from being processed:
inbounds:
- name: "Postgres DB"
database_type: "postgres"
...
tables:
- schema: public
name: secret_table
include: false
Don't include a column
Below is how you exclude a column completely from being processed:
inbounds:
- name: "Postgres DB"
database_type: "postgres"
...
tables:
- schema: public
name: secret_table
columns:
social_security_number:
include: false
Hash a column value
Sometimes you may want a column value to remain secret. Statetrace can hash the value so that the hash is unique to the value and the row (for security you can't compare values across rows).
inbounds:
- name: "Postgres DB"
database_type: "postgres"
...
tables:
- schema: public
name: users
columns:
social_security_number:
hash_value_before_logging: true
The Schema
STATETRACE_INBOUND_CONFIG
is a YAML document with the JSON schema below:
{
"$id": "https://statetrace.com/schemas/config",
"properties": {
"inbounds": {
"items": {
"anyOf": [
{
"properties": {
"database_type": {
"const": "postgres"
},
"publication": {
"type": "string"
},
"slot": {
"type": "string"
}
},
"required": [
"slot",
"publication"
]
},
{
"properties": {
"database_type": {
"const": "mysql"
}
}
}
],
"properties": {
"database": {
"type": "string"
},
"database_type": {
"enum": [
"postgres",
"mysql"
],
"type": "string"
},
"host": {
"type": "string"
},
"log_new_attributes": {
"type": "boolean"
},
"name": {
"type": "string"
},
"outbounds": {
"items": {
"anyOf": [
{
"properties": {
"handler": {
"const": "webhook"
},
"target_url": {
"type": "string"
}
},
"required": [
"target_url"
]
},
{
"properties": {
"handler": {
"const": "search"
}
}
},
{
"properties": {
"database": {
"type": "string"
},
"handler": {
"const": "postgres"
},
"host": {
"type": "string"
},
"name": {
"type": "string"
},
"password": {
"type": "string"
},
"port": {
"type": "integer"
},
"table_name": {
"type": "string"
},
"username": {
"type": "string"
}
},
"required": [
"database",
"username",
"host",
"password",
"port",
"table_name"
]
},
{
"properties": {
"database": {
"type": "string"
},
"handler": {
"const": "redshift"
},
"host": {
"type": "string"
},
"name": {
"type": "string"
},
"password": {
"type": "string"
},
"port": {
"type": "integer"
},
"table_name": {
"type": "string"
},
"username": {
"type": "string"
}
},
"required": [
"database",
"username",
"host",
"password",
"port",
"table_name"
]
}
],
"properties": {
"batch_size": {
"type": "integer"
},
"handler": {
"enum": [
"webhook",
"redshift",
"postgres",
"search"
]
},
"middleware": {
"type": "string"
},
"name": {
"type": "string"
},
"request_builder": {
"type": "string"
}
},
"required": [
"name"
],
"type": "object"
},
"type": "array"
},
"password": {
"type": "string"
},
"port": {
"type": "integer"
},
"tables": {
"items": {
"properties": {
"columns": {
"additionalProperties": {
"properties": {
"hash_value_before_logging": {
"type": "boolean"
},
"include": {
"type": "boolean"
}
},
"type": "object"
},
"type": "object"
},
"include": {
"type": "boolean"
},
"name": {
"type": "string"
},
"schema": {
"type": "string"
}
},
"required": [
"name",
"schema"
],
"type": "object"
},
"type": "array"
},
"username": {
"type": "string"
}
},
"required": [
"database_type",
"database",
"username",
"host",
"password",
"name",
"port",
"log_new_attributes"
],
"type": "object"
},
"type": "array"
}
},
"type": "object"
}