Skip to main content

Statetrace in 10 minutes

Statetrace Core is the free version of Statetrace. There are four major steps to configure it:

  • Configuring your application Postgres database
  • Setup the buffer database
  • Setup Statetrace
  • Change application to annotate transactions

There is no better way to learn than to see a complete example. For a more detailed walkthough, follow along below.

Get the Statetrace License

Go to https://www.statetrace.com/statetrace-core to get your free STATETRACE_LICENSE key. No email or sign-up required.

Configuring Statetrace

Statetrace runs inside a container. The current official image is statetraceofficial/statetrace-beta Either run the container with podman or docker.

docker run \
--net="host"\
-e STATETRACE_DATABASE_URL='postgres://postgres:postgres@localhost:5432/postgres' \
-e STATETRACE_SECRET_KEY_BASE='<...>' \
-e STATETRACE_LICENSE='<...>' \
-e STATETRACE_GQL_API_KEY='<...>' \
-e STATETRACE_INBOUND_CONFIG="$(cat ./statetrace-config.yaml)" \
-p 4000:4000 \
-it \
statetraceofficial/statetrace-beta

The following environment variables are required to configure the statetrace daemon.

  • STATETRACE_DATABASE_URL the database url of where to store the statetrace metadata and buffer information
  • STATETRACE_SECRET_KEY_BASE a randomly generated 32 character string (used for cryptographically hashing columns)
  • STATETRACE_LICENSE the license key downloaded from https://www.statetrace.com/.
  • STATETRACE_INBOUND_CONFIG the configuration of what databases to connect to. See config for more info.
  • STATETRACE_GQL_API_KEY a randomly generated 32 character string to use as authentication for the GQL API

Generating Random Keys

For STATETRACE_SECRET_KEY_BASE and STATETRACE_GQL_API_KEY you can use whatever method you want to generate the string or use the following linux command below:

head -c 500 /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 32 | head -n 1

Configuring your application database

Statetrace works by acting as a replica connected to your database. Having a direct source of changes enables statetrace to accurately record your data's story. Each database requires a bit of extra configuration.

Postgres

Your postgres application database will need to be configured so that the WAL level is set to logical. You will need to set the following variables:

  • wal_level=logical
  • max_wal_senders=2
  • max_replication_slots=2

In a production environment set max_wal_senders and max_replication_slots higher if you have other replicas attached to postgres. See specific instructions for RDS and Google Cloud SQL.

Mysql

MySQL requires GTID and metadata to be enabled. To enable the features required for Statetrace, run MySQL with the following options:

  • --gtid_mode=ON
  • --enforce_gtid_consistency=ON
  • --log_bin=ON
  • --binlog_format=ROW
  • --binlog_row_image=FULL
  • --server-id=1
  • --binlog-row-metadata=FULL
  • --binlog_rows_query_log_events=ON

In a production environment set max_wal_senders and max_replication_slots higher if you have other replicas attached to postgres. See specific instructions for RDS and Google Cloud SQL.

Setting up the Statetrace Buffer Database

Statetrace stores events and session information in a seperate Postgres datatabase as a durable buffer. This database can be pruned as events are dispatched to the outbound handlers.

Nothing special regarding the WAL needs to be configured for the buffer database, however it should be a seperate database from your application DB.

Complete configuration example

Putting this all together depends on your platform. As an example, the following is a docker-compose file that shows how to configure statetrace for use with Postgres.

version: '3'

services:
# This is the database to which your application would connect to store application data
application_db:
image: postgres
environment:
- POSTGRES_DB=postgres
- POSTGRES_USER=postgres
- POSTGRES_PASSWORD=postgres

# We need to start postgres with special WAL settings
command:
- "postgres"
- "-c"
- "wal_level=logical"
- "-c"
- "max_wal_senders=2"
- "-c"
- "max_replication_slots=2"
ports:
- "5432:5432"

# This is the statetrace daemon that will monitor changes
statetrace:
image: statetraceofficial/statetrace-beta
environment:
STATETRACE_DATABASE_URL: postgres://postgres:postgres@statetrace_db:5432/postgres
STATETRACE_SECRET_KEY_BASE: <...>
STATETRACE_LICENSE: <...>
STATETRACE_GQL_API_KEY: <...>
STATETRACE_INBOUND_CONFIG: |
inbounds:
- name: Postgres DB
database_type: postgres
host: application_db
username: postgres
password: postgres
port: 5432
database: postgres
publication: statetrace
slot: statetrace
log_new_attributes: true
tables:
- schema: public
name: secret_table
include: false
- schema: public
name: users
columns:
social_security_number:
hash_value_before_logging: true
outbounds:
- name: Postgres Search
handler: search
STATETRACE_PORT: 4000
depends_on:
- statetrace_db
- application_db
ports:
- "4000:4000"

# This is the database that stores the buffer and meta-information
statetrace_db:
image: postgres
environment:
- POSTGRES_DB=postgres
- POSTGRES_USER=postgres
- POSTGRES_PASSWORD=postgres

Annotating information

There are already some ready-made integrations to use annotations with your favorite framework, however statetrace works effortlessly on anything.

Statetrace integrates on the transaction level. To annotate your changes in your application, simply insert an entry into a special "statetrace_annotations" table in your application's database.

Steps to annotating

To setup annotations, you will need to create a table in your application database:

CREATE TABLE statetrace_annotations(
id TEXT PRIMARY KEY,
kind TEXT,
meta JSON,
payload JSON,
);

This table can be safely truncated at any time because Statetrace reads its values from the logical replication log.

Now insert the session annotation as JSON data. This requires a unique id that you will later use to link transactions to this session.

BEGIN;
INSERT INTO
statetrace_annotations
(id, kind, meta, payload)
VALUES (
(
'bc573b1d-5ed0-4018-bcf4-74ce1b23875d',
'_st.app.sess',
'{}'::JSON,
'{"session_actor_email": "[email protected]", "session_actor_id": "123"}'::JSON,
)
);
COMMIT;

You only need to insert a session once. Now your application business logic can be wrapped in a transaction that associates it with the session ID from above.

BEGIN;

--Do all of your work
INSERT ... VALUES ...;
UPDATE ... WHERE ...;
DELETE ... WHERE ...;

-- Annotate transaction with link to session
INSERT INTO
statetrace_annotations
(id, kind, meta, payload)
VALUES (
(
'9c4c25de-1bed-4037-b981-e3127fadfc32',
'_st.app.act',
'{}'::JSON,
'{"action_session_id": "bc573b1d-5ed0-4018-bcf4-74ce1b23875d"}'::JSON,
)
);
COMMIT;

For more information about what additional information can be passed along in the annotations, follow the guide.

Using the GQL API

To access the buffer and search information, Statetrace provides a GraphQL API. This API can be accessed by submitting HTTP requests to the endpoint http://localhost:4000/graphql/api.

To quickly try out the API we can use GraphiQL. GraphiQL is an optional GUI for running GQL queries. It is unsafe to run in production, however to use it locally, just enable graphiQL by setting STATETRACE_ENABLE_GRAPHIQL to a truthy value:

version: '3'

...

# This is the statetrace daemon that will monitor changes
statetrace:
image: statetraceofficial/statetrace-beta
environment:
- STATETRACE_ENABLE_GRAPHIQL=1

Next go to the graphiql interface: http://localhost:4000/graphql/graphiql.

Now you are ready to run a GQL Query:

query {
scan(inboundName:"Postgres DB") {
rowLsnId
relationSchema
relationName
rowPk
rowRecord
}
}

For a more detailed exploration of the GraphQL API read the reference.