Database

Starnion uses PostgreSQL 16 + the pgvector extension as its primary data store. Text data and 768-dimensional vector embeddings are managed in the same database, implementing a hybrid RAG (vector similarity + full-text search).


Full Schema Overview

PostgreSQL 16 + pgvector
โ”‚
โ”œโ”€โ”€ Auth & Identity
โ”‚   โ”œโ”€โ”€ users                    # Central user table
โ”‚   โ”œโ”€โ”€ platform_identities      # Platform-specific ID mapping (Telegram, web, etc.)
โ”‚   โ””โ”€โ”€ platform_link_codes      # Account link codes (10-minute TTL)
โ”‚
โ”œโ”€โ”€ Conversations
โ”‚   โ”œโ”€โ”€ conversations            # Conversation sessions (includes AI SDK thread_id)
โ”‚   โ””โ”€โ”€ messages                 # Conversation messages (attachments as JSONB)
โ”‚
โ”œโ”€โ”€ Finance
โ”‚   โ”œโ”€โ”€ finances                 # Expense tracker transactions
โ”‚   โ””โ”€โ”€ (budget: profiles.preferences JSONB)
โ”‚
โ”œโ”€โ”€ Personal Records (with vector embeddings)
โ”‚   โ”œโ”€โ”€ daily_logs               # Daily logs and diary entries (vector 768)
โ”‚   โ”œโ”€โ”€ diary_entries            # Diary entries (vector 768)
โ”‚   โ”œโ”€โ”€ memos                    # Memos (vector 768)
โ”‚   โ”œโ”€โ”€ goals                    # Goal management
โ”‚   โ”œโ”€โ”€ goal_checkins            # Goal check-in records
โ”‚   โ””โ”€โ”€ ddays                    # D-Days
โ”‚
โ”œโ”€โ”€ Media & Documents (with vector embeddings)
โ”‚   โ”œโ”€โ”€ documents                # Uploaded document metadata
โ”‚   โ”œโ”€โ”€ document_sections        # Document chunks (vector 768)
โ”‚   โ”œโ”€โ”€ images                   # Image gallery
โ”‚   โ””โ”€โ”€ audios                   # Audio gallery
โ”‚
โ”œโ”€โ”€ Knowledge & Search (with vector embeddings)
โ”‚   โ”œโ”€โ”€ knowledge_base           # Pattern analysis results and knowledge (vector 768)
โ”‚   โ””โ”€โ”€ searches                 # Web search history (vector 768)
โ”‚
โ”œโ”€โ”€ Settings & Integrations
โ”‚   โ”œโ”€โ”€ skills                   # Skill catalog
โ”‚   โ”œโ”€โ”€ user_skills              # Per-user skill activation state
โ”‚   โ”œโ”€โ”€ providers                # LLM provider settings
โ”‚   โ”œโ”€โ”€ personas                 # AI personas
โ”‚   โ”œโ”€โ”€ google_tokens            # Google OAuth2 tokens
โ”‚   โ””โ”€โ”€ integration_keys         # External service API keys
โ”‚
โ”œโ”€โ”€ Channels & Notifications
โ”‚   โ”œโ”€โ”€ channel_settings         # Telegram channel settings
โ”‚   โ”œโ”€โ”€ telegram_approved_contacts  # Telegram approved contacts
โ”‚   โ”œโ”€โ”€ telegram_pairing_requests   # Telegram pairing requests
โ”‚   โ””โ”€โ”€ notifications            # Notification history
โ”‚
โ”œโ”€โ”€ Usage
โ”‚   โ””โ”€โ”€ usage_logs               # LLM token usage logs
โ”‚
โ””โ”€โ”€ Meta
    โ””โ”€โ”€ schema_migrations        # Schema version management

Core Table Details

users โ€” Users

The root table for all user data. Supports both email/password authentication and platform-based authentication.

CREATE TABLE users (
    id            TEXT        PRIMARY KEY,          -- UUID
    display_name  TEXT,
    email         TEXT        UNIQUE,               -- Email auth users only
    password_hash TEXT,                             -- bcrypt
    role          TEXT        DEFAULT 'user',       -- 'admin' | 'user'
    preferences   JSONB       DEFAULT '{}',
    created_at    TIMESTAMPTZ DEFAULT NOW(),
    updated_at    TIMESTAMPTZ DEFAULT NOW()
);

platform_identities โ€” Platform ID Mapping

Maps user IDs from various platforms (Telegram, web, Discord, etc.) to a single user_id.

CREATE TABLE platform_identities (
    user_id        TEXT  REFERENCES users(id),
    platform       TEXT,       -- 'telegram' | 'web' | 'discord' | 'credential'
    platform_id    TEXT,       -- Unique ID within the platform (telegram chat_id, email, etc.)
    display_name   TEXT,
    metadata       JSONB DEFAULT '{}',
    last_active_at TIMESTAMPTZ DEFAULT NOW(),
    PRIMARY KEY (platform, platform_id)
);

conversations / messages โ€” Conversations

Stores conversation sessions. thread_id is used to maintain conversation context in the AI SDK.

CREATE TABLE conversations (
    id         UUID    PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id    TEXT    REFERENCES users(id),
    title      TEXT    DEFAULT 'New Conversation',
    platform   TEXT    DEFAULT 'web',   -- 'web' | 'telegram'
    thread_id  TEXT,                    -- Conversation thread ID
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE messages (
    id              UUID  PRIMARY KEY DEFAULT gen_random_uuid(),
    conversation_id UUID  REFERENCES conversations(id),
    role            TEXT  CHECK (role IN ('user', 'assistant')),
    content         TEXT,
    attachments     JSONB,          -- Array of attachment URLs
    created_at      TIMESTAMPTZ DEFAULT NOW()
);

finances โ€” Expense Tracker

CREATE TABLE finances (
    id          BIGSERIAL PRIMARY KEY,
    user_id     TEXT REFERENCES users(id),
    amount      INTEGER,    -- Amount in KRW. Income: positive, Expense: negative
    category    TEXT,       -- 'food' | 'transport' | 'shopping' | 'income' | etc.
    description TEXT,
    created_at  TIMESTAMPTZ DEFAULT NOW()
);

daily_logs โ€” Daily Logs (Vector Embeddings)

Stores conversation content and diary entries as vectors. This corresponds to Layer 1 of the 4-Layer RAG memory.

CREATE TABLE daily_logs (
    id          BIGSERIAL PRIMARY KEY,
    user_id     TEXT REFERENCES users(id),
    content     TEXT,
    sentiment   TEXT,           -- 'good' | 'neutral' | 'bad' | 'tired' | 'happy'
    embedding   vector(768),    -- Gemini text-embedding-004
    content_tsv tsvector,       -- For full-text search (auto-updated by trigger)
    created_at  TIMESTAMPTZ DEFAULT NOW()
);

-- HNSW index: fast approximate nearest neighbor search
CREATE INDEX ON daily_logs USING hnsw (embedding vector_cosine_ops)
    WITH (m = 16, ef_construction = 64);

-- Full-text search GIN index
CREATE INDEX ON daily_logs USING gin(content_tsv);

document_sections โ€” Document Chunks (Vector Embeddings)

Stores uploaded documents split into chunks. This corresponds to Layer 3 of the 4-Layer RAG memory.

CREATE TABLE document_sections (
    id          BIGSERIAL PRIMARY KEY,
    document_id BIGINT REFERENCES documents(id),
    content     TEXT,
    embedding   vector(768),
    content_tsv tsvector,
    metadata    JSONB DEFAULT '{}'     -- Page number, position, etc.
);

knowledge_base โ€” Knowledge Base (Vector Embeddings)

Stores spending pattern analysis results, user preferences, and personalization data. This corresponds to Layer 2 of the 4-Layer RAG memory.

CREATE TABLE knowledge_base (
    id          BIGSERIAL PRIMARY KEY,
    user_id     TEXT REFERENCES users(id),
    key         TEXT,   -- Knowledge type (e.g., 'pattern_analysis', 'user_preference')
    value       TEXT,   -- Knowledge content
    source      TEXT,   -- Source skill
    embedding   vector(768),
    content_tsv tsvector,
    created_at  TIMESTAMPTZ DEFAULT NOW()
);

skills / user_skills โ€” Skill Management

CREATE TABLE skills (
    id                 TEXT PRIMARY KEY,    -- Skill ID (e.g., 'finance', 'weather')
    name               TEXT,
    description        TEXT,
    category           TEXT,
    emoji              TEXT DEFAULT '',
    tools              TEXT[] DEFAULT '{}', -- List of tools provided by the skill
    reports            TEXT[] DEFAULT '{}', -- Types of reports generated
    cron_rules         TEXT[] DEFAULT '{}', -- Cron schedule rules
    enabled_by_default BOOLEAN DEFAULT TRUE,
    permission_level   INT DEFAULT 1,
    sort_order         INT DEFAULT 0,
    updated_at         TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE user_skills (
    user_id    TEXT,
    skill_id   TEXT REFERENCES skills(id),
    enabled    BOOLEAN,
    updated_at TIMESTAMPTZ DEFAULT NOW(),
    PRIMARY KEY (user_id, skill_id)
);

Vector Search (pgvector)

Overview

The pgvector extension is used to store 768-dimensional embedding vectors and perform cosine similarity searches.

  • Embedding model: Google text-embedding-004 (768 dimensions)
  • Index type: HNSW (Hierarchical Navigable Small World)
  • Similarity function: Cosine similarity (<=> operator)

Tables Using Vectors

Table Purpose RAG Layer
daily_logs Conversation and diary memory search Layer 1
knowledge_base User pattern and preference search Layer 2
document_sections Uploaded document content search Layer 3
diary_entries Diary semantic search -
memos Memo semantic search -
searches Web search history search -

match_logs Function

A vector similarity search function used in the Agentโ€™s memory search.

SELECT * FROM match_logs(
    query_embedding := $1::vector,  -- 768-dimensional query vector
    match_threshold := 0.7,         -- Minimum similarity threshold
    match_count     := 5,           -- Maximum number of results to return
    p_user_id       := 'uuid...'
);
-- Returns: id, content, similarity (cosine similarity 0โ€“1)

Combines vector similarity search with PostgreSQL Full-Text Search.

User query: "food I ate last week"
                โ”‚
      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
      โ–ผ                    โ–ผ
  pgvector search       FTS search
  (semantic similarity) (keyword matching)
  embedding <=>         tsvector @@ tsquery
  query_vector          to_tsquery('simple', 'ate & food')
      โ”‚                    โ”‚
      โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                โ–ผ
          Merge & re-rank results
          (vector similarity + FTS score)

Automatic tsvector Updates

On INSERT/UPDATE, a PostgreSQL trigger automatically updates content_tsv.

-- Example: daily_logs trigger
CREATE TRIGGER trg_daily_logs_tsv
    BEFORE INSERT OR UPDATE OF content ON daily_logs
    FOR EACH ROW EXECUTE FUNCTION daily_logs_tsv_trigger();
-- Internally: NEW.content_tsv := to_tsvector('simple', COALESCE(NEW.content, ''))

The same type of trigger is applied to the knowledge_base, document_sections, diary_entries, memos, and searches tables.


Schema Version Management

Fresh Installation

Use docker/init.sql. This is a baseline file that creates the entire schema at once.

# Runs automatically on Docker initialization
docker compose up -d postgres

Version Upgrade

Apply incremental migration files from the docker/migrations/incremental/ directory in order.

# Example: apply a new migration
psql $DATABASE_URL -f docker/migrations/incremental/031_new_feature.sql

The currently applied version is recorded in the schema_migrations table.

SELECT version, applied_at FROM schema_migrations ORDER BY applied_at;
-- 1.0.0 | 2025-01-01 00:00:00+00

Connection Methods

Gateway (Go)

Uses database/sql + lib/pq driver.

DATABASE_URL=postgres://user:pass@localhost:5432/starnion?sslmode=disable

Agent (Python)

Uses psycopg (psycopg3) + psycopg-pool connection pool.

DATABASE_URL=postgresql://user:pass@localhost:5432/starnion

The Agentโ€™s conversation state is persisted in the same PostgreSQL instance.


Data Isolation

Each userโ€™s data is completely isolated by the user_id foreign key. One user cannot access another userโ€™s data, and all queries include a WHERE user_id = $1 condition.


Performance Considerations

Index Target Tables Purpose
HNSW (m=16, ef=64) daily_logs, document_sections, knowledge_base, diary_entries, memos, searches Approximate nearest neighbor vector search
GIN content_tsv column in above tables Full-text search
B-tree user_id, created_at columns Filtering and sorting
Composite index conversations(user_id, updated_at DESC) Conversation list retrieval

HNSW parameters:

  • m = 16: Maximum number of connections per node (higher = more accurate but more memory)
  • ef_construction = 64: Search scope during index construction (higher = better quality, longer build time)

Copyright © 2025 StarNion. All rights reserved.  |  v0.1.2