Database
Starnion uses PostgreSQL 16 + the pgvector extension as its primary data store. Text data and 768-dimensional vector embeddings are managed in the same database, implementing a hybrid RAG (vector similarity + full-text search).
Full Schema Overview
PostgreSQL 16 + pgvector
โ
โโโ Auth & Identity
โ โโโ users # Central user table
โ โโโ platform_identities # Platform-specific ID mapping (Telegram, web, etc.)
โ โโโ platform_link_codes # Account link codes (10-minute TTL)
โ
โโโ Conversations
โ โโโ conversations # Conversation sessions (includes AI SDK thread_id)
โ โโโ messages # Conversation messages (attachments as JSONB)
โ
โโโ Finance
โ โโโ finances # Expense tracker transactions
โ โโโ (budget: profiles.preferences JSONB)
โ
โโโ Personal Records (with vector embeddings)
โ โโโ daily_logs # Daily logs and diary entries (vector 768)
โ โโโ diary_entries # Diary entries (vector 768)
โ โโโ memos # Memos (vector 768)
โ โโโ goals # Goal management
โ โโโ goal_checkins # Goal check-in records
โ โโโ ddays # D-Days
โ
โโโ Media & Documents (with vector embeddings)
โ โโโ documents # Uploaded document metadata
โ โโโ document_sections # Document chunks (vector 768)
โ โโโ images # Image gallery
โ โโโ audios # Audio gallery
โ
โโโ Knowledge & Search (with vector embeddings)
โ โโโ knowledge_base # Pattern analysis results and knowledge (vector 768)
โ โโโ searches # Web search history (vector 768)
โ
โโโ Settings & Integrations
โ โโโ skills # Skill catalog
โ โโโ user_skills # Per-user skill activation state
โ โโโ providers # LLM provider settings
โ โโโ personas # AI personas
โ โโโ google_tokens # Google OAuth2 tokens
โ โโโ integration_keys # External service API keys
โ
โโโ Channels & Notifications
โ โโโ channel_settings # Telegram channel settings
โ โโโ telegram_approved_contacts # Telegram approved contacts
โ โโโ telegram_pairing_requests # Telegram pairing requests
โ โโโ notifications # Notification history
โ
โโโ Usage
โ โโโ usage_logs # LLM token usage logs
โ
โโโ Meta
โโโ schema_migrations # Schema version management
Core Table Details
users โ Users
The root table for all user data. Supports both email/password authentication and platform-based authentication.
CREATE TABLE users (
id TEXT PRIMARY KEY, -- UUID
display_name TEXT,
email TEXT UNIQUE, -- Email auth users only
password_hash TEXT, -- bcrypt
role TEXT DEFAULT 'user', -- 'admin' | 'user'
preferences JSONB DEFAULT '{}',
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
platform_identities โ Platform ID Mapping
Maps user IDs from various platforms (Telegram, web, Discord, etc.) to a single user_id.
CREATE TABLE platform_identities (
user_id TEXT REFERENCES users(id),
platform TEXT, -- 'telegram' | 'web' | 'discord' | 'credential'
platform_id TEXT, -- Unique ID within the platform (telegram chat_id, email, etc.)
display_name TEXT,
metadata JSONB DEFAULT '{}',
last_active_at TIMESTAMPTZ DEFAULT NOW(),
PRIMARY KEY (platform, platform_id)
);
conversations / messages โ Conversations
Stores conversation sessions. thread_id is used to maintain conversation context in the AI SDK.
CREATE TABLE conversations (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id TEXT REFERENCES users(id),
title TEXT DEFAULT 'New Conversation',
platform TEXT DEFAULT 'web', -- 'web' | 'telegram'
thread_id TEXT, -- Conversation thread ID
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE TABLE messages (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
conversation_id UUID REFERENCES conversations(id),
role TEXT CHECK (role IN ('user', 'assistant')),
content TEXT,
attachments JSONB, -- Array of attachment URLs
created_at TIMESTAMPTZ DEFAULT NOW()
);
finances โ Expense Tracker
CREATE TABLE finances (
id BIGSERIAL PRIMARY KEY,
user_id TEXT REFERENCES users(id),
amount INTEGER, -- Amount in KRW. Income: positive, Expense: negative
category TEXT, -- 'food' | 'transport' | 'shopping' | 'income' | etc.
description TEXT,
created_at TIMESTAMPTZ DEFAULT NOW()
);
daily_logs โ Daily Logs (Vector Embeddings)
Stores conversation content and diary entries as vectors. This corresponds to Layer 1 of the 4-Layer RAG memory.
CREATE TABLE daily_logs (
id BIGSERIAL PRIMARY KEY,
user_id TEXT REFERENCES users(id),
content TEXT,
sentiment TEXT, -- 'good' | 'neutral' | 'bad' | 'tired' | 'happy'
embedding vector(768), -- Gemini text-embedding-004
content_tsv tsvector, -- For full-text search (auto-updated by trigger)
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- HNSW index: fast approximate nearest neighbor search
CREATE INDEX ON daily_logs USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- Full-text search GIN index
CREATE INDEX ON daily_logs USING gin(content_tsv);
document_sections โ Document Chunks (Vector Embeddings)
Stores uploaded documents split into chunks. This corresponds to Layer 3 of the 4-Layer RAG memory.
CREATE TABLE document_sections (
id BIGSERIAL PRIMARY KEY,
document_id BIGINT REFERENCES documents(id),
content TEXT,
embedding vector(768),
content_tsv tsvector,
metadata JSONB DEFAULT '{}' -- Page number, position, etc.
);
knowledge_base โ Knowledge Base (Vector Embeddings)
Stores spending pattern analysis results, user preferences, and personalization data. This corresponds to Layer 2 of the 4-Layer RAG memory.
CREATE TABLE knowledge_base (
id BIGSERIAL PRIMARY KEY,
user_id TEXT REFERENCES users(id),
key TEXT, -- Knowledge type (e.g., 'pattern_analysis', 'user_preference')
value TEXT, -- Knowledge content
source TEXT, -- Source skill
embedding vector(768),
content_tsv tsvector,
created_at TIMESTAMPTZ DEFAULT NOW()
);
skills / user_skills โ Skill Management
CREATE TABLE skills (
id TEXT PRIMARY KEY, -- Skill ID (e.g., 'finance', 'weather')
name TEXT,
description TEXT,
category TEXT,
emoji TEXT DEFAULT '',
tools TEXT[] DEFAULT '{}', -- List of tools provided by the skill
reports TEXT[] DEFAULT '{}', -- Types of reports generated
cron_rules TEXT[] DEFAULT '{}', -- Cron schedule rules
enabled_by_default BOOLEAN DEFAULT TRUE,
permission_level INT DEFAULT 1,
sort_order INT DEFAULT 0,
updated_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE TABLE user_skills (
user_id TEXT,
skill_id TEXT REFERENCES skills(id),
enabled BOOLEAN,
updated_at TIMESTAMPTZ DEFAULT NOW(),
PRIMARY KEY (user_id, skill_id)
);
Vector Search (pgvector)
Overview
The pgvector extension is used to store 768-dimensional embedding vectors and perform cosine similarity searches.
- Embedding model: Google
text-embedding-004(768 dimensions) - Index type: HNSW (Hierarchical Navigable Small World)
- Similarity function: Cosine similarity (
<=>operator)
Tables Using Vectors
| Table | Purpose | RAG Layer |
|---|---|---|
daily_logs |
Conversation and diary memory search | Layer 1 |
knowledge_base |
User pattern and preference search | Layer 2 |
document_sections |
Uploaded document content search | Layer 3 |
diary_entries |
Diary semantic search | - |
memos |
Memo semantic search | - |
searches |
Web search history search | - |
match_logs Function
A vector similarity search function used in the Agentโs memory search.
SELECT * FROM match_logs(
query_embedding := $1::vector, -- 768-dimensional query vector
match_threshold := 0.7, -- Minimum similarity threshold
match_count := 5, -- Maximum number of results to return
p_user_id := 'uuid...'
);
-- Returns: id, content, similarity (cosine similarity 0โ1)
Hybrid Search
Combines vector similarity search with PostgreSQL Full-Text Search.
User query: "food I ate last week"
โ
โโโโโโโโโโโดโโโโโโโโโโโ
โผ โผ
pgvector search FTS search
(semantic similarity) (keyword matching)
embedding <=> tsvector @@ tsquery
query_vector to_tsquery('simple', 'ate & food')
โ โ
โโโโโโโโโโโฌโโโโโโโโโโโ
โผ
Merge & re-rank results
(vector similarity + FTS score)
Automatic tsvector Updates
On INSERT/UPDATE, a PostgreSQL trigger automatically updates content_tsv.
-- Example: daily_logs trigger
CREATE TRIGGER trg_daily_logs_tsv
BEFORE INSERT OR UPDATE OF content ON daily_logs
FOR EACH ROW EXECUTE FUNCTION daily_logs_tsv_trigger();
-- Internally: NEW.content_tsv := to_tsvector('simple', COALESCE(NEW.content, ''))
The same type of trigger is applied to the knowledge_base, document_sections, diary_entries, memos, and searches tables.
Schema Version Management
Fresh Installation
Use docker/init.sql. This is a baseline file that creates the entire schema at once.
# Runs automatically on Docker initialization
docker compose up -d postgres
Version Upgrade
Apply incremental migration files from the docker/migrations/incremental/ directory in order.
# Example: apply a new migration
psql $DATABASE_URL -f docker/migrations/incremental/031_new_feature.sql
The currently applied version is recorded in the schema_migrations table.
SELECT version, applied_at FROM schema_migrations ORDER BY applied_at;
-- 1.0.0 | 2025-01-01 00:00:00+00
Connection Methods
Gateway (Go)
Uses database/sql + lib/pq driver.
DATABASE_URL=postgres://user:pass@localhost:5432/starnion?sslmode=disable
Agent (Python)
Uses psycopg (psycopg3) + psycopg-pool connection pool.
DATABASE_URL=postgresql://user:pass@localhost:5432/starnion
The Agentโs conversation state is persisted in the same PostgreSQL instance.
Data Isolation
Each userโs data is completely isolated by the user_id foreign key. One user cannot access another userโs data, and all queries include a WHERE user_id = $1 condition.
Performance Considerations
| Index | Target Tables | Purpose |
|---|---|---|
| HNSW (m=16, ef=64) | daily_logs, document_sections, knowledge_base, diary_entries, memos, searches |
Approximate nearest neighbor vector search |
| GIN | content_tsv column in above tables |
Full-text search |
| B-tree | user_id, created_at columns |
Filtering and sorting |
| Composite index | conversations(user_id, updated_at DESC) |
Conversation list retrieval |
HNSW parameters:
m = 16: Maximum number of connections per node (higher = more accurate but more memory)ef_construction = 64: Search scope during index construction (higher = better quality, longer build time)