Agent (TypeScript)

Role

The Agent is the AI brain of Starnion. Written in TypeScript/Node.js, it operates using the Anthropic AI SDK v6. It receives gRPC requests from the Gateway, performs AI reasoning and skill execution, and returns the final response.

Core roles:

  • Analyze user messages to understand intent
  • Select and execute the appropriate skill (diary, finance, goals, image)
  • Generate responses using Anthropic Claude models
  • Deliver real-time responses via gRPC streaming

LangGraph ReAct Architecture

The Agent uses LangGraphโ€™s ReAct (Reasoning + Acting) pattern.

User message
      โ”‚
      โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚           ReAct Loop                    โ”‚
โ”‚                                         โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    Think                  โ”‚
โ”‚  โ”‚  LLM     โ”‚โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
โ”‚  โ”‚(Reasoning)โ”‚                     โ”‚   โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                      โ–ผ   โ”‚
โ”‚       โ–ฒ              โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”โ”‚
โ”‚       โ”‚ Observe      โ”‚ Skill Selection  โ”‚โ”‚
โ”‚       โ”‚              โ”‚ (Tool Selection) โ”‚โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”      โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜โ”‚
โ”‚  โ”‚ Skill      โ”‚               โ”‚ Execute  โ”‚
โ”‚  โ”‚ Result     โ”‚โ—„โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜         โ”‚
โ”‚  โ”‚ (Tool Res) โ”‚                         โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                         โ”‚
โ”‚                                         โ”‚
โ”‚  [Repeat: continue if more skills needed]โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
      โ”‚ Final response decided
      โ–ผ
   gRPC streaming response

Operation Flow Summary

  1. Receive input: Receive gRPC request from Gateway (user message + conversation ID + user ID)
  2. Load context: Load conversation history, user profile, and current persona
  3. Memory search: Search 4-Layer memory for relevant information (pgvector similarity search)
  4. LLM reasoning: Pass system prompt + conversation history + memory context to LLM
  5. Skill execution: When LLM selects a needed skill, execute the corresponding function
  6. Loop: Repeat the loop if additional reasoning is needed based on skill results
  7. Stream response: Send the final answer as a gRPC stream in real time
  8. Save memory: Record the conversation content in the daily log

Message Processing Flow

User input: "How much did I spend on food this month?"
      โ”‚
      โ–ผ
[Identify intent]
  โ†’ Detect "expense query" intent
      โ”‚
      โ–ผ
[Memory search]
  โ†’ Search for relevant expense data (Layer 4: SQL)
  โ†’ Search memory for previous similar questions (Layer 1: pgvector)
      โ”‚
      โ–ผ
[Skill selection]
  โ†’ Call get_finance_summary(category="food", period="this_month")
      โ”‚
      โ–ผ
[Skill execution]
  โ†’ Aggregate this month's food transactions from DB
  โ†’ Result: {"total": 234500, "transactions": [...]}
      โ”‚
      โ–ผ
[LLM final response generation]
  โ†’ "Your food spending this month is 234,500 won. That's up 18% from last month (198,000 won)."
      โ”‚
      โ–ผ
[gRPC streaming]
  โ†’ Stream response tokens to Gateway in real time
      โ”‚
      โ–ผ
[Save memory]
  โ†’ Record this conversation in the daily log

Multi-LLM Routing

The Agent determines which model to call based on the LLM provider registered per user and the currently selected Persona.

Model Selection Priority

1. Model explicitly selected in the current conversation
      โ†“ (if none)
2. Model linked to the current persona
      โ†“ (if none)
3. First active model of the user's default provider
      โ†“ (if none)
4. System default (Gemini Flash)

Supported Providers

Provider Implementation
Gemini google-generativeai SDK
OpenAI openai SDK (ChatCompletion API)
Anthropic anthropic SDK (Messages API)
Z.AI OpenAI-compatible endpoint
Custom OpenAI-compatible base URL

4-Layer Memory System

The Agent manages user context through a memory system composed of four layers.

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                 4-Layer Memory                      โ”‚
โ”‚                                                     โ”‚
โ”‚  Layer 1: Daily Logs                                โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                   โ”‚
โ”‚  โ”‚ pgvector, 768-dim embeddings โ”‚                   โ”‚
โ”‚  โ”‚ Conversation records,        โ”‚                   โ”‚
โ”‚  โ”‚ emotions, keywords           โ”‚                   โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                   โ”‚
โ”‚                 โ†‘ similarity search                 โ”‚
โ”‚  Layer 2: Knowledge Base                            โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                   โ”‚
โ”‚  โ”‚ pgvector, 768-dim embeddings โ”‚                   โ”‚
โ”‚  โ”‚ User preferences,            โ”‚                   โ”‚
โ”‚  โ”‚ learned patterns             โ”‚                   โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                   โ”‚
โ”‚                 โ†‘ similarity search                 โ”‚
โ”‚  Layer 3: Document Sections                         โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                   โ”‚
โ”‚  โ”‚ pgvector, 768-dim embeddings โ”‚                   โ”‚
โ”‚  โ”‚ Chunks of uploaded documents โ”‚                   โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                   โ”‚
โ”‚                 โ†‘ SQL query                         โ”‚
โ”‚  Layer 4: Recent Finance                            โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                   โ”‚
โ”‚  โ”‚ PostgreSQL SQL               โ”‚                   โ”‚
โ”‚  โ”‚ Last 30 days of transactions โ”‚                   โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Layer 1: Daily Logs

  • Store: PostgreSQL + pgvector extension
  • Embedding dimensions: 768 (Gemini text-embedding-004)
  • Content: Conversation content, emotional state, key keywords, summaries
  • Search method: Cosine-similarity-based semantic search
  • Use case: Recalling past conversations โ€” โ€œWhat did I say last time?โ€

Layer 2: Knowledge Base

  • Store: PostgreSQL + pgvector
  • Embedding dimensions: 768
  • Content: User preferences, recurring patterns, learned personalization data
  • Use case: Personalization context such as โ€œthe user likes coffeeโ€ or โ€œsalary arrives on the 25th of every monthโ€

Layer 3: Document Sections

  • Store: PostgreSQL + pgvector
  • Embedding dimensions: 768
  • Content: Chunks of PDFs, Word docs, etc. uploaded by the user
  • Chunking method: Split into semantic units (default 512 tokens)
  • Use case: โ€œFind the penalty clause in the contract I uploadedโ€

Layer 4: Recent Finance

  • Store: PostgreSQL (plain SQL, no vectors)
  • Content: Transactions from the last 30 days
  • Search method: SQL aggregate queries
  • Use case: โ€œHow much did I spend on food this month?โ€, โ€œWere there any cafรฉ expenses yesterday?โ€

Embeddings

All vector embeddings use Googleโ€™s text-embedding-004 model.

Item Value
Model text-embedding-004
Dimensions 768
Similarity function Cosine similarity (<=> operator)
Language Multilingual including Korean

Embedding generation flow:

Text input
    โ”‚
    โ–ผ
Call Gemini Embedding API
    โ”‚
    โ–ผ
Returns 768-dimensional float vector
    โ”‚
    โ–ผ
Store in PostgreSQL pgvector column
(e.g., VECTOR(768))

gRPC Interface

The Agent operates as a gRPC server using the default port 50051.

Service Definition (protobuf)

service AgentService {
  // Unary chat request/response
  rpc Chat(ChatRequest) returns (ChatResponse);

  // Server streaming: send response tokens in real time
  rpc ChatStream(ChatRequest) returns (stream ChatStreamResponse);
}

Communication Flow

Gateway (Go)                    Agent (Python)
    โ”‚                               โ”‚
    โ”‚โ”€โ”€ ChatRequest โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–บโ”‚
    โ”‚   (message, user_id,          โ”‚
    โ”‚    conversation_id,           โ”‚  ReAct loop executes
    โ”‚    context, files)            โ”‚  Skill execution
    โ”‚                               โ”‚
    โ”‚โ—„โ”€โ”€ ChatStreamResponse โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”‚ (token-by-token streaming)
    โ”‚โ—„โ”€โ”€ ChatStreamResponse โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”‚
    โ”‚โ—„โ”€โ”€ ChatStreamResponse โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”‚
    โ”‚         ...                   โ”‚
    โ”‚โ—„โ”€โ”€ [stream end] โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”‚

The Gateway receives the streaming response and delivers it to the client via WebSocket or SSE (Server-Sent Events).


Skill Execution Mechanism

Skills are implemented as LangChain Tools. When the LLM determines which skill to call and with what parameters in JSON format, the Agent executes the corresponding Python function.

Skill Categories

Category Example Skills
Finance Add/view transactions, check budget, statistics
Schedule Google Calendar integration
Memo Create/view/delete memos
Diary Write/view diary entries
Goals Set goals/check in/evaluate
D-Day Register/view D-Days
Documents Document search, PDF summary
Web search Tavily, Naver Search API
Weather Current weather lookup
Calculator Expression calculation
Translation Multi-language translation

Skill Activation

Skills can be enabled/disabled per user. Disabled skills are not included in the LLMโ€™s Tool list, so they cannot be called at all.

Control with the toggle under Settings โ†’ Skills or via the API POST /api/v1/skills/:id/toggle.


Docker Configuration

The Agent uses docker/Dockerfile.agent and is defined in docker-compose.yml as follows.

agent:
  build:
    context: ../agent
    dockerfile: ../docker/Dockerfile.agent
  container_name: starnion-agent
  ports:
    - "${GRPC_PORT:-50051}:50051"  # gRPC server
  environment:
    DATABASE_URL: postgres://...   # PostgreSQL connection
    GRPC_PORT: 50051
  depends_on:
    postgres:
      condition: service_healthy

The Agent starts after PostgreSQL is ready. The Gateway attempts to connect after the Agent starts.


Technology Stack Summary

Item Choice Version
Language Python 3.13+
AI orchestration LangGraph 0.4+
LLM clients langchain-google-genai, langchain-anthropic, langchain-openai latest
Conversation state storage langgraph-checkpoint-postgres 2.0+
DB driver psycopg (psycopg3) + psycopg-pool 3.2+
gRPC server grpcio 1.70+
Image generation/analysis google-genai (Gemini) 1.0+
Document parsing pypdf, python-docx, openpyxl, python-pptx latest
Web search tavily-python 0.5+
Browser automation playwright 1.40+
QR code qrcode[pil] 8.0+
PDF generation reportlab 4.4+

Skill Architecture

Each skill is implemented as an independent Python package.

agent/src/starnion_agent/skills/
โ”œโ”€โ”€ finance/          # Expense tracker
โ”‚   โ”œโ”€โ”€ __init__.py   # Skill registration
โ”‚   โ”œโ”€โ”€ tools.py      # LangChain Tool function definitions
โ”‚   โ””โ”€โ”€ SKILL.md      # Skill description (injected into LLM system prompt)
โ”œโ”€โ”€ weather/
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ tools.py
โ”‚   โ””โ”€โ”€ SKILL.md
โ”œโ”€โ”€ loader.py         # Dynamic skill loading
โ”œโ”€โ”€ guard.py          # Skill access permission check
โ””โ”€โ”€ registry.py       # Full skill registry

Role of SKILL.md

The SKILL.md file in each skill directory is injected directly into the LLM system prompt. This lets the LLM know exactly when and how to use each skill.

System prompt = base persona + SKILL.md content from active skills

Skill Guard

Skills disabled by the user are blocked in guard.py. The tools of inactive skills are not exposed to the LLM, making it impossible for them to be called at all.


Logs and HTTP Server

In addition to the gRPC port (50051), the Agent also runs an HTTP server (port 8082).

Port Purpose
50051 gRPC server (communication with Gateway)
8082 HTTP server (log streaming, document indexing, search embedding)

The Gatewayโ€™s /api/v1/logs/agent endpoint proxies to the Agentโ€™s port 8082 to provide real-time Agent logs.


Copyright © 2025 StarNion. All rights reserved.  |  v0.1.1