Database Schema
Snoopy uses a local SQLite database.
Storage Location
Path pattern:
<rootDir>/snoopy.db
Default root directory is <home>/.snoopy on all supported OSs:
- macOS example:
~/.snoopy/snoopy.db - Linux example:
~/.snoopy/snoopy.db - Windows example:
C:\Users\<you>\.snoopy\snoopy.db
Default path (macOS/Linux):
~/.snoopy/snoopy.db
Override root directory:
- Set
SNOOPY_ROOT_DIR
Then DB path becomes:
<SNOOPY_ROOT_DIR>/snoopy.db
Schema Sources
- Migration SQL files in
src/services/db/migrations/ - Runtime bootstrap in
src/services/db/sqlite.tsfor backward-compatible column additions
Tables
settings
Purpose:
- stores key-value app settings and credentials metadata
Columns:
key TEXT PRIMARY KEYvalue TEXT NOT NULLupdated_at TEXT NOT NULL DEFAULT datetime('now')
Common keys:
modelmodel_settings_jsonreddit_app_namereddit_client_id
jobs
Purpose:
- defines monitoring jobs
Columns:
id TEXT PRIMARY KEYslug TEXT UNIQUEname TEXT NOT NULL UNIQUEdescription TEXT NOT NULLqualification_prompt TEXT NOT NULLsubreddits_json TEXT NOT NULLschedule_cron TEXT NOT NULL DEFAULT '*/30 * * * *'enabled INTEGER NOT NULL DEFAULT 1monitor_comments INTEGER NOT NULL DEFAULT 1created_at TEXT NOT NULL DEFAULT datetime('now')updated_at TEXT NOT NULL DEFAULT datetime('now')
Indexes:
idx_jobs_slugunique index onslug
Notes:
- Commands accept job ID or slug.
- Slugs are generated and made unique in repository logic.
job_runs
Purpose:
- stores each scheduled/manual execution attempt
Columns in active runtime schema:
id TEXT PRIMARY KEYjob_id TEXT NOT NULL(FK tojobs.id)status TEXT NOT NULL(running,completed,failed,skipped)message TEXTstarted_at TEXTfinished_at TEXTitems_discovered INTEGER NOT NULL DEFAULT 0items_new INTEGER NOT NULL DEFAULT 0items_qualified INTEGER NOT NULL DEFAULT 0prompt_tokens INTEGER NOT NULL DEFAULT 0completion_tokens INTEGER NOT NULL DEFAULT 0estimated_cost_usd REALlog_file_path TEXTcreated_at TEXT NOT NULL DEFAULT datetime('now')
Notes:
- Migration 001 initializes a minimal version.
- Runtime bootstrap upgrades older local DBs by adding newer analytics columns.
log_file_pathpoints to the per-run log file under~/.snoopy/logs/when detailed logging is available.
scan_items
Purpose:
- deduplicated store of scanned posts/comments and qualification outcome
- supports lightweight result lifecycle tracking for downstream automation
Columns:
id TEXT PRIMARY KEYjob_id TEXT NOT NULL(FK tojobs.id)run_id TEXT NOT NULL(FK tojob_runs.id)type TEXT NOT NULL CHECK(type IN ('post','comment'))reddit_post_id TEXT NOT NULLreddit_comment_id TEXTsubreddit TEXT NOT NULLauthor TEXT NOT NULLtitle TEXTbody TEXT NOT NULLurl TEXT NOT NULLreddit_posted_at TEXT NOT NULLqualified INTEGER NOT NULL DEFAULT 0viewed INTEGER NOT NULL DEFAULT 0validated INTEGER NOT NULL DEFAULT 0processed INTEGER NOT NULL DEFAULT 0qualification_reason TEXTcreated_at TEXT NOT NULL DEFAULT datetime('now')
Indexes:
idx_scan_items_dedupunique index on(job_id, reddit_post_id, COALESCE(reddit_comment_id,''))
Behavior:
- prevents re-processing same post/comment per job
- stores final qualification reason for auditability
- lifecycle flag semantics:
viewed = 1result has been reviewed by an operator or agentvalidated = 1result has been quality-checked/acceptedprocessed = 1result has been handed off to downstream workflow
Notes:
- SQLite stores booleans as integers (
0false,1true). - Newer runtime versions backfill missing lifecycle columns with
ALTER TABLEduring startup for older local DBs.
daemon_state
Purpose:
- reserved runtime state table for daemon lifecycle
Columns:
id INTEGER PRIMARY KEY CHECK (id = 1)is_running INTEGER NOT NULLupdated_at TEXT NOT NULL DEFAULT datetime('now')
Deletion and Data Lifecycle
Deleting a job via CLI/repository removes:
scan_itemsrows for the jobjob_runsrows for the job- associated run log files referenced by
job_runs.log_file_path(when present) jobsrow itself
This deletion is executed in a DB transaction in repository logic.
Query Examples
Latest runs for one job:
SELECT *
FROM job_runs
WHERE job_id = ?
ORDER BY created_at DESC
LIMIT 20;
Latest qualified items for one run:
SELECT reddit_post_id, reddit_comment_id, qualified, qualification_reason
FROM scan_items
WHERE run_id = ?
ORDER BY created_at DESC;
Unprocessed qualified items for one job:
SELECT
id,
url,
author,
title,
qualification_reason,
viewed,
validated,
processed,
reddit_posted_at
FROM scan_items
WHERE job_id = ?
AND qualified = 1
AND processed = 0
ORDER BY datetime(reddit_posted_at) DESC;
Mark one result as viewed + validated:
UPDATE scan_items
SET viewed = 1,
validated = 1
WHERE id = ?;
Bulk mark qualified results as processed for one run:
UPDATE scan_items
SET processed = 1
WHERE run_id = ?
AND qualified = 1;
Agent Workflow Reference
For end-to-end direct DB workflows (list jobs, insert jobs, verify daemon/startup state, run jobs, read/update results), see Agent Operations.