Documents and sources API — Signet Docs

Docs / Reference

Documents and sources API

Document ingestion and source-backed recall endpoints.

Documents and sources API

Document ingestion and source-backed recall endpoints.

Back to HTTP API overview.

Documents

The documents API ingests external content (text, URLs, files) for chunking and embedding. Each document generates linked memory records via the pipeline. All document endpoints require documents permission.

POST /api/documents

Submit a document for ingestion. The document is queued and processed asynchronously. Returns 201 on success, or the existing document’s ID and status if a duplicate URL is detected.

Request body

{
  "source_type": "text",
  "content": "Full text content here",
  "title": "My Document",
  "content_type": "text/plain",
  "connector_id": null,
  "metadata": { "author": "nicholai" }
}

For source_type: "url":

{
  "source_type": "url",
  "url": "https://example.com/page",
  "title": "Example Page"
}

source_type is required and must be text, url, or file. content is required for text. url is required for url.

Response

{ "id": "uuid", "status": "queued" }

Or if deduplicated:

{ "id": "existing-uuid", "status": "processing", "deduplicated": true }

GET /api/documents

List all documents with optional status filter.

Query parameters

ParameterDescription
statusFilter by status (queued, processing, done, failed, deleted)
limitPage size (default: 50, max: 500)
offsetPagination offset (default: 0)

Response

{
  "documents": [...],
  "total": 42,
  "limit": 50,
  "offset": 0
}

Each document includes all columns from the documents table.

GET /api/documents/:id

Get a single document by ID.

Response — full document row, or 404.

GET /api/documents/:id/chunks

List the memory records derived from this document, ordered by chunk index.

Response

{
  "chunks": [
    {
      "id": "memory-uuid",
      "content": "Chunk text...",
      "type": "fact",
      "created_at": "2026-02-21T10:00:00.000Z",
      "chunk_index": 0
    }
  ],
  "count": 12
}

DELETE /api/documents/:id

Soft-delete a document and all its derived memory records. Memories linked to the document are soft-deleted one at a time with audit history.

Query parameters

ParameterDescription
reasonRequired. Deletion reason.

Response

{ "deleted": true, "memoriesRemoved": 12 }

Sources

Sources connect read-only external knowledge bases to Signet recall without turning them into ordinary saved memories. Supported source kinds are obsidian, discord, and github.

GET /api/sources

List configured sources and lightweight index stats for the current daemon agent.

Response

{
  "version": 1,
  "sources": [
    {
      "id": "obsidian:abc123",
      "kind": "obsidian",
      "name": "Research Vault",
      "root": "/home/user/ObsidianVault",
      "enabled": true,
      "mode": "read-only",
      "createdAt": "2026-05-06T09:00:00.000Z",
      "updatedAt": "2026-05-06T09:00:00.000Z",
      "lastIndexedAt": "2026-05-06T09:01:00.000Z",
      "excludeGlobs": ["**/.obsidian/**", "**/.trash/**", "**/.hermes/**"],
      "stats": { "artifacts": 42, "chunks": 108, "indexed": 42 },
      "health": {
        "status": "healthy",
        "generatedAt": "2026-05-06T09:01:00.000Z",
        "latestArtifactAt": "2026-05-06T09:01:00.000Z",
        "latestCheckpointAt": null,
        "chunkCoverage": 1,
        "failures": { "total": 0, "recoverable": 0 },
        "checkpoints": { "total": 0, "partial": 0, "stale": 0 },
        "purge": { "deletedArtifacts": 0, "orphanChunks": 0 },
        "semantic": { "entities": 8, "attributes": 0, "dependencies": 12, "communities": 3, "total": 23 }
      }
    }
  ]
}

POST /api/sources/obsidian

Add or update an Obsidian vault source and queue a source index job. The vault stays read-only; Signet writes only derived source artifacts, graph rows, and chunk embeddings to its own database.

Request body

{
  "path": "/home/user/ObsidianVault",
  "name": "Research Vault",
  "excludeGlobs": ["private/**"]
}

root is also accepted as an alias for path.

Response

{
  "source": { "id": "obsidian:abc123", "kind": "obsidian" },
  "created": true,
  "indexed": 0,
  "queued": true,
  "job": { "status": "queued", "sourceId": "obsidian:abc123" }
}

POST /api/sources/discord

Add or update a Discord source and queue a shared source index job. REST and gateway modes require a bot token secret reference; raw Discord tokens are rejected at the config boundary. Desktop cache mode reads local Discord Desktop cache artifacts and does not require a token.

Request body

{
  "guildIds": ["123456789012345678"],
  "tokenRef": "DISCORD_BOT_TOKEN",
  "name": "Team Discord",
  "channelFilter": ["general", "234567890123456789"],
  "maxMessagesPerChannel": 1000,
  "includeThreads": true,
  "includeArchivedThreads": true,
  "includePrivateArchivedThreads": false,
  "includeMembers": true,
  "includeAttachments": true,
  "includeAttachmentText": false,
  "maxAttachmentTextBytes": 262144,
  "includeEmbeds": true,
  "includePolls": true,
  "includeThreadMembers": true,
  "since": "2026-01-01T00:00:00.000Z",
  "syncMode": "rest"
}

guildId is accepted as a single-guild alias. channels is accepted as an alias for channelFilter.

For local Discord Desktop cache import:

{
  "name": "Local Discord Cache",
  "syncMode": "desktop-cache",
  "desktopCachePath": "/home/user/.config/discord",
  "desktopCacheFullScan": false
}

desktopCachePath is optional when the platform default Discord Desktop data folder exists. The selected cache root must be a known Discord-compatible application data folder. desktopCacheFullScan expands cache file scanning; the default scans LevelDB/log JSON and route-bearing Chromium cache entries.

For live gateway tailing:

{
  "guildIds": ["123456789012345678"],
  "tokenRef": "DISCORD_BOT_TOKEN",
  "name": "Team Discord Tail",
  "syncMode": "gateway-tail"
}

Response

{
  "source": { "id": "discord:abc123", "kind": "discord" },
  "created": true,
  "indexed": 0,
  "queued": true,
  "job": { "status": "queued", "sourceId": "discord:abc123" }
}

The REST sync path indexes guilds, categories, channels, announcement channels, forums, active and archived threads, member snapshots, thread member snapshots, per-message artifacts, message windows, mentions, attachment metadata, optional bounded text-like attachment contents, embeds, polls, checkpoints, and partial-failure artifacts. Partial Discord listings are not used as authoritative deletes. Attachment text extraction is opt-in and skips binary/media uploads by default.

The gateway-tail sync path keeps the shared source job open while it listens for Discord gateway events. It indexes message create/update/delete lifecycle events, deleted-message tombstones, channel/thread upserts, member upserts, member removals, and per-channel tail checkpoints. Canceling or removing the source closes the gateway connection.

The desktop-cache sync path indexes classifiable route-bearing cached messages, DMs under the synthetic guild id @me, cache-observed channel metadata, message windows, attachments, mentions, embeds, polls, checkpoints, and import stats. Cache imports are observational and never reconcile deletes from missing or evicted local cache files.

POST /api/sources/github

Add or update a GitHub source and queue a shared source index job. Without a token reference, GitHub sources default to issues, pull requests, and selected Markdown docs. Discussions require tokenRef because they use the GitHub GraphQL API. Raw GitHub tokens are rejected; pass a Signet secret name or external secret reference instead.

Request body

{
  "repos": ["Signet-AI/signetai"],
  "tokenRef": "GITHUB_TOKEN",
  "name": "Signet GitHub",
  "resourceTypes": ["issues", "pulls", "discussions", "docs"],
  "state": "all",
  "includeComments": true,
  "labels": ["bug", "needs review"],
  "docPaths": ["README.md", "docs/**/*.md"],
  "maxItemsPerRepo": 500
}

repo is accepted as a single-repository alias. docPaths are limited to Markdown files or Markdown globs so GitHub source indexing stays focused on chosen docs instead of broad source-code ingestion.

Response

{
  "source": { "id": "github:abc123", "kind": "github" },
  "created": true,
  "indexed": 0,
  "queued": true,
  "job": { "status": "queued", "sourceId": "github:abc123" }
}

The sync path indexes source-owned artifacts for issues, pull requests, discussions, selected Markdown docs, comments, and partial-failure artifacts. Partial GitHub failures cause the shared source job to report failure while preserving source-owned rows that were indexed successfully.

DELETE /api/sources/:sourceId

Remove a source config and purge Signet-owned source artifacts, graph rows, and source chunk embeddings. Source files are not modified.

Response

{
  "source": { "id": "obsidian:abc123", "kind": "obsidian" },
  "purged": 150
}

GET /api/sources/:sourceId/health

Return operational diagnostics for a configured source. The payload is the same health object embedded in GET /api/sources, plus the source config and index stats.

Diagnostics include artifact/chunk counts, latest artifact and checkpoint timestamps, Discord partial-failure/checkpoint counts, stale checkpoint counts, purge residue, and source-provenance graph row counts. If diagnostic queries fail, the route returns status: "unhealthy" with an error field instead of synthesizing a healthy source.

Response

{
  "source": { "id": "discord:abc123", "kind": "discord", "name": "Team Discord" },
  "stats": { "artifacts": 420, "chunks": 250, "indexed": 420 },
  "health": {
    "status": "degraded",
    "generatedAt": "2026-05-24T00:00:00.000Z",
    "latestArtifactAt": "2026-05-24T00:00:00.000Z",
    "latestCheckpointAt": "2026-05-24T00:00:00.000Z",
    "chunkCoverage": 0.6,
    "failures": { "total": 1, "recoverable": 1 },
    "checkpoints": { "total": 20, "partial": 1, "stale": 0 },
    "purge": { "deletedArtifacts": 0, "orphanChunks": 0 },
    "semantic": { "entities": 12, "attributes": 4, "dependencies": 6, "communities": 2, "total": 24 }
  }
}

GET /api/sources/:sourceId/snapshot

Export source-owned artifact rows as a Signet source snapshot. Snapshots use memory_artifacts provenance instead of a provider-specific archive database.

Query parameters

ParameterDescription
includeLocalDiscordInclude local Discord Desktop @me cache artifacts. Defaults to false.

By default, Discord Desktop cache DMs under the synthetic guild id @me are excluded so shared snapshots do not publish local-only private data.

Response

{
  "version": 1,
  "exportedAt": "2026-05-24T00:00:00.000Z",
  "source": { "id": "discord:abc123", "kind": "discord", "name": "Team Discord", "root": "discord://123" },
  "agentId": "default",
  "artifacts": [
    {
      "sourcePath": "discord://guild/123/channel/456/message/789",
      "sourceKind": "source_discord_message",
      "sourceId": "discord:abc123",
      "content": "# Discord Message\n..."
    }
  ],
  "skipped": { "localDiscordArtifacts": 0 }
}

POST /api/sources/:sourceId/snapshot/import

Import a Signet source snapshot into an existing configured source. The import replaces source-owned artifact rows for that source and reuses the normal artifact upsert path so FTS and provenance stay consistent.

Query parameters

ParameterDescription
includeLocalDiscordImport local Discord Desktop @me cache artifacts from the snapshot. Defaults to false.

Default imports preserve existing local @me Discord cache artifacts and skip any @me artifacts present in the incoming snapshot.

Request body

The JSON returned by GET /api/sources/:sourceId/snapshot.

Response

{
  "ok": true,
  "imported": 42,
  "skipped": { "localDiscordArtifacts": 3 }
}

POST /api/sources/pick-directory

Best-effort local directory picker used by dashboard/browser flows. It returns 501 when no OS picker command is available.

Request body

{ "title": "Choose Obsidian vault" }

Response

{ "path": "/home/user/ObsidianVault" }