Back to Projects
KNOWLEDGE MINING

Ask any YouTube creator anything. Get answers from their actual words.

3-hour videos. Brilliant insights buried at minute 47. No way to search, no way to find them again. So I built a system that turns every spoken word into searchable, queryable knowledge.

View on GitHub
THE PIPELINE

From YouTube URL to searchable knowledge.

Ingest

Paste a YouTube URL. Transcript fetched automatically.

Chunk

Content split into searchable segments automatically.

Embed

Each segment converted to a searchable representation - locally, no external APIs.

Search

Search that understands meaning and catches exact words.

YouTube Transcript API pulls the full spoken content. Metadata via oEmbed. Background job queue with retry logic.

Sentence-boundary aware splitting preserves meaning. Overlapping segments ensure nothing is lost between sections.

The model runs locally - zero external API calls, zero cost per query. Batch processing with automatic caching on cold start.

Two search strategies - semantic similarity and exact keyword matching - fused into one ranked result set. Optional temporal decay for recency bias.

SEARCH SYSTEM

Search that understands what you mean - and catches what you typed.

Two search strategies work in parallel. One understands meaning - find results about “state management” even if those exact words never appear. The other catches exact terms - find every mention of “useEffect.” Results found by both methods get ranked higher.

search.ts
async function search(query, options = {}) {
  const { mode = 'hybrid', limit = 10 } = options

  // Run both search strategies in parallel
  const [semanticResults, keywordResults] = await Promise.all([
    searchByMeaning(query, limit * 2),
    searchByKeywords(query, limit * 2),
  ])

  // Merge and rank - results found by both methods score higher
  return mergeAndRank(semanticResults, keywordResults)
    .slice(0, limit)
}
mergeAndRank()
function mergeAndRank(semanticResults, keywordResults) {
  const combined = new Map()

  // Each result gets a position-based score
  // Results appearing in BOTH lists get boosted
  for (const [results, weight] of [
    [semanticResults, 1.0],
    [keywordResults, 1.0],
  ]) {
    results.forEach((result, position) => {
      const score = scoreByPosition(position)
      const existing = combined.get(result.id)
      if (existing) existing.score += score
      else combined.set(result.id, { result, score })
    })
  }

  return Array.from(combined.values())
    .sort((a, b) => b.score - a.score)
    .map(({ result, score }) => ({ ...result, relevance: score }))
}
ZERO EXTERNAL API CALLS

The model runs locally. No OpenAI. No Cohere. No API keys.

A lightweight model runs directly on the server - no external API calls, no per-query costs. Batch processing handles large content libraries. ~23MB footprint, cached on cold start.

local-model.ts
// Local model - no external API calls, no usage costs
const model = await loadModel({ cache: true })

// Process text into a searchable representation
const representation = await model.encode(text, {
  strategy: 'mean-pooling',
  normalize: true,
})

// Store directly in the database
await db.insert(representation)
CREATOR PERSONAS

Ask the creator. Get answers grounded in their content.

AI-generated personas capture each YouTube channel’s expertise and communication style. Answers are always grounded in actual spoken content - not hallucinated, not generic.

Auto-generated at 5+ videos per creator
Expertise profile built from all ingested content
"Who's best?" routing matches questions to the right creator
Ensemble mode: top 3 personas stream in parallel via SSE
Every answer grounded in actual spoken content - not hallucinated, not generic
0database tables
0%local processing
0MCP tools
0+searchable resources
MCP INTEGRATION

Your knowledge bank, inside Claude Code.

4 MCP tools expose the entire knowledge bank to Claude Code workflows. Search, list creators, chat with personas, or ask the panel - all from your terminal.

search_knowledge

Dual-mode search with semantic and keyword matching, optional creator filtering

topic, creator?, limit?
get_list_of_creators

List all YouTube channels in knowledge bank with video counts

(no input)
chat_with_persona

Query a specific creator persona with responses grounded in their actual content

personaName, question
ensemble_query

Ask multiple personas simultaneously, top 3 parallel responses

question
mcp.json
{
  "mcpServers": {
    "gold-miner": {
      "type": "sse",
      "url": "http://localhost:3001/api/mcp/sse"
    }
  }
}
UNDER THE HOOD
Next.js 16React 19TypeScriptPostgreSQLDrizzle ORMLocal ML ModelClaude APIMCP SDKTailwind CSS v4Vitest
github.com/devobsessed/sluice