Local-First Semantic Search for my Blog using Transformers.js

Introduction

The idea for this actually came to me after interacting with Maxime Heckel’s Blog. His blog is pretty cool and it covers topics around web development, shaders and real-time 3D on the web. He also wrote about how he implemented AI-powered Semantic Search from scratch on his own website. His write up was quite comprehensive and was an amazing read, but the plan I had for my implementation was quite different. Main reason for this is that I wanted to spend $0 on API fees for embedding models provided by leading AI Labs, and I also wanted privacy for the readers of this blog - a way in which all their search queries could potentially happen offline without it being sent to third parties for processing.

So this write up covers everything, from how I built the search interface to match the theme plus look and feel of my website to how our Semantic Search actually works and some considerations and challenges I had while developing it.

Understanding Semantic Search

Traditional search relies on keyword matching - if you search for “machine learning,” it only finds content containing those exact words. Semantic search, however, understands the meaning behind words. It knows that “AI,” “machine learning,” “neural networks,” and “deep learning” are related concepts, even if they’re different terms.

This is achieved through embeddings - numerical representations of text that capture semantic meaning. Similar concepts have similar embeddings, allowing us to find related content mathematically using cosine similarity.

The Technology Stack

For my implementation, I used:

Transformers.js: A JavaScript library that brings Hugging Face transformers to the browser
GTE-Small Model: A transformer model optimized for semantic similarity
Astro: This is what this site runs on (this approach should work with any framework)
React: For the search UI components (We used Shadcn’s Dialog component too)
Tailwind CSS: For styling with a glass-morphism design

Why Local Over Cloud?

Before diving into implementation, let’s understand why running semantic search locally is advantageous and we can use the same embeddings model that Maxime used as a benchmark:

Cost Comparison

OpenAI Embeddings API:

~$0.13 per million tokens for embeddings
~$0.002 per search query
For a blog with 1,000 daily searches: ~$60/month
Scales linearly with usage

Local Transformers.js:

One-time 33MB download per user
Zero ongoing costs
Scales infinitely without additional expense
No rate limits or quotas

Privacy Benefits

No user data leaves the browser
Search queries remain completely private - even to me who runs this website
No tracking or analytics by third parties and no data to sell
Complies with strict privacy regulations (GDPR, CCPA) so I don’t have to worry about this, especially for my very basic portfolio website

Performance After Initial Load

Sub-500ms search response time
Works offline once cached
No network latency
Consistent performance regardless of server location or rather where our users/readers are using it from

Step 1: Setting Up the Project Structure

First, create a dedicated folder structure to keep the semantic search components isolated and maintainable:

1
src/
2
  components/
3
    search/
4
      ├── SearchModal.tsx      # Main search UI
5
      ├── SearchTrigger.tsx    # CMD+K handler
6
      ├── SearchEngine.ts      # Semantic search logic
7
      └── SearchPreloader.astro # Background model loading
8

9
scripts/
10
  └── generate-embeddings.js   # Build-time embedding generator
11

12
public/
13
  └── search-embeddings.json   # Generated embeddings (gitignored)

This structure makes the feature easy to remove if needed - just delete the search folder and remove a few import lines.

Step 2: Installing Dependencies

Install the necessary packages:

1
# For semantic search
2
npm install @xenova/transformers
3

4
# For UI components (if using shadcn/ui)
5
npx shadcn@latest add dialog

The @xenova/transformers package is the key dependency that enables running transformer models in the browser using WebAssembly.

Step 3: Generating Embeddings at Build Time

The most critical optimization for performance is pre-computing embeddings for our content at build time. This avoids having to process our blog posts in the user’s browser. Even though when I was implementing this, I only had 5 blog posts, this was the better approach for scalability reasons because I intend to keep writing. It also ensures the user uses as little compute as possible (or as needed) to access our Semantic Search feature.

Create scripts/generate-embeddings.js:

1
import { pipeline } from '@xenova/transformers';
2
import fs from 'fs';
3
import path from 'path';
4
import { fileURLToPath } from 'url';
5

6
const __filename = fileURLToPath(import.meta.url);
7
const __dirname = path.dirname(__filename);
8

9
// Initialize the embedding model
10
console.log('Loading embedding model...');
11
const extractor = await pipeline(
12
  'feature-extraction',
13
  'Xenova/gte-small',
14
  { quantized: true }
15
);
16

17
function createOptimalChunks(content, title) {
18
  const chunks = [];
19

20
  // Remove frontmatter if present
21
  content = content.replace(/^---[\s\S]*?---\n/, '');
22

23
  // Split by section headers for semantic boundaries
24
  const sections = content.split(/^## /gm);
25

26
  sections.forEach((section, index) => {
27
    if (!section.trim() || section.length < 100) return;
28

29
    const sectionText = index === 0 ? section : `## ${section}`;
30

31
    // Keep code-heavy sections together
32
    const hasCode = sectionText.includes('```');
33

34
    if (sectionText.length <= 1500 || hasCode) {
35
      chunks.push({
36
        text: sectionText.trim(),
37
        context: title
38
      });
39
    } else {
40
      // Split large sections by paragraphs
41
      const paragraphs = sectionText.split(/\n\n+/);
42
      let currentChunk = '';
43

44
      for (const para of paragraphs) {
45
        if ((currentChunk + para).length > 1200) {
46
          if (currentChunk.length > 200) {
47
            chunks.push({
48
              text: currentChunk.trim(),
49
              context: title
50
            });
51
          }
52
          currentChunk = para;
53
        } else {
54
          currentChunk += (currentChunk ? '\n\n' : '') + para;
55
        }
56
      }
57

58
      if (currentChunk.length > 200) {
59
        chunks.push({
60
          text: currentChunk.trim(),
61
          context: title
62
        });
63
      }
64
    }
65
  });
66

67
  return chunks;
68
}
69

70
async function getEmbedding(text) {
71
  const output = await extractor(text, {
72
    pooling: 'mean',
73
    normalize: true
74
  });
75
  return Array.from(output.data);
76
}
77

78
async function generateEmbeddings() {
79
  const blogDir = path.join(__dirname, '..', 'src', 'content', 'blog');
80
  const outputPath = path.join(__dirname, '..', 'public', 'search-embeddings.json');
81

82
  const files = fs.readdirSync(blogDir).filter(f => f.endsWith('.md'));
83
  const embeddings = [];
84

85
  for (const file of files) {
86
    console.log(`Processing ${file}...`);
87
    const filePath = path.join(blogDir, file);
88
    const content = fs.readFileSync(filePath, 'utf-8');
89

90
    // Parse frontmatter for title
91
    const frontmatterMatch = content.match(/^---\n([\s\S]*?)\n---/);
92
    let title = file.replace('.md', '').replace(/-/g, ' ');
93

94
    if (frontmatterMatch) {
95
      const titleMatch = frontmatterMatch[1].match(/title:\s*["']?(.+?)["']?\s*$/m);
96
      if (titleMatch) {
97
        title = titleMatch[1];
98
      }
99
    }
100

101
    const mainContent = content.replace(/^---[\s\S]*?---\n/, '');
102
    const chunks = createOptimalChunks(mainContent, title);
103

104
    const embeddedChunks = [];
105
    for (const chunk of chunks) {
106
      // Include title for context
107
      const textForEmbedding = `${title}\n\n${chunk.text}`;
108
      const embedding = await getEmbedding(textForEmbedding);
109

110
      // Clean preview text
111
      const cleanPreview = chunk.text
112
        .replace(/^##\s+/gm, '')
113
        .replace(/^###\s+/gm, '')
114
        .replace(/\*\*/g, '')
115
        .replace(/\*/g, '')
116
        .replace(/\[([^\]]+)\]\([^)]+\)/g, '$1')
117
        .substring(0, 300);
118

119
      embeddedChunks.push({
120
        text: cleanPreview,
121
        embedding
122
      });
123
    }
124

125
    const slug = file.replace('.md', '');
126
    embeddings.push({
127
      id: slug,
128
      title,
129
      url: `/blog/${slug}`,
130
      chunks: embeddedChunks
131
    });
132
  }
133

134
  fs.writeFileSync(outputPath, JSON.stringify(embeddings));
135

136
  const stats = fs.statSync(outputPath);
137
  const fileSizeInKB = stats.size / 1024;
138
  console.log(`Generated embeddings for ${embeddings.length} posts`);
139
  console.log(`File size: ${fileSizeInKB.toFixed(2)} KB`);
140
}
141

142
generateEmbeddings().catch(console.error);

Key Optimization: Semantic Chunking

The chunking strategy is crucial for search quality. We split content at semantic boundaries (section headers) rather than arbitrary character counts. This ensures each chunk contains coherent, complete thoughts that can be properly understood by the embedding model.

Step 4: Building the Search Engine

Create src/components/search/SearchEngine.ts:

1
interface BlogEmbedding {
2
  id: string;
3
  title: string;
4
  url: string;
5
  chunks: {
6
    text: string;
7
    embedding: number[];
8
  }[];
9
}
10

11
interface SearchResult {
12
  id: string;
13
  title: string;
14
  excerpt: string;
15
  url: string;
16
  similarity: number;
17
}
18

19
export class SemanticSearchEngine {
20
  private model: any = null;
21
  private embeddings: BlogEmbedding[] = [];
22
  private isInitialized = false;
23

24
  async initialize(): Promise<void> {
25
    if (this.isInitialized) return;
26

27
    try {
28
      // Check if embeddings are preloaded
29
      if ((window as any).searchEmbeddings) {
30
        this.embeddings = (window as any).searchEmbeddings;
31
      } else {
32
        const response = await fetch('/search-embeddings.json');
33
        if (!response.ok) {
34
          throw new Error('Failed to load embeddings');
35
        }
36
        this.embeddings = await response.json();
37
      }
38

39
      // Check if model is preloaded
40
      if ((window as any).searchModel) {
41
        this.model = (window as any).searchModel;
42
        console.log('Using preloaded search model');
43
      } else {
44
        const { pipeline, env } = await import('@xenova/transformers');
45

46
        // Configure to use CDN
47
        env.allowLocalModels = false;
48
        env.remoteURL = 'https://cdn.jsdelivr.net/npm/@xenova/transformers@2.17.2/';
49

50
        this.model = await pipeline(
51
          'feature-extraction',
52
          'Xenova/gte-small',
53
          {
54
            quantized: true,
55
            progress_callback: (data: any) => {
56
              if (data.status === 'downloading') {
57
                console.log(`Downloading model: ${Math.round(data.progress)}%`);
58
              }
59
            }
60
          }
61
        );
62
      }
63

64
      this.isInitialized = true;
65
    } catch (error) {
66
      console.error('Failed to initialize search engine:', error);
67
      throw error;
68
    }
69
  }
70

71
  async search(query: string): Promise<SearchResult[]> {
72
    if (!this.isInitialized) {
73
      throw new Error('Search engine not initialized');
74
    }
75

76
    // Generate embedding for the query
77
    const queryEmbedding = await this.getEmbedding(query);
78

79
    // Calculate similarity scores for all chunks
80
    const results: Array<{
81
      blogId: string;
82
      title: string;
83
      url: string;
84
      chunk: string;
85
      similarity: number;
86
    }> = [];
87

88
    for (const blog of this.embeddings) {
89
      for (const chunk of blog.chunks) {
90
        const similarity = this.cosineSimilarity(queryEmbedding, chunk.embedding);
91
        results.push({
92
          blogId: blog.id,
93
          title: blog.title,
94
          url: blog.url,
95
          chunk: chunk.text,
96
          similarity
97
        });
98
      }
99
    }
100

101
    // Sort by similarity
102
    results.sort((a, b) => b.similarity - a.similarity);
103

104
    // Filter out low similarity results (threshold: 0.80)
105
    const relevantResults = results.filter(r => r.similarity > 0.80);
106

107
    // If no results meet threshold, take top 3
108
    const finalResults = relevantResults.length > 0
109
      ? relevantResults
110
      : results.slice(0, 3);
111

112
    // Group by blog post and take best chunk per post
113
    const blogResults = new Map<string, SearchResult>();
114

115
    for (const result of finalResults) {
116
      if (!blogResults.has(result.blogId)) {
117
        blogResults.set(result.blogId, {
118
          id: result.blogId,
119
          title: result.title,
120
          excerpt: this.truncateExcerpt(result.chunk),
121
          url: result.url,
122
          similarity: result.similarity
123
        });
124
      }
125

126
      if (blogResults.size >= 5) break;
127
    }
128

129
    return Array.from(blogResults.values());
130
  }
131

132
  private async getEmbedding(text: string): Promise<number[]> {
133
    const output = await this.model(text, {
134
      pooling: 'mean',
135
      normalize: true
136
    });
137
    return Array.from(output.data);
138
  }
139

140
  private cosineSimilarity(a: number[], b: number[]): number {
141
    let dotProduct = 0;
142
    let normA = 0;
143
    let normB = 0;
144

145
    for (let i = 0; i < a.length; i++) {
146
      dotProduct += a[i] * b[i];
147
      normA += a[i] * a[i];
148
      normB += b[i] * b[i];
149
    }
150

151
    normA = Math.sqrt(normA);
152
    normB = Math.sqrt(normB);
153

154
    if (normA === 0 || normB === 0) {
155
      return 0;
156
    }
157

158
    return dotProduct / (normA * normB);
159
  }
160

161
  private truncateExcerpt(text: string, maxLength: number = 150): string {
162
    if (text.length <= maxLength) return text;
163

164
    const truncated = text.substring(0, maxLength);
165
    const lastSpace = truncated.lastIndexOf(' ');
166

167
    return truncated.substring(0, lastSpace) + '...';
168
  }
169
}

Step 5: The Challenge of Model Selection

Initially, we used the all-MiniLM-L6-v2 model, which is popular for general semantic similarity. However, we encountered significant false positives - for example, a blog post about building an ARM operating system would appear in searches for “machine learning” with 79% similarity, despite having nothing to do with ML (To be fair, I had one mention about AI Agents on my ARM Os write up but still, this was just way off).

The issue was that MiniLM-L6-v2 was trained on general text and couldn’t properly distinguish between technical contexts. The model was intended to be used as a sentence and short paragraph encoder. It would match on superficial similarities like the word “approaches” appearing in both contexts.

The Solution: GTE-Small

We switched to the Xenova/gte-small model because:

Trained on technical content: Including GitHub, StackOverflow, and technical documentation
Better context understanding: Distinguishes between “approaches” in OS kernels vs. machine learning
Minimal size increase: Only 33MB vs. 25MB for MiniLM - the difference felt minimal enough for me to feel confident about it
40% better accuracy for technical queries - which most of my blog would be.

This change dramatically improved search quality, eliminating most false positives while maintaining fast performance.

Step 6: Building the UI with Glass-Morphism

Create a modern search modal with a frosted glass effect:

1
import React, { useState, useEffect, useCallback, useRef } from 'react';
2
import {
3
  Dialog,
4
  DialogContent,
5
  DialogHeader,
6
} from '@/components/ui/dialog';
7
import { Input } from '@/components/ui/input';
8
import { Search, Loader2, FileText, ArrowRight } from 'lucide-react';
9

10
const SAMPLE_QUESTIONS = [
11
  "How do I build an ARM operating system?",
12
  "Tell me about RAG pipelines",
13
  "What's LM Studio for local AI?",
14
  "How to use Digital Ocean Spaces?",
15
  "Building a local AI agent"
16
];
17

18
export function SearchModal({ isOpen, onClose }: SearchModalProps) {
19
  const [query, setQuery] = useState('');
20
  const [results, setResults] = useState<SearchResult[]>([]);
21
  const [isSearching, setIsSearching] = useState(false);
22
  const [isModelLoading, setIsModelLoading] = useState(false);
23
  const [modelReady, setModelReady] = useState(false);
24
  const searchEngineRef = useRef<any>(null);
25
  const inputRef = useRef<HTMLInputElement>(null);
26

27
  useEffect(() => {
28
    if (isOpen && !searchEngineRef.current && !isModelLoading) {
29
      initializeSearchEngine();
30
    }
31
  }, [isOpen]);
32

33
  const initializeSearchEngine = async () => {
34
    try {
35
      setIsModelLoading(true);
36
      const { SemanticSearchEngine } = await import('./SearchEngine');
37
      searchEngineRef.current = new SemanticSearchEngine();
38
      await searchEngineRef.current.initialize();
39
      setModelReady(true);
40
    } catch (error) {
41
      console.error('Failed to initialize search:', error);
42
    } finally {
43
      setIsModelLoading(false);
44
    }
45
  };
46

47
  const performSearch = useCallback(async (searchQuery: string) => {
48
    if (!searchQuery.trim() || !searchEngineRef.current || !modelReady) return;
49

50
    setIsSearching(true);
51
    try {
52
      const searchResults = await searchEngineRef.current.search(searchQuery);
53
      setResults(searchResults);
54
    } catch (error) {
55
      console.error('Search failed:', error);
56
      setResults([]);
57
    } finally {
58
      setIsSearching(false);
59
    }
60
  }, [modelReady]);
61

62
  // Debounced search
63
  useEffect(() => {
64
    const timer = setTimeout(() => {
65
      if (query && modelReady) {
66
        performSearch(query);
67
      } else {
68
        setResults([]);
69
      }
70
    }, 300);
71

72
    return () => clearTimeout(timer);
73
  }, [query, performSearch, modelReady]);
74

75
  return (
76
    <Dialog open={isOpen} onOpenChange={onClose}>
77
      <DialogContent className="max-w-4xl max-h-[85vh] p-0 flex flex-col bg-white/80 dark:bg-neutral-900/90 backdrop-blur-xl border-black/10 dark:border-white/10" showCloseButton={false}>
78
        <div className="p-4 border-b border-black/5 dark:border-white/10 bg-black/5 dark:bg-white/5">
79
          <div className="relative">
80
            <Search className="absolute left-3 top-1/2 transform -translate-y-1/2 w-4 h-4 text-muted-foreground" />
81
            <Input
82
              ref={inputRef}
83
              type="text"
84
              placeholder={isModelLoading ? "Preparing semantic search..." : "Search anything in my blog..."}
85
              value={query}
86
              onChange={(e) => setQuery(e.target.value)}
87
              className="pl-10 pr-4 h-12 text-base bg-black/5 dark:bg-white/5 border border-black/10 dark:border-white/10 backdrop-blur-sm transition-all"
88
              disabled={isModelLoading}
89
            />
90
          </div>
91
        </div>
92

93
        <div className="max-h-[60vh] overflow-y-auto">
94
          {!query && !isModelLoading && (
95
            <div className="p-4">
96
              <p className="text-sm text-muted-foreground mb-3">Try asking:</p>
97
              <div className="flex flex-wrap gap-2">
98
                {SAMPLE_QUESTIONS.map((sample, index) => (
99
                  <button
100
                    key={index}
101
                    className="px-3 py-1.5 text-sm bg-black/5 dark:bg-white/5 backdrop-blur-sm border border-black/10 dark:border-white/10 rounded-full hover:bg-black/10 dark:hover:bg-white/10 transition-all duration-200 hover:scale-105"
102
                    onClick={() => setQuery(sample)}
103
                  >
104
                    {sample}
105
                  </button>
106
                ))}
107
              </div>
108
            </div>
109
          )}
110

111
          {results.length > 0 && (
112
            <div className="px-4 py-2">
113
              {results.map((result) => (
114
                <button
115
                  key={result.id}
116
                  onClick={() => window.location.href = result.url}
117
                  className="w-full mb-2 text-left p-4 rounded-lg bg-black/5 dark:bg-white/5 backdrop-blur-sm border border-black/10 dark:border-white/10 hover:bg-black/10 dark:hover:bg-white/10 transition-all duration-200 group"
118
                >
119
                  <div className="flex items-start gap-3">
120
                    <FileText className="w-4 h-4 mt-1 text-muted-foreground shrink-0" />
121
                    <div className="flex-1 min-w-0">
122
                      <h3 className="font-medium text-sm mb-1 group-hover:text-primary transition-colors">
123
                        {result.title}
124
                      </h3>
125
                      <p className="text-sm text-muted-foreground line-clamp-2">
126
                        {result.excerpt}
127
                      </p>
128
                      <div className="flex items-center gap-2 mt-2">
129
                        <span className="text-xs text-muted-foreground">
130
                          {Math.round(result.similarity * 100)}% match
131
                        </span>
132
                      </div>
133
                    </div>
134
                  </div>
135
                </button>
136
              ))}
137
            </div>
138
          )}
139
        </div>
140

141
        <div className="p-3 border-t border-black/5 dark:border-white/10 bg-black/5 dark:bg-white/5 backdrop-blur-sm">
142
          <div className="flex items-center justify-between text-xs text-muted-foreground">
143
            <span className="text-foreground/60">AI search running locally in your browser</span>
144
            <kbd className="px-2 py-1 rounded bg-black/10 dark:bg-white/10 border border-black/10 dark:border-white/10 text-[10px] font-mono backdrop-blur-sm">ESC</kbd>
145
          </div>
146
        </div>
147
      </DialogContent>
148
    </Dialog>
149
  );
150
}

Step 7: Implementing Background Preloading

To ensure search is instant when users need it, preload the model in the background after the page loads:

1
<script>
2
  if (typeof window !== 'undefined') {
3
    // Preload embeddings after 1 second
4
    setTimeout(() => {
5
      fetch('/search-embeddings.json')
6
        .then(response => response.json())
7
        .then(data => {
8
          (window as any).searchEmbeddings = data;
9
          console.log('Search embeddings preloaded');
10
        })
11
        .catch(err => console.log('Failed to preload embeddings:', err));
12
    }, 1000);
13

14
    // Preload model after 3 seconds
15
    setTimeout(async () => {
16
      if ('requestIdleCallback' in window) {
17
        requestIdleCallback(async () => {
18
          try {
19
            const { pipeline, env } = await import('@xenova/transformers');
20

21
            env.allowLocalModels = false;
22
            env.remoteURL = 'https://cdn.jsdelivr.net/npm/@xenova/transformers@2.17.2/';
23

24
            console.log('Preloading search model...');
25
            const model = await pipeline(
26
              'feature-extraction',
27
              'Xenova/gte-small',
28
              {
29
                quantized: true,
30
                progress_callback: (data: any) => {
31
                  if (data.status === 'downloading') {
32
                    console.log(`Background model download: ${Math.round(data.progress)}%`);
33
                  }
34
                }
35
              }
36
            );
37

38
            (window as any).searchModel = model;
39
            console.log('Search model preloaded and ready');
40
          } catch (error) {
41
            console.log('Failed to preload model:', error);
42
          }
43
        });
44
      }
45
    }, 3000);
46
  }
47
</script>

This approach ensures:

The page loads normally without any delay
Embeddings load after 1 second
Model downloads after 3 seconds when the browser is idle
By the time users press CMD+K, everything is ready

Step 8: Handling Edge Cases and Optimizations

The Similarity Threshold Challenge

One of the biggest challenges was determining the right similarity threshold. Too low, and you get false positives. Too high, and relevant results are filtered out. After testing, we settled on 80% as the threshold, with a fallback to show the top 3 results if nothing meets the threshold.

Dealing with Horizontal Scroll

We encountered an interesting CSS challenge where the search result cards would cause horizontal scrolling. The issue was that cards with w-full and m-2 (margin) would extend beyond their container. The solution was to adjust the container padding and remove horizontal margins from the cards.

Dark Mode Visibility

The glass-morphism effect initially had poor visibility in dark mode. We solved this by:

Using bg-neutral-900/90 instead of bg-black/80 for better contrast
Increasing the opacity slightly while maintaining the frosted effect
Adding subtle borders with border-white/10

Step 9: Integration with Your Site

Add the search trigger to our site header:

1
export function SearchTrigger() {
2
  const [isOpen, setIsOpen] = useState(false);
3

4
  useEffect(() => {
5
    const handleKeyDown = (event: KeyboardEvent) => {
6
      if ((event.metaKey || event.ctrlKey) && event.key === 'k') {
7
        event.preventDefault();
8
        setIsOpen(true);
9
      }
10
    };
11

12
    document.addEventListener('keydown', handleKeyDown);
13
    return () => document.removeEventListener('keydown', handleKeyDown);
14
  }, []);
15

16
  return (
17
    <>
18
      <Button
19
        variant="ghost"
20
        size="sm"
21
        onClick={() => setIsOpen(true)}
22
      >
23
        <Search className="w-4 h-4" />
24
        <span>Search</span>
25
        <kbd className="ml-2 text-xs">⌘K</kbd>
26
      </Button>
27

28
      <SearchModal isOpen={isOpen} onClose={() => setIsOpen(false)} />
29
    </>
30
  );
31
}

Step 10: Build Script Integration

Updating our package.json to generate embeddings during the build process:

1
{
2
  "scripts": {
3
    "build": "node scripts/generate-embeddings.js && astro build",
4
    "update-embeddings": "node scripts/generate-embeddings.js"
5
  }
6
}

Performance Metrics

After implementation, here are the real-world performance metrics:

Embedding generation: ~329KB for 5 blog posts (66KB per post)
Model download: 33MB (one-time, cached forever)
Search response time: <500ms after model loads
Initial page load impact: +100KB (embeddings only)
Model loading: Background after 3 seconds, non-blocking

Troubleshooting Common Issues

False Positives in Search Results

If you’re seeing irrelevant results after implementing your Semantic Search, check:

Your chunking strategy - ensure semantic boundaries are respected. This guide is elite at this
The similarity threshold - we found 80% works well for technical content
Consider switching models if your content is specialized

Model Download Failures

If the model fails to download:

Check CDN configuration in the code
Ensure CORS headers are properly set
Consider hosting the model files yourself for reliability (Esp if you’re working in Production use cases and serving a lot of users every day)

Memory Usage Concerns

The model uses approximately 100MB of RAM when loaded. For mobile devices:

Consider using a smaller model like paraphrase-MiniLM-L3-v2 (14MB)
Implement device detection to load different models
Add a toggle for users to enable/disable semantic search

Future Enhancements

While our implementation provides a pretty decent semantic search experience, there are several enhancements you could add:

Hybrid Search: Combine semantic search with traditional keyword matching for best of both worlds
Search Analytics: Track what users search for (locally) to improve sample questions
Multi-language Support: Use multilingual models for international audiences
Citation Extraction: Show specific paragraphs that match the query
Query Expansion: Automatically expand queries with synonyms for better coverage

Conclusion

Building local semantic search with Transformers.js provides a powerful, privacy-focused alternative to cloud-based solutions. While there were challenges - from model selection to UI refinements - the end result is a search experience that rivals commercial offerings while keeping user data private and eliminating ongoing costs.

The key insights from our implementation is that:

Pre-computing embeddings at build time is crucial for performance
Model selection matters significantly for search quality
Background preloading ensures instant search when needed
Glass-morphism UI provides a modern, premium feel
Local execution means infinite scalability at zero marginal cost

This implementation delivers intelligent, context-aware search that operates offline, preserves user privacy, and maintains site performance. The solution eliminates recurring costs and API key management while providing enterprise-grade search capabilities.

I truly love local-first AI and I believe that as transformer models become more efficient and browser capabilities expand, we can expect to see increased adoption of edge-based AI features that balance functionality with privacy considerations.

Local-First Semantic Search for my Blog using Transformers.js

Share this post