Full-stack vector database application with FAISS and Pinecone integration, featuring intelligent caching for 40% cost reduction.
Traditional keyword search fails when users don't know exact terminology. Organizations struggle with:
A full-stack semantic search application with clean abstraction layers:
Modular design with clean separation between UI, search engine, and storage layers.
Interactive dashboard with real-time analytics, search interface, and document indexing—all in pure Python.
Facebook AI Similarity Search for local, high-speed vector operations. Perfect for development and small-scale deployments.
Cloud-native vector database for production scale. Serverless infrastructure supporting billions of vectors.
text-embedding-3-small for production quality. Sentence Transformers for local/free alternative.
class VectorStoreBase(ABC):
"""Abstract base class for vector store implementations"""
@abstractmethod
def add_documents(self, documents: List[Document]) -> int:
"""Add documents to the vector store"""
pass
@abstractmethod
def search(self, query_embedding: List[float], top_k: int = 5):
"""Search for similar documents"""
pass
class VectorStoreFactory:
@staticmethod
def create(provider: str, dimension: int) -> VectorStoreBase:
if provider == "faiss":
return FAISSVectorStore(dimension)
elif provider == "pinecone":
return PineconeVectorStore(dimension)
class EmbeddingService:
"""High-level embedding service with caching"""
def __init__(self, provider="openai", cache_enabled=True):
self.cache: Dict[str, List[float]] = {}
self.stats = {"hits": 0, "misses": 0def embed(self, text: str) -> List[float]:
cache_key = hashlib.md5(text.encode()).hexdigest()
if cache_key in self.cache:
self.stats["hits"] += 1
return self.cache[cache_key] # 40% cost savings
self.stats["misses"] += 1
embedding = self.provider.embed(text)
self.cache[cache_key] = embedding
return embedding
def search(self, query: str, top_k: int = 5) -> SearchResponse:
# Generate query embedding (cached)
embed_start = time.perf_counter()
query_embedding = self.embedding_service.embed(query)
embed_time = (time.perf_counter() - embed_start) * 1000
# Search vector store (FAISS or Pinecone)
search_start = time.perf_counter()
results = self.vector_store.search(query_embedding, top_k)
search_time = (time.perf_counter() - search_start) * 1000
return SearchResponse(
query=query,
results=results,
search_time_ms=round(search_time, 2),
embedding_time_ms=round(embed_time, 2)
)
FAISS in-memory search with L2 distance calculation
Batch indexing speed with parallel embedding generation
Via embedding cache eliminating redundant API calls
Production-ready capabilities for enterprise semantic search.
Find documents by meaning, not keywords. Natural language queries supported.
Seamlessly switch between FAISS (local) and Pinecone (cloud) at runtime.
LRU cache eliminates redundant embedding calls for 40% cost savings.
Live dashboard with search latency, cache rates, and system stats.
JSON file upload for batch processing thousands of documents.
Categories, tags, and custom fields for enhanced filtering.
Deterministic testing without API costs using hash-based embeddings.
Save and load FAISS indexes for persistence across sessions.
Runtime provider selection enables FAISS for development (free, fast) and Pinecone for production (managed, scalable). Same application code, different infrastructure.
OpenAI for production quality, Sentence Transformers for privacy-sensitive local use, Mock for deterministic testing. Each serves a distinct purpose.
Pure Python rapid prototyping—no JavaScript required. Built-in state management and caching complement the backend. Perfect for data-centric applications.
L2 (Euclidean) distance works well with normalized embeddings. For Pinecone, cosine similarity is used as it's optimized for their infrastructure.
Full source code with documentation available on GitHub.