Skip to main content

概述

一个向量存储存储嵌入的数据并执行相似性搜索。

接口

LangChain 为向量存储提供了统一的接口,允许您:
  • addDocuments - 添加文档到存储中。
  • delete - 根据 ID 移除存储的文档。
  • similaritySearch - 查询语义相似的文档。
该抽象使您能够在不修改应用程序逻辑的情况下,切换不同的实现。

初始化

在初始化向量存储时,LangChain 中的大多数向量存储接受一个嵌入模型作为参数。
import { OpenAIEmbeddings } from "@langchain/openai";
import { MemoryVectorStore } from "@langchain/classic/vectorstores/memory";

const embeddings = new OpenAIEmbeddings({
  model: "text-embedding-3-small",
});
const vectorStore = new MemoryVectorStore(embeddings);

添加文档

您可以使用 addDocuments 函数将文档添加到向量存储中。
import { Document } from "@langchain/core/documents";
const document = new Document({
  pageContent: "Hello world",
});
await vectorStore.addDocuments([document]);

删除文档

您可以使用 delete 函数从向量存储中删除文档。
await vectorStore.delete({
  filter: {
    pageContent: "Hello world",
  },
});

相似性搜索

使用 similaritySearch 执行语义查询,它会返回最接近的嵌入文档:
const results = await vectorStore.similaritySearch("Hello world", 10);
许多向量存储支持如下参数:
  • k — 返回的结果数量
  • filter — 基于元数据的条件过滤

相似度指标与索引

嵌入相似度可以通过以下方式计算:
  • 余弦相似度
  • 欧几里得距离
  • 点积
高效搜索通常采用索引方法,例如 HNSW (Hierarchical Navigable Small World),但具体细节取决于向量存储。

元数据过滤

通过元数据(例如来源、日期)过滤可以细化搜索结果:
vectorStore.similaritySearch("query", 2, { source: "tweets" });

主要集成

选择嵌入模型:
安装依赖:
npm i @langchain/openai
添加环境变量:
OPENAI_API_KEY=your-api-key
实例化模型:
import { OpenAIEmbeddings } from "@langchain/openai";

const embeddings = new OpenAIEmbeddings({
  model: "text-embedding-3-large"
});
安装依赖
npm i @langchain/openai
添加环境变量:
AZURE_OPENAI_API_INSTANCE_NAME=<YOUR_INSTANCE_NAME>
AZURE_OPENAI_API_KEY=<YOUR_KEY>
AZURE_OPENAI_API_VERSION="2024-02-01"
实例化模型:
import { AzureOpenAIEmbeddings } from "@langchain/openai";

const embeddings = new AzureOpenAIEmbeddings({
  azureOpenAIApiEmbeddingsDeploymentName: "text-embedding-ada-002"
});
安装依赖:
npm i @langchain/aws
添加环境变量:
BEDROCK_AWS_REGION=your-region
实例化模型:
import { BedrockEmbeddings } from "@langchain/aws";

const embeddings = new BedrockEmbeddings({
  model: "amazon.titan-embed-text-v1"
});
安装依赖:
npm i @langchain/google-genai
添加环境变量:
GOOGLE_API_KEY=your-api-key
实例化模型:
import { GoogleGenerativeAIEmbeddings } from "@langchain/google-genai";

const embeddings = new GoogleGenerativeAIEmbeddings({
  model: "text-embedding-004"
});
安装依赖:
npm i @langchain/google-vertexai
添加环境变量:
GOOGLE_APPLICATION_CREDENTIALS=credentials.json
实例化模型:
import { VertexAIEmbeddings } from "@langchain/google-vertexai";

const embeddings = new VertexAIEmbeddings({
  model: "gemini-embedding-001"
});
安装依赖:
npm i @langchain/mistralai
添加环境变量:
MISTRAL_API_KEY=your-api-key
实例化模型:
import { MistralAIEmbeddings } from "@langchain/mistralai";

const embeddings = new MistralAIEmbeddings({
  model: "mistral-embed"
});
安装依赖:
npm i @langchain/cohere
添加环境变量:
COHERE_API_KEY=your-api-key
实例化模型:
import { CohereEmbeddings } from "@langchain/cohere";

const embeddings = new CohereEmbeddings({
  model: "embed-english-v3.0"
});
安装依赖:
npm i @langchain/ollama
实例化模型:
import { OllamaEmbeddings } from "@langchain/ollama";

const embeddings = new OllamaEmbeddings({
  model: "llama2",
  baseUrl: "http://localhost:11434", // Default value
});
选择向量存储:
npm i langchain
import { MemoryVectorStore } from "@langchain/classic/vectorstores/memory";

const vectorStore = new MemoryVectorStore(embeddings);
npm i @langchain/community
import { Chroma } from "@langchain/community/vectorstores/chroma";

const vectorStore = new Chroma(embeddings, {
  collectionName: "a-test-collection",
});
npm i @langchain/community
import { FaissStore } from "@langchain/community/vectorstores/faiss";

const vectorStore = new FaissStore(embeddings, {});
npm i @langchain/mongodb
import { MongoDBAtlasVectorSearch } from "@langchain/mongodb"
import { MongoClient } from "mongodb";

const client = new MongoClient(process.env.MONGODB_ATLAS_URI || "");
const collection = client
  .db(process.env.MONGODB_ATLAS_DB_NAME)
  .collection(process.env.MONGODB_ATLAS_COLLECTION_NAME);

const vectorStore = new MongoDBAtlasVectorSearch(embeddings, {
  collection,
  indexName: "vector_index",
  textKey: "text",
  embeddingKey: "embedding",
});
npm i @langchain/community
import { PGVectorStore } from "@langchain/community/vectorstores/pgvector";

const vectorStore = await PGVectorStore.initialize(embeddings, {});
npm i @langchain/pinecone
import { PineconeStore } from "@langchain/pinecone";
import { Pinecone as PineconeClient } from "@pinecone-database/pinecone";

const pinecone = new PineconeClient();
const vectorStore = new PineconeStore(embeddings, {
  pineconeIndex,
  maxConcurrency: 5,
});
npm i @langchain/qdrant
import { QdrantVectorStore } from "@langchain/qdrant";

const vectorStore = await QdrantVectorStore.fromExistingCollection(embeddings, {
  url: process.env.QDRANT_URL,
  collectionName: "langchainjs-testing",
});
LangChain.js 集成了多种向量存储。您可以查看下面的完整列表:

所有向量存储

AnalyticDB

Astra DB

Azion EdgeSQL

Azure AI Search

Azure Cosmos DB for MongoDB vCore

Azure Cosmos DB for NoSQL

Cassandra

Chroma

ClickHouse

CloseVector

Cloudflare Vectorize

Convex

Couchbase

Elasticsearch

Faiss

Google Cloud SQL for PostgreSQL

Google Vertex AI Matching Engine

SAP HANA Cloud Vector Engine

HNSWLib

LanceDB

libSQL

MariaDB

In-memory

Milvus

Momento Vector Index (MVI)

MongoDB Atlas

MyScale

Neo4j Vector Index

Neon Postgres

OpenSearch

PGVector

Pinecone

Prisma

Qdrant

Redis

Rockset

SingleStore

Supabase

Tigris

Turbopuffer

TypeORM

Typesense

Upstash Vector

USearch

Vectara

Vercel Postgres

Voy

Weaviate

Xata

Zep Open Source

Zep Cloud

请提供需要翻译的英文技术文档内容。我将根据您的要求,将其翻译为简体中文,并严格遵循所有术语规范和格式要求。
以编程方式连接这些文档 到 Claude、VSCode 等更多应用,通过 MCP 以获取实时答案。