Intel's Visual Data Management System (VDMS)
This notebook covers how to get started with VDMS as a vector store.
Intel's Visual Data Management System (VDMS) is a storage solution for efficient access of big-”visual”-data that aims to achieve cloud scale by searching for relevant visual data via visual metadata stored as a graph and enabling machine friendly enhancements to visual data for faster access. VDMS is licensed under MIT. For more information on
VDMS, visit this page, and find the LangChain API reference here.
VDMS supports:
- K nearest neighbor search
- Euclidean distance (L2) and inner product (IP)
- Libraries for indexing and computing distances: FaissFlat (Default), FaissHNSWFlat, FaissIVFFlat, Flinng, TileDBDense, TileDBSparse
- Embeddings for text, images, and video
- Vector and metadata searches
Setup
To access VDMS vector stores you'll need to install the langchain-vdms integration package and deploy a VDMS server via the publicly available Docker image.
For simplicity, this notebook will deploy a VDMS server on local host using port 55555.
%pip install -qU "langchain-vdms>=0.1.3"
!docker run --no-healthcheck --rm -d -p 55555:55555 --name vdms_vs_test_nb intellabs/vdms:latest
!sleep 5
Note: you may need to restart the kernel to use updated packages.
c464076e292613df27241765184a673b00c775cecb7792ef058591c2cbf0bde8
Credentials
You can use VDMS without any credentials.
To enable automated tracing of your model calls, set your LangSmith API key:
# os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
# os.environ["LANGSMITH_TRACING"] = "true"
Initialization
Use the VDMS Client to connect to a VDMS vectorstore using FAISS IndexFlat indexing (default) and Euclidean distance (default) as the distance metric for similarity search.
pip install -qU langchain-openai
import getpass
import os
if not os.environ.get("OPENAI_API_KEY"):
os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter API key for OpenAI: ")
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
from langchain_vdms.vectorstores import VDMS, VDMS_Client
collection_name = "test_collection_faiss_L2"
vdms_client = VDMS_Client(host="localhost", port=55555)
vector_store = VDMS(
client=vdms_client,
embedding=embeddings,
collection_name=collection_name,
engine="FaissFlat",
distance_strategy="L2",
)
Manage vector store
Add items to vector store
import logging
logging.basicConfig()
logging.getLogger("langchain_vdms.vectorstores").setLevel(logging.INFO)
from langchain_core.documents import Document
document_1 = Document(
page_content="I had chocolate chip pancakes and scrambled eggs for breakfast this morning.",
metadata={"source": "tweet"},
id=1,
)
document_2 = Document(
page_content="The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.",
metadata={"source": "news"},
id=2,
)
document_3 = Document(
page_content="Building an exciting new project with LangChain - come check it out!",
metadata={"source": "tweet"},
id=3,
)
document_4 = Document(
page_content="Robbers broke into the city bank and stole $1 million in cash.",
metadata={"source": "news"},
id=4,
)
document_5 = Document(
page_content="Wow! That was an amazing movie. I can't wait to see it again.",
metadata={"source": "tweet"},
id=5,
)
document_6 = Document(
page_content="Is the new iPhone worth the price? Read this review to find out.",
metadata={"source": "website"},
id=6,
)
document_7 = Document(
page_content="The top 10 soccer players in the world right now.",
metadata={"source": "website"},
id=7,
)
document_8 = Document(
page_content="LangGraph is the best framework for building stateful, agentic applications!",
metadata={"source": "tweet"},
id=8,
)
document_9 = Document(
page_content="The stock market is down 500 points today due to fears of a recession.",
metadata={"source": "news"},
id=9,
)
document_10 = Document(
page_content="I have a bad feeling I am going to get deleted :(",
metadata={"source": "tweet"},
id=10,
)
documents = [
document_1,
document_2,
document_3,
document_4,
document_5,
document_6,
document_7,
document_8,
document_9,
document_10,
]
doc_ids = [str(i) for i in range(1, 11)]
vector_store.add_documents(documents=documents, ids=doc_ids)
['1', '2', '3', '4', '5', '6', '7', '8', '9', '10']
If an id is provided multiple times, add_documents does not check whether the ids are unique. For this reason, use upsert to delete existing id entries prior to adding.
vector_store.upsert(documents, ids=doc_ids)
{'succeeded': ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10'],
'failed': []}