Skip to content

rosaia/vecworks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

vecworks

Seamlessly manage vectorized data in Python

| Documentation |

About

Vecworks is an open-source framework build on top of Pypeworks to seamlessly manage vectorized data in Python. It offers the following features:

  • Standardized access to various vectorization platforms, like OpenAI's, SBERT and fastText
  • Mixing and matching of vectorization procedures, reducable to a single output using ensemblers
  • Remote serving of custom vectorization procedures using the built-in server

Install

Vecworks is available through the PyPI repository and can be installed using pip:

pip install vecworks

Quickstart

Vecworks' key concept is that of the Retriever. A retriever is a specialised Pypeworks node that vectorizes inputs, allowing to cross-reference these inputs with data in a vector store. As nodes retrievers may be embedded in Pypeworks pipeworks, enabling various applications, including semantic text matching, document classification, and RAG (when combined with Langworks).

Assuming a vector store has been set-up on a PostgreSQL-database, a retriever may be instantiated as follows:

import vecworks

from vecworks.retrievers.pgvector import (
    pgvectorRetriever
)

from vecworks.vectorizers.sbert import (
    sbertVectorizer
)

match = pgvectorRetriever(

    url   = "postgresql://127.0.0.1:5432/rag-mini-wikipedia",
    # Populated using https://huggingface.co/datasets/rag-datasets/rag-mini-wikipedia

    authenticator = vecworks.auth.UsernameCredentials("username", "password"),

    table = '"text-corpus"',

    index = [

        vecworks.Index(

            name         = "passage-e5-ml-large-q",
            # Column derived from 'passage', populated with vectorized contents of 'passage'.

            bind         = "input",

            distance     = vecworks.DISTANCES.cosine,
            max_distance = 0.2,
            top_k        = 5,

            vectorizer   = sbertVectorizer.create_from_string(

                "intfloat/multilingual-e5-large",

                prompt_format = "query: "
                normalize     = True

            ),

            density      = vecworks.DENSITY.dense

        ),

    ],

    return_columns = ["passage", "id"]

    top_k       = 3

) 

Releases

No releases published

Packages

 
 
 

Contributors

Languages