Skip to content
This repository was archived by the owner on Dec 10, 2025. It is now read-only.
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions code_samples/azure_cosmosdb_nosql/.env-template
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
cosmos_db_api_endpoint=
cosmos_db_api_key=
cosmos_db_connection_string=
cog_search_endpoint=
cog_search_key=
AOAI_ENDPOINT=
AOAI_API_VERSION =
AOAI_EMBEDDING_DEPLOYED_MODEL=
AZURE_OPENAI_KEY=
78 changes: 78 additions & 0 deletions code_samples/azure_cosmosdb_nosql/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# Azure CosmosDb Samples

This folder includes the notebooks to demonstrate vector search capabilities of [Azure CosmosDb NoSQL](https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/) for text and documents

## Run the Code Locally

Follow the steps to run the code locally.

1. The samples uses Conda to manage virtual environments. Create a conda environment using the [azure_cosmosdb_nosql_conda.yml](./azure_cosmosdb_nosql_conda.yml) file to include all necessary python dependencies.

`conda env create -f cosmostest azure_cosmosdb_nosql_conda.yml`



**Alternatively**

a. You could install the [requirements.txt](./requirements.txt) in your environment **instead** the yml.

`pip install -r /path/to/requirements.txt`

b. Or run the pip install libraries from the ingestion sample script- [azure_cosmos_ingestion.ipynb](./cosmos_ingestion.ipynb).



2. Create a *.env* file from the *.env-template* and populate it with all necessary keys.

3. Finally, follow the instructions mentioned here to run the code locally using VS Code - [Run the Code Locally](../README.md#run-the-code-locally)


## Resources Deployment

- Azure CosmosDb

*Create resource*

Augment the Azure Cosmos DB data with semantic and vector search capabilities of Azure AI Search..

For IAC deployment, **[infrastructure](./infrastructure/)** folder has a bicep script to deploy the Azure CosmosDb. In the bicep script, **fill out the parameters** values according to your environment, and run the following command.
Note: It will deploy an empty Cosmos Db No SQL, containers will be created in the ingestion step.

`az deployment group create --resource-group resource_group_name --template-file azure_cosmosdb_nosql.bicep`



- Azure OpenAI

Azure OpenAI Service resource can be deployed using [Azure Portal](https://learn.microsoft.com/azure/ai-services/openai/how-to/create-resource?pivots=web-portal), [Azure CLI](https://learn.microsoft.com/azure/ai-services/openai/how-to/create-resource?pivots=cli) or [Azure PowerShell](https://learn.microsoft.com/azure/ai-services/openai/how-to/create-resource?pivots=ps). Again, [private endpoints](https://learn.microsoft.com/azure/ai-services/cognitive-services-virtual-networks?context=%2Fazure%2Fai-services%2Fopenai%2Fcontext%2Fcontext&tabs=portal#use-private-endpoints) can be used for Azure AI services resources to allow clients on a virtual network to securely access data over Azure Private Link.

Please note, for the semantic Search you need to enable the Service: [Semantic](https://learn.microsoft.com/en-us/azure/search/semantic-how-to-enable-disable?tabs=enable-portal)

**In summary:** You will need to have an Open AI Service created with the Model deployed from it( for example:text-embedding-ada-002), and also a Cognitive Search Service created with the Semantic Search enable.

## Datasets

- [text](../data/text/) - for text search sample

- [docs](../data/docs/) - for document search sample



## Sample Notebooks

- [azure_cosmos_ingestion.ipynb](./cosmos_ingestion.ipynb)
- [azure_cosmos_vector_query.ipynb](./cosmosdb_vector_query.ipynb)

## Reference

- [Vector database - Azure Cosmos DB | Microsoft Learn](https://learn.microsoft.com/en-us/azure/cosmos-db/vector-database#implement-vector-database-functionalities-using-our-nosql-api-and-ai-search)
- [Azure AI Search Documentation](https://learn.microsoft.com/azure/search/)
- [Retrieval Augmented Generation (RAG) in Azure AI Search](https://learn.microsoft.com/azure/search/retrieval-augmented-generation-overview)
- [Vector search overview](https://learn.microsoft.com/azure/search/vector-search-overview)
- [Hybrid search overview](https://learn.microsoft.com/azure/search/hybrid-search-overview)
- [Create a vector index](https://learn.microsoft.com/azure/search/vector-search-how-to-create-index)
- [Query a vector index](https://learn.microsoft.com/azure/search/vector-search-how-to-query)
- [Vector search algorithms](https://learn.microsoft.com/azure/search/vector-search-ranking)
- [Create a Service](https://learn.microsoft.com/en-us/azure/search/search-create-service-portal)
- [Vector Store](https://learn.microsoft.com/en-us/azure/search/vector-search-how-to-create-index?tabs=config-2023-11-01%2Crest-2023-11-01%2Cpush%2Cportal-check-index)
- [Deploy Models](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models#embeddings-models)
187 changes: 187 additions & 0 deletions code_samples/azure_cosmosdb_nosql/azure_cosmosdb_nosql_conda.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,187 @@
name: Cosmos_Nosql
channels:
- conda-forge
- defaults
dependencies:
- aiohttp=3.8.5=py310h8d17308_0
- aiosignal=1.3.1=pyhd8ed1ab_0
- anyio=3.7.1=pyhd8ed1ab_0
- appdirs=1.4.4=pyh9f0ad1d_0
- asttokens=2.4.0=pyhd8ed1ab_0
- async-timeout=4.0.3=pyhd8ed1ab_0
- attrs=23.1.0=pyh71513ae_1
- backcall=0.2.0=pyhd3eb1b0_0
- backports=1.1=pyhd3eb1b0_0
- backports.functools_lru_cache=1.6.5=pyhd8ed1ab_0
- brotli=1.1.0=hcfcfb64_0
- brotli-bin=1.1.0=hcfcfb64_0
- brotli-python=1.1.0=py310h00ffb61_0
- bzip2=1.0.8=he774522_0
- ca-certificates=2023.12.12=haa95532_0
- cachetools=5.3.1=pyhd8ed1ab_0
- certifi=2023.11.17=py310haa95532_0
- cffi=1.15.1=py310h8d17308_5
- charset-normalizer=3.2.0=pyhd8ed1ab_0
- click=8.1.7=win_pyh7428d3b_0
- colorama=0.4.6=py310haa95532_0
- comm=0.1.4=pyhd8ed1ab_0
- contourpy=1.1.1=py310h232114e_1
- cryptography=41.0.4=py310h6e82f81_0
- cycler=0.11.0=pyhd8ed1ab_0
- dataclasses-json=0.5.7=pyhd8ed1ab_0
- debugpy=1.8.0=py310h00ffb61_1
- decorator=5.1.1=pyhd3eb1b0_0
- docker-pycreds=0.4.0=py_0
- et_xmlfile=1.1.0=pyhd8ed1ab_0
- exceptiongroup=1.2.0=py310haa95532_0
- executing=1.2.0=pyhd8ed1ab_0
- fonttools=4.42.1=py310h8d17308_0
- freetype=2.12.1=hdaf720e_2
- frozenlist=1.4.0=py310h8d17308_1
- gitdb=4.0.10=pyhd8ed1ab_0
- gitpython=3.1.37=pyhd8ed1ab_0
- greenlet=2.0.2=py310h00ffb61_1
- idna=3.4=pyhd8ed1ab_0
- importlib-metadata=7.0.1=py310haa95532_0
- importlib_metadata=7.0.1=hd3eb1b0_0
- intel-openmp=2023.2.0=h57928b3_49503
- ipykernel=6.28.0=py310haa95532_0
- ipython=8.20.0=py310haa95532_0
- jedi=0.19.0=pyhd8ed1ab_0
- joblib=1.3.2=pyhd8ed1ab_0
- jsonpatch=1.33=pyhd8ed1ab_0
- jsonpointer=2.4=py310h5588dad_3
- jupyter_client=8.6.0=py310haa95532_0
- jupyter_core=5.5.0=py310haa95532_0
- kiwisolver=1.4.5=py310h232114e_1
- langchain=0.0.304=pyhd8ed1ab_0
- langsmith=0.0.41=pyhd8ed1ab_0
- lcms2=2.15=he9d350c_2
- lerc=4.0.0=h63175ca_0
- libabseil=20230802.1=cxx17_h63175ca_0
- libblas=3.9.0=18_win64_mkl
- libbrotlicommon=1.1.0=hcfcfb64_0
- libbrotlidec=1.1.0=hcfcfb64_0
- libbrotlienc=1.1.0=hcfcfb64_0
- libcblas=3.9.0=18_win64_mkl
- libdeflate=1.19=hcfcfb64_0
- libffi=3.4.4=hd77b12b_0
- libhwloc=2.9.3=default_haede6df_1009
- libiconv=1.17=h8ffe710_0
- libjpeg-turbo=2.1.5.1=hcfcfb64_1
- liblapack=3.9.0=18_win64_mkl
- libpng=1.6.39=h19919ed_0
- libprotobuf=4.24.3=hb8276f3_0
- libsodium=1.0.18=h62dcd97_0
- libsqlite=3.44.2=hcfcfb64_0
- libtiff=4.6.0=h4554b19_1
- libwebp-base=1.3.2=hcfcfb64_0
- libxcb=1.15=hcd874cb_0
- libxml2=2.11.5=hc3477c8_1
- libzlib=1.2.13=hcfcfb64_5
- m2w64-gcc-libgfortran=5.3.0=6
- m2w64-gcc-libs=5.3.0=7
- m2w64-gcc-libs-core=5.3.0=7
- m2w64-gmp=6.1.0=2
- m2w64-libwinpthread-git=5.0.0.4634.697f757=2
- matplotlib-base=3.8.0=py310hc9baf74_1
- matplotlib-inline=0.1.6=py310haa95532_0
- mkl=2022.1.0=h6a75c08_874
- msys2-conda-epoch=20160418=1
- multidict=6.0.4=py310h8d17308_0
- munkres=1.1.4=pyh9f0ad1d_0
- mypy_extensions=1.0.0=pyha770c72_0
- nest-asyncio=1.5.6=py310haa95532_0
- numexpr=2.8.7=mkl_py310hd551296_0
- numpy=1.26.0=py310hf667824_0
- openai=0.28.1=pyhd8ed1ab_0
- openapi-schema-pydantic=1.2.4=pyhd8ed1ab_0
- openjpeg=2.5.0=h3d672ee_3
- openpyxl=3.1.2=py310h8d17308_1
- openssl=3.1.3=hcfcfb64_0
- packaging=23.1=py310haa95532_0
- pandas=2.1.1=py310hecd3228_0
- pandas-stubs=2.0.3.230814=pyhd8ed1ab_0
- parso=0.8.3=pyhd3eb1b0_0
- pathtools=0.1.2=py_1
- pickleshare=0.7.5=pyhd3eb1b0_1003
- pillow=10.0.1=py310h6abe1ea_1
- pip=23.3.1=py310haa95532_0
- platformdirs=3.10.0=py310haa95532_0
- plotly=5.17.0=pyhd8ed1ab_0
- prompt-toolkit=3.0.43=py310haa95532_0
- prompt_toolkit=3.0.43=hd3eb1b0_0
- protobuf=4.24.3=py310h19be30a_0
- psutil=5.9.5=py310h8d17308_1
- pthread-stubs=0.4=hcd874cb_1001
- pthreads-win32=2.9.1=hfa6e2cd_3
- pure_eval=0.2.2=pyhd3eb1b0_0
- pyasn1=0.5.0=pyhd8ed1ab_0
- pyasn1-modules=0.3.0=pyhd8ed1ab_0
- pycparser=2.21=pyhd8ed1ab_0
- pydantic=1.10.13=py310h8d17308_0
- pygments=2.16.1=pyhd8ed1ab_0
- pyopenssl=23.2.0=pyhd8ed1ab_1
- pyparsing=3.1.1=pyhd8ed1ab_0
- pypdf2=2.11.1=pyhd8ed1ab_0
- pysocks=1.7.1=pyh0701188_6
- python=3.10.13=he1021f5_0
- python-dateutil=2.8.2=pyhd3eb1b0_0
- python-dotenv=1.0.0=pyhd8ed1ab_1
- python-tzdata=2023.3=pyhd8ed1ab_0
- python_abi=3.10=2_cp310
- pytz=2023.3.post1=pyhd8ed1ab_0
- pyu2f=0.1.5=pyhd8ed1ab_0
- pywin32=306=py310h00ffb61_1
- pyyaml=6.0.1=py310h8d17308_1
- pyzmq=25.1.2=py310hd77b12b_0
- requests=2.31.0=pyhd8ed1ab_0
- rsa=4.9=pyhd8ed1ab_0
- scikit-learn=1.3.1=py310hfd2573f_0
- scipy=1.11.2=py310h70e3499_1
- sentry-sdk=1.31.0=pyhd8ed1ab_0
- setproctitle=1.3.2=py310h8d17308_2
- setuptools=68.2.2=py310haa95532_0
- six=1.16.0=pyhd3eb1b0_1
- smmap=3.0.5=pyh44b312d_0
- sniffio=1.3.0=pyhd8ed1ab_0
- sqlalchemy=2.0.21=py310h8d17308_0
- sqlite=3.41.2=h2bbff1b_0
- stack_data=0.6.2=pyhd8ed1ab_0
- stringcase=1.2.0=py_0
- tbb=2021.10.0=h91493d7_1
- tenacity=8.2.3=pyhd8ed1ab_0
- threadpoolctl=3.2.0=pyha21a80b_0
- tk=8.6.13=hcfcfb64_0
- tornado=6.3.3=py310h2bbff1b_0
- tqdm=4.66.1=pyhd8ed1ab_0
- traitlets=5.10.1=pyhd8ed1ab_0
- types-pytz=2023.3.1.1=pyhd8ed1ab_0
- typing-extensions=4.9.0=py310haa95532_1
- typing_extensions=4.9.0=py310haa95532_1
- typing_inspect=0.9.0=pyhd8ed1ab_0
- tzdata=2023d=h04d1e81_0
- ucrt=10.0.22621.0=h57928b3_0
- unicodedata2=15.1.0=py310h8d17308_0
- urllib3=2.0.5=pyhd8ed1ab_0
- vc=14.3=h64f974e_17
- vc14_runtime=14.38.33130=h82b7239_18
- vs2015_runtime=14.38.33130=hcb4865c_18
- wandb=0.15.11=pyhd8ed1ab_0
- wcwidth=0.2.6=pyhd8ed1ab_0
- wheel=0.41.2=py310haa95532_0
- win_inet_pton=1.1.0=pyhd8ed1ab_6
- xorg-libxau=1.0.11=hcd874cb_0
- xorg-libxdmcp=1.1.3=hcd874cb_0
- xz=5.4.5=h8cc25b3_0
- yaml=0.2.5=h8ffe710_2
- yarl=1.9.2=py310h8d17308_0
- zeromq=4.3.5=hd77b12b_0
- zipp=3.17.0=py310haa95532_0
- zlib=1.2.13=hcfcfb64_5
- zstd=1.5.5=h12be248_0
- pip:
- azure-common==1.1.28
- azure-core==1.29.4
- azure-search-documents==11.4.0b9
- isodate==0.6.1
Loading