anyvar.storage.duckdb

Provide DuckDB-based storage implementation.

Most live, persistent variant registration and search services will want to employ the PostgreSQL storage option for its performance, particularly for concurrent writes and large-scale datasets. However, DuckDB may be better in certain use cases:

  • The DuckDB file-based option can be used to assemble a cohort or a dataset into a static file, like an index, that can be easily passed along to other uses for later lookup This option can also function as a simple registration service in cases where a separately-provisioned PostgreSQL server is logistically prohibitive, although this is not ideal.

  • The DuckDB in-memory option can be used for simple testing and demonstration purposes, and also works as a less-performant “stateless” translation service. Note that the database is wiped and recreated every time a FastAPI service restarts.

>>> from anyvar.storage.duckdb import DuckDbObjectStore
>>> file_based = DuckDbObjectStore("duckdb:///path/to/my/variants.duckdb")
>>> in_memory = DuckDbObjectStore("duckdb:///:memory:")

Under the hood, this should behave like a simpler but less-performant equivalent of Postgres. Our implementation is designed to employ common SqlAlchemy resources so there should be minimal specific maintenance required here.

class anyvar.storage.duckdb.DuckDbObjectStore(db_uri, *args, **kwargs)[source]

DuckDB-backed AnyVar object store.

__init__(db_uri, *args, **kwargs)[source]

Initialize PostgreSQL storage.

Parameters:

db_uri (str) – DuckDB connection URI. See above for options.

close()[source]

Close the storage backend.

Return type:

None

delete_extensions(object_id, name=None, value=None)[source]

Delete extension(s) for an object

Supports gradual specificity – either delete all extensions, or delete all extensions under a given key/name, or delete all extensions with a given name AND value.

If no extension matching given args exists, do nothing.

Note that this gets a little slow in DuckDB, because we have to manually query the # of matching rows first.

Parameters:
  • object_id (str) – The object ID

  • name (str | None) – Optional extension key/name to delete

  • value (Optional[TypeAliasType]) – Optional extension value to delete. Ignored if name is not provided

Return type:

int

Returns:

Number of deleted rows

wait_for_writes()[source]

Wait for all background writes to complete. NOTE: This is a no-op for synchronous storage backends.

Return type:

None

wipe_db()[source]

Wipe all data from the storage backend.

Return type:

None