Using the Python API
This page demonstrates basic usage of the AnyVar Python interface. See the Python API reference for a more exhaustive description of public functions and classes.
Note
For brevity, some code samples assume variables or setup defined earlier on the page.
Instantiating An AnyVar Instance
The AnyVar class requires implementation of the Storage and Translator abstractions. These can easily be instantiated with the create_storage() and create_translator() factory functions:
>>> from anyvar import AnyVar, create_storage, create_translator
>>> av = AnyVar(create_translator(), create_storage())
Basic Variant Object Operations
Use the anyvar.anyvar.AnyVar.put_objects() method to add supported objects, such as VRS alleles.
>>> from ga4gh.vrs import models
>>> allele = models.Allele(**{
... "id": "ga4gh:VA.K7akyz9PHB0wg8wBNVlWAAdvMbJUJJfU",
... "digest": "K7akyz9PHB0wg8wBNVlWAAdvMbJUJJfU",
... "location": {
... "id": "ga4gh:SL.aCMcqLGKClwMWEDx3QWe4XSiGDlKXdB8",
... "digest": "aCMcqLGKClwMWEDx3QWe4XSiGDlKXdB8",
... "end": 87894077,
... "start": 87894076,
... "sequenceReference": {
... "refgetAccession": "SQ.ss8r_wB0-b9r44TQTMmVTI92884QvBiB",
... "type": "SequenceReference"
... },
... "type": "SequenceLocation"
... },
... "state": {
... "sequence": "T",
... "type": "LiteralSequenceExpression"
... },
... "type": "Allele"
... })
>>> av.put_objects([allele])
Retrieve variation objects by ID:
>>> av.get_object("ga4gh:VA.K7akyz9PHB0wg8wBNVlWAAdvMbJUJJfU")
Allele(id='ga4gh:VA.K7akyz9PHB0wg8wBNVlWAAdvMbJUJJfU', type='Allele', name=None, description=None, aliases=None, extensions=None, digest='K7akyz9PHB0wg8wBNVlWAAdvMbJUJJfU', expressions=None, location=SequenceLocation(id='ga4gh:SL.01EH5o6V6VEyNUq68gpeTwKE7xOo-WAy', type='SequenceLocation', name=None, description=None, aliases=None, extensions=None, digest='01EH5o6V6VEyNUq68gpeTwKE7xOo-WAy', sequenceReference=SequenceReference(id=None, type='SequenceReference', name=None, description=None, aliases=None, extensions=None, refgetAccession='SQ.ss8r_wB0-b9r44TQTMmVTI92884QvBiB', residueAlphabet=None, circular=None, sequence=None, moleculeType=None), start=87894076, end=87894077, sequence=None), state=LiteralSequenceExpression(id=None, type='LiteralSequenceExpression', name=None, description=None, aliases=None, extensions=None, sequence=sequenceString(root='T')))
When an object is registered, any objects contained within it are also registered. They may similarly be retrieved:
>>> av.get_object("ga4gh:SL.aCMcqLGKClwMWEDx3QWe4XSiGDlKXdB8")
SequenceLocation(id='ga4gh:SL.aCMcqLGKClwMWEDx3QWe4XSiGDlKXdB8', type='SequenceLocation', name=None, description=None, aliases=None, extensions=None, digest='aCMcqLGKClwMWEDx3QWe4XSiGDlKXdB8', sequenceReference=SequenceReference(id=None, type='SequenceReference', name=None, description=None, aliases=None, extensions=None, refgetAccession='SQ.ss8r_wB0-b9r44TQTMmVTI92884QvBiB', residueAlphabet=None, circular=None, sequence=None, moleculeType=None), start=87894076, end=87894077, sequence=None)
Variation Translation
AnyVar’s variation translation feature can be used to construct supported input objects from other representational nomenclatures. For example, to register a variant using HGVS nomenclature:
>>> av = AnyVar(create_translator(), create_storage())
>>> braf_variant = av.translator.translate_allele("NM_004333.6:c.1799T>A")
>>> braf_variant.id
'ga4gh:VA.XbRlw94yRqkcqY59FKba99Lsd1oc5AE_'
>>> av.put_objects([braf_variant])
Variation Liftover
AnyVar employs the agct library to lift genomic variation locations between equivalent positions on GRCh37 and GRCh38. The get_liftover_variant() function takes a variation, determines its reference assembly, and returns the corresponding allele on the opposite assembly.
>>> from anyvar.mapping.liftover import get_liftover_variant
>>> (allele.location.start, allele.location.end)
(87894076, 87894077)
>>> lifted_variant = get_liftover_variant(allele)
>>> (lifted_variant.location.start, lifted_variant.location.end)
(89653833, 89653834)
Object Mappings
AnyVar can add basic mappings between objects with AnyVar.put_mapping().
>>> from anyvar.core.metadata import VariationMapping, VariationMappingType
>>> genomic_allele = av.translator.translate_allele("NC_000007.14:g.140753336A>T")
>>> tx_allele = av.translator.translate_allele("NM_004333.6:c.1799T>A")
>>> av.put_objects([genomic_allele, tx_allele])
>>> mapping = VariationMapping(
... source_id=genomic_allele.id,
... dest_id=tx_allele.id,
... mapping_type=VariationMappingType.TRANSCRIPTION
... )
>>> av.put_mapping(mapping)
They can be retrieved with AnyVar.get_object_mappings().
>>> av.get_object_mappings(genomic_allele.id, VariationMappingType.TRANSCRIPTION)
[VariationMapping(source_id='ga4gh:VA.Otc5ovrw906Ack087o1fhegB4jDRqCAe', dest_id='ga4gh:VA.W6xsV-aFm9yT2Bic5cFAV2j0rll6KK5R', mapping_type=<VariationMappingType.TRANSCRIPTION: 'transcription'>)]
See here for more information about object mappings in AnyVar.
The liftover module provides the add_liftover_mapping() function as a convenient way to find the lifted-over equivalent of a variation, register it, and add mappings of type liftover between them.
Object Extensions
AnyVar can append basic extensions on objects with AnyVar.put_extension().
>>> from anyvar.core.metadata import Extension
>>> av.put_extension(Extension(
... object_id=genomic_allele.id,
... name="clinvar_accession",
... value="VCV000012345.6"
... ))
>>> av.put_extension(Extension(
... object_id=genomic_allele.id,
... name="sample_id",
... value="SAMPLE-001"
... ))
Extensions can be retrieved with AnyVar.get_object_extensions using the object ID, and optionally the provided extension name.
>>> av.get_object_extensions(genomic_allele.id)
[Extension(object_id='ga4gh:VA.Otc5ovrw906Ack087o1fhegB4jDRqCAe', name='clinvar_accession', value='VCV000012345.6'),
Extension(object_id='ga4gh:VA.Otc5ovrw906Ack087o1fhegB4jDRqCAe', name='sample_id', value='SAMPLE-001')]
>>> av.get_object_extensions(genomic_allele.id, "sample_id")
[Extension(object_id='ga4gh:VA.Otc5ovrw906Ack087o1fhegB4jDRqCAe', name='sample_id', value='SAMPLE-001')]
See here for more information about object extensions in AnyVar.
VCF Ingest and Annotation
AnyVar can consume a Variant Call Format (VCF) file, register all contained variants, and return a copy annotated with variant IDs for later lookup.
>>> from anyvar.vcf.ingest import VcfRegistrar
>>> from pathlib import Path
>>> registrar = VcfRegistrar(data_proxy=av.translator.dp, av=av)
>>> registrar.annotate(Path("my_vcf.vcf"), Path("out.vcf"))