How to get gff information for a specific SNP

Let's say we're interested in getting information about a SNP located on chromosome 2R at position 28,492,890.

In [45]:
# import modules
import allel
import pandas

I downloaded the An. gmabiae gff from VectorBase here

In [46]:
gff_location = '/home/sean/Downloads/Anopheles-gambiae-PEST_BASEFEATURES_AgamP4.4.gff3.gz'
In [47]:
def get_gff_info(position, chrom, gff_path):
    """This return returns all information for a given
    position on a given chromosome where the supplied gff
    has information for it"""
    
    # Thanks to Alistair for this conversion function
    # (http://alimanfoo.github.io/2017/01/25/vgsc-gene-models.html)
    # slightly altered to account for deprecated methods
    def geneset_to_pandas(geneset):
        """Life is a bit easier when a geneset is a pandas DataFrame."""
        items = []
        for n in geneset.dtype.names:
            v = geneset[n]
            # convert bytes columns to unicode (which pandas then converts to object)
            if v.dtype.kind == 'S':
                v = v.astype('U')
            items.append((n, v))
        return pandas.DataFrame.from_dict(dict(items))

    gff = allel.FeatureTable.from_gff3(gff_path,
                                       attributes=['ID', 'Parent'])

    gff = geneset_to_pandas(gff)
    
    _bool = gff.apply(lambda row: position in range(
        row['start'], row['end']) and row['seqid'] == chrom, axis=1)
    
    result = gff[_bool]
    
    return result
In [48]:
chr2R_28492890 = get_gff_info(
    position=28492890, chrom='2R', gff_path=gff_location)
In [49]:
chr2R_28492890
Out[49]:
seqid source type start end score strand phase ID Parent
42235 2R VectorBase chromosome 1 61545105 -1.0 . -1 2R .
68752 2R VectorBase gene 28420677 28511124 -1.0 + -1 AGAP002859 .
68753 2R VectorBase mRNA 28420677 28511124 -1.0 + -1 AGAP002859-RA AGAP002859
68801 2R VectorBase gene 28491415 28493141 -1.0 - -1 AGAP002865 .
68802 2R VectorBase mRNA 28491415 28493141 -1.0 - -1 AGAP002865-RA AGAP002865
68806 2R VectorBase exon 28492028 28493141 -1.0 - -1 . AGAP002865-RA
68807 2R VectorBase CDS 28492028 28493141 -1.0 - 0 AGAP002865-PA AGAP002865-RA

Hope that's of some use to someone. If there's an actual implemented function for this, please do let me know.

Happy coding,

Sean