Skip to content

Review APARENT predictions #346

@Hoeze

Description

@Hoeze

Currently, the APARENT dataloader gets the PolyA sites from the transcript GTF annotation:

def get_roi_from_transcript(transcript_start: int, transcript_end: int, is_on_negative_strand: bool) -> (int, int):
"""
Get region-of-interest for APARENT in relation to the 3'UTR of a transcript
:param transcript_start: 0-based start position of the transcript
:param transcript_end: 1-based end position of the transcript
:param is_on_negative_strand: is the gene on the negative strand?
:return: Tuple of (start, end) position for the region of interest
"""
# CSE should be roughly around position 70 of the 205bp sequence.
# Since CSE is likely 30bp upstream of the cut site, we shift the cut site
# by 100bp upstream and 105bp downstream
if is_on_negative_strand:
end = transcript_start + 100
# convert 0-based to 1-based
end += 1
start = end - 205
else:
start = transcript_end - 100
# convert 1-based to 0-based
start -= 1
end = start + 205
return start, end

@johli Is this a viable implementation of the strategy you explained here?
Do you maybe have some example data (vcf file + scores) against which we could compare the Kipoi predictions?

xref: #342
xref: johli/aparent#8

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions