Position and interval notation #46
h-2
started this conversation in
Design Questions
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
There are typically two notations:
[beg, end)
[beg, end]
Some formats use 1., others use 2.
We would like to use 1. everywhere, but this means that when reading a SAM or VCF record, the reported
.pos()
will different from the position in the plaintext file and the position shown bysamtools
/bcftools
. That's why I initially proposed to use whatever the common formats use, even if this is inconsistent within the library.However, with the introduction of
bio::genome_region
and subregion reading, this becomes more difficult. The algorithm to compute an overlap, for example, needs to know whether the interval is half-open or closed. It's possible to add a template parameter togenome_region
that differentiates between half-open and closed, but this adds complexity 😒I see three solutions:
genome_region
. Document everywhere which notation is used.Beta Was this translation helpful? Give feedback.
All reactions