Skip to content

Commit

Permalink
Typos workflow
Browse files Browse the repository at this point in the history
  • Loading branch information
cmdcolin committed Jan 27, 2023
1 parent d758d3c commit c5d5278
Show file tree
Hide file tree
Showing 10 changed files with 18 additions and 12 deletions.
4 changes: 2 additions & 2 deletions CRAMv2.1.tex
Original file line number Diff line number Diff line change
Expand Up @@ -694,7 +694,7 @@ \subsubsection*{Encoding tag values}
keys composed of the two letter tag abbreviation followed by the tag type as defined
in the SAM specification, for example `OQZ' for `OQ:Z'. The three bytes form a
big endian integer and are written as ITF8. For example, 3-byte representation
of OQ:Z is \{0x4F, 0x51, 0x5A\} and these bytes are intepreted as the integer 0x004F515A.
of OQ:Z is \{0x4F, 0x51, 0x5A\} and these bytes are interpreted as the integer 0x004F515A.
The integer is finally written as ITF8.

\begin{tabular}{|l|l|l|>{\raggedright}p{160pt}|}
Expand Down Expand Up @@ -1640,7 +1640,7 @@ \subsubsection*{BYTE\_ARRAY\_LEN }

\subsubsection*{BYTE\_ARRAY\_STOP }

Byte arrays are captured as a sequence of bytes teminated by a special stop byteFor
Byte arrays are captured as a sequence of bytes terminated by a special stop byteFor
example this could be a golomb encoding. The parameter for BYTE\_ARRAY\_STOP are
listed below:

Expand Down
2 changes: 1 addition & 1 deletion CRAMv3.tex
Original file line number Diff line number Diff line change
Expand Up @@ -850,7 +850,7 @@ \subsubsection*{Tag values}
The encodings used for different tags are stored in a map.
The key is 3 bytes formed from the BAM tag id and type code, matching the TD dictionary described above.
Unlike the Data Series Encoding Map, the key is stored in the map as an ITF8 encoded integer, constructed using $(char1<<16) + (char2<<8) + type$.
For example, the 3-byte representation of OQ:Z is \{0x4F, 0x51, 0x5A\} and these bytes are intepreted as the integer key 0x004F515A, leading to an ITF8 byte stream \{0xE0, 0x4F, 0x51, 0x5A\}.
For example, the 3-byte representation of OQ:Z is \{0x4F, 0x51, 0x5A\} and these bytes are interpreted as the integer key 0x004F515A, leading to an ITF8 byte stream \{0xE0, 0x4F, 0x51, 0x5A\}.

\begin{tabular}{|l|l|l|>{\raggedright}p{160pt}|}
\hline
Expand Down
2 changes: 1 addition & 1 deletion SAMtags.tex
Original file line number Diff line number Diff line change
Expand Up @@ -494,7 +494,7 @@ \subsection{Base modifications}

Following the base modification codes is a recommended but optional `{\tt .}' or `{\tt ?}' describing how skipped seq bases of the stated base type should be interpreted by downstream tools.
When this flag is `{\tt ?}' there is no information about the modification status of the skipped bases provided.
When this flag is not present, or it is `{\tt .}', these bases should be assumed to have low probability of modification.\footnote{The decision whether a base is assumed to be unmodified or has a probability explicitly provided is up to the modification calling program. Some programs will elide calls with modification probabilites below a threshold to provide a more compact modification tag.}
When this flag is not present, or it is `{\tt .}', these bases should be assumed to have low probability of modification.\footnote{The decision whether a base is assumed to be unmodified or has a probability explicitly provided is up to the modification calling program. Some programs will elide calls with modification probabilities below a threshold to provide a more compact modification tag.}

This is then followed by a comma separated list of how many seq bases of the stated base type to skip, stored as a delta to the last and starting with 0 as the first (or next) base, starting from the uncomplemented 5' end of the {\sf SEQ} field.
This number series is comparable to the numbers in an {\tt MD} tag,
Expand Down
6 changes: 6 additions & 0 deletions _typos.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
[default.extend-words]
FO="FO"
BA="BA"
nd="nd"
Hsi="Hsi"
Apon="Apon"
2 changes: 1 addition & 1 deletion crypt4gh.tex
Original file line number Diff line number Diff line change
Expand Up @@ -271,7 +271,7 @@ \subsection{File Structure}
\draw (header packet.four split south) to (data encryption packet.north west);
\draw (header packet.five split south) to (data encryption packet.north east);
\node (data encryption packet notes) at (data encryption packet -| file notes) [notes] {
\textbf{Data Encyption Packet (plain-text)} \\
\textbf{Data Encryption Packet (plain-text)} \\
Stores $K_{data}$
};

Expand Down
2 changes: 1 addition & 1 deletion refget.md
Original file line number Diff line number Diff line change
Expand Up @@ -571,7 +571,7 @@ Key to generating reproducible checksums is the normalisation algorithm applied
- VMC
- VMC requires sequence to be a string of IUPAC codes for either nucelotide or protein sequence

Considering the requirements of the three systems the specification designers felt it was sufficient to restrict input to the inclusive range `65` (`0x41`/`A`) to `90` (`0x5A`/`Z`). Changes to this normalisation algorthim would require a new checksum identifier to be used.
Considering the requirements of the three systems the specification designers felt it was sufficient to restrict input to the inclusive range `65` (`0x41`/`A`) to `90` (`0x5A`/`Z`). Changes to this normalisation algorithm would require a new checksum identifier to be used.

### Checksum Choice

Expand Down
2 changes: 1 addition & 1 deletion test/SAMtags/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ Mm and Ml auxiliary tags
========================

The purpose of these test files is to test parsing of the Mm and Ml
tags. These succint Mm and Ml tags are present in the .sam files,
tags. These succinct Mm and Ml tags are present in the .sam files,
with a more human readable expanded form in the .txt files.
Developers should check whether their implementation is able to
convert between the two forms.
Expand Down
2 changes: 1 addition & 1 deletion test/SAMtags/parse_mm.pl
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ sub rc {

my $i = 0; # I^{th} bosition in sequence
foreach my $delta (split(",", $pos)) {
# Skip $delta occurences of $base
# Skip $delta occurrences of $base
do {
$delta-- if ($base eq "N" || $base eq $seq[$i]);
$i++;
Expand Down
4 changes: 2 additions & 2 deletions test/sam/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -234,7 +234,7 @@ CIGAR
- Reads entirely consisting of insertions (no bases on ref)
- At pos 1; every base is prior to start of ref
- Neighbouring matching ops, eg 1D1D, 10M10M
- (Cicular genomes? needs more work.)
- (Circular genomes? needs more work.)
- Very large CIGAR strings (BAM has a 64K limit so tools that parse
SAM into in-memory BAM may fail).

Expand Down Expand Up @@ -403,7 +403,7 @@ Aux
- General syntax
- Other types (including case change variants of above; I, z, etc)
- Aux tag not 2 chars
- Aux tag occuring multiple times
- Aux tag occurring multiple times


Todo
Expand Down
4 changes: 2 additions & 2 deletions test/sam/compare_sam.pl
Original file line number Diff line number Diff line change
Expand Up @@ -85,8 +85,8 @@
# Validate MD and NM only if partialmd & 'file' set, otherwise
# discard it. Ie:
#
# 1: if file 1 has NM/MD keep in file 2, othewise discard from file2
# 2: if file 2 has NM/MD keep in file 1, othewise discard from file1
# 1: if file 1 has NM/MD keep in file 2, otherwise discard from file2
# 2: if file 2 has NM/MD keep in file 1, otherwise discard from file1
# 3: if file 1 and file 2 both have NM/MD keep, otherwise discard.
if (exists $opts{partialmd}) {
if ($opts{partialmd} & 2) {
Expand Down

0 comments on commit c5d5278

Please sign in to comment.