streq package

Submodules

streq.circular module

Classes relating to circular strings.

class streq.circular.Circular[source]

Bases: str

A string-like object which can be circularly sliced.

Useful for sequences that represent a bacterial genome or a plasmid.

Currently this only works if the length of the slice is shorter than the sequence’s total length.

__getitem__()[source]

Slice string as if it were a circle.

Examples

>>> Circular('ATCG')[:3]
'ATC'
>>> Circular('ATCG')[-1:3]
'GATC'
>>> from streq import reverse_complement
>>> reverse_complement(Circular('ATCg'))
'cGAT'

streq.distance module

Functions for calculating distances and similarities between sequences.

streq.distance.correlation(x: str, y: str = '', wobble: bool = False) float[source]

Calculate autocorrelation of a single sequence or correlation between two sequences.

If a single sequence is provided, its correlation with its reverse complement is calculated, which might be an indicator of secondary structure.

If two sequences are provided, then the correlation between the first sequence and the reverse complement of the second sequence is calculated. This might be an indicator of binding affinity.

If wobble is True, then the G.U basepairing is also taken into account.

Parameters:
  • x (str) – Sequence.

  • y (str, optional) – Second sequence for correlation with x.

  • wobble (bool, optional) – Whether to calulate correlations taking into account G.U wobble.

Returns:

Correlation.

Return type:

float

Examples

>>> correlation('AACC')
0.0
>>> correlation('AAATTT')
2.3
>>> correlation('AAATTCT')
1.3047619047619046
>>> correlation('AAACTTT')
1.9238095238095236
>>> correlation('AAA', 'TTT')
3.0
>>> correlation('AAA', 'AAA')
0.0
>>> correlation('GGGTTT')
0.0
>>> correlation('GGGTTT', wobble=True)
2.3
>>> correlation('GGGUUU', wobble=True)
2.3
>>> correlation('GGG', 'UUU')
0.0
>>> correlation('GGG', 'UUU', wobble=True)
3.0
streq.distance.hamming(x: str, y: str) int[source]

Calculate the Hamming distance between two sequences.

The Hamming distance is the number of mismatches for two sequences of identical length. This function truncates the longer sequence to the shortest length.

Parameters:
  • x (str) – Sequence.

  • y (str, optional) – Second sequence for correlation with x.

Returns:

Hamming distance.

Return type:

int

Examples

>>> hamming('AAA', 'ATA')
1
>>> hamming('AAA', 'ATT')
2
>>> hamming('AAA', 'TTT')
3
streq.distance.levenshtein(x: str, y: str) int[source]

Calculate the Levenshtein distance between two sequences.

The Levenshtein distance is the number of insertions, deletions, and mutations required to make two sequences match.

Parameters:
  • x (str) – Sequence.

  • y (str, optional) – Second sequence for correlation with x.

Returns:

Levenshtein distance.

Return type:

int

Examples

>>> levenshtein('AAATTT', 'AAATTT')
0
>>> levenshtein('AAATTT', 'ACTTT')
2
>>> levenshtein('AAATTT', 'AACTTT')
1
>>> levenshtein('AAAG', 'TCGA')
4
streq.distance.mismatch_fun(x, y, n, wobble)[source]
streq.distance.ratcliff_obershelp(x: str, y: str) int[source]

Calculate the Ratcliff-Obershelp distance between two sequences.

The Ratcliff-Obershelp distance is the number of grouped

insertions, deletions, and mutations required to make two

sequences match.

Parameters:
  • x (str) – Sequence.

  • y (str, optional) – Second sequence for correlation with x.

Returns:

Ratcliff-Obershelp distance.

Return type:

int

Examples

>>> ratcliff_obershelp('AAATTT', 'AAATTT')
0
>>> ratcliff_obershelp('AAATTT', 'ACTTT')
1
>>> ratcliff_obershelp('AAATTT', 'AACTTT')
1
>>> ratcliff_obershelp('AAAG', 'TCGA')
2

streq.seqtools module

Python utilities for working with nucleotide sequence strings.

Variety of utilities for converting, searching, and doing calculations on nucleotide sequences.

streq.seqtools.complement(x: str) str[source]

Complement (but don’t reverse) a sequence.

Parameters:

x (str) – Sequence to convert.

Returns:

Converted sequence.

Return type:

str

Note: Preserves case.

Note: Preserves circularity.

streq.seqtools.count_re_sites(x: str) bool[source]

Count Type IIS restriction sites in sequence.

Currently only searches for the most commonly used Type IIS restriction sites for Golden Gate Cloning:

BbsI: GAAGAC BsmBI: CGTCTC BtgZI: GCGATG PaqCI: CACCTGC SapI: GCTCTTC BsaI: GGTCTC

Parameters:

x (str) – Sequence to check.

Returns:

Number of Type IIS restriction sites in x.

Return type:

int

Examples

>>> count_re_sites('AAAGAAG')
0
>>> count_re_sites('AAAGAAGAC')
1
>>> count_re_sites('AAAGAAGACACCTGC')
2
streq.seqtools.find_iupac(query: str, sequence: str) Generator[Sequence[int], str][source]

Find occurrences of a query in a larger sequence.

IUPAC codes in the query will be interpreted as ambiguities:

A: A C: C G: G T: T U: U N: . R: “[AG]” Y: “[TUC]” W: “[ATU]” S: “[CG]” V: “[ACG]” B: “[TUGC]”

Parameters:
  • query (str) – Sequence to search for. Accepts IUPAC codes: N, R, Y, S, W, V, B.

  • sequence (str) – Sequence to search within.

Yields:
  • Generator – Generator of tuples containing the match indices and matched sequence.

  • indices (tuple) – Start and stop indices of the match

  • sequence (str) – matched sequence

Examples

>>> for (start_idx, end_idx), match in find_iupac('ARY', 'AATAGCAGTGTGAAC'):
...     print(f"Found ARY at {start_idx}:{end_idx}: {match}")
...
Found ARY at 0:3: AAT
Found ARY at 3:6: AGC
Found ARY at 6:9: AGT
Found ARY at 12:15: AAC
streq.seqtools.gc_content(x: str) float[source]

Calculate proportional GC content.

Recognises IUPAC codes.

Parameters:

x (str) – Sequence.

Returns:

GC content.

Return type:

float

Examples

>>> gc_content('AGGG')
0.75
streq.seqtools.purine_content(x: str) float[source]

Calculate proportional purine content.

Recognises IUPAC codes.

Parameters:

x (str) – Sequence.

Returns:

Purine content.

Return type:

float

Examples

>>> purine_content('AUGGR')
0.8
streq.seqtools.pyrimidine_content(x: str) float[source]

Calculate proportional pyrimidine content.

Recognises IUPAC codes.

Parameters:

x (str) – Sequence.

Returns:

Pyrimidine content.

Return type:

float

Examples

>>> pyrimidine_content('AUGGG')
0.2
streq.seqtools.reverse(x: str) str[source]

Reverse a sequence.

Parameters:

x (str) – Sequence to convert.

Returns:

Converted sequence.

Return type:

str

Note: Preserves circularity.

streq.seqtools.reverse_complement(x: str) str[source]

Reverse complement a sequence.

Parameters:

x (str) – Sequence to convert.

Returns:

Converted sequence.

Return type:

str

Examples

>>> reverse_complement('ATCG')
'CGAT'
streq.seqtools.to_dna(x: str) str[source]

Convert nucleotides to DNA.

Parameters:

x (str) – Sequence to convert.

Returns:

Converted sequence.

Return type:

str

Examples

>>> to_dna('AUCG')
'ATCG'

Note: Preserves case.

Note: Preserves circularity.

streq.seqtools.to_rna(x: str) str[source]

Convert nucleotides to RNA.

Parameters:

x (str) – Sequence to convert.

Returns:

Converted sequence.

Return type:

str

Examples

>>> to_rna('ATCG')
'AUCG'

Note: Preserves case.

Note: Preserves circularity.

streq.seqtools.which_re_sites(x: str) Sequence[str][source]

List Type IIS restriction sites in sequence.

Currently only searches for the most commonly used Type IIS restriction sites for Golden Gate Cloning:

BbsI: GAAGAC BsmBI: CGTCTC BtgZI: GCGATG PaqCI: CACCTGC SapI: GCTCTTC BsaI: GGTCTC

Parameters:

x (str) – Sequence to check.

Returns:

List of Type IIS restriction sites in x

Return type:

tuple

Examples

>>> which_re_sites('AAAGAAG')
()
>>> which_re_sites('AAAGAAGAC')
('BbsI',)
>>> which_re_sites('AAAGAAGACACCTGC')
('BbsI', 'PaqCI')

streq.utils module

Miscellaneous utilities used in streq.

class streq.utils.SequenceCollection(complementer, re_sites, DNA, RNA, base2regex, PAMs)

Bases: tuple

DNA

Alias for field number 2

PAMs

Alias for field number 5

RNA

Alias for field number 3

base2regex

Alias for field number 4

complementer

Alias for field number 0

re_sites

Alias for field number 1

Module contents