streq package

Submodules

streq.circular module

Classes relating to circular strings.

class streq.circular.Circular[source]

Bases: str

A string-like object which can be circularly sliced.

Useful for sequences that represent a bacterial genome or a plasmid.

Currently this only works if the length of the slice is shorter than the sequence’s total length.

__getitem__()[source]: Slice string as if it were a circle.

Examples

>>> Circular('ATCG')[:3]
'ATC'
>>> Circular('ATCG')[-1:3]
'GATC'
>>> from streq import reverse_complement
>>> reverse_complement(Circular('ATCg'))
'cGAT'

streq.distance module

streq.distance.correlation(x: str, y: str = '', wobble: bool = False) → float[source]

Get autocorrelation of a single sequence or correlation between two sequences.

If a single sequence is provided, it’s autocorrelation is calculated, which might be an indicator of secondary structure.

Note: the wobble parameter is not yet implemented.

Parameters:

x (str) – Sequence.
y (str, optional) – Second sequence for correlation with x.
wobble (bool, optional) – Whether to calulate correlations taking into account G.U wobble. Not yet implemented.

Returns:

Correlation.

Return type:

float

Examples

>>> correlation('AACC')
0.0
>>> correlation('AAATTT')
2.3
>>> correlation('AAATTCT')
1.3047619047619046
>>> correlation('AAACTTT')
1.9238095238095236
>>> correlation('AAA', 'TTT')
0.0
>>> correlation('AAA', 'AAA')
3.0

streq.distance.hamming(x: str, y: str, wobble: bool = False) → int[source]

Calculate the Hamming distance between two sequences.

The Hamming distance is the number of mismatches.

Note: the wobble parameter is not yet implemented.

Parameters:

x (str) – Sequence.
y (str, optional) – Second sequence for correlation with x.
wobble (bool, optional) – Whether to calulate correlations taking into account G.U wobble. Not yet implemented.

Returns:

Hamming distance.

Return type:

int

Examples

>>> hamming('AAA', 'ATA')
1
>>> hamming('AAA', 'ATT')
2
>>> hamming('AAA', 'TTT')
3

streq.distance.levenshtein(x: str, y: str) → int[source]

Calculate the Levenshtein distance between two sequences.

The Levenshtein distance is the number of insertions, deletions, and mutations required to make two sequences match.

Parameters:

x (str) – Sequence.
y (str, optional) – Second sequence for correlation with x.

Returns:

Levenshtein distance.

Return type:

int

Examples

>>> levenshtein('AAATTT', 'AAATTT')
0
>>> levenshtein('AAATTT', 'ACTTT')
2
>>> levenshtein('AAATTT', 'AACTTT')
1
>>> levenshtein('AAAG', 'TCGA')
4

streq.distance.ratcliff_obershelp(x: str, y: str) → int[source]

Calculate the Ratcliff-Obershelp distance between two sequences.

The Ratcliff-Obershelp distance is the number of grouped: insertions, deletions, and mutations required to make two

sequences match.

Parameters:

x (str) – Sequence.
y (str, optional) – Second sequence for correlation with x.

Returns:

Ratcliff-Obershelp distance.

Return type:

int

Examples

>>> ratcliff_obershelp('AAATTT', 'AAATTT')
0
>>> ratcliff_obershelp('AAATTT', 'ACTTT')
1
>>> ratcliff_obershelp('AAATTT', 'AACTTT')
1
>>> ratcliff_obershelp('AAAG', 'TCGA')
2

streq.seqtools module

Python utilities for working with nucleotide sequence strings.

Variety of utilities for converting, searching, and doing calculations on nucleotide sequences.

streq.seqtools.complement(x: str) → str[source]

Complement (but don’t reverse) a sequence.

Parameters:: x (str) – Sequence to convert.
Returns:: Converted sequence.
Return type:: str

Note: Preserves case.

Note: Preserves circularity.

streq.seqtools.count_re_sites(x: str) → bool[source]

Count Type IIS restriction sites in sequence.

Currently only searches for the most commonly used Type IIS restriction sites for Golden Gate Cloning:

BbsI: GAAGAC BsmBI: CGTCTC BtgZI: GCGATG PaqCI: CACCTGC SapI: GCTCTTC BsaI: GGTCTC

Parameters:: x (str) – Sequence to check.
Returns:: Number of Type IIS restriction sites in x.
Return type:: int

Examples

>>> count_re_sites('AAAGAAG')
0
>>> count_re_sites('AAAGAAGAC')
1
>>> count_re_sites('AAAGAAGACACCTGC')
2

streq.seqtools.find_iupac(query: str, sequence: str) → Generator[Sequence[int], str][source]

Find occurrences of a query in a larger sequence.

IUPAC codes in the query will be interpreted as ambiguities:

A: A C: C G: G T: T U: U N: . R: “[AG]” Y: “[TUC]” W: “[ATU]” S: “[CG]” V: “[ACG]” B: “[TUGC]”

Parameters:

query (str) – Sequence to search for. Accepts IUPAC codes: N, R, Y, S, W, V, B.
sequence (str) – Sequence to search within.

Yields:

Generator – Generator of tuples containing the match indices and matched sequence.
indices (tuple) – Start and stop indices of the match
sequence (str) – matched sequence

Examples

>>> for (start_idx, end_idx), match in find_iupac('ARY', 'AATAGCAGTGTGAAC'):
...     print(f"Found ARY at {start_idx}:{end_idx}: {match}")
...
Found ARY at 0:3: AAT
Found ARY at 3:6: AGC
Found ARY at 6:9: AGT
Found ARY at 12:15: AAC

streq.seqtools.gc_content(x: str) → float[source]

Calculate proportional GC content.

Recognises IUPAC codes.

Parameters:: x (str) – Sequence.
Returns:: GC content.
Return type:: float

Examples

>>> gc_content('AGGG')
0.75

streq.seqtools.purine_content(x: str) → float[source]

Calculate proportional purine content.

Recognises IUPAC codes.

Parameters:: x (str) – Sequence.
Returns:: Purine content.
Return type:: float

Examples

>>> purine_content('AUGGR')
0.8

streq.seqtools.pyrimidine_content(x: str) → float[source]

Calculate proportional pyrimidine content.

Recognises IUPAC codes.

Parameters:: x (str) – Sequence.
Returns:: Pyrimidine content.
Return type:: float

Examples

>>> pyrimidine_content('AUGGG')
0.2

streq.seqtools.reverse(x: str) → str[source]

Reverse a sequence.

Parameters:: x (str) – Sequence to convert.
Returns:: Converted sequence.
Return type:: str

Note: Preserves circularity.

streq.seqtools.reverse_complement(x: str) → str[source]

Reverse complement a sequence.

Parameters:: x (str) – Sequence to convert.
Returns:: Converted sequence.
Return type:: str

Examples

>>> reverse_complement('ATCG')
'CGAT'

streq.seqtools.to_dna(x: str) → str[source]

Convert nucleotides to DNA.

Parameters:: x (str) – Sequence to convert.
Returns:: Converted sequence.
Return type:: str

Examples

>>> to_dna('AUCG')
'ATCG'

Note: Preserves case.

Note: Preserves circularity.

streq.seqtools.to_rna(x: str) → str[source]

Convert nucleotides to RNA.

Parameters:: x (str) – Sequence to convert.
Returns:: Converted sequence.
Return type:: str

Examples

>>> to_rna('ATCG')
'AUCG'

Note: Preserves case.

Note: Preserves circularity.

streq.seqtools.which_re_sites(x: str) → Sequence[str][source]

List Type IIS restriction sites in sequence.

Currently only searches for the most commonly used Type IIS restriction sites for Golden Gate Cloning:

BbsI: GAAGAC BsmBI: CGTCTC BtgZI: GCGATG PaqCI: CACCTGC SapI: GCTCTTC BsaI: GGTCTC

Parameters:: x (str) – Sequence to check.
Returns:: List of Type IIS restriction sites in x
Return type:: tuple

Examples

>>> which_re_sites('AAAGAAG')
()
>>> which_re_sites('AAAGAAGAC')
('BbsI',)
>>> which_re_sites('AAAGAAGACACCTGC')
('BbsI', 'PaqCI')

streq.utils module

Miscellaneous utilities used in streq.

class streq.utils.SequenceCollection(complementer, re_sites, DNA, RNA, base2regex, PAMs)

Bases: tuple

DNA: Alias for field number 2

PAMs: Alias for field number 5

RNA: Alias for field number 3

base2regex: Alias for field number 4

complementer: Alias for field number 0

re_sites: Alias for field number 1

streq package

Submodules

streq.circular module

streq.distance module

streq.seqtools module

streq.utils module

Module contents