streq package
Submodules
streq.circular module
Classes relating to circular strings.
- class streq.circular.Circular[source]
Bases:
strA string-like object which can be circularly sliced.
Useful for sequences that represent a bacterial genome or a plasmid.
Currently this only works if the length of the slice is shorter than the sequence’s total length.
Examples
>>> Circular('ATCG')[:3] 'ATC' >>> Circular('ATCG')[-1:3] 'GATC' >>> from streq import reverse_complement >>> reverse_complement(Circular('ATCg')) 'cGAT'
streq.distance module
Functions for calculating distances and similarities between sequences.
- streq.distance.correlation(x: str, y: str = '', wobble: bool = False) float[source]
Calculate autocorrelation of a single sequence or correlation between two sequences.
If a single sequence is provided, its correlation with its reverse complement is calculated, which might be an indicator of secondary structure.
If two sequences are provided, then the correlation between the first sequence and the reverse complement of the second sequence is calculated. This might be an indicator of binding affinity.
If wobble is True, then the G.U basepairing is also taken into account.
- Parameters:
x (str) – Sequence.
y (str, optional) – Second sequence for correlation with x.
wobble (bool, optional) – Whether to calulate correlations taking into account G.U wobble.
- Returns:
Correlation.
- Return type:
float
Examples
>>> correlation('AACC') 0.0 >>> correlation('AAATTT') 2.3 >>> correlation('AAATTCT') 1.3047619047619046 >>> correlation('AAACTTT') 1.9238095238095236 >>> correlation('AAA', 'TTT') 3.0 >>> correlation('AAA', 'AAA') 0.0 >>> correlation('GGGTTT') 0.0 >>> correlation('GGGTTT', wobble=True) 2.3 >>> correlation('GGGUUU', wobble=True) 2.3 >>> correlation('GGG', 'UUU') 0.0 >>> correlation('GGG', 'UUU', wobble=True) 3.0
- streq.distance.hamming(x: str, y: str) int[source]
Calculate the Hamming distance between two sequences.
The Hamming distance is the number of mismatches for two sequences of identical length. This function truncates the longer sequence to the shortest length.
- Parameters:
x (str) – Sequence.
y (str, optional) – Second sequence for correlation with x.
- Returns:
Hamming distance.
- Return type:
int
Examples
>>> hamming('AAA', 'ATA') 1 >>> hamming('AAA', 'ATT') 2 >>> hamming('AAA', 'TTT') 3
- streq.distance.levenshtein(x: str, y: str) int[source]
Calculate the Levenshtein distance between two sequences.
The Levenshtein distance is the number of insertions, deletions, and mutations required to make two sequences match.
- Parameters:
x (str) – Sequence.
y (str, optional) – Second sequence for correlation with x.
- Returns:
Levenshtein distance.
- Return type:
int
Examples
>>> levenshtein('AAATTT', 'AAATTT') 0 >>> levenshtein('AAATTT', 'ACTTT') 2 >>> levenshtein('AAATTT', 'AACTTT') 1 >>> levenshtein('AAAG', 'TCGA') 4
- streq.distance.ratcliff_obershelp(x: str, y: str) int[source]
Calculate the Ratcliff-Obershelp distance between two sequences.
- The Ratcliff-Obershelp distance is the number of grouped
insertions, deletions, and mutations required to make two
sequences match.
- Parameters:
x (str) – Sequence.
y (str, optional) – Second sequence for correlation with x.
- Returns:
Ratcliff-Obershelp distance.
- Return type:
int
Examples
>>> ratcliff_obershelp('AAATTT', 'AAATTT') 0 >>> ratcliff_obershelp('AAATTT', 'ACTTT') 1 >>> ratcliff_obershelp('AAATTT', 'AACTTT') 1 >>> ratcliff_obershelp('AAAG', 'TCGA') 2
streq.seqtools module
Python utilities for working with nucleotide sequence strings.
Variety of utilities for converting, searching, and doing calculations on nucleotide sequences.
- streq.seqtools.complement(x: str) str[source]
Complement (but don’t reverse) a sequence.
- Parameters:
x (str) – Sequence to convert.
- Returns:
Converted sequence.
- Return type:
str
Note: Preserves case.
Note: Preserves circularity.
- streq.seqtools.count_re_sites(x: str) bool[source]
Count Type IIS restriction sites in sequence.
Currently only searches for the most commonly used Type IIS restriction sites for Golden Gate Cloning:
BbsI: GAAGAC BsmBI: CGTCTC BtgZI: GCGATG PaqCI: CACCTGC SapI: GCTCTTC BsaI: GGTCTC
- Parameters:
x (str) – Sequence to check.
- Returns:
Number of Type IIS restriction sites in x.
- Return type:
int
Examples
>>> count_re_sites('AAAGAAG') 0 >>> count_re_sites('AAAGAAGAC') 1 >>> count_re_sites('AAAGAAGACACCTGC') 2
- streq.seqtools.find_iupac(query: str, sequence: str) Generator[Sequence[int], str][source]
Find occurrences of a query in a larger sequence.
IUPAC codes in the query will be interpreted as ambiguities:
A: A C: C G: G T: T U: U N: . R: “[AG]” Y: “[TUC]” W: “[ATU]” S: “[CG]” V: “[ACG]” B: “[TUGC]”
- Parameters:
query (str) – Sequence to search for. Accepts IUPAC codes: N, R, Y, S, W, V, B.
sequence (str) – Sequence to search within.
- Yields:
Generator – Generator of tuples containing the match indices and matched sequence.
indices (tuple) – Start and stop indices of the match
sequence (str) – matched sequence
Examples
>>> for (start_idx, end_idx), match in find_iupac('ARY', 'AATAGCAGTGTGAAC'): ... print(f"Found ARY at {start_idx}:{end_idx}: {match}") ... Found ARY at 0:3: AAT Found ARY at 3:6: AGC Found ARY at 6:9: AGT Found ARY at 12:15: AAC
- streq.seqtools.gc_content(x: str) float[source]
Calculate proportional GC content.
Recognises IUPAC codes.
- Parameters:
x (str) – Sequence.
- Returns:
GC content.
- Return type:
float
Examples
>>> gc_content('AGGG') 0.75
- streq.seqtools.purine_content(x: str) float[source]
Calculate proportional purine content.
Recognises IUPAC codes.
- Parameters:
x (str) – Sequence.
- Returns:
Purine content.
- Return type:
float
Examples
>>> purine_content('AUGGR') 0.8
- streq.seqtools.pyrimidine_content(x: str) float[source]
Calculate proportional pyrimidine content.
Recognises IUPAC codes.
- Parameters:
x (str) – Sequence.
- Returns:
Pyrimidine content.
- Return type:
float
Examples
>>> pyrimidine_content('AUGGG') 0.2
- streq.seqtools.reverse(x: str) str[source]
Reverse a sequence.
- Parameters:
x (str) – Sequence to convert.
- Returns:
Converted sequence.
- Return type:
str
Note: Preserves circularity.
- streq.seqtools.reverse_complement(x: str) str[source]
Reverse complement a sequence.
- Parameters:
x (str) – Sequence to convert.
- Returns:
Converted sequence.
- Return type:
str
Examples
>>> reverse_complement('ATCG') 'CGAT'
- streq.seqtools.to_dna(x: str) str[source]
Convert nucleotides to DNA.
- Parameters:
x (str) – Sequence to convert.
- Returns:
Converted sequence.
- Return type:
str
Examples
>>> to_dna('AUCG') 'ATCG'
Note: Preserves case.
Note: Preserves circularity.
- streq.seqtools.to_rna(x: str) str[source]
Convert nucleotides to RNA.
- Parameters:
x (str) – Sequence to convert.
- Returns:
Converted sequence.
- Return type:
str
Examples
>>> to_rna('ATCG') 'AUCG'
Note: Preserves case.
Note: Preserves circularity.
- streq.seqtools.which_re_sites(x: str) Sequence[str][source]
List Type IIS restriction sites in sequence.
Currently only searches for the most commonly used Type IIS restriction sites for Golden Gate Cloning:
BbsI: GAAGAC BsmBI: CGTCTC BtgZI: GCGATG PaqCI: CACCTGC SapI: GCTCTTC BsaI: GGTCTC
- Parameters:
x (str) – Sequence to check.
- Returns:
List of Type IIS restriction sites in x
- Return type:
tuple
Examples
>>> which_re_sites('AAAGAAG') () >>> which_re_sites('AAAGAAGAC') ('BbsI',) >>> which_re_sites('AAAGAAGACACCTGC') ('BbsI', 'PaqCI')
streq.utils module
Miscellaneous utilities used in streq.
- class streq.utils.SequenceCollection(complementer, re_sites, DNA, RNA, base2regex, PAMs)
Bases:
tuple- DNA
Alias for field number 2
- PAMs
Alias for field number 5
- RNA
Alias for field number 3
- base2regex
Alias for field number 4
- complementer
Alias for field number 0
- re_sites
Alias for field number 1