streq package
Submodules
streq.circular module
Classes relating to circular strings.
- class streq.circular.Circular[source]
Bases:
strA string-like object which can be circularly sliced.
Useful for sequences that represent a bacterial genome or a plasmid.
Currently this only works if the length of the slice is shorter than the sequence’s total length.
Examples
>>> Circular('ATCG')[:3] 'ATC' >>> Circular('ATCG')[-1:3] 'GATC' >>> from streq import reverse_complement >>> reverse_complement(Circular('ATCg')) 'cGAT'
streq.distance module
- streq.distance.correlation(x: str, y: str = '', wobble: bool = False) float[source]
Get autocorrelation of a single sequence or correlation between two sequences.
If a single sequence is provided, it’s autocorrelation is calculated, which might be an indicator of secondary structure.
Note: the wobble parameter is not yet implemented.
- Parameters:
x (str) – Sequence.
y (str, optional) – Second sequence for correlation with x.
wobble (bool, optional) – Whether to calulate correlations taking into account G.U wobble. Not yet implemented.
- Returns:
Correlation.
- Return type:
float
Examples
>>> correlation('AACC') 0.0 >>> correlation('AAATTT') 2.3 >>> correlation('AAATTCT') 1.3047619047619046 >>> correlation('AAACTTT') 1.9238095238095236 >>> correlation('AAA', 'TTT') 0.0 >>> correlation('AAA', 'AAA') 3.0
- streq.distance.hamming(x: str, y: str, wobble: bool = False) int[source]
Calculate the Hamming distance between two sequences.
The Hamming distance is the number of mismatches.
Note: the wobble parameter is not yet implemented.
- Parameters:
x (str) – Sequence.
y (str, optional) – Second sequence for correlation with x.
wobble (bool, optional) – Whether to calulate correlations taking into account G.U wobble. Not yet implemented.
- Returns:
Hamming distance.
- Return type:
int
Examples
>>> hamming('AAA', 'ATA') 1 >>> hamming('AAA', 'ATT') 2 >>> hamming('AAA', 'TTT') 3
- streq.distance.levenshtein(x: str, y: str) int[source]
Calculate the Levenshtein distance between two sequences.
The Levenshtein distance is the number of insertions, deletions, and mutations required to make two sequences match.
- Parameters:
x (str) – Sequence.
y (str, optional) – Second sequence for correlation with x.
- Returns:
Levenshtein distance.
- Return type:
int
Examples
>>> levenshtein('AAATTT', 'AAATTT') 0 >>> levenshtein('AAATTT', 'ACTTT') 2 >>> levenshtein('AAATTT', 'AACTTT') 1 >>> levenshtein('AAAG', 'TCGA') 4
- streq.distance.ratcliff_obershelp(x: str, y: str) int[source]
Calculate the Ratcliff-Obershelp distance between two sequences.
- The Ratcliff-Obershelp distance is the number of grouped
insertions, deletions, and mutations required to make two
sequences match.
- Parameters:
x (str) – Sequence.
y (str, optional) – Second sequence for correlation with x.
- Returns:
Ratcliff-Obershelp distance.
- Return type:
int
Examples
>>> ratcliff_obershelp('AAATTT', 'AAATTT') 0 >>> ratcliff_obershelp('AAATTT', 'ACTTT') 1 >>> ratcliff_obershelp('AAATTT', 'AACTTT') 1 >>> ratcliff_obershelp('AAAG', 'TCGA') 2
streq.seqtools module
Python utilities for working with nucleotide sequence strings.
Variety of utilities for converting, searching, and doing calculations on nucleotide sequences.
- streq.seqtools.complement(x: str) str[source]
Complement (but don’t reverse) a sequence.
- Parameters:
x (str) – Sequence to convert.
- Returns:
Converted sequence.
- Return type:
str
Note: Preserves case.
Note: Preserves circularity.
- streq.seqtools.count_re_sites(x: str) bool[source]
Count Type IIS restriction sites in sequence.
Currently only searches for the most commonly used Type IIS restriction sites for Golden Gate Cloning:
BbsI: GAAGAC BsmBI: CGTCTC BtgZI: GCGATG PaqCI: CACCTGC SapI: GCTCTTC BsaI: GGTCTC
- Parameters:
x (str) – Sequence to check.
- Returns:
Number of Type IIS restriction sites in x.
- Return type:
int
Examples
>>> count_re_sites('AAAGAAG') 0 >>> count_re_sites('AAAGAAGAC') 1 >>> count_re_sites('AAAGAAGACACCTGC') 2
- streq.seqtools.find_iupac(query: str, sequence: str) Generator[Sequence[int], str][source]
Find occurrences of a query in a larger sequence.
IUPAC codes in the query will be interpreted as ambiguities:
A: A C: C G: G T: T U: U N: . R: “[AG]” Y: “[TUC]” W: “[ATU]” S: “[CG]” V: “[ACG]” B: “[TUGC]”
- Parameters:
query (str) – Sequence to search for. Accepts IUPAC codes: N, R, Y, S, W, V, B.
sequence (str) – Sequence to search within.
- Yields:
Generator – Generator of tuples containing the match indices and matched sequence.
indices (tuple) – Start and stop indices of the match
sequence (str) – matched sequence
Examples
>>> for (start_idx, end_idx), match in find_iupac('ARY', 'AATAGCAGTGTGAAC'): ... print(f"Found ARY at {start_idx}:{end_idx}: {match}") ... Found ARY at 0:3: AAT Found ARY at 3:6: AGC Found ARY at 6:9: AGT Found ARY at 12:15: AAC
- streq.seqtools.gc_content(x: str) float[source]
Calculate proportional GC content.
Recognises IUPAC codes.
- Parameters:
x (str) – Sequence.
- Returns:
GC content.
- Return type:
float
Examples
>>> gc_content('AGGG') 0.75
- streq.seqtools.purine_content(x: str) float[source]
Calculate proportional purine content.
Recognises IUPAC codes.
- Parameters:
x (str) – Sequence.
- Returns:
Purine content.
- Return type:
float
Examples
>>> purine_content('AUGGR') 0.8
- streq.seqtools.pyrimidine_content(x: str) float[source]
Calculate proportional pyrimidine content.
Recognises IUPAC codes.
- Parameters:
x (str) – Sequence.
- Returns:
Pyrimidine content.
- Return type:
float
Examples
>>> pyrimidine_content('AUGGG') 0.2
- streq.seqtools.reverse(x: str) str[source]
Reverse a sequence.
- Parameters:
x (str) – Sequence to convert.
- Returns:
Converted sequence.
- Return type:
str
Note: Preserves circularity.
- streq.seqtools.reverse_complement(x: str) str[source]
Reverse complement a sequence.
- Parameters:
x (str) – Sequence to convert.
- Returns:
Converted sequence.
- Return type:
str
Examples
>>> reverse_complement('ATCG') 'CGAT'
- streq.seqtools.to_dna(x: str) str[source]
Convert nucleotides to DNA.
- Parameters:
x (str) – Sequence to convert.
- Returns:
Converted sequence.
- Return type:
str
Examples
>>> to_dna('AUCG') 'ATCG'
Note: Preserves case.
Note: Preserves circularity.
- streq.seqtools.to_rna(x: str) str[source]
Convert nucleotides to RNA.
- Parameters:
x (str) – Sequence to convert.
- Returns:
Converted sequence.
- Return type:
str
Examples
>>> to_rna('ATCG') 'AUCG'
Note: Preserves case.
Note: Preserves circularity.
- streq.seqtools.which_re_sites(x: str) Sequence[str][source]
List Type IIS restriction sites in sequence.
Currently only searches for the most commonly used Type IIS restriction sites for Golden Gate Cloning:
BbsI: GAAGAC BsmBI: CGTCTC BtgZI: GCGATG PaqCI: CACCTGC SapI: GCTCTTC BsaI: GGTCTC
- Parameters:
x (str) – Sequence to check.
- Returns:
List of Type IIS restriction sites in x
- Return type:
tuple
Examples
>>> which_re_sites('AAAGAAG') () >>> which_re_sites('AAAGAAGAC') ('BbsI',) >>> which_re_sites('AAAGAAGACACCTGC') ('BbsI', 'PaqCI')
streq.utils module
Miscellaneous utilities used in streq.
- class streq.utils.SequenceCollection(complementer, re_sites, DNA, RNA, base2regex, PAMs)
Bases:
tuple- DNA
Alias for field number 2
- PAMs
Alias for field number 5
- RNA
Alias for field number 3
- base2regex
Alias for field number 4
- complementer
Alias for field number 0
- re_sites
Alias for field number 1