Codon optimization for DNA synthesis

A set of scripts reliant on BioPython and DNAChisel to perform codon optimization with the dual objectives of efficent translation and efficent DNA synthesis. Sequences are optimized towards the fastest translating codons while avoiding common motifs that inhibit DNA synthesis.

Avoided motifs include:

Repeated K-mers
Homopolymers
Hairpins

Environment

conda install -c bioconda dnachisel biopython

Workflow

Start with a fasta file of coding sequences to be optimized and run codonOptimizeSynthesis.py. Adjust the weights and thresholds as necessary to acheive the desired result.

usage: codonOptimizeSynthesis.py [options] input_file

positional arguments:
  input_file            Fasta file to be optimized

options:
  -h, --help            show this help message and exit
  -o outfile            Output file name
  -s species            Species codon usage to optimize towards ['A_thaliana', 'G_max']
  --rscu RSCU           JSON file containing RSCU values
  --rscuGenerate CDS.fasta
                        Generate rscu values from CDS fasta file
  --topCodonWeight TOPCODONWEIGHT
                        Adjust weight optimizing towards highest frequency codon (default: 25)
  --rareCodonThreshold RARECODONTHRESHOLD
                        Attempt to avoid codons with frequencies below this number (default: 0.1)
  --rareCodonWeight RARECODONWEIGHT
                        Adjust weight for avoiding rare codons (default: 500)
  --repeatedKmerWeight REPEATEDKMERWEIGHT
                        Adjust weight for avoiding repeated kmers (default: 100)
  --repeatedHomopolymerWeight REPEATEDHOMOPOLYMERWEIGHT
                        Adjust weight for avoiding repeated homopolymers (default: 50)
  --hairpinWeight HAIRPINWEIGHT
                        Adjust weight for avoiding hairpins (default: 1000)
  --uniquifyKmersWeight UNIQUIFYKMERSWEIGHT
                        Adjust weight for avoiding kmer repeats of 8 bases and longer (default: 1000)

Next, you can verify your optimized sequences using checkCodons.py. This will print out a report showing the original and optimized protein alignment and the usage of top codons over the coding sequence.

checkCodons.py [-h] [-s species] original optimized


positional arguments:
  original    Original fasta file
  optimized   Optimized fasta file

options:
  -h, --help  show this help message and exit
  -s species  Species codon usage to use ['A_thaliana', 'G_max']

Codon Usage Tables

Codon usage tables for A. thaliana and Glycine max (soybean) are provided in the data/ folder. The values represent fractions among synonymous codons. You can provide your own codon usage table via the --rscu argument.

Note: Many measures of codon translation efficiency exist and there may be a more effective formulation for your use case... These include (CAI, ENC, tAI, expression normalized tAI, etc.)

Arabidopsis: https://www.kazusa.or.jp/codon/cgi-bin/showcodon.cgi?species=3702&aa=1&style=N

Soybean https://www.kazusa.or.jp/codon/cgi-bin/showcodon.cgi?species=3847&aa=1&style=N

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
checkCodons.py		checkCodons.py
codonOptimize.py		codonOptimize.py
codonOptimizeSynthesis.py		codonOptimizeSynthesis.py
codonUsageCalculations.py		codonUsageCalculations.py
compareRscuSeqs.R		compareRscuSeqs.R
extractCisElements.py		extractCisElements.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Codon optimization for DNA synthesis

Environment

Workflow

Codon Usage Tables

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

JohnnyLomas/Codon_Optimization_synDNA

Folders and files

Latest commit

History

Repository files navigation

Codon optimization for DNA synthesis

Environment

Workflow

Codon Usage Tables

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages