Files
Abstract
Designing synthetic genes for heterologous expression is a keystone of synthetic biology. In protein sequences - as there are 61 sense codons but only 20 standard amino acids - most amino acids are encoded by more than one codon. Although such synonymous codons do not alter the encoded amino acid sequence, they are not redundant. By using certain codons over others, gene expression can be improved by up to 1000 times. Industry-standard codon optimization techniques based on biological indexes replace synonymous codons with the most abundant codon found in the host organism's genome. However, this technique may result in an imbalanced tRNA pool, metabolic stress, and translational error which lead to greater cell toxicity and reduced protein expression. In this research, recurrent neural networks are used to accurately capture sequential and contextual patterns. By predicting synonymous codons based on the sequential information of the host organism, protein expression can be increased while preventing translational error and plasmid toxicity.