ECS 129: Computational Structural Bioinformatics

From the course description:

Fundamental biological, chemical and algorithmic models underlying computational structural biology; protein structure and nucleic acids structure; comparison of protein structures; protein structure prediction; molecular simulations; databases and online services in computational structural biology.

The final project in this class was to write a program that solved basic structural questions regarding proteins.

The first part of my program found the longest ‘gene’ (Open Reading Frames) in DNA base sequences (FASTA), transcribed them to MRNA sequences and translated them into amino acid sequences. The second part of my program translated amino acid symbol sequences to most probable and compressed codon sequences using GCG-format codon frequency tables.

This was the general program structure:

  1. Gene Finder
    a. Setup by ingesting DNA sequence
    b. Process sequence by finding longest ORF
    c. Output translated mRNA and amino acid sequence
  2. Reverse Translator
    a. Setup by ingesting codon frequency table
    b. Produce probability table for amino acid sequence
    c. Output most likely and compressed sequences

These are sample outputs for each part of the program:

Please read through the Project Writeup below for a full, detailed look at the project.

Project Writeup