Characterization of Opsins from the Registry

Author

Samuel Ortion

Published

September 5, 2023

Goal: Characterize a bit some opsins part from the iGEM registry: 1. PDB files with a model of their protein’s 3D structure (using AlphaFold2/ColabFold)

Search for opsin in the iGEM registry

  • BBa_K317000: http://parts.igem.org/Part:BBa_K317000
    • Downloaded the sequence from this page in ./data/opsins/registry/BBa_K317000.fna Unfortunately, this was a dna sequence only. The dna sequence corresponds however to the coding sequence of the associated peptid:

DNA to Peptid Sequence using EMBOSS/transeq

Using mamba to install the dependencies:

environment.yml
name: transeq
channels:
  - bioconda
  - conda-forge
  - defaults
dependencies:
  - bioconda::emboss
  - conda-forge::libiconv

Run the following in a terminal:

mamba env create -f environment.yml
mamba activate transeq

Then to run transeq on a fasta DNA sequence:

transeq path/to/your/sequence.fna -outseq path/to/your/sequence.faa

For instance:

BBa\_K317000.fna
>BBa_K317000
atggtgggacttacgaccctcttttggctcggcgcaatcggcatgctcgtcggcacgctc
gcgttcgcgtgggccggccgtgacgccggaagcggcgagcgacggtactacgtgacactt
gtcggcatcagtggtatcgcagcagtcgcctacgccgttatggcgctgggtgtcggctgg
gttcccgtggccgaacggactgttttcgtcccccggtacatcgactggattctcacaacc
ccgctcatcgtctacttcctcgggctgcttgcggggcttgatagtcgggagttcggcatc
gtcatcacgctcaacaccgtggtcatgctcgccggcttcgccggggcgatggtgcccggt
atcgagcgctacgcgctgttcggcatgggggcggtcgcattcatcggactggtctactac
ctcgtcgggccgatgaccgaaagcgccagccagcggtcctccggaatcaagtcgctgtac
gtccgcctccgaaacctgacggtcgtcctctgggcgatttatccgttcatctggctgctt
ggaccgccgggcgtggcgctgctgacaccgactgtcgacgtggcgcttatcgtctacctt
gacctcgtcacgaaggtcgggttcggcttcatcgcactcgatgctgcggcgacacttcgg
gccgaacacggcgaatcgctcgctggcgtcgatactgacacgcctgcggtcgccgactaa
transeq BBa_K317000.fna BBa_K317000.faa
BBa\_K317000.faa
>BBa_K317000_1
MVGLTTLFWLGAIGMLVGTLAFAWAGRDAGSGERRYYVTLVGISGIAAVAYAVMALGVGW
VPVAERTVFVPRYIDWILTTPLIVYFLGLLAGLDSREFGIVITLNTVVMLAGFAGAMVPG
IERYALFGMGAVAFIGLVYYLVGPMTESASQRSSGIKSLYVRLRNLTVVLWAIYPFIWLL
GPPGVALLTPTVDVALIVYLDLVTKVGFGFIALDAAATLRAEHGESLAGVDTDTPAVAD*