Querying iGEM Registry SQL Dump Intro
Loading in MariaDB/MySQL
gunzip sql_parts.sql.gzCREATE DATABASE igem_registry; mysql igem_registry < sql_parts.sqlPreliminary Exploration
show tables;| Tables_in_igem_registry |
|---|
| parts |
| parts_seq_features |
describe parts;select label from parts_seq_features where label like "%OMPR%";select label from parts_seq_features where label like "%OMPC%";Select the part labels containing “OmpR”, and count how many parts contains this label, ordering by decreasing count:
select label, count(label) from parts_seq_features where lab
el like "%OMPR%" group by label order by count(label) desc;Results Table
| label | count(label) |
|---|---|
| C1 OmpR | 209 |
| C3 OmpR | 184 |
| C2 OmpR | 184 |
| F3 OmpR | 33 |
| F2 OmpR | 33 |
| F1 OmpR | 33 |
| ompR | 11 |
| OmpR binding site | 8 |
| OmpR binding | 5 |
| ompR(sph) | 4 |
| OmpR dependent promoter | 2 |
| OmpR site b | 1 |
| OmpR Scrambled promoter | 1 |
| Location of wt C3 OmpR | 1 |
| Location of wt C2 OmpR | 1 |
| Strong OmpR | 1 |
| Location of wt C1 OmpR | 1 |
| OmpR (R0082) | 1 |
| OmpR site a | 1 |
| BBa_R0082 - pOmpR | 1 |
| OmpR site c | 1 |
What are the available part types:
select part_type from parts group by part_type;Results Table
| part_type |
|---|
| Cell |
| Coding |
| Composite |
| Conjugation |
| Device |
| DNA |
| Generator |
| Intermediate |
| Inverter |
| Measurement |
| Other |
| Plasmid |
| Plasmid_Backbone |
| Primer |
| Project |
| Promoter |
| Protein_Domain |
| RBS |
| Regulatory |
| Reporter |
| RNA |
| Scar |
| Signalling |
| T7 |
| Tag |
| Temporary |
| Terminator |
| Translational_Unit |
How many part has the part_type Promoter?
select count(part_id) from parts where part_type = "Promoter"3
Idem for Regulatory part type:
select count(part_id) from parts where part_type = "Regulatory"6981
Interested Queries
What are the part biobicks registry IDs associated with “OmpR binding site” label?
select parts.part_name
from parts
inner join parts_seq_features
on parts.part_id = parts_seq_features.part_id
where
label = "OmpR binding site";Results Table
| part_name |
|---|
| BBa_K136004 |
| BBa_K2845017 |
| BBa_K2845019 |
| BBa_K2845021 |
| BBa_K2845023 |
| BBa_K2845027 |
| BBa_K2845029 |
| BBa_K2845031 |
What’s the length of those feature sequence?
select parts.part_name, parts.sequence_length,
parts_seq_features.start_pos, parts_seq_features.end_pos,
parts_seq_features.end_pos - parts_seq_features.start_pos as feature_length
from parts
inner join parts_seq_features
on parts.part_id = parts_seq_features.part_id
where
label = "OmpR binding site";Results Table
| part_name | sequence_length | start_pos | end_pos | feature_length |
|---|---|---|---|---|
| BBa_K136004 | 230 | 43 | 63 | 20 |
| BBa_K2845017 | 58 | 11 | 28 | 17 |
| BBa_K2845019 | 63 | 10 | 28 | 18 |
| BBa_K2845021 | 68 | 11 | 28 | 17 |
| BBa_K2845023 | 108 | 11 | 28 | 17 |
| BBa_K2845027 | 63 | 36 | 53 | 17 |
| BBa_K2845029 | 108 | 41 | 68 | 27 |
| BBa_K2845031 | 73 | 46 | 73 | 27 |
Extracting the feature sequence from the part into a fasta file:
select
parts.part_name,
substring(parts.sequence,
parts_seq_features.start_pos,
parts_seq_features.end_pos
) as feature_sequence
from parts, parts_seq_features
where parts.part_id = parts_seq_features.part_id
and parts_seq_features.label = "OmpR binding site"
into outfile "/tmp/OmpR_binding_site_part_features.csv";sudo cp /tmp/systemd-private-90c6e33e66b14aad94f288d77b339520-mariadb.service-0LgiF0/tmp/OmpR_binding_site_part_features.csv ./data/pOmpR/OmpR_binding_site_part_features.csv
sudo chown $USER:$USER /home/sortion/Documents/Projects/asso/igem/2023/bioinfo/OmpR_binding_site_part_features.csv
head ./data/pOmpR/OmpR_binding_site_part_features.csvBBa_K136004 gaaaaatcttagataagtgtaaagacccatttctatttgtaaggacatattaaaccaaaaagg
BBa_K2845017 acattttgaaacatctatagcgatctct
BBa_K2845019 tacattttgaaacatctatagcgatctc
BBa_K2845021 acattttgaaacatctatagcgatctct
BBa_K2845023 ttacattttgaaacatctatagcgataa
BBa_K2845027 acattttgaaacatctatagcgatctct
BBa_K2845029 gaaacatcttaaaagttttagtatcatattcgtgttggattattctgcatttttggggagaatggact
BBa_K2845031 acattttgaaacatctatagcgatctct
*Converting this TSV into a FASTA file`:
awk '{ printf ">%s\n", $1; print $2}' ./data/pOmpR/OmpR_binding_site_part_features.csv > ./data/pOmpR/OmpR_binding_site_part_features.fastaMultiple Sequence Alignment
muscle Alignment with color
musclor() {
local source_file
source_file="$1"
muscle -align ./data/${source_file}.fasta -output ./results/${source_file}.afa
muscle -align ./data/${source_file}.fasta -stratified -output ./results/${source_file}.efa
muscle -letterconf ./results/${source_file}.efa -ref ./results/${source_file}.afa -output ./results/${source_file}.lc.afa
muscle -letterconf ./results/${source_file}.efa -ref ./results/${source_file}.afa \
-output ./results/${source_file}.lc.afa \
-html ./results/${source_file}.lc.html \
-jalview ./results/${source_file}/jalview.features
}
musclor "pOmpR/OmpR_binding_site_part_features"Two sequence seems to contain relatively large insert.
awk 'length($2) < 30 { printf ">%s\n", $1; print $2}' ./data/pOmpR/OmpR_binding_site_part_features.csv > ./data/pOmpR/OmpR_binding_site_part_features_without_inserted.fastamusclor "pOmpR/OmpR_binding_site_part_features_without_inserted"Extract the interesting html content into a .qmd file, for import in this notebook:
for file in OmpR_binding_site_part_features_without_inserted.lc.html OmpR_binding_site_part_features.lc.html
do
./src/extract_inner_muscle_html.sh ./results/pOmpR/$file > ./media/$file.qmd
done
BBa_K2845027 —acattttgaaa——————————–catc-tatagc———————–
BBa_K136004 ga-aaaatcttagataagtgtaaagacccatttctatttgtaaggacatattaaacc———————–
BBa_K2845019 t–acattttgaaa——————————–catc-tatagc———————–
BBa_K2845029 gaaacatcttaaaa——————————–gttt-tagtatcatattcgtgttggattattctg
BBa_K2845031 —acattttgaaa——————————–catc-tatagc———————–
BBa_K2845017 —acattttgaaa——————————–catc-tatagc———————–
BBa_K2845023 tt-acattttgaaa——————————–catc-tatagc———————–
BBa_K2845021 —acattttgaaa——————————–catc-tatagc———————–
BBa_K2845027 ————–gatctct
BBa_K136004 ————–aaaaagg
BBa_K2845019 ————–gatct-c
BBa_K2845029 catttttggggagaatggact
BBa_K2845031 ————–gatctct
BBa_K2845017 ————–gatctct
BBa_K2845023 ————–gat–aa
BBa_K2845021 ————–gatctct
Confidence high 9876543210 low
BBa_K2845031 –acattttgaaacatctatagcgatctct
BBa_K2845023 ttacattttgaaacatctatagcgata–a
BBa_K2845017 –acattttgaaacatctatagcgatctct
BBa_K2845019 -tacattttgaaacatctatagcgatct-c
BBa_K2845021 –acattttgaaacatctatagcgatctct
BBa_K2845027 –acattttgaaacatctatagcgatctct
Confidence high 9876543210 low
Appendix
Easier connection to the database
To connect to the MariaDB database without using sudo all the time, one can create a user:
GRANT ALL PRIVILEGES ON igem_registry.* TO 'user'@'localhost' identified by 'secret';
FLUSH PRIVILEGES;Then, to avoid having to type the login / password at each connection, he can add the following to the ~/.my.cnf config file:
~/.my.cnf
[client-server]
user=user
password=secret
host=localhost
Thus, the connection is straightforward:
mysql igem_registryAlso, to enable file system access, from an administrator mariadb account:
GRANT FILE ON *.* TO 'user'@'localhost';References
- https://parts.igem.org/Registry_API (database dumps available there).