Where the Minor Things Are (WtMTA): (Yet Another) Minor Intron Database
Data license: ODbL · Data source: Larue & Roy, 2023
Queries
Tables
genomes
taxonomy_id, species, family, order, phylum, accession, n_minor_introns, n_major_introns, percent_minor_introns, busco_score, minor_snRNAs, genome_version, source_url, source_metadata, minor_intron+
1,575 rows
introns
id, dinucleotide_pair, is_minor, score, length, transcript_id, ordinal_index, start, end, taxonomy_id, scored_motifs, phase, in_cds, relative_position
214,855,132 rows
transcripts
id, taxonomy_id, transcript_id, gene_id, chromosome, strand, start, end, coding_length, introns_per_kbp_cds, proportion_minor_introns, n_introns, n_minor_introns
35,702,876 rows
Download SQLite DB: WtMTA.db 37.8 GB
The Where the Minor Things Are (WtMTA) intron database contains information about introns in > 1500 species identified by Larue & Roy, 2023 as containing minor introns, with a total of more than 250 million rows. The data includes intron information such as type classification (major or minor), phase, genomic coordinates, etc. for all annotated introns included in our analyses, as well as additional metadata about parent genes, transcripts, and genomes.
Intron classifications were generated using intronIC, and other intron-based metadata (introns per kbps coding sequence, etc.) was obtained using custom Python workflows. All substrate data was sourced from publicly-available genomic resources such as NCBI, Ensembl and JGI.
Exploring the database
Unless you are interested in the entirety of the data (see the section on running the database locally), the best place to start exploring may be via the
genomes
table. There, you can select a species of interest and drill down to the associated introns and/or transcripts for further filtering.The results of any query can be downloaded in a number of plaintext formats (e.g., CSV), provided they don’t exceed 1 GB (see
Advanced Export
below the paginated results; selectstream all rows
to ensure the full dataset is returned). This should be sufficient to retrieve, for example, the complete intron/transcript set for any individual genome, or a subset of introns/transcripts across a number of different genomes.Searching within tables
The
genomes
andtranscripts
table provide limited search functionality, allowing for queries of complete words only (i.e., no wildcards). For example, to return information about all cnidarian genomes, thegenomes
table should be searched forcnidaria
, but not (for example)cnidar*
.Obtaining a local copy of the DB
The SQLite database file was created using sqlite-utils and Datasette.
You are free to download the entire WtMTA database file via the link at the bottom of this page. After doing so, you can recreate most of the functionality of this website on a local computer/server.
To explore a local version of this database using Datasette, first install Datasette:
Then, run Datasette with the SQLite database file:
This command will start a local web server (the default URL will be displayed by Datasette automatically), and you can explore the database interactively using your web browser. See Datasette’s documentation for details and additional options.