WtMTA

The Where the Minor Things Are (WtMTA) intron database contains information about introns in > 1500 species identified by Larue & Roy, 2023 as containing minor introns, with a total of more than 250 million rows. The data includes intron information such as type classification (major or minor), phase, genomic coordinates, etc. for all annotated introns included in our analyses, as well as additional metadata about parent genes, transcripts, and genomes.

Intron classifications were generated using intronIC, and other intron-based metadata (introns per kbps coding sequence, etc.) was obtained using custom Python workflows. All substrate data was sourced from publicly-available genomic resources such as NCBI, Ensembl and JGI.

Quick start: get minor introns for your species

To quickly retrieve all minor introns for a species in BED format, use the BED-formatted introns query. Enter a species name (e.g., Homo sapiens) or NCBI taxonomy ID (e.g., 9606) along with a minimum score of 90 and a maximum score of 100 (which correspond to the default intronIC scoring thresholds for minor introns). Partial species names also work (e.g., sapiens), as long as they match exactly one species. Use the species search query if you need to find the right name first.

Exploring the database

Unless you are interested in the entirety of the data (see the section on running the database locally), the best place to start exploring may be via the genomes table. There, you can select a species of interest and drill down to the associated introns and/or transcripts for further filtering.

The results of any query can be downloaded in a number of plaintext formats (e.g., CSV), provided they don’t exceed 1 GB (see Advanced Export below the paginated results; select stream all rows to ensure the full dataset is returned). This should be sufficient to retrieve, for example, the complete intron/transcript set for any individual genome, or a subset of introns/transcripts across a number of different genomes.

Searching within tables

The genomes and transcripts table provide limited search functionality, allowing for queries of complete words only (i.e., no wildcards). For example, to return information about all cnidarian genomes, the genomes table should be searched for cnidaria, but not (for example) cnidar*.

Obtaining a local copy of the DB

The SQLite database file was created using sqlite-utils and Datasette.

You are free to download the entire WtMTA database file via the link at the bottom of this page. After doing so, you can recreate most of the functionality of this website on a local computer/server.

To explore a local version of this database using Datasette, first install Datasette:

python3 -m pip install datasette

Then, run Datasette with the SQLite database file:

datasette -i WtMTA.db

This command will start a local web server (the default URL will be displayed by Datasette automatically), and you can explore the database interactively using your web browser. See Datasette’s documentation for details and additional options.

Where the Minor Things Are (WtMTA): (Yet Another) Minor Intron Database

Quick start: get minor introns for your species

Exploring the database

Searching within tables

Obtaining a local copy of the DB

Custom SQL query

Queries

Tables

genomes

introns

transcripts