Cenote-Taker3

Discover and annotate the virome.

Works on your laptop or HPC (compatible with MacOS and Linux)

Cenote-Taker 3 is a virus bioinformatics tool that scales from individual genomes sequences to massive metagenome assemblies to:

Identify sequences containing genes specific to viruses (virus hallmark genes)
Annotate virus sequences including:

---a) adaptive ORF calling

---b) a large catalog of HMMs from virus gene families for functional annotation

---c) Hierarchical taxonomy assignment based on hallmark genes

---d) mmseqs2-based CDD database search

---e) tabular (.tsv) and interactive genome map (.gbf) outputs

Also, Cenote-Taker 3 is very fast, many many times faster than Cenote-Taker 2 for large datasets, and faster than comparable annotation using pharokka with more function annotation for virus genes (in my hands)

Image of example genome map:

Use Cases

Discovering virus contigs in metagenomic data
Annotating virus sequences without highly similar well-annotated reference
Finding prophages (or proviruses) in microbial genomes

Not-Use Cases

Not for read-level classification of known viruses (see Marker-MAGu or EsViritu for this task)
Not ideal for annotating virus genomes that are highly similar to known references (e.g. phage lambda with a few mutations).

Schematic

Installation Instructions

Most recent versions

Cenote-Taker 3 scripts: v3.4.3 Cenote-Taker 3 Databases: v3.1.1

This should work on MacOS and Linux

Versions used in test installations

mamba 1.5.8

conda 24.7.1

Bioconda package (most users)

mamba is better/faster than conda for almost all solving/installation tasks

Use mamba to install the bioconda package

macOS (specify osx-64 platform regardless of which chip you have) I'm also noticing a macOS-specific issue with newer mmseqs versions, so use mmseqs2=15.6f452

mamba create --platform osx-64 -n ct3_env -c conda-forge -c bioconda cenote-taker3=3.4.3 mmseqs2=15.6f452

linux

mamba create -n ct3_env -c conda-forge -c bioconda cenote-taker3=3.4.3

Using conda instead

macOS (specify osx-64 platform regardless of which chip you have)

conda create --platform osx-64 -n ct3_env -c conda-forge -c bioconda cenote-taker3=3.4.3 mmseqs2=15.6f452

linux

conda create -n ct3_env -c conda-forge -c bioconda cenote-taker3=3.4.3

Activate the conda environment.

conda activate ct3_env

You should be able to type cenotetaker3 and get_ct3_dbs in terminal to bring up help menu now

Change to a directory where you'd like to install databases and run database script, specify DB directory with -o.

Total DB file size of 3.0 GB after file decompression

cd ..

get_ct3_dbs -o ct3_DBs --hmm T --hallmark_tax T --refseq_tax T --mmseqs_cdd T --domain_list T

With optional hhsuite databases

Warning: due to inconsistent server speed, these downloads may take over 2 hours.

You may download one or more hhsuite DB.

The data footprint is:

Database	Size
CDD	6.1 GB
pfam	4.6 GB
pdb70	56 GB

get_ct3_dbs -o ct3_DBs --hmm T --hallmark_tax T --refseq_tax T --mmseqs_cdd T --domain_list T --hhCDD T --hhPFAM T --hhPDB T

Set the database directory as a conda environmental variable.

conda env config vars set CENOTE_DBS=/path/to/ct3_DBs

From source (development versions)

Clone this GitHub repo
Using mamba (package manager within conda) and the provided yaml file, make the environment:

mamba env create -f Cenote-Taker3/environment/ct3_env.yaml

Activate the conda environment.

conda activate ct3_env

Change to repo and pip install command line tool.

cd Cenote-Taker3

pip install .

You should be able to type cenotetaker3 and get_ct3_dbs in terminal to bring up help menu now

Change to a directory where you'd like to install databases and run database script, specify DB directory with -o.

Total DB file size of 3.0 GB after file decompression

cd ..

get_ct3_dbs -o ct3_DBs --hmm T --hallmark_tax T --refseq_tax T --mmseqs_cdd T --domain_list T

With optional hhsuite databases

Warning: due to inconsistent server speed, these downloads may take over 2 hours.

You may download one or more hhsuite DB.

The data footprint is:

Database	Size
CDD	6.1 GB
pfam	4.6 GB
pdb70	56 GB

get_ct3_dbs -o ct3_DBs --hmm T --hallmark_tax T --refseq_tax T --mmseqs_cdd T --domain_list T --hhCDD T --hhPFAM T --hhPDB T

Set the database directory as a conda environmental variable.

conda env config vars set CENOTE_DBS=/path/to/ct3_DBs

Running Cenote-Taker 3

Make sure conda environment is activated

Help Menu

cenotetaker3 -h

Test contigs

cenotetaker3 -c Cenote-Taker3/test_data/testcontigs_DNA_ct2.fasta -r test_ct3 -p T

Default Discover and Annotate

cenotetaker3 -c my_metagenome_contigs.fna -r my_meta_ct3 -p T

Recommended settings for microbial genomes

cenotetaker3 -c my_metagenome_contigs.fna -r my_meta_ct3 -p T --lin_minimum_hallmark_genes 2

Discover and Annotate, Force `prodigal` (`prodigal-gv` is default)

cenotetaker3 -c my_metagenome_contigs.fna -r my_meta_ct3pr -p T --caller prodigal

Just Annotate

cenotetaker3 -c my_virus_contigs.fna -r my_virs_ct3 -p F -am T

Choose which HMM DBs are hallmark (virion rdrp is default)

cenotetaker3 -c my_metagenome_contigs.fna -r my_meta_ct3 -p T -db virion rdrp dnarep

Calculate coverage level with reads

cenotetaker3 -c my_metagenome_contigs.fna -r my_meta_ct3 -p T --reads my_reads/*fastq

Output Files

{run_title}/
|   {run_title}_virus_summary.tsv                 <- main summary file for each virus
|   {run_title}_virus_sequences.fna               <- all virus genome seqs
|   {run_title}_virus_AA.faa                      <- all virus AA seqs
|   {run_title}_prune_summary.tsv                 <- summary of pruning of each sequence
|   final_genes_to_contigs_annotation_summary.tsv <- annotation info, all genes
|   run_arguments.txt                             <- arguments used in this run
│   {run_title}_cenotetaker.log                   <- main log file
│
└───sequin_and_genome_maps/
│   │   {run_title}*gbf                           <- genome maps
│   │   {run_title}*fsa                           <- genome sequence
│   │   {run_title}*gtf                           <- feature table gtf format
│   │   {run_title}*tbl                           <- feature table sequin format
│   │   {run_title}*sqn                           <- non-human-readable sequin file for GenBank sub
│   │   {run_title}*cmt                           <- sequin comment file
│
└───ct_processing/
    │   --- many intermediate files ---

Ideas for downstream analyses

CheckV for virus genome completeness estimation.

BACPHLIP for phage lifestyle prediction (only use complete/near-complete phage genomes).

VContact3 for genome clustering and taxonomy.

iPHoP for prokaryotic virus host prediction.

Notes

Cenote-Taker 3 is under active development, so please open an issue if anything seems unusual or any errors occur. It's likely that I've not tested every parameter combination, and bugs will be a simple fix.

Citation

Cenote-Taker 3 for Fast and Accurate Virus Discovery and Annotation of the Virome.

Michael J. Tisza, Joseph F. Petrosino, Sara J. Javornik Cregeen

doi: https://doi.org/10.1101/2025.08.20.671380

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
environment		environment
images		images
src/cenote		src/cenote
test_data		test_data
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dummy_template.sbt		dummy_template.sbt
pyproject.toml		pyproject.toml
viral_cdds_and_pfams_191028.txt		viral_cdds_and_pfams_191028.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Cenote-Taker3

Use Cases

Not-Use Cases

Schematic

Installation Instructions

Bioconda package (most users)

From source (development versions)

Running Cenote-Taker 3

Help Menu

Test contigs

Default Discover and Annotate

Recommended settings for microbial genomes

Discover and Annotate, Force `prodigal` (`prodigal-gv` is default)

Just Annotate

Choose which HMM DBs are hallmark (virion rdrp is default)

Calculate coverage level with reads

Output Files

Ideas for downstream analyses

Notes

Citation

About

Uh oh!

Releases 9

Languages

License

mtisza1/Cenote-Taker3

Folders and files

Latest commit

History

Repository files navigation

Cenote-Taker3

Use Cases

Not-Use Cases

Schematic

Installation Instructions

Bioconda package (most users)

From source (development versions)

Running Cenote-Taker 3

Help Menu

Test contigs

Default Discover and Annotate

Recommended settings for microbial genomes

Discover and Annotate, Force prodigal (prodigal-gv is default)

Just Annotate

Choose which HMM DBs are hallmark (virion rdrp is default)

Calculate coverage level with reads

Output Files

Ideas for downstream analyses

Notes

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 9

Languages

Discover and Annotate, Force `prodigal` (`prodigal-gv` is default)