BUSCO Genome Quality Assessment
BUSCO (Benchmarking Universal Single-Copy Orthologs) is a tool for evaluating genome assembly completeness by searching for conserved orthologs across different lineages. This guide outlines the steps for using BUSCO to evaluate insect genome quality.
This is the methodology adopted by the InsectBase database team for assessing insect genome quality.
Workflow Steps
Install BUSCO
Ensure that BUSCO is installed on your system.
Installation with Conda
conda install -c bioconda busco
Download the Appropriate Insect Lineage Dataset
For insect genomes, you need to download the correct lineage dataset. BUSCO provides several pre-built datasets. For insects, you can use the insecta_odb10 dataset.
Download Insect Lineage Dataset
busco download -l insecta_odb10
This will download the ortholog dataset for insect species.
Run BUSCO to Assess Genome Quality
Run BUSCO with the downloaded lineage dataset to evaluate your insect genome. The command requires your genome file in FASTA format.
Run BUSCO Command
busco -i your_insect_genome.fasta -l insecta_odb10 -o busco_results -m genome
-i
specifies the input genome file (in FASTA format)-l
specifies the lineage dataset (here,insecta_odb10
)-o
specifies the output directory for results-m genome
indicates you're analyzing a genome assembly (as opposed to a transcriptome)
Interpret BUSCO Results
Once BUSCO completes, it will generate a report with the following categories:
- Complete (single-copy): The number of BUSCOs that are present as complete single-copy orthologs
- Complete (duplicated): The number of BUSCOs that are present as duplicated
- Fragmented: BUSCOs that are still present but fragmented
- Missing: BUSCOs that are completely absent from the genome
For high-quality genomes, you should expect a high percentage of complete BUSCOs.
Example Output
# BUSCO analysis summary
Total BUSCOs: 1,000
Complete: 850 (85%)
Complete (single-copy): 800 (80%)
Complete (duplicated): 50 (5%)
Fragmented: 100 (10%)
Missing: 50 (5%)
- 85% complete indicates the genome has high completeness
- 10% fragmented means some orthologs are partially missing
- 5% missing suggests that some BUSCOs are absent, which could indicate gaps in the assembly
Alternative with compleasm
compleasm is a faster and more accurate reimplementation of BUSCO that can be used as an alternative.
Install compleasm
pip install compleasm
Run compleasm
compleasm.py run -t16 -l insecta -L /data/ -a genome.fa -o busco
Note: compleasm downloads lineage files organized differently than BUSCO.
References
- BUSCO Official Website
- BUSCO GitHub Repository
- compleasm GitHub Repository