BUSCO Genome Quality Assessment

BUSCO (Benchmarking Universal Single-Copy Orthologs) is a tool for evaluating genome assembly completeness by searching for conserved orthologs across different lineages. This guide outlines the steps for using BUSCO to evaluate insect genome quality.

This is the methodology adopted by the InsectBase database team for assessing insect genome quality.

Workflow Steps

1

Install BUSCO

Ensure that BUSCO is installed on your system.

Installation with Conda

conda install -c bioconda busco
2

Download the Appropriate Insect Lineage Dataset

For insect genomes, you need to download the correct lineage dataset. BUSCO provides several pre-built datasets. For insects, you can use the insecta_odb10 dataset.

Download Insect Lineage Dataset

busco download -l insecta_odb10

This will download the ortholog dataset for insect species.

3

Run BUSCO to Assess Genome Quality

Run BUSCO with the downloaded lineage dataset to evaluate your insect genome. The command requires your genome file in FASTA format.

Run BUSCO Command

busco -i your_insect_genome.fasta -l insecta_odb10 -o busco_results -m genome
  • -i specifies the input genome file (in FASTA format)
  • -l specifies the lineage dataset (here, insecta_odb10)
  • -o specifies the output directory for results
  • -m genome indicates you're analyzing a genome assembly (as opposed to a transcriptome)
4

Interpret BUSCO Results

Once BUSCO completes, it will generate a report with the following categories:

  • Complete (single-copy): The number of BUSCOs that are present as complete single-copy orthologs
  • Complete (duplicated): The number of BUSCOs that are present as duplicated
  • Fragmented: BUSCOs that are still present but fragmented
  • Missing: BUSCOs that are completely absent from the genome

For high-quality genomes, you should expect a high percentage of complete BUSCOs.

Example Output

# BUSCO analysis summary
Total BUSCOs: 1,000
Complete: 850 (85%)
Complete (single-copy): 800 (80%)
Complete (duplicated): 50 (5%)
Fragmented: 100 (10%)
Missing: 50 (5%)
  • 85% complete indicates the genome has high completeness
  • 10% fragmented means some orthologs are partially missing
  • 5% missing suggests that some BUSCOs are absent, which could indicate gaps in the assembly
5

Alternative with compleasm

compleasm is a faster and more accurate reimplementation of BUSCO that can be used as an alternative.

Install compleasm

pip install compleasm

Run compleasm

compleasm.py run -t16 -l insecta -L /data/ -a genome.fa -o busco

Note: compleasm downloads lineage files organized differently than BUSCO.

References