In recent years, high-throughput next-generation sequencing (NGS) techniques have provided fascinating opportunities to understand the biology of non-model organisms, especially insect species. The decrease in sequencing costs and extensive sequencing services from NGS providers has brought many entomologists to be involved in genome sequencing. However, poor planning can lead to extremely fragmented genome assemblies which prevents high quality gene annotation and other desired analyses. Insect genomes can be problematic to assemble, due to combinations of high polymorphism, inability to breed for genome homozygosity, and small physical sizes limiting the quantity of DNA able to be isolated from a single individual. Given to the rapid development of host resistance to multiple classes of insecticides, it is indispensable to study the comprehensive genomic information of insects. Recent advances in sequencing technology and assembly strategies can able to fetch breakthroughs in deciphering the genetic information of insects. Here, we present the cost effective high throughput genome sequencing and assembly strategies for insect species in respects to taxonomy, evolutionary history, immune response, drug development, insect host-virus interactions and pest management etc.
Leptotrombidium pallidum is the major vector mites for Orientia tsutsugamushi, the causative agent of scrub typhus. To understand the molecular mechanism of L. pallidum, we sequenced the whole genome using Illumina sequencing technology. Totally four genomic libraries with different insert sizes ranging from 280 bp to 8 kb were used to generate 45.1 Gb of genome in the combination of paired-end and mate-pairs sequencing reads. Quality filtering and correction of paired-end reads for very small and/or bad-quality sequences yielded 26.9 Gb of high-quality sequences, which are used to estimate the genome size as 175 Mbusing kmer methods and assembled into a 193.7 Mb genomic sequence scaffolds with N50 length of 92,945 bp. Furthermore, 94% of CEGMA completeness score were obtained from genome scaffold assembly. To facilitate gene annotation, we used a combination of de novo and homology based tools to predict gene models in the chigger mite genome. A combination of evidence-based and de novo approaches predicted 15,842 high-confidence protein-coding genes with an average transcript length of 1,511 bp and 2.4 exons per gene which corresponds to about 12.4% total gene length. Bacterial endosymbiosis are very common in mite species and can range from mutualistic to pathogenic associations. Henceforth, the endosymbionts in L. pallidum were predicted using the NCBI microbial draft genomes and mitochondrial genome. Besides, this L. pallidum draft genome can be used as a significant reference for comparative genomic studies across mite species.
Leptotrombidium pallidum is the major vector mite for Orientia tsutsugamushi, the causative agent of scrub typhus, in Asian countries, including Korea. The genome size of L. pallidum was previously estimated to be 191 ± 7 Mb (Kim et al., 2014). Genomic DNA (gDNA) was extracted from a single female from a 9-generation inbred L. pallidum colony and used for whole genome amplification (WGA). The resulting amplified gDNA was used for the construction of paired-end and mate-pair libraries and sequenced using Illumina platforms (HiSeq2000 and MiSeq). An unamplified gDNA sample extracted from 20 female mites was also used for sequencing in parallel. More than 45Gb sequence reads from both paired-end and mate-pair libraries of the WGA gDNA were trimmed and then de novo assembled using the CLC Asembly Cell v.4.0 for contig assembly and SSPACE for scaffolding. The assembly generated approximately 6,545 scaffolds with N50 value of 92,945 and total size of ~193Mb, which was in a good agreement with our previous estimation. Repeat analysis showed that about 30% of genome (~58Mb) was masked as repeats, most of which were unclassified novel elements. For gene predictions, generated were the PASA models based on genomic alignments of RNA-seq reads from 4 different chigger mite samples (i.e. male, female, larva, and protonymph) and the GeneWise models based on genomic alignments of protein sequences from 4 closely related species with chigger mite. Independently, ab initio gene predictions were performed with AUGUSTUS and FgeneSH with custom trained matrices optimized for L. pallidum and GENEID with pre-trained matrix for Acyrthopsiphon pisum. By combining all together, 15,842 genes were predicted finally. Manual curation is in progress for various groups of genes, including chemosensory receptor genes, immune-related genes, acaricide target genes, etc.
Leptotrombidium pallidum and Leptotrombidium scutellare are the major vector mites for Orientia tsutsugamushi, the causative agent of scrub typhus. Before these organisms can be subjected to whole-genome sequencing, the genome sizes of L. pallidum and L. scutellare were estimated by a method based on quantitative real-time PCR. In addition, k-mer analysis of the genome sequences obtained from Illumina sequencing was conducted to verify the mutual compatibility and reliability of results. The genome sizes estimated by qPCR were 191.3±7 Mb for L. pallidum and 262.1±13 Mb for L. scutellare. The estimated genome sizes based on k-mer analysis were 175.5 Mb for L. pallidum and 286.6 Mb for L. scutellare. The estimates from two independent methods were mutually complementary and in a similar range to those of other Acariform mites. The relatively small genome size would facilitate genome analysis, which could contribute to understanding Arachnida genome evolution and mite vector competence and provide key information for scrub typhus prevention.