On February 5th, the collaborative research results of the Shenzhen Institute of Genomics, Chinese Academy of Agricultural Sciences were published online in Cell. The study revealed that vertebrates present a three-step evolution from aquatic to terrestrial evolution, and analysis showed that the origin of new genes, the emergence of new regulatory elements, and changes in gene-encoded amino acids are three types of genetic innovations in vertebrates from aquatic to terrestrial. The terrestrial life played a role in the evolution process. Researcher Ruan Jue of the Institute of Genomics is the co-first author, mainly involved in completing the genome assembly of African lungfish.
The transformation of vertebrates from aquatic to terrestrial is a leap in the evolution of vertebrates. This process requires the respiratory system, motor system, nervous system, etc. All kinds of physiological and morphological innovations can be realized. What is the basis of genetic innovation for various evolutionary changes from bony fish ancestors to flesh-fin fish ancestors to terrestrial vertebrates has always been a major unresolved scientific question in the scientific community. After long-term paleontology and vertebrate studies, it is known that the nearest fish relative of the surviving quadrupeds is lungfish. Therefore, analyzing the genomes of”living fossil” fishes such as lungfish and early ray fin fish is the key to solving this major problem. Lungfish has the largest genome of 40Gb in known vertebrates, which is more than 10 times the 3Gb of human genome. The lungfish genome is filled with a large number of repetitive sequences, which poses a great challenge for researchers to analyze its complete sequence.
Ruan Jue led the team to participate in the research and optimization of the wtdbg algorithm, and analyzed the most difficult African lungfish genome with high quality so far. On the basis of the original assembly method wtdbg-1.2.8, the team has greatly expanded the data capacity supported by the algorithm on the one hand, making it theoretically assemble the largest genome up to 1000Gb, meeting the genome assembly requirements of all known organisms; on the other hand, it has further improved The assembly speed of the algorithm finally developed a new version of the assembly algorithm wtdbg2. Using the wtdbg2 algorithm, the analysis of the three generations of 1.5TB calibration data was completed in only 3 days (192CPU), and 39.1Gb genome sequences (contigs) were obtained. The BUSCO evaluation based on vertebrate core genes showed that its integrity was as high as 95%. This result shows the great advantage of wtdbg2 in the analysis of the largest vertebrate genome. The African lungfish genome studied in this study is not only the largest genome at present, but also the first complete and high-quality super-large genome, marking that Chinese scientific and technological personnel have reached the international top level in the analysis of super-large genomes.