Introduction to babelgene

Overview

Genomic analysis of model organisms frequently requires the use of databases based on human data or making comparisons to patient-derived resources. This requires harmonization of gene names into the same gene space. The babelgene R package helps to simplify the conversion process. It provides gene orthologs/homologs:

  • for multiple frequently studied model organisms, such as mouse, rat, fly, and zebrafish
  • sourced from multiple databases
  • as gene symbols, NCBI Entrez, and Ensembl IDs
  • without accessing external resources and requiring an active internet connection
  • in an R-friendly “tidy” format with one gene pair per row

Usage

You can install the babelgene R package from CRAN.

install.packages("babelgene")

Load babelgene.

library(babelgene)

The main functionality is accessed via the orthologs() function which takes one or more genes and outputs a data frame of predicted ortholog/homolog pairs. The output data frame contains gene symbols and IDs for human and the specified species. Several examples are provided below.

Get mouse equivalents for a set of human genes.

orthologs(genes = c("TP53", "EGFR", "IL6", "TGFB1", "CD4"), species = "mouse")
#>   human_symbol human_entrez   human_ensembl taxon_id symbol entrez
#> 1          CD4          920 ENSG00000010610    10090    Cd4  12504
#> 2         EGFR         1956 ENSG00000146648    10090   Egfr  13649
#> 3          IL6         3569 ENSG00000136244    10090    Il6  16193
#> 4        TGFB1         7040 ENSG00000105329    10090  Tgfb1  21803
#> 5         TP53         7157 ENSG00000141510    10090  Trp53  22059
#>              ensembl
#> 1 ENSMUSG00000023274
#> 2 ENSMUSG00000020122
#> 3 ENSMUSG00000025746
#> 4 ENSMUSG00000002603
#> 5 ENSMUSG00000059552
#>                                                                                         support
#> 1 EggNOG|Ensembl|HGNC|HomoloGene|Inparanoid|NCBI|OMA|OrthoDB|OrthoMCL|Panther|PhylomeDB|Treefam
#> 2 EggNOG|Ensembl|HGNC|HomoloGene|Inparanoid|NCBI|OMA|OrthoDB|OrthoMCL|Panther|PhylomeDB|Treefam
#> 3                Ensembl|HGNC|HomoloGene|Inparanoid|NCBI|OMA|OrthoMCL|Panther|PhylomeDB|Treefam
#> 4 EggNOG|Ensembl|HGNC|HomoloGene|Inparanoid|NCBI|OMA|OrthoDB|OrthoMCL|Panther|PhylomeDB|Treefam
#> 5 EggNOG|Ensembl|HGNC|HomoloGene|Inparanoid|NCBI|OMA|OrthoDB|OrthoMCL|Panther|PhylomeDB|Treefam
#>   support_n
#> 1        12
#> 2        12
#> 3        10
#> 4        12
#> 5        12

Input genes are assumed to be human by default. You can specify if the input genes are human with the human parameter.

orthologs(genes = "Pu", species = "fruit fly", human = FALSE)
#>   human_symbol human_entrez   human_ensembl taxon_id symbol entrez     ensembl
#> 1         GCH1         2643 ENSG00000131979     7227     Pu  37415 FBgn0003162
#>                                                                               support
#> 1 EggNOG|Ensembl|HomoloGene|Inparanoid|OMA|OrthoDB|OrthoMCL|Panther|PhylomeDB|Treefam
#>   support_n
#> 1        10

It is possible to search by NCBI Entrez or Ensembl IDs instead of gene symbols.

orthologs(genes = "ENSG00000111640", species = "mouse", human = TRUE)
#>   human_symbol human_entrez   human_ensembl taxon_id symbol entrez
#> 1        GAPDH         2597 ENSG00000111640    10090  Gapdh  14433
#>              ensembl
#> 1 ENSMUSG00000057666
#>                                                             support support_n
#> 1 Ensembl|HGNC|HomoloGene|NCBI|OMA|OrthoDB|OrthoMCL|Panther|Treefam         9

The orthologs() function requires the species parameter to be set (both scientific and common names are acceptable). You can check all the species that can be queried with the help of the species() function.

species()
#>    taxon_id                 scientific_name
#> 1     28377             Anolis carolinensis
#> 2      9913                      Bos taurus
#> 3      6239          Caenorhabditis elegans
#> 4      9615          Canis lupus familiaris
#> 5      7955                     Danio rerio
#> 6      7227         Drosophila melanogaster
#> 7      9796                  Equus caballus
#> 8      9685                     Felis catus
#> 9      9031                   Gallus gallus
#> 10     9544                  Macaca mulatta
#> 11    13616           Monodelphis domestica
#> 12    10090                    Mus musculus
#> 13     9258        Ornithorhynchus anatinus
#> 14     9598                 Pan troglodytes
#> 15    10116               Rattus norvegicus
#> 16     4932        Saccharomyces cerevisiae
#> 17   284812 Schizosaccharomyces pombe 972h-
#> 18     9823                      Sus scrofa
#> 19     8364              Xenopus tropicalis
#>                                                                common_name
#> 1                                              Carolina anole, green anole
#> 2  bovine, cattle, cow, dairy cow, domestic cattle, domestic cow, ox, oxen
#> 3                                                                     <NA>
#> 4                                                                dog, dogs
#> 5                        leopard danio, zebra danio, zebra fish, zebrafish
#> 6                                                                fruit fly
#> 7                                            domestic horse, equine, horse
#> 8                                                  cat, cats, domestic cat
#> 9                             bantam, chicken, chickens, Gallus domesticus
#> 10          rhesus macaque, rhesus macaques, Rhesus monkey, rhesus monkeys
#> 11                                               gray short-tailed opossum
#> 12                                                      house mouse, mouse
#> 13                       duck-billed platypus, duckbill platypus, platypus
#> 14                                                              chimpanzee
#> 15                                        brown rat, Norway rat, rat, rats
#> 16                            baker's yeast, brewer's yeast, S. cerevisiae
#> 17                                                                    <NA>
#> 18                                             pig, pigs, swine, wild boar
#> 19                               tropical clawed frog, western clawed frog

Details

The package is based on the data provided by the Human Genome Organization (HUGO) Gene Nomenclature Committee (HGNC) at the European Bioinformatics Institute. The HGNC Comparison of Orthology Predictions (HCOP) integrates the orthology assertions predicted for human genes by eggNOG, Ensembl Compara, HGNC, HomoloGene, Inparanoid, NCBI Gene Orthology, OMA, OrthoDB, OrthoMCL, Panther, PhylomeDB, TreeFam and ZFIN.

The name babelgene is derived from the Babel Fish, a fictional species of fish that could translate for humans.