Genomic analysis of model organisms frequently requires the use of databases based on human data or making comparisons to patient-derived resources. This requires harmonization of gene names into the same gene space. The babelgene R package helps to simplify the conversion process. It provides gene orthologs/homologs:
You can install the babelgene R package from CRAN.
Load babelgene.
The main functionality is accessed via the orthologs()
function which takes one or more genes and outputs a data frame of
predicted ortholog/homolog pairs. The output data frame contains gene
symbols and IDs for human and the specified species. Several examples
are provided below.
Get mouse equivalents for a set of human genes.
orthologs(genes = c("TP53", "EGFR", "IL6", "TGFB1", "CD4"), species = "mouse")
#> human_symbol human_entrez human_ensembl taxon_id symbol entrez
#> 1 CD4 920 ENSG00000010610 10090 Cd4 12504
#> 2 EGFR 1956 ENSG00000146648 10090 Egfr 13649
#> 3 IL6 3569 ENSG00000136244 10090 Il6 16193
#> 4 TGFB1 7040 ENSG00000105329 10090 Tgfb1 21803
#> 5 TP53 7157 ENSG00000141510 10090 Trp53 22059
#> ensembl
#> 1 ENSMUSG00000023274
#> 2 ENSMUSG00000020122
#> 3 ENSMUSG00000025746
#> 4 ENSMUSG00000002603
#> 5 ENSMUSG00000059552
#> support
#> 1 EggNOG|Ensembl|HGNC|HomoloGene|Inparanoid|NCBI|OMA|OrthoDB|OrthoMCL|Panther|PhylomeDB|Treefam
#> 2 EggNOG|Ensembl|HGNC|HomoloGene|Inparanoid|NCBI|OMA|OrthoDB|OrthoMCL|Panther|PhylomeDB|Treefam
#> 3 Ensembl|HGNC|HomoloGene|Inparanoid|NCBI|OMA|OrthoMCL|Panther|PhylomeDB|Treefam
#> 4 EggNOG|Ensembl|HGNC|HomoloGene|Inparanoid|NCBI|OMA|OrthoDB|OrthoMCL|Panther|PhylomeDB|Treefam
#> 5 EggNOG|Ensembl|HGNC|HomoloGene|Inparanoid|NCBI|OMA|OrthoDB|OrthoMCL|Panther|PhylomeDB|Treefam
#> support_n
#> 1 12
#> 2 12
#> 3 10
#> 4 12
#> 5 12
Input genes are assumed to be human by default. You can specify if
the input genes are human with the human
parameter.
orthologs(genes = "Pu", species = "fruit fly", human = FALSE)
#> human_symbol human_entrez human_ensembl taxon_id symbol entrez ensembl
#> 1 GCH1 2643 ENSG00000131979 7227 Pu 37415 FBgn0003162
#> support
#> 1 EggNOG|Ensembl|HomoloGene|Inparanoid|OMA|OrthoDB|OrthoMCL|Panther|PhylomeDB|Treefam
#> support_n
#> 1 10
It is possible to search by NCBI Entrez or Ensembl IDs instead of gene symbols.
orthologs(genes = "ENSG00000111640", species = "mouse", human = TRUE)
#> human_symbol human_entrez human_ensembl taxon_id symbol entrez
#> 1 GAPDH 2597 ENSG00000111640 10090 Gapdh 14433
#> ensembl
#> 1 ENSMUSG00000057666
#> support support_n
#> 1 Ensembl|HGNC|HomoloGene|NCBI|OMA|OrthoDB|OrthoMCL|Panther|Treefam 9
The orthologs()
function requires the
species
parameter to be set (both scientific and common
names are acceptable). You can check all the species that can be queried
with the help of the species()
function.
species()
#> taxon_id scientific_name
#> 1 28377 Anolis carolinensis
#> 2 9913 Bos taurus
#> 3 6239 Caenorhabditis elegans
#> 4 9615 Canis lupus familiaris
#> 5 7955 Danio rerio
#> 6 7227 Drosophila melanogaster
#> 7 9796 Equus caballus
#> 8 9685 Felis catus
#> 9 9031 Gallus gallus
#> 10 9544 Macaca mulatta
#> 11 13616 Monodelphis domestica
#> 12 10090 Mus musculus
#> 13 9258 Ornithorhynchus anatinus
#> 14 9598 Pan troglodytes
#> 15 10116 Rattus norvegicus
#> 16 4932 Saccharomyces cerevisiae
#> 17 284812 Schizosaccharomyces pombe 972h-
#> 18 9823 Sus scrofa
#> 19 8364 Xenopus tropicalis
#> common_name
#> 1 Carolina anole, green anole
#> 2 bovine, cattle, cow, dairy cow, domestic cattle, domestic cow, ox, oxen
#> 3 <NA>
#> 4 dog, dogs
#> 5 leopard danio, zebra danio, zebra fish, zebrafish
#> 6 fruit fly
#> 7 domestic horse, equine, horse
#> 8 cat, cats, domestic cat
#> 9 bantam, chicken, chickens, Gallus domesticus
#> 10 rhesus macaque, rhesus macaques, Rhesus monkey, rhesus monkeys
#> 11 gray short-tailed opossum
#> 12 house mouse, mouse
#> 13 duck-billed platypus, duckbill platypus, platypus
#> 14 chimpanzee
#> 15 brown rat, Norway rat, rat, rats
#> 16 baker's yeast, brewer's yeast, S. cerevisiae
#> 17 <NA>
#> 18 pig, pigs, swine, wild boar
#> 19 tropical clawed frog, western clawed frog
The package is based on the data provided by the Human Genome Organization (HUGO) Gene Nomenclature Committee (HGNC) at the European Bioinformatics Institute. The HGNC Comparison of Orthology Predictions (HCOP) integrates the orthology assertions predicted for human genes by eggNOG, Ensembl Compara, HGNC, HomoloGene, Inparanoid, NCBI Gene Orthology, OMA, OrthoDB, OrthoMCL, Panther, PhylomeDB, TreeFam and ZFIN.
The name babelgene is derived from the Babel Fish, a fictional species of fish that could translate for humans.