Thursday April 16th, 2015

  • Who: LC, PJ, JP, JE, CM, JD, XX
  • Not available: EA, EZ, BS, GG, SN

A. General Comments and Updates:


- Curating the trait dictionaries, used by plant breeders - New Hire at CO: Marie Angelique Laporte has accepted the position as an Ontology Engineer, will be joining the first week of May

  • John Doonan, Director National Plant Phenomics Centre, at University of Aberystwyth, UK

- High throughput phenotyping - gene discovery and plant breeding. Integration of phenomic and genomic data

- Semi-automated phenotyping system, associated with Plant Breeding department

- Background in developmental genetics, but works closely with George Gkoutos, expert in ontologies and bioinformatics

- George's news: postdoc position has been advertised, and BBSRC has agreed to extend it as part of the collaboration with CGIAR on tropical grasses (Need to check?)

- Hands on facility for large scale phenotyping

- Goal is integrate ontologies into phenotyping workflow.

Chris Mungall, LBL - background in bioinformatics, involved with Gene Ontology, Phenotype ontology, PATO with George Gkoutous, human and mammalian systems, created Uberon anatomy ontology

- Assist with deployment of AmiGO2 browser.

Xu Xu - In Sinisa's group, works on Image annotation - From Xu Xu : "hello all, I am Xu Xu. Sorry my mic is not working. Just an introduction here: I am Sinisa's student working in image annotation group, with Justin. I will be working on integrating AISO to BisQue. Happy to work with all of you!"

Update on hires at OSU: PJ: Software Developer position has not had too many applicants, PJ met a potential candidate who is a spouse of a BPP department member, Biology Masters, recently from Yahoo.

Ontology curator position - will start evaluating applicants and will hopefully can offer to someone by mid May......

No news on the position at NYBG.

B. Update from IT group- Data Store, AmiGO2

  • Justin Elser, Chris Mungall, with Seth Carbon

AmiGO2 install

  • A dev version of the AmiGO2 has been installed and loaded with a set of OWL files, includes some others that may not be needed.
  • Still in progress, need to do a fair bit of configuration and organization of the ontology project files.
  • Loading the ontologies takes about 25 minutes, but does not include the ncbi taxon ontology, which takes too much memory and doesn't finish on palea.
  • Looking at loading a small subset (plants?) of the taxon

Ontologies loaded:

  • ChEBI, GO, PATO, Cell, PO, TO, ECO, GOREL, also Uberon, Uberon/phenoscape-anatomy, ncbi_taxonomy (114)
  • Need to include the Plant Experimental Conditions ontology (EO), and take out the CL and Uberon files
  • Several of the features such as QuickGO are not working QuickGo apparently is hard wired for GO, so we will not be able to adopt it
  • 'Mappings' aka dbxrefs to SourceForge and other ontologies like PATO are not working either.
  • Filters are not being saved- compare with the GO version- It actually does not work there either, apparently designed this way.

CM: In the future, we will have one uber file- "planteome.owl" that will contain all the planteome core and partner (PATO, ChEBI, GO etc) ontologies.

  • Also could use only a slice of the GO, as there are a lot of non-plant relevant terms.

PJ: Could use the GR_tax-ontology.obo file that has many of the plant species, and should have all the pathogens- Note: GR_tax-ontology.obo has only green plants and red algae.

  • GR_tax-ontology.obo should have refs to NCBI as well, would have to be maintained here, has not been worked on for over two years

CM: We could extract a slice of the NCBI taxonomy, if we can define our criteria e.g. everything that we have an annotation to, should have all the plants and the pests and pathogens.

Test set of annotations loaded

  • Loaded around 200,000 annotations from various ontologies, out of a total of ~ 4 million
  • Load times seem much better than the old AmiGO, at least for these annotations
  • Problems were found in some of the files, specifically coming from the col 16- annotation extensions and also in column 5 (ontology ID), where spaces are causing problems
  • JE corrected the issues he found in the files, but these revised files need to be committed back to the SVN- Done
Ontology Species filename Issues?
PO-Anatomy rice po_anatomy_gene_oryza_gramene.assoc no
PO-Anatomy rice po_anatomy_gene_oryza_poc.assoc yes
PO-Anatomy rice po_anatomy_qtl_oryza_gramene.assoc no
PO-Anatomy maize po_anatomy_gene_zea_MaizeGDB.assoc no
PO-Anatomy grape po_anatomy_gene_vitis_poc.assoc yes
PO-Anatomy maize po_anatomy_stock_zea_MaizeGDB.assoc no
PO_Growth_Stage rice po_growth_gene_oryza_gramene.assoc no
PO_Growth_Stage rice po_growth_qtl_oryza_gramene.assoc no
PO_Growth_Stage grape po_growth_gene_vitis_poc.assoc yes
Plant_Ontology rice po_ontology_IMP_gene_oryza_poc.assoc yes
Trait_Ontology Arabidopsis to_diversity_arabidopsis.assoc yes
Trait_Ontology  ? to_Protein_association.assoc no
Trait_Ontology  ? to_QTL_association.assoc no
Trait_Ontology Rice to_diversity_rice.assoc no
Trait_Ontology  ? to_Gene_association.assoc no
Plant_EO  ? eo_protein.assoc no
Plant_EO rice eo_diversity_rice.assoc no
Plant_EO  ? eo_qtl.assoc no
Plant_EO Arabidopsis eo_diversity_arabidopsis.assoc no
Plant_EO  ? eo_gene.assoc no
Gene_Ontology rice go_gramene_oryza.assoc no

Need to standardize the names of the annotation files

Browsing in AmiGO2

  • Type term into search box on front page, auto-fill feature will be useful
  • Column sorting can be customized
  • NCBI ids are not displayed correctly as the NCBI tree is not loaded
  • Need to check on "Source"- where is it pulling that from? It looks like it is using the namespaces from the terms.

  • "Panther Gene Families"- We can customize this and can use our own Inparanoid gene families, as Panther does not have too many plants represented
  • Inparanoid gene families- algorithm creates superclusters, based on ~70- 100 genomes, species by species comparison
  • Can input any standard format such as New Hampshire, phyloXML, etc call it "Plant Gene Families" etc
  • Can try with a subset- e.g. Arabidopsis, rice and maize and associate with the GO annotations

PJ: can include the other out groups, but do not serve the animal genes, but then we have to serve all the GO human/mouse etc annotations

  • Start with displaying only the plant data, but include all the species in the analysis. Can decide later if we want to include it as well
  • JE will do a baseline GO annotation for all the plant genomes, based on Interpro

PJ- Goal is a "Pan-Gene" database - centralized resource for nomenclature and management

  • Need an annotation platform- one idea is a "wiki" type platform
  • May look at GO tool "Noctua", CM can demo on a future call see Link to [Noctua on GitHub]

C. Update from AISO/BisQue group- Justin Preece:

Please fill in notes here....


  • Yao Zhou has left Sinisa's lab
  • Xu Xu is currently a full time GRA on the project.

D. Update from Ontologies Working Group:

The OWG has been meeting roughly biweekly, around everyone's travel schedules- see the page for more details on the recent meetings.

Discussion about storing Ontologies and Associations at GitHub:

  • Working on plans for moving the ontologies to GitHub, still not completely decided how to do this. See meeting notes GitHub Mtg 4-3
  • Re: associations: PJ met a person (another Justin!) from Github on the plane, they are interested in hosting large datasets and offering them through APIs etc
  • Question about whether or not we need to maintain the history? PJ: Ideally we would keep the history with the ontology files, not so important about the association files.
  • LC: Theoretically this is possible, but it is complicated. On the 4-3 Github call, JE said that he would reorganize the files on the SVN and try importing the history.
  • PJ: Suggest someone should send an email to Github and ask them how best to transfer the SVN history....Who is doing this??
  • One option would be to maintain the SVN and run a chron job to the Github or visa versa, but this does not solve anything and adds a lot of extra work for maintenance.
  • PJ can ask his contact "GitHub-Justin" if we have questions.

Note from after the call: see this link for [Announcing Git Large File Storage (LFS)]

Summary of Goals:

PJ: The goal is to have a version control system on Github for the ontology files (and possibly the annotations). Currently we are using the SVN on our local servers, publicly-available for read access, approved developers can get write access.

  • Each change generates a version # for tracking purposes.
  • To release the ontologies and data on the database and browser, the team JE/CM would take a snapshot or branch it

Updates on various collaborative projects:

Panzea dataset annotation in collaboration with MaizeGDB

  • large GWAS dataset 385,000 lines in MaizeGDB database
  • annotating with TO and PO terms
  • new interface at Maize GDB includes some of the PO terms, adding additional ontology terms and adding TO
  • working with their developers so their users can browse the ontology hierarchy

Plant Disease Ontology

(will become part of Plant Stress Ontology)

  • new undergrad helper working on adding diseases
  • will also tie into the Panzea dataset as they also have diseases traits
  • Working with new collaborating Database group PHI-Base Pathogen - Host Interaction database- Initiative from Rothamstad in England
  • Large set of manually curated literature
  • Covers wide range of plant species (and animals)
  • They are requesting terms and will make cross links to and from their database

E. Other Comments:

Leo has been working with JE and his local Sys Admin to get the SVN access working- maybe fixed by moving to GitHub

Next meeting Thursday May 21st