Mar 1st, 2016 Ontology Working Group Meeting: Difference between revisions
(40 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
=Planteome Ontology WG Zoom Meeting= | |||
* Date: Tuesday Mar. 1st, 2016 | * Date: Tuesday Mar. 1st, 2016 | ||
* Time: 8:15am PST (GMT-8) | * Time: 8:15am PST (GMT-8) | ||
* Connection details: Join from PC, Mac, Linux, iOS or Android: https://zoom.us/j/762470789 | * Connection details: Join from PC, Mac, Linux, iOS or Android: https://zoom.us/j/762470789 | ||
* Links to recordings: | * Links to recordings: | ||
* | ** [[File:Ontology WG Meeting 3-1-2016 audio.m4a|thumb|Ontology WG Meeting 3-1-2016 audio]] | ||
** [[File:Ontology WG Meeting 3-1-2016 video.mp4|thumb|Ontology WG Meeting 3-1-2016 video]] | |||
* Attendees:LC, AM, PJ, CM, JE | |||
== | |||
* | ==Changes and updates in the IBP files == | ||
* | * Review of progress with building the Planteome version of AmiGO with the CO classes merged in | ||
* | |||
* | * Three files are currently integrated: rice, cassava, lentil | ||
* wheat- ''newly mapped and will be added soon, once the issues of the id numbers are resolved'' | |||
** Latest changes: added the crop common name to all terms, removed the categorical classes (ie: Agronomical Traits), made all CO class names in lower case | |||
* The mappings to TO (is_a) are on the trait class, but there are still some on the cassava variables, as it appeared on the browser. These were originally put in since the variable was not connected, but can now be removed. | |||
* In cassava file, the variable has an is_a relationship to the trait, since the variable_of, method_of and scale_of relations are not being displayed. This is a workaround to get the variable to show as a child (should revert to actual relationships when we can get it sorted out in AmiGO) | |||
*Note: funny synonyms in AmiGO cause weird behavior with autocomplete ''(not sure what this means..)'' | |||
* Questions and discussions: | |||
** Have we decided on how and where the CO<->TO mappings will be stored? At the Corvallis meeting we hacked something into a pseudo-TD5 file. | |||
*** ''Currently the mappings are being done in a google spreadsheet, which is then converted to an OBO file (see below). '' | |||
* From CM: ''Marie, looks like you are generating some files such as: lentil_withcropname.obo --- which have the bridging axioms merged in. This looks great. Not sure if this is experiments or ready to go, but this is exactly what we need. We should just decide on a standard naming convention, update the READMEs, and maybe set up travis for each CO repo.'' | |||
From MAL: | |||
* About the creation of the lentil file, I have a script in java that takes a TD V5 (in excel format) and creates an OBO file. I use apache poi and obo2owl for that. Then I have a second script that adds the mappings in the OBO file. I get the mappings from a spreadsheet document on Google Drive. Eventually the mappings will be store directly in the TD v5. | |||
* | PJ's Vision: | ||
* Vision: TD --> CSV --> Chris' tool --> merged version --> read-only crop specific | |||
* Want crop-specific groups to be able to pull down a slice of the TO that includes all the crop-specific terms, and the associated TO hierarchy. None of the traits that are not specific to their crop of choice. His goal is to not use the CO OBO, we want to bypass it, and use just the trait dictionary (TD)- wants to avoid CO.obo and reduce overhead for MAL | |||
* MAL doesn't edit the CO.obo, so the current way is already reduced. | |||
** ''No explanation of what "Chris' tool" is...., or how this would actually work'' | |||
* Chris argued for having purls for each of the CO-obo files | |||
* there is a replacement of "prefixes" (/ibp-cassava-traits/) that put in the true link to raw.github.usercontent.com/planteomeetc | |||
* need to be clear on our naming for the purl. | |||
* PURLS are stored here: https://github.com/OBOFoundry/purl.obolibrary.org/blob/master/config/to.yml | |||
* from the chat: MAL should use ontology:to/ibp-FOO-traits/FOO.owl | |||
== | == Update on AmiGO APIs- Justin Elser == | ||
* API code changes have been merged into our version of AmiGO (dev.planteome.org) | * API code changes have been merged into our version of AmiGO (dev.planteome.org) | ||
* These APIs will live on AmiGO, and bisque, AISO, and whoever wants to use them, can use them for pulling definitions, using autocomplete | |||
* Does not work on browser.planteome.org as it will require a reload of all data to work because of a schema change to SOLR | * Does not work on browser.planteome.org as it will require a reload of all data to work because of a schema change to SOLR | ||
Still fixing some bugs: | |||
* Small issue with main readme page not showing up, but the API itself is working | * Small issue with main readme page not showing up, but the API itself is working | ||
** ie. http://dev.planteome.org/api doesn't work | ** ie. http://dev.planteome.org/api doesn't work | ||
** http://dev.planteome.org/api/entity/term/GO:0022008 works and returns JSON | ** http://dev.planteome.org/api/entity/term/GO:0022008 works and returns JSON | ||
* Still testing out speed and working on readme page not working | |||
* | ==Other AmiGO Issues:== | ||
* purls needed again (catalogs can fix this -xml format) | |||
* do a redirect within the catalogs | |||
* put the catalog on GitHub that maps all logical definitions onto the dev versions, the catalog can also insure that protege pulls the dev versions of all the imported ontologies | |||
* more frequent releases will insure that TO doesn't reference updated PO terms, before the PO gets released, the idea is to eventually release PO/TO/etc simultaneously. | |||
* Ideally releases would be monthly | |||
* iPlant mirror working at: http://draco.cyverse.org/amigo | |||
** some issues, like slow response- say something to iplant folks. | |||
== | ==Visit to IRRI March 7th-11th== | ||
* Leo, Marie-Angelique and Austin will visit IRRI Monday March 7th to Friday March 11th | * Leo, Marie-Angelique and Austin will visit IRRI Monday March 7th to Friday March 11th | ||
* Goals: TD revision and data annotation- | * Goals: TD revision and data annotation- | ||
Line 37: | Line 71: | ||
** AM: annotate 140K+ germplasm entries with PD and TO mapping data | ** AM: annotate 140K+ germplasm entries with PD and TO mapping data | ||
Discussion from email | * Discussion from email on the need for an annotation tool: | ||
** EA: "We discussed the visit of Marie, Léo and Austin and the agenda we prepared. I indicated Mau that the team would like to see how IRRI data annotations could go on Planteome. Mau asked where Planteome stands with the annotations tools because this what IRRI is waiting for long. Would you have any points you want the team to share with IRRI on that aspect?" | |||
* MAL/LV: "We need a simple tool that scientists will be able to use to easily annotate their data. Adhoc scripts are short term and one-shot solutions that will serve Planteome but not necessarily serve IRRI because: The GAF2 format that amiGO2 requires is not known by the breeding community and the annotated data will be published on Planteome and not in the IRRI information systems. We should take the opportunity of being at IRRI to provide some guidance on how to include the ontologies in their data model. Annotating the data at the source is beneficial for everyone. IRRI information systems will have interoperable and discoverable data. And Planteome can easily add future annotations from the IRRI systems." | * MAL/LV: "We need a simple tool that scientists will be able to use to easily annotate their data. Adhoc scripts are short term and one-shot solutions that will serve Planteome but not necessarily serve IRRI because: The GAF2 format that amiGO2 requires is not known by the breeding community and the annotated data will be published on Planteome and not in the IRRI information systems. We should take the opportunity of being at IRRI to provide some guidance on how to include the ontologies in their data model. Annotating the data at the source is beneficial for everyone. IRRI information systems will have interoperable and discoverable data. And Planteome can easily add future annotations from the IRRI systems." | ||
Discussion: | |||
* ''From PJ: MAL/LV - forms submitted for international travel: if approved, it will be covered. Need american carriers - so PJ will look into other routes.'' | |||
=== Notes:=== | |||
* Goal is to have the data annotated, and have the trait dictionary in its final version | |||
** add breeding traits to the TD, currently only genebank traits are in there. | |||
* data annotation- Identify datasets for Planteome | |||
* genebank data, breeding database, "breeding for rice"- similar to data in BMS | |||
** http://www.irgcis.irri.org:81/grc/AboutIRGCIS.htm | |||
* 3K genomes of rice in another database, PJ - no trait data, only sequencing data | |||
* Unless that seed packet of sequenced varieties has IRGC phenotypes in their collection. Add EVERYTHING to Planteome | |||
** environment, phenotypes, observed scales/variables, original location, location assay, treatment metadata | |||
* ANYTHING they have to include, we will get included, very first pass will be similar to casava and lentil | |||
* Anything that doesn't fit in the first pass, get the data, and we can evaluate how to make the additional data fit the GAF2 form (they might not be comfortable with giving ALL the data, but get what I can- access to images would be great | |||
* IRGC packets http://www.irgcis.irri.org:81/grc/SearchData.htm | |||
* IR64 mutant collection -another database, plus any other high throughput phenotyping data | |||
* 64K mutant lines with phenotype data | |||
* C3-C4 mutant collection- any data on this would be awesome to get as well. | |||
===Genomic & Opensource Breeding Informatics Initiative (GOBII)=== | |||
* what is GOBII (http://cbsugobii05.tc.cornell.edu/wordpress/) - learn as much about it as possible | |||
** creates field books and templates in a standard way for phenotype collection, more about providing tools than collecting data. push for them to use TO/CO ontologies in these GOBII | |||
===== Tentative Schedule ===== | |||
* Monday Day 1: morning - Austin present Planteome | |||
* Tuesday Day 2: show how lentil and cassava data has been annotated | |||
* GAF2 format - column 16 - not unstructured, it has data relationships included | |||
* Visit to genebank, fields, see the phenotyping platform | |||
* They are doing some automating phenotyping with UAVs- plant physiology, soil physiology | |||
Bottom line: Look at all the data they are collecting, and see if we can get those things integrated in our workflow | |||
we want to attempt to get a meeting with "homebase" | |||
* 8 hours ahead - could meet Wednesday (0800) Philippines = Tuesday (1600) Corvallis | |||
Austin work with their staff to get the phenotyping data format | |||
find exact ID of their varieties- need the EXACT seed packet ID, publicly available IDs to link out to from Planteome, basically need to find out how they store data | |||
== 5. [http://www.phenotypercn.org/?page_id=2750 Phenotype RCN meeting February 26-28, 2016] == | == 5. [http://www.phenotypercn.org/?page_id=2750 Phenotype RCN meeting February 26-28, 2016] == | ||
* PJ attended | * PJ attended | ||
== | == Status of second year funding== | ||
* | * PJ- first year report accepted, will follow up with NSF and find out | ||
*PJ update: | |||
** Annual report approved | |||
** Once money from second year is released: amendment to the subcontracts (elizabeth, chris, et cetera) | |||
==Following items tabled for next meeting:== | |||
===Recent updates on the TO === | |||
* revisions to equivalence axioms- ''occurs_in'' and ''composition'' | |||
* Stem and culm | |||
* anther morphology traits, incl. anther number | |||
* biochemical branch | |||
=== OBA development- Chris Mungall=== | |||
* If there is time I would like to walk through the procedure we use to develop OBA. | |||
* Part of this was covered in the tutorial on template-based ontology development, but we had to rush this part due to lack of time: | |||
https://github.com/Planteome/protege-tutorial/tree/master/template-examples | |||
* For OBA, the source of the ontology is primarily in TSVs, found here: | |||
https://github.com/obophenotype/bio-attribute-ontology/tree/master/src/ontology/modules | |||
* The design patterns are specified here: | |||
https://github.com/obophenotype/bio-attribute-ontology/tree/master/src/ontology/patterns | |||
* Together these are used to build the ontology with equivalence axioms, with the entire ontology hierarchy being inferred automatically, e.g: | |||
http://www.ebi.ac.uk/ols/beta/ontologies/oba/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FOBA_VT0000017 | |||
= Upcoming Meetings and Workshops= | = Upcoming Meetings and Workshops= | ||
Line 56: | Line 157: | ||
* Elizabeth and George are going, EA will present Planteome as part of her talk | * Elizabeth and George are going, EA will present Planteome as part of her talk | ||
== Meeting in Montpellier, 9-13 May 2016 [https://sites.google.com/a/cgxchange.org/cropontologycommunity/home] | == Meeting in Montpellier, 9-13 May 2016 == | ||
Link to tentative agenda/website: [https://sites.google.com/a/cgxchange.org/cropontologycommunity/home] | |||
==[http://www.bio-ontologies.org.uk/call-for-participation BioOntologies SIG] of the [http://www.iscb.org/ismb2016 Intelligent Systems for Molecular Biology (ISMB)]; July 8-12, 2016, Orlando, Florida == | ==[http://www.bio-ontologies.org.uk/call-for-participation BioOntologies SIG] of the [http://www.iscb.org/ismb2016 Intelligent Systems for Molecular Biology (ISMB)]; July 8-12, 2016, Orlando, Florida == |
Latest revision as of 19:34, 7 March 2016
Planteome Ontology WG Zoom Meeting
- Date: Tuesday Mar. 1st, 2016
- Time: 8:15am PST (GMT-8)
- Connection details: Join from PC, Mac, Linux, iOS or Android: https://zoom.us/j/762470789
- Links to recordings:
- Attendees:LC, AM, PJ, CM, JE
Changes and updates in the IBP files
- Review of progress with building the Planteome version of AmiGO with the CO classes merged in
- Three files are currently integrated: rice, cassava, lentil
- wheat- newly mapped and will be added soon, once the issues of the id numbers are resolved
- Latest changes: added the crop common name to all terms, removed the categorical classes (ie: Agronomical Traits), made all CO class names in lower case
- The mappings to TO (is_a) are on the trait class, but there are still some on the cassava variables, as it appeared on the browser. These were originally put in since the variable was not connected, but can now be removed.
- In cassava file, the variable has an is_a relationship to the trait, since the variable_of, method_of and scale_of relations are not being displayed. This is a workaround to get the variable to show as a child (should revert to actual relationships when we can get it sorted out in AmiGO)
- Note: funny synonyms in AmiGO cause weird behavior with autocomplete (not sure what this means..)
- Questions and discussions:
- Have we decided on how and where the CO<->TO mappings will be stored? At the Corvallis meeting we hacked something into a pseudo-TD5 file.
- Currently the mappings are being done in a google spreadsheet, which is then converted to an OBO file (see below).
- Have we decided on how and where the CO<->TO mappings will be stored? At the Corvallis meeting we hacked something into a pseudo-TD5 file.
- From CM: Marie, looks like you are generating some files such as: lentil_withcropname.obo --- which have the bridging axioms merged in. This looks great. Not sure if this is experiments or ready to go, but this is exactly what we need. We should just decide on a standard naming convention, update the READMEs, and maybe set up travis for each CO repo.
From MAL:
- About the creation of the lentil file, I have a script in java that takes a TD V5 (in excel format) and creates an OBO file. I use apache poi and obo2owl for that. Then I have a second script that adds the mappings in the OBO file. I get the mappings from a spreadsheet document on Google Drive. Eventually the mappings will be store directly in the TD v5.
PJ's Vision:
- Vision: TD --> CSV --> Chris' tool --> merged version --> read-only crop specific
- Want crop-specific groups to be able to pull down a slice of the TO that includes all the crop-specific terms, and the associated TO hierarchy. None of the traits that are not specific to their crop of choice. His goal is to not use the CO OBO, we want to bypass it, and use just the trait dictionary (TD)- wants to avoid CO.obo and reduce overhead for MAL
- MAL doesn't edit the CO.obo, so the current way is already reduced.
- No explanation of what "Chris' tool" is...., or how this would actually work
- Chris argued for having purls for each of the CO-obo files
- there is a replacement of "prefixes" (/ibp-cassava-traits/) that put in the true link to raw.github.usercontent.com/planteomeetc
- need to be clear on our naming for the purl.
- PURLS are stored here: https://github.com/OBOFoundry/purl.obolibrary.org/blob/master/config/to.yml
- from the chat: MAL should use ontology:to/ibp-FOO-traits/FOO.owl
Update on AmiGO APIs- Justin Elser
- API code changes have been merged into our version of AmiGO (dev.planteome.org)
- These APIs will live on AmiGO, and bisque, AISO, and whoever wants to use them, can use them for pulling definitions, using autocomplete
- Does not work on browser.planteome.org as it will require a reload of all data to work because of a schema change to SOLR
Still fixing some bugs:
- Small issue with main readme page not showing up, but the API itself is working
- ie. http://dev.planteome.org/api doesn't work
- http://dev.planteome.org/api/entity/term/GO:0022008 works and returns JSON
- Still testing out speed and working on readme page not working
Other AmiGO Issues:
- purls needed again (catalogs can fix this -xml format)
- do a redirect within the catalogs
- put the catalog on GitHub that maps all logical definitions onto the dev versions, the catalog can also insure that protege pulls the dev versions of all the imported ontologies
- more frequent releases will insure that TO doesn't reference updated PO terms, before the PO gets released, the idea is to eventually release PO/TO/etc simultaneously.
- Ideally releases would be monthly
- iPlant mirror working at: http://draco.cyverse.org/amigo
- some issues, like slow response- say something to iplant folks.
Visit to IRRI March 7th-11th
- Leo, Marie-Angelique and Austin will visit IRRI Monday March 7th to Friday March 11th
- Goals: TD revision and data annotation-
- MAL, and Leo will be working on trait dictionary
- AM: annotate 140K+ germplasm entries with PD and TO mapping data
- Discussion from email on the need for an annotation tool:
- EA: "We discussed the visit of Marie, Léo and Austin and the agenda we prepared. I indicated Mau that the team would like to see how IRRI data annotations could go on Planteome. Mau asked where Planteome stands with the annotations tools because this what IRRI is waiting for long. Would you have any points you want the team to share with IRRI on that aspect?"
- MAL/LV: "We need a simple tool that scientists will be able to use to easily annotate their data. Adhoc scripts are short term and one-shot solutions that will serve Planteome but not necessarily serve IRRI because: The GAF2 format that amiGO2 requires is not known by the breeding community and the annotated data will be published on Planteome and not in the IRRI information systems. We should take the opportunity of being at IRRI to provide some guidance on how to include the ontologies in their data model. Annotating the data at the source is beneficial for everyone. IRRI information systems will have interoperable and discoverable data. And Planteome can easily add future annotations from the IRRI systems."
Discussion:
- From PJ: MAL/LV - forms submitted for international travel: if approved, it will be covered. Need american carriers - so PJ will look into other routes.
Notes:
- Goal is to have the data annotated, and have the trait dictionary in its final version
- add breeding traits to the TD, currently only genebank traits are in there.
- data annotation- Identify datasets for Planteome
- genebank data, breeding database, "breeding for rice"- similar to data in BMS
- 3K genomes of rice in another database, PJ - no trait data, only sequencing data
- Unless that seed packet of sequenced varieties has IRGC phenotypes in their collection. Add EVERYTHING to Planteome
- environment, phenotypes, observed scales/variables, original location, location assay, treatment metadata
- ANYTHING they have to include, we will get included, very first pass will be similar to casava and lentil
- Anything that doesn't fit in the first pass, get the data, and we can evaluate how to make the additional data fit the GAF2 form (they might not be comfortable with giving ALL the data, but get what I can- access to images would be great
- IRGC packets http://www.irgcis.irri.org:81/grc/SearchData.htm
- IR64 mutant collection -another database, plus any other high throughput phenotyping data
- 64K mutant lines with phenotype data
- C3-C4 mutant collection- any data on this would be awesome to get as well.
Genomic & Opensource Breeding Informatics Initiative (GOBII)
- what is GOBII (http://cbsugobii05.tc.cornell.edu/wordpress/) - learn as much about it as possible
- creates field books and templates in a standard way for phenotype collection, more about providing tools than collecting data. push for them to use TO/CO ontologies in these GOBII
Tentative Schedule
- Monday Day 1: morning - Austin present Planteome
- Tuesday Day 2: show how lentil and cassava data has been annotated
- GAF2 format - column 16 - not unstructured, it has data relationships included
- Visit to genebank, fields, see the phenotyping platform
- They are doing some automating phenotyping with UAVs- plant physiology, soil physiology
Bottom line: Look at all the data they are collecting, and see if we can get those things integrated in our workflow we want to attempt to get a meeting with "homebase"
- 8 hours ahead - could meet Wednesday (0800) Philippines = Tuesday (1600) Corvallis
Austin work with their staff to get the phenotyping data format find exact ID of their varieties- need the EXACT seed packet ID, publicly available IDs to link out to from Planteome, basically need to find out how they store data
5. Phenotype RCN meeting February 26-28, 2016
- PJ attended
Status of second year funding
- PJ- first year report accepted, will follow up with NSF and find out
- PJ update:
- Annual report approved
- Once money from second year is released: amendment to the subcontracts (elizabeth, chris, et cetera)
Following items tabled for next meeting:
Recent updates on the TO
- revisions to equivalence axioms- occurs_in and composition
- Stem and culm
- anther morphology traits, incl. anther number
- biochemical branch
OBA development- Chris Mungall
- If there is time I would like to walk through the procedure we use to develop OBA.
- Part of this was covered in the tutorial on template-based ontology development, but we had to rush this part due to lack of time:
https://github.com/Planteome/protege-tutorial/tree/master/template-examples
- For OBA, the source of the ontology is primarily in TSVs, found here:
https://github.com/obophenotype/bio-attribute-ontology/tree/master/src/ontology/modules
- The design patterns are specified here:
https://github.com/obophenotype/bio-attribute-ontology/tree/master/src/ontology/patterns
- Together these are used to build the ontology with equivalence axioms, with the entire ontology hierarchy being inferred automatically, e.g:
Upcoming Meetings and Workshops
Biocuration 2016, April 10th-14th,2016; Geneva, Switzerland
- MAL is going
GARNet/Egenis Workshop: Integrating Large Data into Plant Science, April 21st-22nd 2016, Dartington Hall, Totnes, Devon
- Elizabeth and George are going, EA will present Planteome as part of her talk
Meeting in Montpellier, 9-13 May 2016
Link to tentative agenda/website: [1]
BioOntologies SIG of the Intelligent Systems for Molecular Biology (ISMB); July 8-12, 2016, Orlando, Florida
- Dates: July 8th and 9th, with July 9th being the “Phenotype Day”, focused on the systematic description of phenotypes.
- Short papers, up to 4 pages (will be published in JBMS)
- Poster abstracts, up to 1 page
- Flash updates, up to 1 page
7th International Conference on Biological Ontology and BioCreative 2016 Aug 1st to 4th, Corvallis, OR
- Link to program: ICBO + BioCreative Program
- Link to Easy Chair site: https://easychair.org/conferences/conference_info.cgi?a=10776589