Data storage and AmiGO2 Working Group: Difference between revisions
No edit summary |
|||
Line 144: | Line 144: | ||
Video P.1: [[File:Planteome Data Vis Mtg p1 5-27-15-video.mp4|thumbnail|Planteome Data Vis Mtg p1 5-27-15-video.mp4]] | Video P.1: [[File:Planteome Data Vis Mtg p1 5-27-15-video.mp4|thumbnail|Planteome Data Vis Mtg p1 5-27-15-video.mp4]] | ||
Video P.2: [[File:Data_Visualization_5-27-15_part_2.mp4|thumbnail|Data_Visualization_5-27-15_part_2.mp4]] | |||
Revision as of 23:26, 27 May 2015
Goals and Objectives:
Aim-3: Develop an online informatics portal and data warehouse for ontology-based, annotated plant genome data and plant genomes.
- Deliverables: A centralized portal for common reference ontologies for plants and the associated data sets. Novel data store and web user interface.
3.1 Planteome Web Portal Development
- Drupal portal will host the AmiGO browser, the ontology database (similar to the one developed by the PO and the GO), and a BioMart
- Transition to AmiGO 2.0 with new features
3.2 Planteome Data Warehouse Development
- Novel data warehouse for storing both the ontologies and annotation data based on NoSQL (e.g. MongoDB, http://www.mongodb.org, and Apache™ Hadoop®, http://hadoop.apache.org)
- Integrate the MapReduce algorithm to increase scalability and performance
- Investigate using HDF (Hierarchical Data Format), as a storage format for any numerical or sequence-based data.
- Create an efficient way to add annotations incrementally to the database, (not possible in the current AmiGO database)
- Implementation of OLAP (Online Analytical Processing) data cubes (http://en.wikipedia.org/wiki/OLAP_cube)
3.3 Integration with the iPlant infrastructure
- Initial design and testing will happen locally at the Center for Genome Research and Biocomputing at Oregon State University
- Use of virtual machine (VM) images in the iPlant cloud computing environments
- Utilization of high performance computing resources, such as:
- The supercomputer 'Stampede' at Texas Advanced Computing Center (TACC)
- Use of iRODS at iPlant for data file storage and retrieval
- Image hosting via Bisque hosted on the iPlant infrastructure (See 3.4, below)
- Interaction with resources such as CoGE, Bisque, and the Integrated Breeding Platform (IBP)
3.4 Library of Publicly-Accessible, Annotated Digital Images
- Design a relational data schema to support the large-scale storage of annotated images (and their associated metadata)
- Image library main goal: A training set for a new auto-segmentation and annotation active-learning algorithm
- Support other visual analysis tools and the integration of image data with ontology data
- Will also function as a home for community-contributed image data
3.5 Application Programming Interface (APIs)
- Develop of publicly available APIs for both internal and external data access to ontology terms and annotations
- Extend the existing lightweight web services providing Plant Ontology terms, synonyms, and definitions to the Planteoem APIs, including direct web service access to annotated data
- Potential Users:
- Gramene project- information about annotations and ontologies
- DOE KBase project (http://kbase.science.energy.gov/)
- iPlant tools and services.
- Integrate our data with other external APIs, For example:
- EBI (the Gene Expression Atlas, Ensembl Plants, IntAct),
- ERA-CAPS (genotype-to-phenotype data)
- DOE KBase
- GCP Integrated Breeding Platform
- Agave on iPlant which provides web-focused developer access to the iPlant data store and other integration services, providing a direct link to high-performance computing systems such as the TACC.
Participants
- Jaiswal Lab (OSU, BPP): Justin Elser
- Mungall Group (Lawrence Berkeley National Laboratory): Chris Mungall (Co-PI), Seth Carbon
- Zhang Lab (OSU, EECS): Eugene Zhang (Co-PI), Botong Qu (CS Ph.D. student)
Data storage and AmiGO 2 Working Group Meetings:
Data storage and AmiGO 2 call 1-30-15
- Who: PJ, CM, Seth, EZ, LC, JP, JE
- Discussion of the planned transition to the AmiGO 2.0 platform
- JE is working on installing SolR database - View details and progress reports here: AmiGO2_install
Data Storage and AmiGO2 call 2-18-15
- Who: JE, JP, EZ
- Link to video recording: Media:Data_Storage_2-18-15.mp4
- Link to audio recording: File:Data Storage Meeting audio only 2-18-15.m4a
- Further discussion of AmiGO2 progress and overview of AmiGO2 interface
Planteome Interface meeting 3-2-15
Who: EZ, LC, JE, PJ ...??
- Link to video recording: File:Planteome Interface mtg video 3-2-15.mp4
- Link to audio recording: File:Planteome Interface mtg audio 3-2-15.m4a
- EZ: First thing: focus of the project, speed up queries, data visualization, data organization
- User Interface: it would be good to see how the users operate and what kinds of questions they would ask.
- Amigo 2, could build enhanced interface so users could inspect their data and build queries.
- what functionalities are needed?
- Questions:
- How will the users access the interface?
- How will they use the interface?
Involvement of other organisms? e.g. viruses, animals, etc
- Could the tools be expanded to other organisms
- Crops are interacting with the environment, other organisms
AmiGO2 introduction
- show filtered image
- Cytoscape Sif file, ontology network
- bring in other ontologies, Rice Xanthomanas example
- ontology formats
Data storage and AmiGO2 Working Group Meeting 4-16-15
Who: EZ, BQ, PJ, JP, LC, JE
- Link to video recording: File:Data storage and AmiGO2 video 4-16 15.mp4
- Link to audio recording: File:Data storage and AmiGO2 audio 4-16 15.m4a
- Introductions
- Storing code developed on Planteome project on Github
- posting notes on wiki
- data visualization aspect within AmiGO2- work with JE on
- Data annotation- wikipedia style?
- different levels of access
- To do: Send link to Eugene and Botong- circulate the developer position ad
Relevant links:
- https://github.com/geneontology/amigo
- Demo: http://amigo.geneontology.org/
- http://amigo2.berkeleybop.org/ - dev server
Data storage and AmiGO2 Working Group meeting 5-1-15
Who: PJ, BQ, EZ, CM, JE
- Link to video: File:Data Storage + AmiGO2 meeting 5-1-15-video.mp4
- Link to video: File:Data Storage + AmiGO2 meeting 5-1-15-audio.m4a
Demo of GO enrichment tool
- At the GO website (www.geneontology.org)
- See how it works there compared to the one at AGRIGO (http://bioinfo.cau.edu.cn/agriGO/analysis.php). See if your NIH proposal work with Yanmin Di overlaps and can be adopted.
- Set of genes you can get a list of Arabidopsis thaliana genes from the attached file. (email)
- Gene pages at Gramene e.g. http://archive.gramene.org/db/genes/search_gene?acc=GR:0101175 or elsewhere
- Start thinking about building a gene database, with different access levels for admins and curators, limited in what fields are allowed to be edited by what level.
This is for building the community gene annotation tools. The annotated data from this database can be synced with the AmiGOdb version that Justin Elser and Chris are working on.
I hope this will give Bo, enough width to work until the AMIGO build and datasets mature.
BinGO http://bioinformatics.oxfordjournals.org/content/21/16/3448.full AgriGO http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2896167/
Data storage and AmiGO2 Working Group Meeting 5-15-15
Who: PJ, EZ, BQ, JE, CM
- Continue to show how the enrichment analysis tools work(agriGO).
- Discuss about the language and character problem on the page.
- Agreed to start with design the requirement specification.
- Share the old database scheme with EZ and BQ.
- Show the Annotation database to EZ and BQ (unfinished, need to be continued).
EZ and BQ will start with analyzing the old database scheme and try to study the enrichment analysis algorithms, the first goal of the group is to show the search result through tables. The next step will be visualize it with Graph.
Data storage and visualization Working Group Meeting 5-27-15
Who: PJ, EZ, BQ, JE, LC, 2 REU students (Christian and Marquis)
Link to recordings (in 2 parts): Audio P.1: File:Planteome Data Vis Mtg p1 5-27-15-audio only.m4a
Video P.1: File:Planteome Data Vis Mtg p1 5-27-15-video.mp4
Video P.2: File:Data Visualization 5-27-15 part 2.mp4
- Explanation of ontologies to new students
- Overview of older AmiGO browser
- Show planteome.cgrb tool for visualizing homology cluster heatmaps and orthology http://planteome.cgrb.oregonstate.edu/heatmap/infoChoice.php
- Used gene set from above to show AgriGO usage