Data storage and AmiGO2 Working Group

From Planteome.org
Revision as of 10:25, 20 May 2015 by Cooperl (Talk | contribs) (May 1st meeting)

Jump to: navigation, search

Goals and Objectives:

Aim-3: Develop an online informatics portal and data warehouse for ontology-based, annotated plant genome data and plant genomes.

  • Deliverables: A centralized portal for common reference ontologies for plants and the associated data sets. Novel data store and web user interface.

3.1 Planteome Web Portal Development

  • Drupal portal will host the AmiGO browser, the ontology database (similar to the one developed by the PO and the GO), and a BioMart
  • Transition to AmiGO 2.0 with new features

3.2 Planteome Data Warehouse Development

  • Novel data warehouse for storing both the ontologies and annotation data based on NoSQL (e.g. MongoDB, http://www.mongodb.org, and Apache™ Hadoop®, http://hadoop.apache.org)
  • Integrate the MapReduce algorithm to increase scalability and performance
  • Investigate using HDF (Hierarchical Data Format), as a storage format for any numerical or sequence-based data.
  • Create an efficient way to add annotations incrementally to the database, (not possible in the current AmiGO database)
  • Implementation of OLAP (Online Analytical Processing) data cubes (http://en.wikipedia.org/wiki/OLAP_cube)

3.3 Integration with the iPlant infrastructure

  • Initial design and testing will happen locally at the Center for Genome Research and Biocomputing at Oregon State University
  • Use of virtual machine (VM) images in the iPlant cloud computing environments
  • Utilization of high performance computing resources, such as:
    • The supercomputer 'Stampede' at Texas Advanced Computing Center (TACC)
    • Use of iRODS at iPlant for data file storage and retrieval
    • Image hosting via Bisque hosted on the iPlant infrastructure (See 3.4, below)
  • Interaction with resources such as CoGE, Bisque, and the Integrated Breeding Platform (IBP)

3.4 Library of Publicly-Accessible, Annotated Digital Images

  • Design a relational data schema to support the large-scale storage of annotated images (and their associated metadata)
  • Image library main goal: A training set for a new auto-segmentation and annotation active-learning algorithm
  • Support other visual analysis tools and the integration of image data with ontology data
  • Will also function as a home for community-contributed image data

3.5 Application Programming Interface (APIs)

  • Develop of publicly available APIs for both internal and external data access to ontology terms and annotations
  • Extend the existing lightweight web services providing Plant Ontology terms, synonyms, and definitions to the Planteoem APIs, including direct web service access to annotated data
  • Potential Users:
  • Integrate our data with other external APIs, For example:
    • EBI (the Gene Expression Atlas, Ensembl Plants, IntAct),
    • ERA-CAPS (genotype-to-phenotype data)
    • DOE KBase
    • GCP Integrated Breeding Platform
    • Agave on iPlant which provides web-focused developer access to the iPlant data store and other integration services, providing a direct link to high-performance computing systems such as the TACC.

Participants

  • Jaiswal Lab (OSU, BPP): Justin Elser
  • Mungall Group (Lawrence Berkeley National Laboratory): Chris Mungall (Co-PI), Seth Carbon
  • Zhang Lab (OSU, EECS): Eugene Zhang (Co-PI), Botong Qu (CS Ph.D. student)

Data storage and AmiGO 2 Working Group Meetings:

Data storage and AmiGO 2 call 1-30-15

  • Who: PJ, CM, Seth, EZ, LC, JP, JE
  • Discussion of the planned transition to the AmiGO 2.0 platform
  • JE is working on installing SolR database - View details and progress reports here: AmiGO2_install

Data Storage and AmiGO2 call 2-18-15


  • Further discussion of AmiGO2 progress and overview of AmiGO2 interface

Planteome Interface meeting 3-2-15

Who: EZ, LC, JE, PJ ...??

  • EZ: First thing: focus of the project, speed up queries, data visualization, data organization
  • User Interface: it would be good to see how the users operate and what kinds of questions they would ask.
  • Amigo 2, could build enhanced interface so users could inspect their data and build queries.
  • what functionalities are needed?
  • Questions:
    • How will the users access the interface?
    • How will they use the interface?

Involvement of other organisms? e.g. viruses, animals, etc

  • Could the tools be expanded to other organisms
  • Crops are interacting with the environment, other organisms

AmiGO2 introduction

  • show filtered image
  • Cytoscape Sif file, ontology network
  • bring in other ontologies, Rice Xanthomanas example
  • ontology formats

Data storage and AmiGO2 Working Group Meeting 4-16-15

Who: EZ, BQ, PJ, JP, LC, JE

  • Introductions
  • Storing code developed on Planteome project on Github
  • posting notes on wiki
  • data visualization aspect within AmiGO2- work with JE on
  • Data annotation- wikipedia style?
  • different levels of access
  • To do: Send link to Eugene and Botong- circulate the developer position ad

Relevant links:


Data storage and AmiGO2 Working Group meeting 5-1-15

Who:

Demo of GO enrichment tool

  • Set of genes you can get a list of Arabidopsis thaliana genes from the attached file. (email)
  • Gene pages at Gramene e.g. http://archive.gramene.org/db/genes/search_gene?acc=GR:0101175 or elsewhere
  • Start thinking about building a gene database, with different access levels for admins and curators, limited in what fields are allowed to be edited by what level.

This is for building the community gene annotation tools. The annotated data from this database can be synced with the AmiGOdb version that Justin Elser and Chris are working on.

I hope this will give Bo, enough width to work until the AMIGO build and datasets mature.

BinGO http://bioinformatics.oxfordjournals.org/content/21/16/3448.full AgriGO http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2896167/

May 15th meeting