Difference between revisions of "Data storage and AmiGO2 Working Group"

From Planteome.org
Jump to: navigation, search
(Goals and Objectives:)
(Goals and Objectives:)
Line 11: Line 11:
 
* Investigate using HDF ([http://en.wikipedia.org/wiki/Hierarchical_Data_Format Hierarchical Data Format]),  as a storage format for any numerical or sequence-based data.
 
* Investigate using HDF ([http://en.wikipedia.org/wiki/Hierarchical_Data_Format Hierarchical Data Format]),  as a storage format for any numerical or sequence-based data.
 
* Create an efficient way to add annotations incrementally to the database, (not possible in the current AmiGO database)
 
* Create an efficient way to add annotations incrementally to the database, (not possible in the current AmiGO database)
 +
* Implementation of OLAP (Online Analytical Processing) data cubes (http://en.wikipedia.org/wiki/OLAP_cube)
  
 
===3.3 Integration with the iPlant infrastructure ===
 
===3.3 Integration with the iPlant infrastructure ===
 +
* Initial design and testing will happen locally at the Center for Genome Research and Biocomputing at Oregon State University
 +
* Utilization of high performance computing resources, such as:
 +
** The supercomputer 'Stampede' at Texas Advanced Computing Center (TACC)
 +
** Use of iRODS at iPlant for data file storage and retrieval
 +
** Image hosting via Bisque hosted on the iPlant infrastructure (See 3.4, below)
  
 
===3.4 Storage of Annotated Digital image collections===
 
===3.4 Storage of Annotated Digital image collections===

Revision as of 12:03, 11 April 2015

Goals and Objectives:

== Aim-3: Develop an online informatics portal and data warehouse for ontology-based, annotated plant genome data and plant genomes.

  • Deliverables: A centralized portal for common reference ontologies for plants and the associated data sets. Novel data store and web user interface.

3.1 Planteome Web Portal Development

  • Drupal portal will host the AmiGO browser, the ontology database developed by the PO and the GO consortium, and a BioMart
  • Transition to AmiGO 2.0 with new features

3.2 Planteome Data Warehouse Development

  • Novel data warehouse for storing both the ontologies and annotation data based on NoSQL (e.g. MongoDB, http://www.mongodb.org, and Apache™ Hadoop®, http://hadoop.apache.org)
  • Integrate the MapReduce algorithm to increase scalability and performance
  • Investigate using HDF (Hierarchical Data Format), as a storage format for any numerical or sequence-based data.
  • Create an efficient way to add annotations incrementally to the database, (not possible in the current AmiGO database)
  • Implementation of OLAP (Online Analytical Processing) data cubes (http://en.wikipedia.org/wiki/OLAP_cube)

3.3 Integration with the iPlant infrastructure

  • Initial design and testing will happen locally at the Center for Genome Research and Biocomputing at Oregon State University
  • Utilization of high performance computing resources, such as:
    • The supercomputer 'Stampede' at Texas Advanced Computing Center (TACC)
    • Use of iRODS at iPlant for data file storage and retrieval
    • Image hosting via Bisque hosted on the iPlant infrastructure (See 3.4, below)

3.4 Storage of Annotated Digital image collections

3.5 Application Programming Interface (APIs)

Participants

  • Jaiswal Lab (OSU, BPP): Justin Elser
  • Mungall Group (Lawrence Berkeley National Laboratory): Chris Mungall (Co-PI), Seth Carbon
  • Zhang Lab (OSU, EECS): Eugene Zhang (Co-PI), Botong Qu (CS Ph.D. student)

Data storage and AmiGO 2 Working Group Meetings:

  • Data storage and AmiGO 2 call 1-30-15
    • Who: PJ, CM, Seth, EZ, LC, JP, JE

- Discussion of the planned transition to the AmiGO 2.0 platform

- JE is working on installing SolR database - View details and progress reports here: AmiGO2_install

- Further discussion of AmiGO2 progress and overview of AmiGO2 interface


Relevant links:

https://github.com/geneontology/amigo

Demo: http://amigo.geneontology.org/

http://amigo2.berkeleybop.org/ - dev server