Revision as of 12:03, 11 April 2015

Goals and Objectives:

== Aim-3: Develop an online informatics portal and data warehouse for ontology-based, annotated plant genome data and plant genomes.

Deliverables: A centralized portal for common reference ontologies for plants and the associated data sets. Novel data store and web user interface.

Drupal portal will host the AmiGO browser, the ontology database developed by the PO and the GO consortium, and a BioMart
Transition to AmiGO 2.0 with new features

Novel data warehouse for storing both the ontologies and annotation data based on NoSQL (e.g. MongoDB, http://www.mongodb.org, and Apache™ Hadoop®, http://hadoop.apache.org)
Integrate the MapReduce algorithm to increase scalability and performance
Investigate using HDF (Hierarchical Data Format), as a storage format for any numerical or sequence-based data.
Create an efficient way to add annotations incrementally to the database, (not possible in the current AmiGO database)
Implementation of OLAP (Online Analytical Processing) data cubes (http://en.wikipedia.org/wiki/OLAP_cube)

Initial design and testing will happen locally at the Center for Genome Research and Biocomputing at Oregon State University
Utilization of high performance computing resources, such as:
- The supercomputer 'Stampede' at Texas Advanced Computing Center (TACC)
- Use of iRODS at iPlant for data file storage and retrieval
- Image hosting via Bisque hosted on the iPlant infrastructure (See 3.4, below)

Jaiswal Lab (OSU, BPP): Justin Elser
Mungall Group (Lawrence Berkeley National Laboratory): Chris Mungall (Co-PI), Seth Carbon
Zhang Lab (OSU, EECS): Eugene Zhang (Co-PI), Botong Qu (CS Ph.D. student)

- Discussion of the planned transition to the AmiGO 2.0 platform

- JE is working on installing SolR database - View details and progress reports here: AmiGO2_install

Data Storage and AmiGO2 call 2-18-15 Media:Data_Storage_2-18-15.mp4
- Who: JE, JP, EZ

- Further discussion of AmiGO2 progress and overview of AmiGO2 interface

Relevant links:

@@ Line 11: / Line 11: @@
 * Investigate using HDF ([http://en.wikipedia.org/wiki/Hierarchical_Data_Format Hierarchical Data Format]),  as a storage format for any numerical or sequence-based data.
 * Create an efficient way to add annotations incrementally to the database, (not possible in the current AmiGO database)
+* Implementation of OLAP (Online Analytical Processing) data cubes (http://en.wikipedia.org/wiki/OLAP_cube)
 ===3.3 Integration with the iPlant infrastructure ===
+* Initial design and testing will happen locally at the Center for Genome Research and Biocomputing at Oregon State University
+* Utilization of high performance computing resources, such as:
+** The supercomputer 'Stampede' at Texas Advanced Computing Center (TACC)
+** Use of iRODS at iPlant for data file storage and retrieval
+** Image hosting via Bisque hosted on the iPlant infrastructure (See 3.4, below)
 ===3.4 Storage of Annotated Digital image collections===