Data storage and AmiGO2 Working Group: Difference between revisions
Jump to navigation
Jump to search
Line 11: | Line 11: | ||
* Investigate using HDF ([http://en.wikipedia.org/wiki/Hierarchical_Data_Format Hierarchical Data Format]), as a storage format for any numerical or sequence-based data. | * Investigate using HDF ([http://en.wikipedia.org/wiki/Hierarchical_Data_Format Hierarchical Data Format]), as a storage format for any numerical or sequence-based data. | ||
* Create an efficient way to add annotations incrementally to the database, (not possible in the current AmiGO database) | * Create an efficient way to add annotations incrementally to the database, (not possible in the current AmiGO database) | ||
* Implementation of OLAP (Online Analytical Processing) data cubes (http://en.wikipedia.org/wiki/OLAP_cube) | |||
===3.3 Integration with the iPlant infrastructure === | ===3.3 Integration with the iPlant infrastructure === | ||
* Initial design and testing will happen locally at the Center for Genome Research and Biocomputing at Oregon State University | |||
* Utilization of high performance computing resources, such as: | |||
** The supercomputer 'Stampede' at Texas Advanced Computing Center (TACC) | |||
** Use of iRODS at iPlant for data file storage and retrieval | |||
** Image hosting via Bisque hosted on the iPlant infrastructure (See 3.4, below) | |||
===3.4 Storage of Annotated Digital image collections=== | ===3.4 Storage of Annotated Digital image collections=== |
Revision as of 19:03, 11 April 2015
Goals and Objectives:
== Aim-3: Develop an online informatics portal and data warehouse for ontology-based, annotated plant genome data and plant genomes.
- Deliverables: A centralized portal for common reference ontologies for plants and the associated data sets. Novel data store and web user interface.
3.1 Planteome Web Portal Development
- Drupal portal will host the AmiGO browser, the ontology database developed by the PO and the GO consortium, and a BioMart
- Transition to AmiGO 2.0 with new features
3.2 Planteome Data Warehouse Development
- Novel data warehouse for storing both the ontologies and annotation data based on NoSQL (e.g. MongoDB, http://www.mongodb.org, and Apache™ Hadoop®, http://hadoop.apache.org)
- Integrate the MapReduce algorithm to increase scalability and performance
- Investigate using HDF (Hierarchical Data Format), as a storage format for any numerical or sequence-based data.
- Create an efficient way to add annotations incrementally to the database, (not possible in the current AmiGO database)
- Implementation of OLAP (Online Analytical Processing) data cubes (http://en.wikipedia.org/wiki/OLAP_cube)
3.3 Integration with the iPlant infrastructure
- Initial design and testing will happen locally at the Center for Genome Research and Biocomputing at Oregon State University
- Utilization of high performance computing resources, such as:
- The supercomputer 'Stampede' at Texas Advanced Computing Center (TACC)
- Use of iRODS at iPlant for data file storage and retrieval
- Image hosting via Bisque hosted on the iPlant infrastructure (See 3.4, below)
3.4 Storage of Annotated Digital image collections
3.5 Application Programming Interface (APIs)
Participants
- Jaiswal Lab (OSU, BPP): Justin Elser
- Mungall Group (Lawrence Berkeley National Laboratory): Chris Mungall (Co-PI), Seth Carbon
- Zhang Lab (OSU, EECS): Eugene Zhang (Co-PI), Botong Qu (CS Ph.D. student)
Data storage and AmiGO 2 Working Group Meetings:
- Data storage and AmiGO 2 call 1-30-15
- Who: PJ, CM, Seth, EZ, LC, JP, JE
- Discussion of the planned transition to the AmiGO 2.0 platform
- JE is working on installing SolR database - View details and progress reports here: AmiGO2_install
- Data Storage and AmiGO2 call 2-18-15 Media:Data_Storage_2-18-15.mp4
- Who: JE, JP, EZ
- Further discussion of AmiGO2 progress and overview of AmiGO2 interface
Relevant links:
https://github.com/geneontology/amigo
Demo: http://amigo.geneontology.org/
http://amigo2.berkeleybop.org/ - dev server