Svn migration to git: Difference between revisions
(7 intermediate revisions by the same user not shown) | |||
Line 25: | Line 25: | ||
I decided to test this out with trait.obo first as a proof of concept. Other files should be similar with changes to the include statement | I decided to test this out with trait.obo first as a proof of concept. Other files should be similar with changes to the include statement | ||
<pre>cd /lemma/justin/Poc_svn_migration | <pre>cd /lemma/justin/Poc_svn_migration | ||
cat Poc_svn_dump | svndumpfilter include trunk/ontology/collaborators_ontology/gramene/traits/trait.obo --drop-empty-revs --renumber-revs > trait.obo_filtered_dump | cat Poc_svn_dump | svndumpfilter include /trunk/ontology/collaborators_ontology/gramene/traits/trait.obo --drop-empty-revs --renumber-revs > trait.obo_filtered_dump | ||
</pre> | </pre> | ||
The <pre>svndumpfilter</pre> command will go through the dump file and only output revision info for trait.obo, in this case. The full path needs to be in the include argument, otherwise it output an empty repo. The other arguments will reset the revision numbers so that they start at 1 and have no empty revisions. | The <pre>svndumpfilter</pre> command will go through the dump file and only output revision info for trait.obo, in this case. The full path needs to be in the include argument, otherwise it output an empty repo. The other arguments will reset the revision numbers so that they start at 1 and have no empty revisions. | ||
If the file path needs to be changed, this is the point to do it. Here is a command that I used to change the filename and remove the full path: | |||
<pre> | |||
cat trait.obo_filtered_dump | sed 's/Node-path: trunk\/ontology\/collaborators_ontology\/gramene\/traits\//Node-path: plant-trait-ontology.obo/' > trait.obo_filtered_path_fixed_dump | |||
</pre> | |||
For repos that have had files moved with history before, the above did not work when trying to import into repo in the next step. I used the following for a different repo to get it to work: | |||
<pre>cat associations_dump | sed 's/Node-path: trunk\/associations/Node-path: po-associations/' | sed 's/Node-copyfrom-path: trunk\/associations/Node-copyfrom-path: po-associations/' > associations_filtered_path_dump | |||
</pre> | |||
==Create a new svn repo to hold the filtered data== | ==Create a new svn repo to hold the filtered data== | ||
Line 36: | Line 45: | ||
==Copy the filtered dump to the new repo== | ==Copy the filtered dump to the new repo== | ||
Then back on palea (svn server): | Then back on palea (svn server): | ||
<pre>cd /data/svnrepos | <pre>cd /data/svnrepos | ||
sudo svnadmin load ./svn2git < /lemma/justin/Poc_svn_migration/trait. | sudo svnadmin load ./svn2git < /lemma/justin/Poc_svn_migration/trait.obo_filtered_path_fixed_dump | ||
</pre> | </pre> | ||
Line 60: | Line 55: | ||
==Copy the svn repo to github== | ==Copy the svn repo to github== | ||
===Method if repo in github hasn't been created yet (simpler)=== | |||
Go to https://import.github.com/new and follow the steps there. This will use the SVN repo created above. | |||
===Method if github repo already has files in it=== | |||
I couldn't figure out how to do this direct from the svn, but I figured out how to do it by using an intermediate temporary github repo. | |||
Create a new temporary github repo from the svn using the "simpler" import method described above. | |||
In this example, I created a temporary github repo https://github.com/elser/dbxref that contained just the PO_DBXref.txt file along with its history. I wanted to move it to the Planteome/common-files-for-ref-ontologies repo under Planteome. | |||
<pre>mkdir git_temp | |||
cd git_temp/ | |||
git clone https://github.com/elserj/dbxref.git dbxref | |||
cd dbxref/ | |||
git remote rm origin | |||
git filter-branch --subdirectory-filter dbxref -- --all | |||
git commit | |||
cd ../ | |||
git clone https://github.com/Planteome/common-files-for-ref-ontologies.git common-files-for-ref-ontologies | |||
cd common-files-for-ref-ontologies/ | |||
git remote add origin-dbxref ../dbxref | |||
git pull origin-dbxref master | |||
git remote rm origin-dbxref | |||
git push</pre> | |||
Note that not every line is required, noticeably the "git filter-branch", but they may be needed if the repo has multiple folders or other structure that this example didn't have. |
Latest revision as of 04:18, 22 August 2015
Migration of the ontology respository from svn to git
Motivation
After much discussion with Chris Mungall, it was decided to migrate all relevant files from the POC svn repo to a new planteome repository in [github http://github.com].
Not all files are needed to be moved. Most notably, the association files are too large to be hosted on github, so we are looking for a solution for this.
Also, ontology files besides the plant_ontology.obo file were stored in a collaborators_ontology folder. We decided to restructure the repo to make these at the same level
Dumping the svn repo
To keep the revision info for the files, we need to dump all of that from svn and then filter it out for the specific files we need to move to git.
On Palea
cd /data/svnrepos/Poc svnadmin dump . > /lemma/justin/Poc_svn_migration/Poc_svn_dump
This will dump all repo information including revision info and commit messages to a file named Poc_svn_dump. I stored this on lemma as the file is large (>25GB).
Filtering the dump
Note: the following was run on one of cluster compute nodes because I didn't want to swamp palea working on it.
I decided to test this out with trait.obo first as a proof of concept. Other files should be similar with changes to the include statement
cd /lemma/justin/Poc_svn_migration cat Poc_svn_dump | svndumpfilter include /trunk/ontology/collaborators_ontology/gramene/traits/trait.obo --drop-empty-revs --renumber-revs > trait.obo_filtered_dump
The
svndumpfilter
command will go through the dump file and only output revision info for trait.obo, in this case. The full path needs to be in the include argument, otherwise it output an empty repo. The other arguments will reset the revision numbers so that they start at 1 and have no empty revisions.
If the file path needs to be changed, this is the point to do it. Here is a command that I used to change the filename and remove the full path:
cat trait.obo_filtered_dump | sed 's/Node-path: trunk\/ontology\/collaborators_ontology\/gramene\/traits\//Node-path: plant-trait-ontology.obo/' > trait.obo_filtered_path_fixed_dump
For repos that have had files moved with history before, the above did not work when trying to import into repo in the next step. I used the following for a different repo to get it to work:
cat associations_dump | sed 's/Node-path: trunk\/associations/Node-path: po-associations/' | sed 's/Node-copyfrom-path: trunk\/associations/Node-copyfrom-path: po-associations/' > associations_filtered_path_dump
Create a new svn repo to hold the filtered data
cd /data/svnrepos sudo svnadmin create /data/svnrepos/svn2git sudo chown -R apache: svn2git
Copy the filtered dump to the new repo
Then back on palea (svn server):
cd /data/svnrepos sudo svnadmin load ./svn2git < /lemma/justin/Poc_svn_migration/trait.obo_filtered_path_fixed_dump
At this point, we should have a fresh clean svn repo with only the filtered files in it with all revision info.
Copy the svn repo to github
Method if repo in github hasn't been created yet (simpler)
Go to https://import.github.com/new and follow the steps there. This will use the SVN repo created above.
Method if github repo already has files in it
I couldn't figure out how to do this direct from the svn, but I figured out how to do it by using an intermediate temporary github repo.
Create a new temporary github repo from the svn using the "simpler" import method described above.
In this example, I created a temporary github repo https://github.com/elser/dbxref that contained just the PO_DBXref.txt file along with its history. I wanted to move it to the Planteome/common-files-for-ref-ontologies repo under Planteome.
mkdir git_temp cd git_temp/ git clone https://github.com/elserj/dbxref.git dbxref cd dbxref/ git remote rm origin git filter-branch --subdirectory-filter dbxref -- --all git commit cd ../ git clone https://github.com/Planteome/common-files-for-ref-ontologies.git common-files-for-ref-ontologies cd common-files-for-ref-ontologies/ git remote add origin-dbxref ../dbxref git pull origin-dbxref master git remote rm origin-dbxref git push
Note that not every line is required, noticeably the "git filter-branch", but they may be needed if the repo has multiple folders or other structure that this example didn't have.