Skip to content

GitHub to Gist Service Migration

Simon Urbanek edited this page May 14, 2017 · 7 revisions

Since RCloud 1.8 we support three back-ends for notebooks: GitHub (and GitHub Enterprise), gitgist (local git repositories) and RCloud Gist Service (centralized server on top of git repositories). This page describes the process of migrating from GitHub back-end to RCloud Gist Service.

The main difference is that GitHub uses its own user management and authentication mechanism, whereas RCloud Gist Service plugs into the RCloud authentication (SessionKeyServer). The benefit is that there is now only a single authentication authority and a single token that governs both execution and notebook access.

Typical GH setup in an enterprise setting:

GitHub <=> RCloud compute instances <-> SessionKeyServer

after migration

RCloud compute instances <====> RCloud Gist Service
            |                          |
            +==> Session Key Server <--+

which means that SKS has to be accessible both from the compute nodes as well as the gist service. Moreover if multiple RCloud instances uses the same Gist Service, they have to be registered with the Service such that it knows which SKS to authenticate against, .e.g:

RCloud Instance 1 <-->|  Gist   |<--> RCloud Instance 2
  SKS 1 <-------------| Service |----> SKS 2

Migration

When migrating an existing GitHub Enterprise installation, use the following process:

  1. upgrade to RCloud 1.8
  2. install github_0.9.9 package (install.packages("github",,"http://rforge.net"))
  3. install Java on the machine/VM that you will be using for the Gist Service. The default port is 13020 so make sure it is accessible from RCloud compute nodes. Unlike GH installations it doesn't need to be client-visible as it only provides API access. Finally, make sure it can reach the SessionKeyServer instances you have (typically on port 4301).
  4. set GitHub into mainenance mode
  5. create a backup using GHE backup utilities, preferably from the Gist Service machine
  6. create destination directory for the Gist Service. A typical choice is /data/rcloud/data/gist-service for standard RCloud installations, but it can be any directory.
  7. use migration script scripts/migrate-ghe2gists.pl to copy gists from GHE backup into the Gist Service directory, e.g. perl scripts/migrate-ghe2gists.pl /shared/ghe/backup/current /data/rcloud/data/gist-service Note that this step can take quite a while depending on the disk speed and number of gists that need to be migrated (~30k notebooks at 4Gb take ~45min on a fast RAID array).
  8. if you have users using private keys you will have to migrate them between authentication methods, because SKS stores keys separate for each method. Note that this can be done at any point as it is independent of the Gist back-end.
    1. update to latest SKS sources (typically by running git pull in /data/rcloud/services/SessionKeyServer) and run make CopyKeys.jar in the SessionKeyServer directory to create key migration tool
    2. stop SessionKeyServer. All actions below have to be performed as the same user that is normally running SKS.
    3. determine which RCloud execution authentication method is used in your installation, e.g. auth/pam if you use PAM
    4. run java -jar CopyKeys.jar -d key.db stored auth/pam where the last argument is your RCloud execution authentication method
    5. start SessionKeyServer
  9. Download the Gist Service (currently https://github.com/MangoTheCat/rcloud-gist-services) - although you can build it yourself, pre-made binary JAR with configuration is available at https://github.com/att/rcloud/releases/download/rcloud-gist-service-0.3.1-rc/rcloud-gist-service-0.3.1.tar.gz
  10. Edit application.yml to match your setup. The sample is setup to use /data/rcloud locations and local SKS. Typically you will want to check gists: section, in particular root: which should match the directory you used above and keyservers: section which should list all SKS instances that will be using this service. The names in the list should match the github.client.id as defined in rcloud.conf of each instance or default which is used for all unknown client ids. You can enable SSL if you wish (uses the same JKS format as SKS itself) and also check the location of the log file.
  11. Start the service java -jar rcloud-gist-service-0.3.1.jar

The final step is to configure RCloud to use the new service and restart. Example entries for rcloud.conf:

github.client.id: default
github.client.secret: X
github.api.url: https://rcloud.research.att.com:13020/
github.auth.forward: https://rcloud.research.att.com/login_successful.R
github.auth: exec.token
rational.githubgist: true

In detail, github.auth.forward must point to the login_successful.R entry of your instance - the same you used when registering GitHub application. The github.client.id can be an arbitrary name, but if you use multiple RCloud instances against one service they must have distinct names and those names must have corresponding entries the keyservers: section. Note that github.client.secret must be set to some value although it is not actually used. github.api.url: must be set to the gist service URL. Finally, github.auth: tells RCloud to not use OAuth but instead use the token as execution token and rational.githubgist: disables work-arounds for GitHub-specific idiosyncrasies (race conditions in the API, inability to self-fork/bi-fork etc.). Restart RCloud (rcloud-qap and rcloud-script services).

Troubleshooting

If you get 415 errors (invalid content) on writes then you have not upgraded the github package to 0.9.9.

If RCloud is complaining on connect that it cannot use read-only gist backend in the main section, then it's likely that you are missing at last one of the entries above - don't forget to include both github.client.id: and github.client.secret: even though the secret is actually verified.