<< Previous | Home

Selective repository import via "git filter-branch"

This article is concerned with selective importing of file repositories while keeping version history intact. It is not particularly about Java technology, but I'm tossing it in here anyway since I have been blogging here about my adventures with version control software before. What has changed for me since 2013 is that the project has now expanded and grown together with some other projects, which all use Git and GitHub. It's no longer a one-man project. The core parts of my Java EE code are still managed by Fossil, but other parts are being migrated to GitHub in order to make life a bit easier for the team as a whole.

The first part of the migration is to export the entire Fossil repository to a file in Git's "fast-import" format. Just cd to your Fossil working directory and do:

    fossil export --git > ~/repo.data

The second part of the migration concerns importing this file. Nothing is Fossil-specific about this, it works the same way if the original export was made from Subversion, Mercurial, CVS, or whatever. First cd to wherever you want to store your new Git repo, then do:

    git init new-repo
    cd new-repo
    git fast-import < ~/repo.data
    git checkout trunk

The last command checks out the branch you want filtered. It is named "trunk" in this example since in my case the original repository lived in Subversion before it became a Fossil repository.

The third part concerns filtering the checked-out branch. This is the selective part of the procedure. Create a script "filter.pl" that when given a pathname as input, prints it out if it should be removed from the branch (e.g. filtered out), and prints nothing otherwise:

    #!/usr/bin/perl
    if(m{/naughty-secrets.txt$}) { print; }
    elsif(!m{/(subproject1|subproject2|subproject3)/}) { print; }
    1;

The above example script removes any file named "naughty-secrets.txt" in any directory, and everything that is not stored under directories named "subproject1", "subproject2", or "subproject3."

Now run the git filter-branch command:

    git filter-branch --index-filter \
	   "git ls-files --cached | \
	       perl -n /home/user/filter.pl | \
	       xargs -r git rm --cached --ignore-unmatch -- >/dev/null" \
	   --prune-empty -- --all

This make take a while. The filtering script will be called once for each commit in the branch. To speed it up a bit you can mount a tmpfs file system (or some other ramdisk implementation), and move the stuff there. When the filtering is finished you should inspect your tree to check that it contains all the files that should be in the import, and no others. Check some sample file git-logs to verify that the history is visible. Not all history may remain however in cases where files have been moved around and renamed so that parts of their history are rejected by the filter.

When satisfied with the filtering part, you can now proceed with the optional step of setting the author and committer attributes:

    git filter-branch -f --env-filter "
         GIT_AUTHOR_NAME='Firstname Lastname'
         GIT_AUTHOR_EMAIL='github@contact.example.com'
         GIT_COMMITTER_NAME='Firstname Lastname'
         GIT_COMMITTER_EMAIL='github@contact.example.com'
      " -- --all

This is much faster than the file filtering part, and especially so if the filtered repo is a lot smaller than the original.

Finally, you can now merge the selective import into your target repo. This is also optional of course. Here is an example:

    cd ../target-repo
    git pull ../new-repo
    git mv trunk/somedir target-root
    git commit

Depending on your particular circumstances, there may be more moving and renaming needed to fit the imported stuff correctly into the target repository.

 

 

 

418 I'm a teapot

Problem: How to handle session timeouts for AJAX calls made from a plain HTML + jQuery page?

The webapp has a GUI where the user logs in via a form and gets an authentication cookie. A servlet filter handles authentication and accepts a valid cookie, or a valid Authorization (Basic) header. The latter is intended for web service calls made from other applications.

When a non-authenticated request arrives, the filter detects whether it is for a web service or for a GUI resource (HTML or whatever). If it's for a GUI resource the login page is presented. If it's for a web service the response is 401 Unauthorized since non-browser clients are expected to use Basic authentication with specially generated API keys and passwords.

This works fine, except in the case where the cookie has expired and the user clicks on something that generates an AJAX call to the web service API. Since this call is now unauthenticated, the response will be 401 which causes the browser to present the Basic username/password popup. The user's normal login credentials won't work here, and we wouldn't want Basic authentication for the browser anyway.

Changing the response to be the same as for GUI resources won't help, because the AJAX code can't distinguish it from a successful 200 response. Of course a special header or extra cookie could be added, but that's awkward to deal with.

But I found a trick: Use a HTTP fail-code that is not handled transparently by the browser. I picked 418 I'm a teapot which was introduced in RFC 2324 as an April Fools' joke. Then I put this in my jQuery startup code:
 

    $.ajaxSetup({
        statusCode: {
            418: function() {
		     window.location.reload();
                 }
        }
    });

This ensures that all AJAX calls on all web pages will force a page reload whenever status 418 is returned, and this in turn will present the login page to the user.

Credits to this Stackoverflow answer for this elegant solution to a similar problem.

The 418 response is only returned for AJAX calls from the browser. Other clients still get the proper 401 response. In order to distinguish browsers from other clients, I now just check if the request has a User-Agent header that starts with "Mozilla/5.0", which apparently is the case for all modern browsers these days.

Double.MIN_VALUE is positive

Note to self: Double.MIN_VALUE is positive. Repeat again: positive.

Its larger cousin Float.MIN_VALUE is also positive.

It's not negative like the MIN_VALUE properties of all the other signed numeric data types like Integer, Byte, Short,  and Long.

Aaaaargh!!!!!