[nycbug-talk] Git Tangent: [Was: svnup(1) - worthy of promotion to base?]

Isaac (.ike) Levy ike at blackskyresearch.net
Mon Mar 11 12:58:34 EDT 2013

On Mar 11, 2013, at 11:31 AM, Fabian Keil wrote:
> Isaac (.ike) Levy <ike at blackskyresearch.net> wrote:
>> --
>> My observations so far:
>> svnup(1) speed "feel"
>> Without opening up any cans of worms on SCM tools, I believe the 'git effect' creates some unrealistic expectations for the utility.  (Git's impact is so huge on development, it's become so popular, it affects/warps perception of other tools.)  git(1) has spectacular indexing/hashing, it's implementation is really thoughtful compared to svn(1).  So, remote fetching of deltas, on massive codebases, is extremely fast.
> But actually checking out the fetched deltas scales with
> the repository size.

Absolutely- files read remote, files over network, files local write/remove/resolve- doesn't change from SCM to SCM.

>> svn(1) itself- (and svnup(1)), is not quite as slick in this area.  To this end, svnup(1) *feels* surprisingly "slow" when fetching deltas.  Yet, for what it's doing, (comparing every file), it's pretty darned fast.
> Did you use git for base, ports or other projects with similar size?

Only toyed with it for base/ports.

I've used git in other codebases, mush larger than FreeBSD.  This year I've been living in a git repo with 38M+ lines of various software packages in < 3Gb, including gnarly bad-form binaries stuff in there, and other general repo abuses.

This repo is *not* fast to work with, but using it, it's at least fast enough to be workable.  Anecdotally:

- Fetching the entire repo over fast network: 20-30 minutes. (expected)
- Running 'fetch' to check deltas against remote repo: typically < 2-8 seconds. (the awesomeness)

The fetch time is of course dependent on how many changes there have been upstream, how many branches added, etc…  But for a codebase this large, the 'index as it goes' approach of git yields pretty impressive results when the deltas themselves are minimal.

This is not the case for SVN, apparently, based on the design of the protocol and underlying data storage.  In essence, git is comparing hash strings of directory indexes- svn is comparing files (and metadata) individually.

On a different git backed codebase I work in: 2.8m lines (which grows an average of 3 branches a day- fast-foreward-only workflow), it's still mere seconds to check/fetch deltas.

Etc…  Git is really quite impressive on all these fronts- to bad it's getting so obese in the feature implementation department (build it from source one day, you'll see what I mean :)

> I'm not claiming that git isn't faster than svnup (which I never used),
> but a lot of people seem to only use git for small projects and expect
> it to perform similar well with larger projects.

Absolutely agree, in particular as github is the new internet black…  and most repos are (thankfully) quite small.

> Once upon a time I did that too, but then I started using git for both
> ports and base on my admittedly old laptop.
> Until the ARC is warm some operations are depressingly slow because git
> lstat()s every monitored file (and even when you know better you can't
> trivially tell git not to). To give you an example:
> fk at r500 /usr/ports $time git fetch
> remote: Counting objects: 1433, done.
> remote: Compressing objects: 100% (508/508), done.
> remote: Total 1014 (delta 599), reused 906 (delta 502)
> Receiving objects: 100% (1014/1014), 936.97 KiB | 168 KiB/s, done.
> Resolving deltas: 100% (599/599), completed with 230 local objects.
> From git://github.com/freebsd/freebsd-ports
>   a486bc5..e355038  master     -> origin/master
>   10e61e6..c4e0181  svn_head   -> origin/svn_head
> real	0m48.982s
> user	0m1.064s
> sys	0m0.699s
> fk at r500 /usr/ports $time git checkout master
> Switched to branch 'master'
> Your branch is behind 'origin/master' by 131 commits, and can be fast-forwarded.
>  (use "git pull" to update your local branch)
> real	4m51.360s
> user	0m2.090s
> sys	0m8.008s
> fk at r500 /usr/ports $time git rebase origin/master
> First, rewinding head to replay your work on top of it...
> Fast-forwarded master to origin/master.
> real	6m0.862s
> user	0m2.308s
> sys	0m12.356s
> Fast-forwarding 131 commits in 6 minutes probably isn't the
> kind of spectacular you were referring to.

Ha- no :)

> It's reasonable once the ARC is warm:
> fk at r500 /usr/ports $time git pull
> remote: Counting objects: 59, done.
> remote: Compressing objects: 100% (17/17), done.
> remote: Total 39 (delta 26), reused 35 (delta 22)
> Unpacking objects: 100% (39/39), done.
> From git://github.com/freebsd/freebsd-ports
>   e355038..dc0fb02  master     -> origin/master
>   c4e0181..532d005  svn_head   -> origin/svn_head
> Updating e355038..dc0fb02
> Fast-forward
> databases/rubygem-familia/pkg-descr | 1 +
> databases/rubygem-redis/Makefile    | 2 +-
> databases/rubygem-redis/distinfo    | 4 ++--
> devel/rubygem-stella/pkg-descr      | 1 +
> graphics/ruby-gdal/Makefile         | 5 ++---
> www/rubygem-em-websocket/Makefile   | 2 +-
> www/rubygem-em-websocket/distinfo   | 4 ++--
> www/wgetpaste/Makefile              | 8 ++------
> www/wgetpaste/distinfo              | 4 ++--
> 9 files changed, 14 insertions(+), 17 deletions(-)
> real	0m16.813s
> user	0m1.823s
> sys	0m3.183s
> But then again, I reboot the laptop at least once a day
> and the ARC isn't persistent (yet) …

And also since ARC is for ZFS, how much physical memory do you have on that box?

Just curious- (mostly because I'm amazed to see how much more feasable using ZFS has become with smaller memory footprints).

Could you perhaps share a dmesg, out of curiosity?

Realistically, my point is that the git internals help it blow away other open source SCM tools in delta sync- and for many new/future FreeBSD users, svnup(1) may be perceived as slow.
So, anything we can do in advance to identify and resolve any speed issues which are real- and within reach to resolve, will make this important tool stand up much better- IMHO.

I'm not meaning to compare- but I do want svnup(1) to stand solid.


More information about the talk mailing list