Search engine optimization with git web interfaces

I recently became frustrated with gitweb’s funky query-strings and decided to give cgit a try. Although there are some patches that make gitweb more user (and search engine) friendly, cgit is a much better web-interface for git, both in terms of the code and the actual user experience. However, there were still some opportunities for SEO.

I went through the HTML suggestions from the google webmaster tools and Google’s own SEO Starter Guide. I’ve pushed the search engine optimized cgit to my seo branch on github. You can see it in action at my git repositories. I’m testing all of this using an Apache ScriptAlias directive, I’m hoping it will still work alright with whatever other URL-processing schemes cgit supports. A short summary of the new SEO features so far:

  • Use HTML h1 and h2 heading tags instead of custom-styled divs
  • Much better title tags; commits have the commit subject, and the repo name has been added in a lot of places to avoid duplicate titles
  • The bread-crumb has been integrated into the heading
  • A configurable option to set nofollow relationships on links to non-HEAD commits, to avoid duplicate content being indexed

Of course, you could take the popular option of just using github instead of self-hosting your own git web interfaces… but even they don’t do quite a good a job IMO, they use the SHA1 in the web page titles, eww!