Source Server with Git Repository

Update [29.9.2016] – Due to numerous emails, I wanted to post the exact command I used in my Continuous Integration builds:


%BuildToolsPath%\GitLink.exe c:\git\myLocalGitProjectFolder -u "http://vm-hd87-ug84:8081/api/%var2%/%revision%" -b %branchname% -c debug -powershell -errorsaswarnings
– %BuildToolsPath% is the path to build tools on my build machine.
– %var2% is a placeholder that will be replaced by the file name when debugging. This is the only placeholder that should not be expanded by the build system. It is for the debugger's use only.
– %revision% is the git revision number that should be expanded by the build system to the current revision being built.
– %branchname% should also expand by the build system to the current branch name that is being built.

 

TL;DR – How I enabled “Source Server” support for an internal Git repository for any Git provider even when authentication to the server is required. I ended up setting up a tiny Web Api service that queries a local clone of a repository on the server and returns a raw file content from a specific revision.

KeepCalmStudio.com-[Crown]-Keep-Calm-And-Use-Source-Server.png

 

Lets review the big picture and explain the motivation for what I tried to achieve here. If you know what ‘source server’ in Visual Studio is – feel free to skip to ‘The solution‘ section below.

Source Server

How many times you wished you could step into source of some referenced assembly in you project that you don’t own the actual sources for? Well, when you enable the Source Server option in Visual Studio, you basically tell the Debugger to check the .pdb file to try and bring the sources from the location encoded in the .pdb file. Even after using this feature at least a 100 times, it still feels like magic to me when the actual source file suddenly appears in my debugger.

 

.pdb who?

So now we know that the debugger can fetch the relevant sources for you as long as you have the .pdb file that the debugger can look into to find the location of the sources on the internet. But what are those .pdb files and how can you get them?

Lets say you have a C# library project loaded in visual studio. When you compile the source code with ‘Debug’ configuration, your output would contain your .dll file with the compiled code, and a .pdb file that will contain information for debugging your library. (more on .pdb files HERE)

If you try opening the .pdb file, you will see lots and lots of binary gibberish, but if you scroll to the end of the file, you will see textual content that looks something like this (taken from xunit.assert.pdb):

pdb text.PNG

 

What you see here, is the info that tells Visual Studio where to get the actual sources for this specific revision in case you want to debug the library and step into their source. Of course you can’t access ‘C:\TeamCity\buildAgent’ on the actual machine the library was built on, so if you look carefully, there is an ‘*’ that separates variables and adds the relative path variable the debugging tools will use to try and fetch the correct source from a publicly accessible location that is specified in the ‘SRCSRVTRG‘ variable.

Ok, so now we know that basically all we need to do is to instrument our.pdb files upon build with the textual information that will assist the debugger to retrieve the right file from the exact revision. The good news is that Microsoft supports such instrumentation of your .pdb files and even provides you with the right scripts to do so. I have no clue why, but those Micrisoft scripts are written in Perl (oh God, do I hate Perl).

You can get all the relevant scripts when installing the Windows SDK. They should reside in a path that looks something like this:
C:\Program Files (x86)\Windows Kits\10\Debuggers\x64\srcsrv

 

The problem

What we’ve go so far, hopefully, is the understanding of what are symbols and how we can debug sources that we don’t actually own. The problem is, that Microsoft is only supporting couple of version control systems that the debugger can fetch sources from with the help of a .pdb file. In the company I work for, there is a desire to migrate to a Git source control system – Stash\Bitbucket.

One of the reasons we are struggling with the migration is due to lack of support for Source Server over a Git repository. Meaning, if we have a CI system that builds our sources from Stash, the debugger will not be able to fetch the relevant sources shall we want to debug some specific .dll library that we deployed to production.

Even if we find a way to instrument the .pdb file to try and fetch sources from Stash, there are 2 issues that stop our debugging:

  • Our Stash\Bitbucket Git server needs an SSO authentication to access any repository.
  • To query the Stash\Bitbucket Git server we need to build a URL with query parameters while the debugger will not fetch anything that contains “?” symbol in the URL (seems like a known issue in the srcsrv.dll).

The solution

I’ll get right to the point – I decided to write an ASP Web Api service that will accept a true REST call (without any query parameters) and will use a server-side clone of the relevant Git repository to fetch the exact revision of a file. By doing so, we solve both problems:

  • We don’t need any SSO authentication because we are using a local clone of the repository over at the Web Service (which actually makes the solution agnostic as for the Git source control we are using).
  • We don’t have to create a URL with query parameters anymore, because the Web Api service will accept a clean RESTfull request.

Ok, right about now you should be asking “hey, but how is your service querying the local Git clone on the server? Did you write your own library?”.  Of course not, I used a library called libgit2Sharp that helped me query my sources simply like this:

getfileatrevision

So right now, with some out-of-the-box thinking, we solved the problems we had with getting the relevant revision of a specific file that is stored in a Git repository. While this is solved, we have another problem, how the hell are we going to instrument our .pdb files upon build to make the debugger talk to our Web Api service and load the correct sources in Visual Studio?

Enter GitLink! No more cryptic Perl scripts, just plain ol’ C# that will instrument all your .pdb files that are created when building your projects. I wont get into the nuts-and-bolts on how to use this tool, it is pretty straightforward with good documentation on GitHub, but I will mention couple of things I had to solve myself and recompile the latest GitLink sources with my own changes.

(You can see all my changes in this revision HERE)

Change #1: The GitLink command line is expecting my service to talk via https, so I removed the need for https since my service is running via http.

Change #2: This is an important one, when instrumenting my .pdb files with GitLink, I opted in using powershell for fetching the files from my Web Service. But the original code did not work no matter how hard I tried, the Visual Studio debugger did not want to go and fetch my files. So i fixed the command in GitLink to use this line (you can find it find in source):

“SRCSRVCMD=powershell invoke-command -scriptblock {param($url=’%RAWURL%’, $output=’%TRGFILE%’); (New-Object System.Net.WebClient).DownloadFile($url, $output)}”

Change #3: When instrumenting .pdb files, you can’t use relative file locations in your REST call because of the ‘/’ character. I changed the code to replace each slash with a placeholder like this: {__slash__} so when my Web Service gets a request, it first replaces the placeholders with actual slashes before querying the local Git clone of my repository.

 

Conclusion

Let me go back to 50k foot view, and draw the full picture here.
When you build your project in Visual Studio in Debug mode, your output will contain your .dll library file and a .pdb file.

After you have your build artifacts, you will use GitLink to instrument your .pdb files so they know how to fetch the relevant sources when you want to debug the code.

After instrumenting you .pdb files, you will deploy them to your Symbol Server and point your Visual Studio configuration to use that Symbol Server.

Next, when you wish to debug the exact same version of your .dll, you will try stepping in to any method in that library while debugging, then the Debugger will go get the relevant .pdb file from your Symbol Server and use it to find info you put in the .pdb that lets the Debugger get your sources.

The debugger will use that information to call your Web Service and fetch the exact revision of your sources files. The debugger will then load that file automatically and you will be able to debug the code.

 

Last thoughts

 

Only when writing your own blog posts, you understand how hard it is to explain something like this in a written document. I tried to do my best, but if you still have any questions, don’t hesitate to contact me and I will do my best to help out with the implementation.

 

Shonn Lyga.

 

 

 

About Shonn Lyga

Obsessed with anything and everything in Software Engineering, Technology and Science
This entry was posted in .NET, Debugging and tagged , , , , , . Bookmark the permalink.

7 Responses to Source Server with Git Repository

  1. Chirag says:

    Hi Shonn,

    We have recently migrated to Git from Perforce in my company. I am facing the same issue which you are facing related to Source server indexing for Git repository. I searched on google but there is very less information about the same. I have gone through your article, it is very useful and it seems that I have to implement the same solution as yours to solve my problem. The Stash source repository which I am working on contains multiple solutions of C++ and C# source code. We highly rely on source server when we have to analyze the crash dump collected from customer production environment. I have below queries about the implementation:
    1) You have mentioned that you have implemented for Stash\BitBucket. Will it work for private Stash repo?
    2) Does the solution you have mentioned (using libgit2sharp and GitLink) can work for both C++ and C# code?
    3) Will other debugger like, WinDBG able to fetch the code from Git repo using Web API service?
    4) I have looked at the GitLink documentation. I used below command and specified my local repo, but it shows me below error.:
    GitLink.exe c:\mylocalrepo
    No target url was specified, trying to determine the target url automatically
    Target url is missing

    is it looking for URL of Web Service? I have not understood this part. How GitLink will instrument the Web Service URL for each source file in pdb?
    5) I would be great help if you can share your Web Service code snippet? How it call libgit2sharp functions?

    Thanks and regards,
    Chirag

    Like

    • Shonn Lyga says:

      Hi Chirag, sorry for the late reply, couldn’t get to it earlier.

      1) What do you mean private Stash? Is it a Stash server? In any case, any Git server that can provide you with a ‘raw’ file content API is good enough.

      2. Sure, it is agnostic to the type of source code you have in your Git repository. The only important thing is that it is a Git repository, that is all.

      3. Well, as far as i remember, Visual Studio is using WinDbg under the hood to fetch the files. So yes, this should work for you. I know people that are using it with WinDbg without the Visual Studio’s help.

      4. You need to use your remote server URL and not a local repository. GitLink will instrument your pdb files in such a way that you will be able to reconstruct the URL path to the file in an exact manner.

      5. Sure, no problem. I created a GitHuh gist for you here


      namespace StashFileFetcher.Controllers
      {
      using System.Diagnostics;
      using System.IO;
      using System.Text;
      using LibGit2Sharp;
      class RepoDigger : IDisposable
      {
      public static RepoDigger GetRepositoryInformationForPath(string path)
      {
      if (LibGit2Sharp.Repository.IsValid(path))
      {
      return new RepoDigger(path);
      }
      return null;
      }
      public string GetFileAtRevision(string filePath, string revision)
      {
      var commit = _repo.Lookup<Commit>(revision);
      var treeEntry = commit[filePath];
      Debug.Assert(treeEntry.TargetType == TreeEntryTargetType.Blob);
      var blob = (Blob)treeEntry.Target;
      var contentStream = blob.GetContentStream();
      string content;
      using (var tr = new StreamReader(contentStream, Encoding.UTF8))
      {
      content = tr.ReadToEnd();
      }
      return content;
      }
      public string CommitHash
      {
      get
      {
      return _repo.Head.Tip.Sha;
      }
      }
      public bool HasUnpushedCommits
      {
      get
      {
      return _repo.Head.TrackingDetails.AheadBy > 0;
      }
      }
      public bool HasUncommittedChanges
      {
      get
      {
      return _repo.RetrieveStatus().Any(s => s.State != FileStatus.Ignored);
      }
      }
      public IEnumerable<Commit> Log
      {
      get
      {
      return _repo.Head.Commits;
      }
      }
      public void Dispose()
      {
      if (!_disposed)
      {
      _disposed = true;
      _repo.Dispose();
      }
      }
      private RepoDigger(string path)
      {
      _repo = new Repository(path);
      }
      private bool _disposed;
      private readonly Repository _repo;
      }
      }

      Please let me know if this helps and if you need any more help.

      Shonn.

      Like

  2. Shonn Lyga says:

    @Chirag – sure, no problem. BTW, I updated the article with the exact command I used for my instrumentation. Hope this sheds more light…

    Like

  3. rvdginste says:

    Hi Shonn,

    Excellent idea for making such a proxy service! I think I will go down the same path, and I am wondering how you keep the local clone of the git repo up-to-date? Are those repo’s pushed to, or is something running on the server to do a “git fetch” regularly?

    kind regards,
    Ruben

    Like

    • Shonn Lyga says:

      Hi, thanks for the warm feedback 🙂
      I am fetching the changes on the fly by my service. Since sources in Git are amazingly compact, the latency caused by this is minimal.

      Of course further optimizations can be made if needed.

      Like

  4. Pingback: Be Careful Where You Put GC.SuppressFinalize – Alois Kraus

Leave a comment