Monday, November 4, 2013

Version info and enforced commits

One of prerequisites to successfully operate multiple software servers is to have a good production code inventory. Read an excellent example of an impact of a failed software inventory if you have any doubts about it. Below is some guidance for building software with inventory data compiled in to the code. It provides running systems with data needed for factual inventory. It also has a benefit of committed code enforcement.

In a rush to patch a hot issue developers may deploy a dirty codebase build. Dirty in a sense that the build happens while there are changes to the code which are not committed to the repository, even if local. If that dirty build goes to production while source code gets more changes before committing, one will never have certainty why that production build behaves a certain way should a problem arise.

A way to combat it is to put version and repository status information into the build executable. The example below is a generic guideline for Microsoft Visual Studio 2012 C++ (VC++ in this post) solution building from a Git repository. It also shows a possible integration with GitHub service, if needed. This post gives you a workflow outline, which can be applied to most environments I know of.

The general steps from build to execution are:

  1. The build process has a pre-build script, which gathers version and repository status information. In our case, VC++ uses the Windows PowerShell version_info.ps1 script.

  2. The pre-build script generates a source code file, which is expected by the rest of the codebase. The generated file has everything needed to allow or deny running the version. In our case it is a C++ header file version.h.

  3. A code contains function which checks if the build is legitimate to run, and logs and stops the process as needed. That example code is in the versionLogAndVet function of the version.cpp example file.

  4. At run time, the versionLogAndVet function allows to run only permitted combinations of repository status / build configuration. It also logs version information.

In this specific environment there are three build configurations. DEBUG and RELEASE configurations are standard, while the LOGGED configuration turns additional logging without debug code overhead. It is important, is that the code in version.cpp should only be invoked from server start-up routines, so that tests still can run on uncommitted code in production build configurations.

Here is the source code and notes:

It makes sense in most VC++ solutions to configure these checks on a per-project basis. Specifically configure it for projects which generate executable files. In the line below you likely need to customize two things - a path to the version_info.ps1 script and the word "server" - which will become the namespace for storing constant values describing the repository state and information.

powershell.exe -ExecutionPolicy RemoteSigned -File "$(SolutionDir)Util\Build\version_info.ps1" server "$(ProjectDir)\" "$(SolutionDir)\"

A word of precaution - the interpreter expands special character sequences on that command line. It took me a lot of time to realize, that the $(...) variables filled by Visual Studio have backslashes as path separators and also end with a backslash. Which means, that PowerShell takes the last backslash as an escape character for the following '"' and breaks the command line. For that reason there is an extra backslash before '"' which turns the last backslash of the path into itself after expansion. The unsolved problem, though, is if your paths have directories starting with escapable characters. In that case, directories may start with a bad character and break things. Microsoft did not provide any help with this issue (see here and the footnote).

As we use VC++, it does not allow to specify the PowerShell script directly - we need to specify the powershell interpreter and desired security policy. Also, this step should be embedded in an XML project properties file so that it can be assigned to multiple projects and build configurations:

<?xml version="1.0" encoding="utf-8"?>
<Project ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
  <ImportGroup Label="PropertySheets" />
  <PropertyGroup Label="UserMacros" />
  <PropertyGroup />
  <ItemDefinitionGroup>
    <PreBuildEvent>
      <Command>powershell.exe -ExecutionPolicy RemoteSigned -File "$(SolutionDir)Util\Build\version_info.ps1" server "$(ProjectDir)\" "$(SolutionDir)\"</Command>
    </PreBuildEvent>
  </ItemDefinitionGroup>
  <ItemGroup />
</Project>

Here is how the step will look in the project properties (note, that the value is not bold, meaning it is inherited from the .props file.

The pre-build step calls the version_info.ps1 Microsoft PowerShell script. The version_info.ps1 script runs a few git commands which query current repository information and generate a header file to be used later in the build. This is the script which you want to edit to provide more or different repository information to your software:

Param (
 [String]$Namespace,
 [String]$Project,
 [String]$GitRoot,
 [String]$HeaderFile="version.h",
 [String]$VerPrefix="https://github.com/<USER>/<REPO>/commit/"
)

Push-Location -LiteralPath $GitRoot

$VerFileHead = "`#pragma once`n`#include <string>`n`nnamespace $Namespace {`n"
$VerFileTail = "}`n"

$VerBy   = (git log -n 1 --format=format:"  const std::string VerAuthor=`\`"%an `<%ae`>`\`";%n") | Out-String
$VerUrl  = (git log -n 1 --format=format:"  const std::string VerUrl=`\`"$VerPrefix%H`\`";%n") | Out-String
$VerDate = (git log -n 1 --format=format:"  const std::string VerDate=`\`"%ai`\`";%n") | Out-String
$VerSubj = (git log -n 1 --format=format:"  const std::string VerSubj=`\`"%f`\`";%n") | Out-String

$VerChgs = ((git ls-files --exclude-standard -d -m -o -k) | Measure-Object -Line).Lines

if ($VerChgs -gt 0) {
  $VerDirty = "  const bool VerDirty=true;`n"
} else {
  $VerDirty = "  const bool VerDirty=false;`n"
}

"Written $Project\" + (
  New-Item -Force -Path "$Project" -Name "$HeaderFile" -ItemType "file" -Value "$VerFileHead$VerUrl$VerDate$VerSubj$VerBy$VerDirty$VerFileTail"
).Name + " as:"
""
Get-Content "$Project\$HeaderFile"
""

Pop-Location

A few notes regarding this script. It's placement is not very important, but it is important to include it in the repository. A git executable should be on the PATH. The way this example is organized it makes possible for each project to independently choose to create or not to create the header version file. That version header file mentioned on the fifth line can be named differently - either via an extra parameter when calling PowerShell or as the default value on that fifth line. It is important to include the header file name (version.h in this case) in .gitignore file so that it is ignored by git. Finally there is no automation I came up with to generate an html link to the repository on GitHub. You will need to manually edit the <USER> and <REPO> placeholders to point to your repository. I am not going over the script in greater detail for a simple reason - if you can not make sense out of it, you should not be using it - even if it is not destructive.

As an example, the version_info.ps1 script will generate a version.h header file in a project root directory, which may look something like this:

#pragma once
#include <string>

namespace server {
  const std::string VerUrl="https://github.com/<USER>/<REPO>/commit/<SHA_WILL_BE_HERE>";
  const std::string VerDate="2013-10-22 15:00:00 -0700";
  const std::string VerSubj="properly-tested-commit-enforcement";
  const std::string VerAuthor="The Developer <[email protected]>";
  const bool VerDirty=true;
}

Since each project generates version information by running it's designated pre-build step and the PowerShell script set up to write data into the project's directory I found no issues with parallel builds. I tried to force it by running the pre-build script on about twenty simple projects set to compile in parallel - and did not find any indication of a problem. That said, parallel execution is often wacky and you should check it as a culprit if run into weird/inconsistent behaviors.

Finally, let's pull this into the code base and have it log version information and enforce production builds running only out of clean repositories. Here is an example of what I am doing. I trust it you will know how to adjust the namespace and variables you use to customize the above steps:

#include "version.h"

void Server::versionLogAndVet() {

  Log::Info("Version Date: ",       server::VerDate);
  Log::Info("Version URL: ",        server::VerUrl);
  Log::Info("Version Info: ",       server::VerSubj);
  Log::Info("Version Author: ",     server::VerAuthor);
  Log::Info("Version Repo Clean: ", (server::VerDirty? "NO, it is DIRTY" : "yes" ));

#if defined(_DEBUG)
  Log::Info("Build configuration: DEBUG");

#elif defined(LOGGED)
  Log::Info("Build configuration: LOGGED");

#else
  Log::Info("Build configuration: RELEASE");

#endif

#if ! defined(_DEBUG)

  if (server::VerDirty)
  {
    Log::Error("Must NOT run production code build from a dirty repository. Server process STOPPED.");

    std::cerr << std::endl;
    std::cerr << "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!" << std::endl;
    std::cerr << "Must NOT run production code build from a dirty repository." << std::endl;
    std::cerr << "STOPPED.    Press  Enter key  to exit  or close the window." << std::endl;
    std::cerr << "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!" << std::endl;
    std::cerr << std::endl;

    exit(1);
  }

#endif

}

In that code all build configurations but those defining _DEBUG macro considered production builds and are not allowed to run the server process built from a dirty repo. Note, that this is a safety guard - this is NOT a security feature.

You can download all code exaples from this post as an archived file from the gist page.


Microsoft is apparently not considering the undefined parameter expansion behavior problem as a bug. They have closed the corresponding ticket with the "by design" reasoning, meaning the undefined behavior is OK with them and they will not fix it. Which means, that if any of the developers trying to use this solution run into the escaping issue, they will need to move or potentially restructure the affected repository.

read more ...