My Background with SCM

While working my way through my undergraduate degree in Computer Science, I ended up doing some research work for a professor. Myself along with several other research assistants were developing a C Compiler that could be use to inspect C code and collect complexity attributes from the code. As a team, we quickly discovered that it was difficult to track changes happening in the code from day to day. I read UNIX man pages obsessively looking for methods to keep track of changes. I eventually discovered the commands ci & co and the whole notion of Software Configuration Management (SCM). Not only did SCM help me and my fellow undergraduates with our project, but I was able to apply my knowledge of SCM to subsequent projects both in undergraduate and graduate school.

Who hasn’t had that sinking feeling of deleting the wrong file? Well if you use SCM tools, you can mitigate some of those situations. The more interesting case for SCM is where you make several revisions to a system only to discover that the approach is not going to work. Using SCM to keep track of changes along the way enables you to pursue changes knowing that if you really didn’t like the outcome you can roll back to a previous commit without pain.

My mantra became:

Commit early & commit often.

When I graduated and moved into the private sector, I was happy to discover that SCM techniques were being used on large scale projects. Although I will admit that the methods used were not always the most efficient. Long story short, over several years, we have slowly evolved our corporate SCM infrastructure from SCCS => RCS => VCS => PVCS => SVN and now finally Git. I’m sure there will be future changes, but at least we are at a good spot now.

Personal SCM for configurations

SCM is not just for the big projects. I have found many cases in my personal computing environment where SCM methods can really save the day. It’s not just for code. There are many configuration files that can benefit from a good dose of SCM.

For example, if you’re configuring a new service that will run on a system, it’s very easy to do a git init, git add --all and git commit while you’re trying to perfect the configuration of the system. Getting configuration files just right can be a real pain. Why not save off those configuration changes for future reference? Say for example, you have your configuration working just fine, and a new release RPM is deployed and your configuration is overwritten by the boilerplate. Well, there’s at least a chance, that your last commit is available and you can diff between the previous working configuration and the new boilerplate to see what differences were made.

Personal SCM for code

I’m going to focus on my personal SCM for code in this article. I usually start development on something new by doing a git init and setting up the appropriate .gitignore for the language I will be using. As I build out the directory structure, I will do git add and git commit along the way to save that structure for future reference. When I’ve gotten a day or so into a new project and feel that it is moving along nicely. I will establish a remote repository for the code and git push to the remote repository.

Caution: GIT and PII

I really want to have a remote repository for my code. I have enough code at this point that I would not want to loose any of it in a hard drive crash on a single system. Also, I like being able to work on any of a number of computers around the house depending on my situation. Why not just use an external git repository like github or gitlab? Well, I guess I can now. The business model for both of those vendors does now include a free private repo option. When I started doing this, those options were not available.

Another concern I have is the remote possibility of exposing my own PII1 in an external git repository. See link. I’m still personally developing my own best practices for segregating PII and credential information in a project. I want to do it in a way that ensures that none of my PII makes it to an external git repository.

My in-house central git repo

Rather than using a real ‘remote’ location like github or gitlab for my remote. I have chosen to setup an in-house central git repository and use the ssh protocol for communication with it. This is not much more than a personal Linux box with a directory set aside that contains all the consolidated git repositories that I work on.

ssh configuration

Note that the configuration described here depends on ssh working between the local development boxes and the Server containing the Git Remote Repo. If you can’t ssh to the Server containing the Git Remote Repo, then the git ssh protocol will not work for you. Also, I found that setting $HOME/.ssh/authorized_keys between all machines eliminated a lot of password authentications that tend to slow things down when pushing code. See link for details on how to setup authorized_keys.

Using git during development

Consider the case where I have new development code MyNewCode on my Chromebook. I use git init, git add and git commit commands to start tracking my development on the Chromebook. For demonstration purposes, this is the code used to create a local repository.

$ mkdir /tmp/MyNewCode
$ touch /tmp/MyNewCode/testfile.py
$ cd /tmp/MyNewCode/

$ git init
Initialized empty Git repository in /tmp/MyNewCode/.git/
$ git add testfile.py 
$ git commit -m "initial commit"
[master (root-commit) f4cfe01] initial commit
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 testfile.py

{Many code, test, debug loops with git add & git commit}

Putting it into context, here is a diagram of my setup at this stage.

GitInception

At some point I will determine that I have enough code that to loose it would be painful. I need to establish a new remote repo.

To initialize a new remote repo

On the central Linux host I have a directory called GitSharedRepo that contains all my remote repositories.

To create a new empty repository that will eventually contain the development code from my Chromebook, I use the following command on the remote Linux box.

$ git init --bare $HOME/GitSharedRepo/MyNewCode.git
Initialized empty Git repository in /home/username/GitSharedRepo/MyNewCode.git/

This command creates a ‘bare’ repository that will eventually contain my local repository code after the next step.

GitRemoteRepoCreated

Define remote in local git

Now that I have a allocated an bare directory on the server, from my laptop I add a new remote called origin and define how the Chromebook will get to it using an ssh path.

$ cd /tmp/MyNewCode
$ git remote add origin ssh://username@somehost:/home/username/GitSharedRepo/MyNewCode.git

GitRemoteRepoAdd

This creates a remote named origin on the Chromebook. The Chromebook code can now be pushed to the remote named origin using ssh protocol.

$ git push --set-upstream origin master
Counting objects: 3, done.
Writing objects: 100% (3/3), 219 bytes | 0 bytes/s, done.
Total 3 (delta 0), reused 0 (delta 0)
To ssh://somehost:/home/username/GitSharedRepo/MyNewCode.git
 * [new branch]      master -> master
Branch master set up to track remote branch master from origin.

GitRemoteRepoPush

Now that the local git repo has a remote named origin, I can continue to develop code on the Chromebook with occasional pushes to origin to make sure that changes are being stored on the remote and I will not lose them if my laptop fails.

$ vi testfile.py
...
$ git add testfile.py
$ git commit -m "made some changes"
[master 16ae62b] made some changes
 1 file changed, 1 insertion(+)
$ git push
Counting objects: 3, done.
Writing objects: 100% (3/3), 270 bytes | 0 bytes/s, done.
Total 3 (delta 0), reused 0 (delta 0)
To ssh://somehost:/home/username/GitSharedRepo/MyNewCode.git
   bc73e77..16ae62b  master -> master

Connecting from other systems

Now that the code resides in a remote repository, I can pull it to several other locations around my house. For example, I may initially create the system on my Chromebook, but once I’ve reached some milestone, I may want to load the system up on one of several RaspberryPis or a system that I use for programming micro-controllers. In these cases, I just need to connect to the remote from those systems and git clone the code down.

ssh raspipi01

$ git clone ssh://username@somehost:/home/username/GitSharedRepo/MyNewCode.git
Cloning into 'MyNewCode'...
remote: Counting objects: 3, done.
remote: Total 3 (delta 0), reused 0 (delta 0)
Receiving objects: 100% (3/3), done.

cd MyNewCode

GitRepoCloneOtherPlaces

Of course if I make changes in any of those locations, I will need to git add, git commit and git push the changes back up to the remote repository.

But what about a failure on my remote

This model has been working well for me for 2-3 years. I’m very comfortable with it and don’t want to change a thing. However I have a creeping fear in the back of my mind… what happens if my remote repository machine dies? Well, at the moment, I have been backing up the repository to cloud disk… so at least I have some backup.

But I think the time has come to move my repository to a real remote cloud repository. After some research I’ve discovered that both Amazon and Google Cloud Platform (GCP) offer private repositories. I’m going to give them a try and see how they work.

References


  1. Just to be clear, I consider PII to include credentials and API keys. ↩︎