| tags: [ development SCM git ] categories: [Development ]
A private git repo using SSH
My Background with SCM
While working my way through my undergraduate degree in Computer
Science, I ended up doing some research work for a professor. Myself
along with several other research assistants were developing a C
Compiler that could be use to inspect C code and collect complexity
attributes from the code. As a team, we quickly discovered that it
was difficult to track changes happening in the code from day to day.
I read UNIX man pages obsessively looking for methods to keep track of
changes. I eventually discovered the commands ci
& co
and the
whole notion of Software Configuration Management (SCM). Not only did
SCM help me and my fellow undergraduates with our project, but I was
able to apply my knowledge of SCM to subsequent projects both in
undergraduate and graduate school.
Who hasn’t had that sinking feeling of deleting the wrong file? Well if you use SCM tools, you can mitigate some of those situations. The more interesting case for SCM is where you make several revisions to a system only to discover that the approach is not going to work. Using SCM to keep track of changes along the way enables you to pursue changes knowing that if you really didn’t like the outcome you can roll back to a previous commit without pain.
My mantra became:
Commit early & commit often.
When I graduated and moved into the private sector, I was happy to discover that SCM techniques were being used on large scale projects. Although I will admit that the methods used were not always the most efficient. Long story short, over several years, we have slowly evolved our corporate SCM infrastructure from SCCS => RCS => VCS => PVCS => SVN and now finally Git. I’m sure there will be future changes, but at least we are at a good spot now.
Personal SCM for configurations
SCM is not just for the big projects. I have found many cases in my personal computing environment where SCM methods can really save the day. It’s not just for code. There are many configuration files that can benefit from a good dose of SCM.
For example, if you’re configuring a new service that will run on a
system, it’s very easy to do a git init
, git add --all
and git commit
while you’re trying to perfect the configuration of the
system. Getting configuration files just right can be a real pain.
Why not save off those configuration changes for future reference?
Say for example, you have your configuration working just fine, and a
new release RPM is deployed and your configuration is overwritten by
the boilerplate. Well, there’s at least a chance, that your last
commit is available and you can diff between the previous working
configuration and the new boilerplate to see what differences were
made.
Personal SCM for code
I’m going to focus on my personal SCM for code in this article. I
usually start development on something new by doing a git init
and
setting up the appropriate .gitignore
for the language I will be
using. As I build out the directory structure, I will do git add
and git commit
along the way to save that structure for future
reference. When I’ve gotten a day or so into a new project and feel
that it is moving along nicely. I will establish a remote repository
for the code and git push
to the remote repository.
Caution: GIT and PII
I really want to have a remote repository for my code. I have enough code at this point that I would not want to loose any of it in a hard drive crash on a single system. Also, I like being able to work on any of a number of computers around the house depending on my situation. Why not just use an external git repository like github or gitlab? Well, I guess I can now. The business model for both of those vendors does now include a free private repo option. When I started doing this, those options were not available.
Another concern I have is the remote possibility of exposing my own PII1 in an external git repository. See link. I’m still personally developing my own best practices for segregating PII and credential information in a project. I want to do it in a way that ensures that none of my PII makes it to an external git repository.
My in-house central git repo
Rather than using a real ‘remote’ location like github or gitlab for my remote. I have chosen to setup an in-house central git repository and use the ssh protocol for communication with it. This is not much more than a personal Linux box with a directory set aside that contains all the consolidated git repositories that I work on.
ssh configuration
Note that the configuration described here depends on ssh working
between the local development boxes and the Server containing the Git
Remote Repo. If you can’t ssh
to the Server containing the Git
Remote Repo, then the git ssh protocol will not work for you. Also, I
found that setting $HOME/.ssh/authorized_keys
between all machines
eliminated a lot of password authentications that tend to slow things
down when pushing code. See
link for details on
how to setup authorized_keys
.
Using git during development
Consider the case where I have new development code MyNewCode
on my
Chromebook. I use git init
, git add
and git commit
commands to
start tracking my development on the Chromebook. For demonstration
purposes, this is the code used to create a local repository.
$ mkdir /tmp/MyNewCode
$ touch /tmp/MyNewCode/testfile.py
$ cd /tmp/MyNewCode/
$ git init
Initialized empty Git repository in /tmp/MyNewCode/.git/
$ git add testfile.py
$ git commit -m "initial commit"
[master (root-commit) f4cfe01] initial commit
1 file changed, 0 insertions(+), 0 deletions(-)
create mode 100644 testfile.py
{Many code, test, debug loops with git add & git commit}
Putting it into context, here is a diagram of my setup at this stage.
At some point I will determine that I have enough code that to loose it would be painful. I need to establish a new remote repo.
To initialize a new remote repo
On the central Linux host I have a directory called
GitSharedRepo
that contains all my remote repositories.
To create a new empty repository that will eventually contain the development code from my Chromebook, I use the following command on the remote Linux box.
$ git init --bare $HOME/GitSharedRepo/MyNewCode.git
Initialized empty Git repository in /home/username/GitSharedRepo/MyNewCode.git/
This command creates a ‘bare’ repository that will eventually contain my local repository code after the next step.
Define remote in local git
Now that I have a allocated an bare directory on the server, from my
laptop I add a new remote called origin
and define how the
Chromebook will get to it using an ssh path.
$ cd /tmp/MyNewCode
$ git remote add origin ssh://username@somehost:/home/username/GitSharedRepo/MyNewCode.git
This creates a remote named origin
on the Chromebook. The
Chromebook code can now be pushed to the remote named origin
using
ssh protocol.
$ git push --set-upstream origin master
Counting objects: 3, done.
Writing objects: 100% (3/3), 219 bytes | 0 bytes/s, done.
Total 3 (delta 0), reused 0 (delta 0)
To ssh://somehost:/home/username/GitSharedRepo/MyNewCode.git
* [new branch] master -> master
Branch master set up to track remote branch master from origin.
Now that the local git repo has a remote named origin
, I can continue to develop
code on the Chromebook with occasional pushes to origin
to make sure that
changes are being stored on the remote and I will not lose them if my
laptop fails.
$ vi testfile.py
...
$ git add testfile.py
$ git commit -m "made some changes"
[master 16ae62b] made some changes
1 file changed, 1 insertion(+)
$ git push
Counting objects: 3, done.
Writing objects: 100% (3/3), 270 bytes | 0 bytes/s, done.
Total 3 (delta 0), reused 0 (delta 0)
To ssh://somehost:/home/username/GitSharedRepo/MyNewCode.git
bc73e77..16ae62b master -> master
Connecting from other systems
Now that the code resides in a remote repository, I can pull it to
several other locations around my house. For example, I may initially
create the system on my Chromebook, but once I’ve reached some
milestone, I may want to load the system up on one of several
RaspberryPis or a system that I use for programming micro-controllers.
In these cases, I just need to connect to the remote from those
systems and git clone
the code down.
ssh raspipi01
$ git clone ssh://username@somehost:/home/username/GitSharedRepo/MyNewCode.git
Cloning into 'MyNewCode'...
remote: Counting objects: 3, done.
remote: Total 3 (delta 0), reused 0 (delta 0)
Receiving objects: 100% (3/3), done.
cd MyNewCode
Of course if I make changes in any of those locations, I will need to
git add
, git commit
and git push
the changes back up to the
remote repository.
But what about a failure on my remote
This model has been working well for me for 2-3 years. I’m very comfortable with it and don’t want to change a thing. However I have a creeping fear in the back of my mind… what happens if my remote repository machine dies? Well, at the moment, I have been backing up the repository to cloud disk… so at least I have some backup.
But I think the time has come to move my repository to a real remote cloud repository. After some research I’ve discovered that both Amazon and Google Cloud Platform (GCP) offer private repositories. I’m going to give them a try and see how they work.
References
- Over 100,000 GitHub repos have leaked API or cryptographic keys
- How to set up ssh so you aren’t asked for a password
- github
- gitlab
- AWS CodeCommit
- GCP Source Repositories
-
Just to be clear, I consider PII to include credentials and API keys. ↩︎