Using GitHub

A question you may have, is where is the repository stored? Where are all of the versions of the directory that you have saved?

The can be found by typing;

ls -a

This will list all files, including hidden files in the directory.

You should see something that looks like this;

.git          README.md     background.md notes.md

Notice that there is an extra hidden file called .git in your directory. This is actually a hidden directory, which you can explore via

ls .git

On my computer, the .git directory contains

COMMIT_EDITMSG config         hooks          info           objects
HEAD           description    index          logs           refs

The .git directory is where git saves the entire repository, and thus all versions of all files that have been committed.

YOU MUST NOT EDIT OR CHANGE THE .git DIRECTORY IN ANY WAY, this will break git, and potentially lose all of the old versions of files that were committed to the repository.

Because the .git folder is saved in versioned_dir, this means that if you accidentally delete versioned_dir, then you will also have deleted the entire git repository. You will thus not be able to recover or restore the files. Equally, if your computer is lost or breaks.

Backing up your data

Git is NOT a backup tool. It only version controls your files, it DOES NOT back them up.

Git hosting services

An alternative to backing up manually (USB drive, Dropbox,…) is to use a git backup service. These are online services that store a backup of your .git folder. There are special commands in git that then integrate with these services, and which make backing up and restoring very easy and straightforward. Additionally, Git hosting services enable researchers to share their work publicly, making it accessible for others to review and reproduce.

The three most established git hosting services are GitHub, GitLab and BitBucket. Of the three, GitHub is the one most people know.

GitHub is free for public repositories (those that can be viewed by anyone, anywhere), and you can have a small number of private projects (those that only you have access to).

Installing a Git Credential Helper

Before you use GitHub, you need to install a “Git credential manager”. This is a program that will let your git command-line client manage the login credentials needed to connect to your GitHub account.

You can install a “Git credential manager” by following the instructions here. This credential manager is available for Linux, MacOS and Windows. Note that the “Git credential manager” is included with Git for Windows and would have been installed by default as long as you left “Git Credential Manager Core” selected.

Creating a GitHub repository

To use GitHub, register to create an account, and the log in. Once you have logged in, click the “New Repository” button to create a new repository (this button is in the menu of the “+” button at the top right of the page).

This will bring you to the page to create a new repository. The page will look something like this (possibly different colours or layout);

Choose a name for your repository. I’ve used versioned_dir, as this is what I called the directory on my computer. You can use any name you want, and it does not need to match the name on your computer.

I’ve left the repository type to “Public” (meaning everything is published openly), and am not initialising the repository with anything (as we will be uploading our existing versioned_dir in the next step).

Click “Create Repository” to create the new repository. It will start off being empty, and something like this page should be seen.

This page gives instructions for the three different ways that you can add files to this repository. In this case, we will use the “push an existing repository from the command line” option. The commands to do this are written on the page, so we will copy and paste

git remote add origin https://github.com/ab01234/versioned_dir.git
git push -u origin main

Alternatively, we could run

git clone https://github.com/ab01234/versioned_dir

If this is the first time you have used GitHub, then it is likely that you will be asked to log in when you ran the git push command. The method you use to log in will depend on your account and operating system. In most cases, the easiest is to use the option to log in via your web browser. This will open a tab in your browser and will ask you to log in there with your GitHub username and password (and possibly multi-factor authenticator too).

Assuming you could log in correctly, then you should see output that looks something like this;

Enumerating objects: 21, done.
Counting objects: 100% (21/21), done.
Delta compression using up to 8 threads
Compressing objects: 100% (16/16), done.
Writing objects: 100% (21/21), 2.23 KiB | 2.23 MiB/s, done.
Total 21 (delta 5), reused 0 (delta 0)
remote: Resolving deltas: 100% (5/5), done.
To https://github.com/ab01234/versioned_dir.git
 * [new branch]      main -> main
Branch 'main' set up to track remote branch 'main' from 'origin'.

This shows that our local versions in our .git folder have been pushed up to the .git folder on GitHub (in my case https://github.com/ab01234/versioned_dir.git). Git has also been set up so that the local main on our computer is set to track the remote main on GitHub.

If you refresh the GitHub page you should now see that your files have been uploaded, e.g.

You can use the GitHub interface to explore the files in your repository. For example, you can navigate to any version in the repository by clicking the “Commits” (the “9 commits” on the right). This will show all of the versions that have been saved, together with the log messages. Cool :-)

Backing up new changes

You have now backed up the .git directory of versioned_dir to GitHub. But what if we make new changes and save new versions?

Let’s change README.md, e.g. by changing the last line to read;

will say that the cat goes meow and kittens are cute.

Commit this change to the repository using

git commit -a

Now run git status. You should see output that looks something like this;

On branch main
Your branch is ahead of 'origin/main' by 1 commit.
  (use "git push" to publish your local commits)

nothing to commit, working tree clean

This shows that git knows that your working tree is clean, but that the main in your local repository now has one commit more than the main on GitHub. It is ahead of GitHub by 1 commit. We can send this new commit to GitHub using the command git push. Type this now;

git push

You should see output that looks something like this;

Enumerating objects: 5, done.
Counting objects: 100% (5/5), done.
Delta compression using up to 8 threads
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 396 bytes | 396.00 KiB/s, done.
Total 3 (delta 1), reused 0 (delta 0)
remote: Resolving deltas: 100% (1/1), completed with 1 local object.
To https://github.com/ab01234/versioned_dir.git
   20be956..38db1ae  main -> main

This shows that the new commit has been pushed (uploaded) from your local main in your local .git folder and copied to the main in the .git folder on GitHub.

If you type git status again, you should see output like this;

On branch main
Your branch is up to date with 'origin/main'.

nothing to commit, working tree clean

This shows that your local main is up to date and level with the remote main on GitHub.

ATTENTION

PUBLIC REPOSITORIES ARE PUBLIC!

That means that anyone in the World can read everything within them, including all of your commit messages.

NEVER push passwords or sensitive data to the repository. Make sure that you never save a password in a version controlled directory, or else you risk accidentally uploading it to the cloud.

NEVER push private or unpublished research data. By pushing to a public repository you are making the file (and all its previous versions) public. Don’t push a file that you don’t have permission to publish. Don’t push sensitive or private research data. Don’t push grant proposals or research papers (at least, not before they have been awarded or published!).

BE CAREFUL of offensive commit messages. It is a bad idea to be abusive or condescending in your commit messages, particularly as they will become public when you push them into a public repository. Avoid commit messages like “Fixed this annoying piece of rubbish code written by Fred”, as “Fred” is likely to see that comment once it is published.