Introduction to Git

Git is a widely used system (both in academia and industry) for version controlling files and collaborating on code. It is used to track changes in (text) files, thereby establishing a history of all edits made to each file, together with short messages about each change and information about who made it. Git is mainly run from the command line, but there are several tools that have implemented a graphical user interface to run git commands.

Using version control for tracking your files, and edits to those, is an essential step in making your computational research reproducible. A typical git workflow consists of:

There are many benefits of using git in your research project:

The best way to get an idea about git is simply to start using it. The tutorial below will guide you through the essential steps, with a focus on what is needed for making a project reproducible. There are many additional features of both git and the web-based repository hosting services (like Github and Bitbucket) that are not included here. If you are interested in learning more, the web is filled with information!

Git commands

Get started with Git

Pushing changes to the remote Repository

Normally, you sit on your computer and work on a project. you commit changes as you go. At some point (usually when you feel that you have added some new features to the project) you can push these changes to the remote (Online) repository.

Note

From now on you do not need to specify to where you are pushing inside this git project. git push is enough.

Note

As a best practice (specifically when working with a groups) it is highly recommend to pull the lates updates that your coluges has done then push your changes. see the git pull command.

Branching with Git

Note

Note that it is more efficient to use branching when you are collaborating with others on a project. Since you may want to develop a new feature while letting others keep pushing to the master branch.

Note

It's important to understand that branches are just pointers to commits. When you create a branch, all Git needs to do is create a new pointer, it doesn’t change the repository in any other way. The repository history remains unchanged.

Note

You may want to work with others on the new branch, this is beyond our course scope. However, you can create a remote branch. Similar to what we did before, we need to tell git that we added a remote branch. In the following commands, we create and push a copy of the local branch to the remote repo.

    $ git remote add new_branch-remote-repo `github url`
    # Add remote repo to local repo config
    $ git push <new_branch-remote-repo> branch_name~
    # pushes the branch_name  branch to new-remote-repo

Merging

Conflicts

Tip

Note that you can skip the git fetch command if you want to and run git pull directly. The difference is that fetch will just update git with the latest information of the remote status, whereas pull will try to integrate and sync those changes to your local clone directly.

Ignoring files

Tagging

Git has the ability to tag specific points in a repository’s history as being important. Typically, people use this functionality to mark release points (v1.0, v2.0 and so on). This can be, for example, the version of the repository that was used for the manuscript submission, the version used during resubmission, and, most importantly, the version used for the final publication.

Let say that you are happy with the clustering results so far and would like to included in the submission. You can tag the current scripts by running

    git tag "first-submission"

To push your tag run

    git push --tags

Suppose that the Journal referees send a comments that it is better to use Hierarchical clustering instead.

    res.dist <- dist(df, method = "euclidean")
    # The R code below displays the first 6 rows and columns of the distance matrix:
    as.matrix(res.dist)[1:6, 1:6]
    # warld linkage
    res.hc <- hclust(d = res.dist, method = "ward.D2")
    # for visulization you can add the following as well to the markdown_reports code
    #fviz_dend(res.hc, cex = 0.5)

Now in the command line

    git add  cluster_code
    git commit -m"put your comment"
    git push

Suppose the journal referee are happy and your paper got published you can tag this version as well

    git tag "publication-version"
    git push --tags

You may also do some more updates later, you can tag a third version then. You can now check your github page under the release section.

git users who want to reproduce your analysis with the code used for the publication can clone the Bitbucket repository and then run git checkout publication-version.

You can try this in your local clone, run:

    git checkout publication-version

To go back to the latest version, run:

    git checkout master

Git with R and Rstudio

Our project organization