1.4 GitHub

You can learn how to code in R without using GitHub, but GitHub is the standard platform for sharing your code with teammates and with the general public. If you aren’t already familiar with GitHub, basically, GitHub is like Google Drive but you have to actively “fetch/pull” to get updates from the cloud downloaded to your machine, and “commit/push” to send your changes to overwrite what’s in the cloud. One user-friendly way to do this is with GitHub Desktop, which you should have installed; generally you’d have GitHub Desktop open as its own window while you’re working in RStudio.

GitHub Desktop doesn’t replace lots of activity you’d still do on GitHub.com, where you should have made a personal account. On GitHub.com you create repos which are like Drive folders, and which can have collaborators who have view/edit permissions. GitHub keeps track of changes for all files in a repo, kind of like version history on Google Docs. This version control is really only important when you have a lot of people working together on the same repo, which you won’t necessarily experience in this curriculum, but it’s useful to start practicing good habits on GitHub.

If you haven’t already, I’d recommend doing the tutorial on GitHub.com to understand the basic concepts and terminology, and then you can review the GitHub Desktop tutorial. The general workflow for starting a new coding project would be something like this:

  1. Create a repo on GitHub.com.
  2. Clone it on GitHub Desktop (there’s a button to initiate this from online, or within GitHub Desktop, if you’ve linked to your account, you can do File > Clone Repository and then search for it by name). This makes a copy of the repo on your personal machine, likely in an automatically created “GitHub” folder.
  3. In RStudio, set your working directory to this empty folder.
  4. Create a new .Rmd file, save it into the working directory, which would be the first observed “change” in your cloned repo.
  5. On GitHub Desktop, you’ll see those changes detected. Commit those changes, which usually involves writing a summary of the change (which can just be the word “updates”, but you should be detailed if you think you might want to time travel back to this change) in the bottom left of the window, clicking Commit to master, then clicking the button in the top tab that switches its name from Fetch origin to Pull origin to Push origin depending on what’s going on. If this is your first time “pushing”, it will say Publish branch.

If you’re working alone, you will just continue to make edits to the repo, and whenever you want to push, you repeat step 5. If you’re working in a team, and others may be editing the same repo, then always Fetch origin and Pull origin before you get started on anything in RStudio, to make sure you’re working off the latest version someone else might have edited, and then Commit to master and Push origin regularly and especially when you’re done working on something, so others can access your updated content. You’re trying to avoid creating parallel universes of code. If you want to play it safe, then you can always just create your own something_yourname.Rmd copies of code, but your team ultimately would need to decide how to merge various branches of code development back together. This becomes mainly a question of project management and team coordination on a messaging platform like Slack.

So far, all these GitHub practices probably feel like busywork, especially given there’s not much to show off or collaborate on yet in terms of actual code. My main reason for introducing this now is to now explain a more advanced use case of GitHub, called GitHub Pages, which is great for helping you actually publish your knitted HTML files to the web. Generally, if you have an HTML file, there are a number of ways to host that content online, but GitHub Pages is a streamlined option if you’re already using GitHub. Basically, follow the simple instructions, and you’ll create a specialized repo called username.github.io that GitHub treats like the “host” for web content, so if you push an HTML file to it, then within a minute or so, you can see the content at the URL username.github.io/filename.html, and importantly, send this URL to others for them to see your content. So then the publishing workflow would be something like this:

  1. Make sure you’ve created the special username.github.io repo and have cloned it to your local machine.
  2. Work in RStudio on a .Rmd file that you ultimately want to showcase online.
  3. When ready, click Knit and generate an HTML file in your working directory. Preview it and make sure it looks the way you want.
  4. Copy or move the HTML file from the working directory (which would be a clone of a repo from your GitHub account) to the username.github.io folder on your machine (which is also a clone of a repo from your GitHub account, but is the “special” web hosting repo).
  5. In GitHub Desktop, you’ll see that changes have been detected in both the original project repo and the special web hosting repo. Commit changes to both, and push both.
  6. In a minute or so, you can try loading username.github.io/filename on your web browser, and you should see the same thing you had previewed.

Note that when you start working with more complex interactive maps and charts, you might start to have other support files/folders that get generated by the “knitting” function that you have to copy/paste over along with the HTML file. And that’s basically all there is to it. Practice this as much as you want to start to create your own URLs. You can have as many as you want accessible through this domain, but it’ll take extra web development knowledge to do fancier things like format username.github.io itself to be a “home page” with a directory to all the other web material. Your general use case will be just sending individual URLs like username.github.io/filename to people on Slack or email. Those completing the assignments at the end of each chapter will be asked to “submit” their work as web pages using this exact method.

OK, now we’ll finally get into R coding. Before moving on, go ahead and set up a cloned GitHub repo, with any name you’d like, save your current .Rmd example file in that folder on your local machine, and set that folder as your “working directory” in RStudio.