Installer les packages R pour le cours “Getting and Cleaning Data” (coursera)Install R packages for the course “Getting and Cleaning Data” (coursera)

This post explains how to install all (or at least most of) the R packages described in the MOOC “Getting and Cleaning Data” offered by Johns Hopkins University on the MOOC coursera if you’re using Ubuntu. Moreover, it gives pratical advices for staying up-to-date with your R installation on this OS.

R installation: CRAN repository and RutteR ppa

First of all, the version of R included in Ubuntu repositories may be a bit old. I advice using the official CRAN repository editing (as root) the file /etc/apt/sources.list and adding the following line at its end:

deb http://cran.univ-paris1.fr/bin/linux/ubuntu precise/

adapt the previous line with your favorite CRAN mirror and your distribution’s name) and then

gpg --keyserver keyserver.ubuntu.com --recv-key E084DAB9
gpg -a --export E084DAB9 | sudo apt-key add -
sudo apt-get update
sudo apt-get upgrade
sudo apt-get install r-base-core r-base-dev

Packages for the CRAN repository are built on a Launchpad PPA called RutteR. It is possible to use the PPA itself, which includes a few more packages than the CRAN repository. Installing the PPA is done using:

sudo add-apt-repository ppa:marutter/rrutter
sudo apt-get update

Curl

As explained in the first week videos of the course, data avalaible through an ‘https’ connexion can be downloaded using the option method="curl" in some functions. However, on Ubuntu, you first need curl to be installed:

sudo apt-get install curl

Packages included in the repositories

Some packages are included in the repositories and can be installed directly using the command line:

sudo apt-get install r-cran-plyr r-cran-xml r-cran-reshape r-cran-reshape2 r-cran-rmysql

Packages easily installed from R

Some packages are not available in the RutteR ppa but are nevertheless easily installed in R using the CRAN repositories:

install.packages(c("jpeg","jsonlite","data.table","httr"))

or by the bioconductor project:

source("http://bioconductor.org/biocLite.R")
biocLite("rhdf5")

The hard way: package xlsx

xlsx may be a bit tricky to install because you need rJava which itself requires a proper JVM on your system. A problem has been reported trying to simply install the package r-cran-rjava:

conftest.c:1:17: fatal error: jni.h: No such file or directory
compilation terminated.
make: *** [conftest.o] Error 1
Unable to compile a JNI program

This problem is solved by:

  • first installing openjdk version 7:
    sudo apt-get install openjdk-7-*

    The installation is properly registered by your system using

    update-alternatives --config java

    and choosing openjdk-7 as the default JVM.

  • rJava can now be installed. Only, java configuration for R is updated before using the ubuntu package:
    sudo R CMD javareconf
    sudo apt-get install r-cran-rjava
  • finally, in R, run:
    install.packages("xlsx")

Now, you just have to learn how to use all these 😉