Data analysis with R

Contextualization and presentation of R and Rstudio



Translated by

Statistical analysis softwares

SAS, SPAD, SPSS… and R


In the early 2000s, a new software emerged and progressively established itself as an equal to the three major softwares that corner the market in statistical analysis

We would anticipate users of these softwares may be interested in R

Proprietary softwares are

not free (of charge)

SPSS license Base edition

Not cross-platform


Software/System Windows MacOS Linux BSD other Unix
SAS yes terminated yes no yes
SPAD yes no no no no
SPSS yes yes no no no
Stata yes yes yes no no


https://en.wikipedia.org/wiki/Comparison_of_statistical_packages

Specialized


    • SPSS: oriented towards social science
    • SPAD: oriented towards decision making
    • Stata: recommended for economists and epidemiologists
    • SAS: broad-based with some restrictions (i.e. graphic design)


They provide zero or few network analysis, sequence data analysis, lexicometry (except for SPAD), and few features dedicated to valorisation.

Restricted

Centralized management limits the following:

    • sustainability
    • freedom of use
    • compatibility with other softwares (data format)
    • compatibility of version updates
    • development of new functions
    • available languages (software and documentation)
    • available sources of information

Difficulties with…


    • training
    • usage
    • teamwork
    • interdisciplinary work
    • reproducibility


That is why we use R

Close enemies


Two languages often used in data management and data analysis
and compared against each other because of their similar features

Choosing R or Python depends on
who I am and what I want to do

Two divergent communities

    • accessible and inclusive community
    • rich and structured documentation
    • discipline: data analysis
    • jobs: research and development


Specific features

R is as brilliant…

    • easier to pick up (Rstudio)
    • statistical analysis
    • graphic design
    • valorisation (markdown, application…)

For users less advanced in programming
specialized in data analysis

History of R


R is based on programming language S, created in 1988

    • 1992: R. Gentleman and R. Ihaka begin working on the development (research project)
    • 1993: publication of the first binary version of R on Statlib
    • 1995: R is an open source software under the terms of the GPL2 license
    • 1997: Creation of the R core group. Creation of the CRAN (K. Jornik and F. Leisch)
    • 1999: Creation of the R website (r-project.org) . First in person meeting of the R core team
    • 2000: R 1.0.0 is released. John Chambers (designer of the S language), joins the R Core team
    • 2001: Creation of R News (known today as the R Journal)
    • 2003: Creation of the R Foundation
    • 2004: First UseR! conference (Vienna)
    • 2004: R 2.0.0 is released
    • 2009: First edition of the R Journal
    • 2013: R 3.0.0 is released
    • 2015: Creation of the R Consortium (involving the R Foundation)
    • 2020: R 4.0.0 is released

https://blog.revolutionanalytics.com/2017/10/updated-history-of-r.html

significant support


The result of 30 years of research and development


Major financiers support the development of R: Microsoft, Google, Oracle, Esri


https://www.r-consortium.org/members

Free and cross-platform


    • R is a free and open source software and programming language


    • one of the GNU GPLv3 packages


    • cross-platform


Software/System Windows MacOS Linux BSD Other Unix
R yes yes yes yes yes

Unlimited development


R offers 2292 standard statistical analysis and graphics functions (core-based)

Many packages are available to enrich this core base, they are listed on the
Comprehensive R Archive Network (CRAN). Ex :

    • quanteda - textual analysis
    • igraph - network analysis
    • sf - spatial vector data handling
    • shiny - interactive web applications


R has a modular structure that offers a multitude of applications
Its development is only limited by contributions

Unlimited development


Number of available packages on the CRAN

Versatile



Available packages allow a huge range of operations. From data collection to the final results’ valorisation (chart, gaphic design, document, website…)


its versatility makes R a complement and even a competition to many existing softwares

Versatile


A worldwide based community of users

https://benubah.github.io/r-community-explorer/rugs.html

…and companies

https://data-flair.training/blogs/r-careers/

Reliable


    • financial support from investors (R Consortium)
    • an involved community
    • an open source software that is open and therefore verifiable
    • some glitches encountered, however…


The information quickly runs through open software communities

reproducible work


    • a unique software to work on every processing step
    • contents are easily filed and shared (code scripts)
    • it only takes a computer to replicate the work


Reproducibility means sharing and transparency!

Inconveniences


    • R is a programming language first


    • Documentation & resources are mainly in English


    • A very basic interface

What is RStudio?


RStudio is a company developing and releasing softwares and services based on R.
It is the major private actor in R community

RStudio (its employees) developed several reference packages. Ex :

    • rmarkdown (document production)
    • shiny & flexdashboard (web application & dashboard)
    • ggplot2 (graphic design)
    • dplyr & tidyr (chart handling)
    • stringr (character string handling)


Rstudio also released an integrated development environment (IDE),
making R easier to work with

Basic R interface

R interface on Windows

IDE Rstudio

IDE Rstudio

Other strengths


    • project file organization
    • clicking functions
    • predictive text
    • keyboard shortcuts


it is simple, complete and constantly evolving

Use the RStudio environment!

Installation

Install R


Installing R and the Rstudio IDE is as smooth as any other software. Download R through the CRAN


https://cran.r-project.org/

Install the Rstudio IDE

Download the Desktop version from Rstudio website

https://rstudio.com/products/rstudio/download/

And we’re off!

Launch RStudio (not R) to begin

Free and open presentation
CC BY 3.0 license


consultation:


source code:


Documentation


Numerous referenced documentary resources (EN, FR and SP) on…


rzine.fr

Thanks


Natacha Bohin (Barts Cancer Institute)

Timothée Giraud (CNRS)

Violaine Jurie (Université de Paris)


REVEAL.JS