R Tutorial

What is R Programming?

R is a programming language; an interpreted computer programming language. R is a software environment also.

  • As a software environment, R programming is utilised for analyzing statistical information, reporting, data modelling and graphical representation.
  • It was created at the University of Auckland, New Zealand. Ross Ihaka and Robert Gentleman first made this language and is now developed by the R Development Core Team.
  • S programming language is combined with lexical scoping semantics to execute R programming language.
  • Along with branching and looping, this language facilitates modular programming using functions.
  • The codes written in other programming languages, like C, C++, .Net, Python, and FORTRAN can also be integrated with R. This makes the work easier.
  • R is popular among researchers, data analysts, statisticians, and marketers. The reason is, R is an efficient tool for retrieving, manipulating, interpreting, visualizing, and presenting data.

History of R Programming:

R was created at the University of Auckland, New Zealand. In 1992, the work on this programming language started that led to the release of the initial version in 1995. Ross Ihaka and Robert Gentleman first made this language and is now developed by the R Development Core Team. The name of both developers led to the name of this programming language.

VERSION DESCRIPTION DATE
0.49
  • R’s source was released for the first time.
  • CRAN (Comprehensive R Archive Network) was started.
1997-04-23
0.6 Got the GNU General Public License. 1997-12-05
0.65.1 The update.packages and install.packages were included. 1999-10-07
1 First production-ready version. 2000-02-29
1.4 First version for Mac OS. 2001-12-19
2 Version for Mac OS. 2004-10-04
2.1 Version with added support for UTF-8encoding, internationalization, localization etc. 2005-04-18
2.11 Version with added support for Windows 64-bit systems. 2010-04-22
2.13 Version with an added function that rapidly converts code to byte code. 2011-04-14
2.14 Version with some new packages. 2011-10-31
2.15 Version with improved serialization speed for long vectors. 2012-03-30
3 Version with support for larger numeric values on 64-bit systems. 2013-04-03
3.4 Version with the enabled just-in-time compilation (JIT) by default. 2017-04-21
3.5 Version with added new features, like, the compact internal representation of integer sequences, serialization format etc. 2018-04-23

Features of R programming:

  1. R is a simple, effective and interpreted programming language.
  2. R is created as data analysis software.
  3. R is a domain-specific programming language.
  4. R supports user-defined, looping, conditional, and various I/O structures.
  5. R is known for its consistent and incorporated set of tools.
  6. Different suites of operators for different types of calculation on arrays, lists and vectors.
  7. Efficacious storage and data handling facility.
  8. The notation of vectors facilitates the execution of a complex operation on a set of values in a single command.
  9. R is open-source software.
  10. It facilitates highly extensible graphical techniques.
  11. Multiple calculations are possible with the use of vectors.

Importance of R Programming:

  • For data scientists, the best programming languages are R and Python.
  • R facilitates the best algorithm implementation for machine learning.
  • R facilitates the execution of Xgboost, which is the best tool for Kaggle competition.
  • R can be best utilised for the investigation and exploration of the data, along with clustering, correlation, and data reduction.
  • R facilitates communication and integration with Python, Java, C++, etc.
  • R has accessibility to the big data world.
  • R facilitates connection with various databases like Spark or Hadoop.

Comparison between R and Python:

Data scientists require efficient tools for the identification extraction and representation of the desired data from the data source. For this, they make use of software like R, Python, SAS, SQL, Tableau, MATLAB, etc. Being the most popular choices, R and Python can sometimes create confusion on which is the most suitable for the desired task.

R Python
Overview:

  • R is a programming language; an interpreted computer programming language.
  • R is a software environment also.
  • As a software environment, R programming is utilised for analyzing statistical information, reporting, data modelling and graphical representation.
  • The notation of vectors facilitates the execution of a complex operation on a set of values in a single command.
  • R is open-source software.
  • It facilitates highly extensible graphical techniques.
  • Multiple calculations are possible with the use of vectors.
Overview:

  • Python is an Interpreted high-level programming language.
  • Python is popular for general-purpose programming.
  • Python facilitates simple and easy readability and debugging.
  • Python is a programming language created by Guido van Rossum for web development, software development, mathematics operations and system scripting.
Special Features for Data Science:

  • R facilitates the best algorithm implementation for machine learning.
  • R facilitates the execution of Xgboost, which is the best tool for Kaggle competition.
  • R can be best utilised for the investigation and exploration of the data, along with clustering, correlation, and data reduction.
  • R facilitates communication and integration with Python, Java, C++, etc.
  • R has accessibility to the big data world.
  • R facilitates connection with various databases like Spark or Hadoop.
Special Features for Data Science:

When it comes to the development of a web service to facilitate the users to upload datasets and find outliers, Python is preferred.

Functionalities:

Inbuilt functionalities for data analysis.

Functionalities:

Many of the data analysis functionalities are supported through packages like Numpy and Pandas and are not in-built in Python programming language.

Key Domains of Application:

Data visualization.

Machine Learning.

Key Domains of Application:

Deep learning.

Machine Learning.

Availability of Packages:

Hundreds of packages.

Availability of Packages:

Few main packages.

Popular Real-life Applications based on R:

With an increase in its popularity, nowadays, various applications utilises the R programming language. Some of these are:

  • Facebook
  • Google
  • Twitter
  • RealClimate
  • NDAA
  • XBOX ONE
  • ANZ
  • FDA
  • HRDAG
  • Sunlight Foundation

Prerequisites:

To work with R programming efficiently, some prerequisites are suggested. The prerequisites may ary depending on how you want to utilise R programming language and for what purpose. While working with R, its better to have a prior knowledge of:

  • Statistical theory in mathematics – For Data analysis and Interpretation.
  • Types of graphs and their uses – For Data Visualization.
  • Programming languages like C, etc – For coding purpose.