What is the R language?

Table of contents

  1. What is the R language
  2. What can you do with the R language?
  3. The r language is also recommended for Excel users
  4. R language/Python
  5. How to get started with the R language
  6. summary

What is the R language

The R language, developed by researchers at the University of Auckland in New Zealand in the 1990s, is a “programming language that specializes in statistical analysis, data analysis, and graphics.” It’s open source so you can use it for free.

The R language inherits the specifications of the statistical processing language “Slanguage” developed at AT&T Bell Laboratories in the 1980s. It has evolved as a programming language in the field of statistical analysis, rather than a general-purpose language like C.

What can you do with the R language?

The R language specializes in the fields of statistical computing and graphics. In the field of statistical analysis, there are many types of “packages” that can perform complex statistical calculations with simple code. As of the end of 2020, just under 17,000 packages are available. It is mainly used for statistical analysis, but it is also used in fields such as time series analysis, machine learning, and bioinformatics.

Statistical analysis

Not only can you make statistical tests and inferences based on traditional statistics. Statistical analysis packages created by researchers around the world are added almost daily and are available for anyone to use. Thanks to this, statistical analysis methods used in the latest research in various fields can be easily executed.

text mining

For Japanese, morphological analysis can be performed using MeCab and RMeCab, and natural language processing and text mining can be performed using R language. You can extract frequently occurring words from text data, create word clouds, analyze co-occurrence networks of words, and perform correspondence analysis with the R language package. The R language code is also used for part of the analysis of the text mining software KH Coder.

Histogram, scatterplot

In the R language, which was developed as a statistical processing language, you can create a histogram with just the code “hist(data)”, for example. If you want to draw a scatterplot, you just need the code “plot(x,y)”.

t-test

A t-test of the difference between the means of two groups can be done with a simple code like ‘t.test(group1, group2, var.equal=TRUE)’. It is also possible to read the t-test result numbers and create a graph with a 95% confidence interval.

Bayesian statistics

In addition to traditional statistics, the R language also has many packages for Bayesian statistics. Bayesian estimation may use MCMC (Markov chain Monte Carlo method), but MCMC can be executed using the R language.

A package called ‘bayesAB’ can be used to perform AB testing using Bayesian statistical methods. It is recommended to use Bayesian statistics because it is easier to interpret AB test results. Among the packages, there is also a Bayesian statistical version package for t-tests, etc., and it is possible to execute the Bayesian statistical version of familiar statistical processing in the R language.

Learning statistical theory

The R language began as a statistical training program. The R language, which makes statistical analysis easy, can also be used as a tool for studying statistics. Learning statistical theory while executing calculations in actual statistical analysis in the R language should lead to a more practical understanding.

Basic statistics such as mean/median, variance/standard deviation, and minimum/maximum can be easily calculated, and histograms, boxplots, scatterplots, etc. can be easily generated from data for statistical analysis. You can create it and visually know details such as the distribution of data.

The r language is also recommended for Excel users

Much of the data you process in Excel can also be processed in R. In some cases, the processing performed by entering calculation formulas and functions in Excel cells can be processed more efficiently and accurately in the R language.

For example, to enter the sum of the numbers in column A and the numbers in column B into column C, in Excel you need to paste the formula of the cell into all the cells in column C, but in the R language, the data is displayed column by column. (Variable unit) is handled, so it is more efficient and there are no mistakes in pasting functions.

Data preprocessing and data cleansing in R language

In the stage before data analysis, data preprocessing that adjusts the content and shape of data can be performed with Excel’s Power Query, but similar processing is also possible with the R language. The R language may be more complex and easier to perform detailed processing. The R language has the advantage that it is easier to obtain information from the net and books than the M language of Power Query.

If data preprocessing is performed in the R language, preprocessing programs can be reused, and the same results can be reproduced by anyone. If you’re doing preprocessing manually in Excel, we recommend you consider using Power Query or the R language. The actual analysis can be done in Excel, or the R language can be used to perform everything from preprocessing to analysis all at once.

R language/Python

Like the R language, Python is an open-source language and can be used for free. Many of the processes that can be executed in R language can also be processed in Python. Python is also rich in statistical processing “libraries”. Furthermore, Python is used not only in the field of statistical calculation but also in a wide range of fields such as building web services and developing applications.

Python is more versatile

Python is a general-purpose programming language similar to C language. Google’s web services also use Python a lot. Python programs are used behind the scenes of services such as YouTube and Gmail. While the R language is mainly used in the field of statistical analysis, Python is used in a very wide range of fields. Python’s popularity is growing because of its wide range of applications.

Python dominates the machine learning and deep learning fields

The R language is used in fields such as statistical analysis and data visualization, but the majority of people use Python in the field of data science such as machine learning and deep learning.

If you look at the code published on Kaggle, a data science competition site, most of it is in Python.

How to get started with the R language

First, you need to download and install the installer from the official R language website. You can use the R language using the console as it is, but if you use the software called RStudio, it will be much more convenient.

In the case of a Windows PC, if a Japanese name is used for the account name, the Japanese folder name will be included in the path, and it may not work properly. In this case, it is necessary to take measures such as creating and installing an environment for an account with an English user name.

Use RStudio

With RStudio, you can easily load/save data, create/execute programs, display processing results, and manage packages. You can install it on your computer and use it, but there is also a free plan for the cloud service called RStudio Cloud, so you can try it from your browser.

Use GUI

There is a package called “R Commander” that allows you to operate the R language in a GUI (graphical user interface) environment. There is a free software called EZR, which is convenient for medical statistical analysis, developed by Jichi Medical University based on the “R Commander”.

EZR supports basic statistical analysis processing, so it can be used extensively in fields other than medical statistics. There is also an introductory book on EZR, and statistical processing can be done by clicking the menu with the mouse, so it is easy. Recommended for those who want to try statistical analysis using the R language.

summary

One of the features of the R language is that there are many packages integrated with research activities in various fields. Statistical programs used in cutting-edge research are available.

Some people may feel alienated from “having to program”, but there are also accessible environments like EZR.

There are plenty of examples of R language programs on the net, and it is surprisingly easy to start. The R language is useful for learning statistical analysis and for jobs that require statistical analysis. Why not take this opportunity to experience the R language?

Leave a Comment