Department of Genetics

Luiz de Queiroz College of Agriculture

University of São Paulo

`## Warning: package 'knitr' was built under R version 4.1.3`

`## Warning: package 'rmarkdown' was built under R version 4.1.3`

R is a language and environment for statistical computing and
graphics. To download R, please visit the Comprehensive R Archive Network.
You do not need to be an expert on it to be able to build your linkage
map using *OneMap*.

Although we prefer and recommend the Linux version, in this tutorial it is assumed that the user is running Windows. Users of R under Linux or Mac OS should have no difficulty following this tutorial.

We would like to recommend those new users, instead of using plain R, use it through the fantastic software RStudio. With this package, there is no noticeable difference between operating systems.

As advertised on the website, *RStudio is an integrated
development environment (IDE) for R. It includes a console,
syntax-highlighting editor that supports direct code execution, as well
as tools for plotting, history, debugging, and workspace
management*. In other words, it offers a number of facilities for
your convenience that will make your life easier, especially if you have
never used R before.

So, go ahead and download and install R and RStudio. The window on the left is where you type the R commands you want.

In the left window, you can see a *greater than* sign
(``>’’), which means that R is waiting for a command. We call this a
*prompt*.

Let us start with a simple example of adding two numbers. Type
`2 + 3`

at the prompt then type the *Enter* key. You
will see the result directly on the screen.

```
2 + 3
#> [1] 5
```

You can store this result into a variable for future use, applying
the assignment operator _ <- _ (*less than* sign and _ minus_
altogether):

`<- 2 + 3 x `

The result of the calculation was stored into the variable
*x*. You can access this result by typing *x* at the
prompt:

```
x#> [1] 5
```

You can also use the variable *x* in other calculations. For
example:

```
+ 4
x #> [1] 9
```

So, play a little just to start understanding what is going on.

Another fundamental aspect in R is the usage of *functions*. A
function is a predefined routine used to do specific calculations. For
example, to calculate the natural logarithm of \(6.7\), we can use the function
*log*:

```
log(6.7)
#> [1] 1.902108
```

The function *log* contains a group of internal procedures to
calculate the natural logarithm of a positive real number. The input
values of a function are called *arguments*.

In the previous example, we provided only one argument (\(6.7\)) to the function. Sometimes a function has more than one argument. For example, to obtain the logarithm of \(6.7\) to base \(4\), you can use:

```
log(6.7, base = 4)
#> [1] 1.372081
```

It is possible to calculate the natural logarithm of a set of numbers
by defining a vector and using it as the first argument of the function
*log*. To do so we use the function *c()*, that
*combines* a set of values into a vector. Thus, to calculate the
logarithm of the numbers 6.7, 3.2, 5.4, 8.1, 4.9, 9.7, and 2.5, one can
use:

```
<- c(6.7, 3.2, 5.4, 8.1, 4.9, 9.7, 2.5)
y log(y)
#> [1] 1.9021075 1.1631508 1.6863990 2.0918641 1.5892352 2.2721259 0.9162907
```

Notice that *y* is a vector, that is the argument to the
function *log()*.

Every R function has a help page that can be accessed using a
question mark before the name of the function. For example, to get help
on function *log*, you would type:

` ?log`

This command will open a help page in the default web browser of your system. The help page contains some important information about the function such as its syntax, its arguments, and some usage examples.

There are many other ways of getting help, of course. For example,
from RStudio, click *Help* on the menu. For doing searches on the
internet, it is better to first go to https://rseek.org/, since R is a very
common letter to include in searches.

Although R has a huge amount of internal functions, for doing very
specific computations (like constructing genetic linkage maps), it is
necessary to add extra functionalities. These can be done by installing
a *package* (that, loosely speaking, will include a number of new
functions for helping you to achieve what you are trying to do). A
package is a collection of related functions, help files and example
data files that have been bundled together (Adler, 2010).

For example, let us assume that you need to convert a set of
recombination fractions into centimorgan distance using the Kosambi
mapping function. One possible way to do this is by using basic R to
write a function to calculate the distances. Another way is to use the
*OneMap* package. To install it you can type:

```
setRepositories(ind = 1:2)
install.packages("onemap")
```

You also can use the console menus on RStudio. On the bottom window
to the right, select **Packages**, then
**Install**, and finally select *OneMap* (select
CRAN as your repository). Yes, it is that easy!

Returning to the console, you need to load *OneMap* by
typing:

`library(onemap)`

Some Linux users reported the error message below:

`: dependency ‘tkrplot’ is not available for package ‘onemap’ ERROR`

To fix it, in a terminal (outside R), install
`r-cran-tkrplot`

:

`-get install r-cran-tkrplot sudo apt`

To finish our example, let us enter some recombination fractions, for
example, 0.01, 0.12, 0.05, 0.11, 0.21, 0.07, and save it into a variable
named *rf*:

`<- c(0.01, 0.12, 0.05, 0.11, 0.21, 0.07) rf `

Now, let us use *OneMap*’s function *kosambi* to do the
calculation:

```
kosambi(rf)
#> [1] 1.000133 12.238706 5.016767 11.182805 22.384601 7.046279
```

You can also obtain help on the function *kosambi* using the
question mark in the same way as done before:

` ?kosambi`

So far, we have entered the variables in R by typing them directly
into the console. However, in real situations, we usually **read
these values from a file** or a data bank (including files on the
internet).

To learn this procedure, copy and paste the following table into a
text editor (for example, *notepad*) and save it to a file called
*test.txt* into any directory in your computer (such as *My
Documents*).

```
x y
2.13 4.50
4.48 1.98
10.95 9.29
10.03 16.25
12.72 27.38
24.63 22.60
22.57 36.87
29.78 31.73
19.54 10.42
7.86 14.68
11.75 8.68
23.71 37.39
```

To read these data set into R, first, you have to set the working
directory. Go to *Session*, then *Set Working Directory*,
and *Choose Directory*, pointing to where you saved the file
*test.txt*.

Now let us read the file *test.txt* into R and store it in a
variable named *dat*. To do this, we can use using the R function
*read.table*. The first argument is the name of the file; the
second one indicates if the file contains a header, that is, if the
first line of the file contains the names of the variables (which is
true for our example):

```
<- read.table(file = "test.txt", header = TRUE)
dat dat
```

The second line, with *dat*, is necessary to ask R to print
the contents of the object *dat* (i. e., the data itself).
Inspecting the object *dat* you can see a table with 12 rows and
two columns. The names of the columns are *x* and *y*. We
can access the variables in columns using the dollar sign followed by
the column name:

```
$x
dat#> [1] 2.13 4.48 10.95 10.03 12.72 24.63 22.57 29.78 19.54 7.86 11.75 23.71
$y
dat#> [1] 4.50 1.98 9.29 16.25 27.38 22.60 36.87 31.73 10.42 14.68 8.68 37.39
```

It is also possible to use the function *summary* to extract
some information about the object *dat*, or about each one of the
columns separately:

```
summary(dat)
#> x y
#> Min. : 2.130 Min. : 1.980
#> 1st Qu.: 9.488 1st Qu.: 9.137
#> Median :12.235 Median :15.465
#> Mean :15.012 Mean :18.481
#> 3rd Qu.:22.855 3rd Qu.:28.468
#> Max. :29.780 Max. :37.390
summary(dat$x)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 2.130 9.488 12.235 15.012 22.855 29.780
summary(dat$y)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 1.980 9.137 15.465 18.481 28.468 37.390
```

The function *summary* provides some descriptive statistics
about the variables in the dataset. If you want to export this
information to a file you can use the function *write.table*:

`write.table(x = summary(dat), file = "test_sum.txt", quote = FALSE)`

The first argument is the output of the *summary* function.
Note that is possible to use a function as an argument of another one.
The second argument is the name of the file in which the summary will be
written. Notice that this will happen in the *working directory*,
previously set through RStudio menus. The third argument eliminates
double quotes from the output file. After running the command, you can
look for the file *test_sum.txt* in the working directory you
defined before.

In R, every object belongs to a ** class**. This
is a simple concept that you must remember. For example, the

```
class(dat)
#> [1] "data.frame"
```

When we used the function *summary*, it automatically
recognized the class of the object *dat* and applied a specific
procedure developed for class *data.frame*, which in this case
involves the computation of some descriptive statistics.

This procedure is named *method*. However, other classes of
objects can be used as arguments to function *summary* and the
result will be different!

For example, let us adjust a linear (regression) model using column
*y* as the response variable, and column *x* as the
independent one. This can be done with the function *lm()*:

```
<- lm(dat$y ~ dat$x)
ft_mod
ft_mod#>
#> Call:
#> lm(formula = dat$y ~ dat$x)
#>
#> Coefficients:
#> (Intercept) dat$x
#> 1.803 1.111
```

This function is used to fit linear models and, by default, prints
just a formula and the coefficients of the linear regression. Object
*ft_mod* is of class *lm*:

```
class(ft_mod)
#> [1] "lm"
```

So, if we use function *summary* to obtain more information
about the fitted model, the result will be:

```
summary(ft_mod)
#>
#> Call:
#> lm(formula = dat$y ~ dat$x)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -13.091 -5.144 -1.413 5.421 11.446
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 1.8026 4.7689 0.378 0.71334
#> dat$x 1.1110 0.2771 4.009 0.00248 **
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 8.075 on 10 degrees of freedom
#> Multiple R-squared: 0.6164, Adjusted R-squared: 0.5781
#> F-statistic: 16.07 on 1 and 10 DF, p-value: 0.002482
```

In this case, function *summary* recognizes *ft_mod* as
an object of class *lm* and applies a method that shows
information about the fitted model, such as the distribution of the
residuals, regression coefficients, t-tests, and the coefficient of
determination (\(R^2\)), etc.

Thus, it is possible to use the same function on different classes of
objects to obtain different results. This concept is very important in
*OneMap* and you must remember it to use the package. For
example, in other vignettes, we will show that depending on the class of
the dataset, which can be *outcross*, *f2*,
*backcross*, *riself* and *risib*, a certain set of
procedures will be applied. Not by coincidence, these classes correspond
to all types of populations that can be analyzed. The advantage of this
approach is that you do not need to change the function to do a specific
analysis; it will recognize the object type and will adapt
accordingly.

Finally, you may need to save your work to come back to it in another working session. But before we explain how to do that, let us explain a few other concepts.

You can save your ** R Script**, which is the
file that has all R instructions you typed so far. You can later load
them and run all instructions again to get the same results. This is
easy: just click

A different thing is to save your **R Session**, with
all objects you created so far (called *R Workspace*). This is
not the same, because once you load the workspace, you will have all the
objects already loaded, not requiring you to do everything again, i. e,
running your script. This will help you to save a lot of time since some
of the analyses required to build linkage maps are time demanding.

To do so, click *Session*, then *Save Workspace As* and
choose a directory and name. In your next session, open RStudio and then
go to *Session*, *Load Workspace*.

Alternatively, you can do that using the R function
*save.image*, For example, if you want to save your analysis in a
file named *myworkspace.RData*, you should use:

`save.image("myworkspace.RData")`

To load:

`load("myworkspace.RData")`

N. Matloff, The Art of R Programming. 2011. 1st ed. San Francisco, CA: No Starch Press, Inc., 404 pages.

Adler, J. R. 2009. R in a Nutshell. A Desktop Quick Reference. O’Reilly Media.