servosphereR: A Package for Analyzing SynTech Servosphere Data

Jacob T. Wittman (corresponding author), Brian H. Aukema

Department of Entomology, University of Minnesota
wittja01@gmail.com

2019-05-14

Introduction

A servosphere, or locomotory compensator, is a device that can be used to study the movement of insects or other small flightless animals. A servosphere is a sphere, on top of which an insect is placed. A camera is mounted directly over the top of the sphere with a downward viewing angle. The camera watches the insect as it moves and reports position changes to three motors positioned as a tripod at the base of the ball. As the insect moves, the camera relays information in real time to the motors, which coordinate rotation to keep the insect positioned at the top of the sphere. Locomotory compensators have been used in a number of studies of insect movement in response to different stimuli (Arnold, Stevenson, and Belmain 2016; Becher and Guerin 2009; Bell and Kramer 1980; Noldus, Spink, and Tegelenbosch 2002; Otálora-Luna and Dickens 2011a; Otálora-Luna and Dickens 2011b; Otálora-Luna, Lapointe, and Dickens 2013; Party et al. 2013; Sakuma 2002).

This package is written to aid analysis of data produced by a SynTech Servosphere product website. Data are output in a standardized raw x.csv form. The software provided to date by Syntech that is shipped with the servosphere can be used for some analyses. Other, more complicated analyses, however, require working directly with the raw x.csv files produced during instrument recording.

This package is the result of analyzing servosphere data in R for the Master’s thesis of Jacob T. Wittman in the Forest Insect Science laboratory in the Department of Entomology at the University of Minnesota, St. Paul, MN, USA. This short vignette describes the x.csv output available from the servosphere software and provide examples of potential work flows for these data.

Raw Servosphere Data

After each trial recorded by the servosphere, the user can (and should) save the recorded x.csv file. The first few rows of one of these x.csv files will look like this when first imported into R.

##   X.cState dT..ms. dX..cm. dY..cm. encoder1..puls. encoder2..puls.
## 1        0       8    0.00    0.00               0               0
## 2        0      10    0.00    0.00               0               0
## 3        0      10    0.00    0.00               0               0
## 4        0       9    0.00    0.00               0               0
##   encoder3..puls..
## 1                0
## 2                0
## 3                0
## 4                0

As you can see, the column names are a bit of a mess but they do provide useful information about what is recorded in these files. Each row in this x.csv is a record of where the insect has moved in the time elapsed since the last cycle of communication between the servosphere and the computer was completed. The time between communication cycles in recorded in the dT column and is in milliseconds. The data in the dX and dY columns record how far in the x and y directions the insect moved during the time period measured in the dT column. The encoder pulse columns are not vital to analysing the data and pertain to software reconstruction of the direction and distance moved between communication cycles.

The cState records which “stimulus” has been activated according to the software during the period of recording. The stimulus function available in the locomotion compensator software allows the user to record the time when a stimulus has been administered to the study organism. For example, a researcher may wish to measure the locomotory response of an insect to different concentrations of a sex pheromone. A single trial for this experiment could consist of allowing the insect to move about on top of the sphere for several minutes in the absence of the pheromone and then administering the pheromone for the same amount of time and recording the movement after the pheromone has been introduced. The researcher could use the stimulus function to identify the time period during which there was no pheromone and the time period during which there was pheromone present, facilitating later analyses.

Using servosphereR to work with servosphere data

The functions in the servosphereR package can be divided into three use categories: loading in and cleaning the data, deriving other movement variables of interest from the raw data, and summarizing the derived variables for the purpose of analysis. Generally speaking, the functions should also be used in the same order as the categories just described. That is, first clean the data, derive any variables of interest, and then summarize them for analysis. The descriptions of functions that follows will use this work flow.

It is important to note that these functions were designed to be used together in a specific work flow. They may not function correctly or at all if used outside the described workflow.

If you are comfortable working in R and learning how to use new functions, you may wish to skip towards the bottom of this vignette and just look at the “Workflow” section, which will simply detail the order in which the functions should be called. Additional documentation can be found in R by using the ? help function along with the function name (e.g. ?calcVelocity).

Loading and cleaning data

Each trial recorded on the servosphere produces one x.csv file. A single experiment may produce tens or hundreds of such x.csv files, which can be difficult to manage in R. The getFiles() function makes it easy to import these raw data files into R so the researcher can begin working with them. To use getFiles() , all of the raw, unedited x.csv files should be stored in a single folder, which is provided to the function in the path argument. Additionally, each file should contain some string of characters that distinguishes them from any other x.csv files that may be in the data folder. For example, one possible naming scheme would be “01_28052018_servosphere.csv”. The first number “01” is a unique identifier for the file, the following number is the date in DMY format and the last part, “_servosphere" serves as a pattern identifier that getFiles() will use to select only csv files with “_servosphere" in the title. This function will also drop the encoder columns from each data set, as they are not necessary for analyzing or working with the data. The following code chunk demonstrates how to use the getFiles() function.

The created object servosphere_data is a list. Each item in the list is one raw x.csv file from the “servosphere_data” folder that has a title that matches the pattern “_servosphere" somewhere. It is important that the pattern provided is only used in the x.csv data files, otherwise R will try to read in any file containing that pattern and this function and subsequent functions will produce errors.

Now that the data are loaded into R, cleaning can begin. The first step should be to clean up the messy column names shown earlier. This can be done with the cleanNames() function. This function takes two arguments. The first argument provides the function with the list object in which the data are stored. The second argument should be a character vector containing the cleaned names designated for all the columns. Specifically, the “dT”, “dx”, “and”dy" columns should all be named as such. Other functions will look for columns with those names and if they are not found, the functions will not work. Additionally, every column in the data frame must have an associated character string to be used as name. The names in the following example can be used for simplicity.

##   stimulus dT dx dy
## 1        0  8  0  0
## 2        0 10  0  0
## 3        0 10  0  0
## 4        0  9  0  0

After applying the cleanNames() function, the data contained in our list will have their heading names renamed, facilitating the use of other functions.

The next step in the work flow is to merge any additional experimental information that may be pertinent for future analysis with the raw data. For example, one may wish to determine how different types of food affect the movement behavior of the study organism. This is accomplished by maintaining a separate x.csv file containing data linking each trial with a unique ID and any relevant experimental information. There should be one row for each unique trial. These rows should be ordered in the same order as the data frames in the list of data frames generated by getFiles(). Here is a simple example of a file containing additional experimental information:

##   id treatment    date  time
## 1  1         a 2032018 13:30
## 2  2         b 2042018 07:30

The important information needed for analysis is the treatment variable for the two trials represented in this data frame, specifically in the ID and Treatment columns. An ID column at minimum must be merged with the raw data for future functions to work. We can merge this information with the raw output provided by the software using the mergeTrialInfo() function. This function has four required arguments that must be specified, as well as one optional argument that can be used to allow more flexible experimental designs making use of the stimulus functionality provided by the software. The first example will cover the more basic usage, while other examples will demonstrate the additional functionality. The mergeTrialInfo() function must be given the list object that holds the raw data, the name of the data frame that contains the experiment or trial information that should be merged with the raw data, the columns to extract from the trial information data frame, and which stimuli to keep in the raw data. That last argument will be further explained in the next example; for now we wish to keep all three of the stimuli specified in our raw data:

##   stimulus dT dx dy id treatment
## 1        0  8  0  0  1         a
## 2        0 10  0  0  1         a
## 3        0 10  0  0  1         a
## 4        0  9  0  0  1         a

If it is necessary to remove a certain section of the raw data because it was used as a “warm-up” for the insect, the mergeTrialInfo() function can do that as well. For example, the user may have used the stimulus recording to identify the end of the warm-up period by switching from stimulus 0 to stimulus 1. In this instance, they would want to remove information linked with stimulus 0. This is done by providing a numerical vector to the argument stimulus.keep, telling the function which stimulus observations to retain in the data.

##   stimulus dT dx dy id treatment
## 1        1  7  0  0  1         a
## 2        1 79  0  0  1         a
## 3        1 32  0  0  1         a
## 4        1  7  0  0  1         a

Lastly, some experimental designs may be interested in comparing behavior before and after a certain stimulus is administered. This function can be used to split your data by stimulus into multiple separate data frames to facilitate this analysis. If this effect is desired, the user must include an additional column in the experimental information x.csv titled “id_stim” that combines the unique ID value for each trial with each stimulus used during said trial. For example, in one experiment, stimulus 1 may be used to denote the start of the trial before a pheromone is administered and stimulus 2 is used to signify the time period during which the pheromone is present. Two rows would needed in the experimental information x.csv per unique trial. Such a x.csv file might look like this:

##   id stimulus treatment    date  time id_stim
## 1  1        1         a 2032018 13:30     1_1
## 2  1        2         a 2032018 13:30     1_2
## 3  2        1         b 2042018  7:30     2_1
## 4  2        2         b 2042018  7:30     2_2

There are now two entries for each unique trial, trial 1 and trial 2. The information is the same for each trial except for the “id_stim” column which combines the information in the “id” and “stimulus” columns, separated by an underscore. These rows must be ordered first by the “id” column then within id, by “stimulus”. This ordering is necessary to split raw data by stimulus. To split data by stimulus, the argument stimulus.split must be set equal to TRUE.

##   stimulus dT dx dy id id_stim treatment
## 1        1  7  0  0  1     1_1         a
## 2        1 79  0  0  1     1_1         a
## 3        1 32  0  0  1     1_1         a
## 4        1  7  0  0  1     1_1         a

Now that the data is loaded and the variable names cleaned and attached with additional experimental information, the servosphere data can be aggregated if desired. Data are recorded at the millisecond scale and spurious movements of the organism on the top of the sphere may be picked up as motion. Aggregating the data is one method to reduce noise caused by organisms moving appendages or shifting but not actually moving. The aggregateData() function is used to aggregate the data to a coarser time scale. Some of the literature cited previously in this vignette have recommended aggregating the data so each recording interval represents the amount of time it takes the organism to move half its body length, on average. This suggestion can be used as a useful starting place but observing the movement of the insect while it’s being recorded can help select a more useful aggregation level. Observant readers may have noticed that observations do not appear to be recorded in exactly even intervals based on entries in the dT column. Aggregating the data provides another benefit, as the discrepancies between the time intervals dissipates as intervals are aggregated to coarser scales.

To use the aggregateData() function, the user must provide two arguments: the first argument is the list of data on which to perform the aggregation, and the second argument tells the function how many consecutive rows to aggregate.

##   id dx dy stimulus treatment  dT length
## 1  1  0  0        1         a 592     60
## 2  1  0  0        1         a 521     60
## 3  1  0  0        1         a 516     60
## 4  1  0  0        1         a 521     60

Now the data have been aggregated, deriving movement variables can begin!

Deriving movement variables

This package provides a number of functions for calculating different variables that can be derived from the raw data. All of these functions start with the prefix calc. The derived variables obtainable include:

These functions add new variables to the existing list of data frames with which the above examples have been working. The list of data frames is the only argument needed to use these functions and it is very easy to use the %>% pipe operator from the magrittr package to chain these functions together. (This is different from the logical operator | which is also often referred to as a pipe.) The following example chains the above functions together to produce the range of derived variables available.

##   id dx dy stimulus treatment  dT length x y distance bearing turn_angle
## 1  1  0  0        1         a 592     60 0 0        0       0         NA
## 2  1  0  0        1         a 521     60 0 0        0       0          0
## 3  1  0  0        1         a 516     60 0 0        0       0          0
## 4  1  0  0        1         a 521     60 0 0        0       0          0
## 5  1  0  0        1         a 519     60 0 0        0       0          0
## 6  1  0  0        1         a 512     60 0 0        0       0          0
##   turn_velocity velocity
## 1            NA        0
## 2             0        0
## 3             0        0
## 4             0        0
## 5             0        0
## 6             0        0

The list of data frames now has a column for each new variable. The data is now ready to be summarised for analysis.

Summarising movement variable data

There are also a number of summary functions available in this package to provide averages or descriptive values for each unique trial data set contained in the list of data frames. These functions all begin with the prefix summary. The summary options available include:

These functions often require two arguments: the list containing the data frames with the derived variables and raw data and a data frame object in which to save the summary data.

Both total distance and net displacement are needed to calculate tortuosity, so this example will start by calculating total distance to initialize the summary data frame. No summary data frame currently exists, so the argument summary.df is set to NA.

##   id total_distance treatment stimulus
## 1  1       6.680208         a        1
## 2  2      78.446680         b        1

Now that a summary data frame object has been started, that object should be provided to the summary functions as an argument.

##   id net_displacement total_distance treatment stimulus
## 1  1         4.361617       6.680208         a        1
## 2  2        75.437553      78.446680         b        1

With both total distance and net displacement calculated, tortuosity can be found from the summary data frame. This function requires that the variable names for total distance and net displacement be specified. The inverse argument can be used to flip the ratio from \(\frac{net~displacement}{total~distance}\) to \(\frac{total~distance}{net~displacement}\) by setting inverse = TRUE.

##   id net_displacement total_distance treatment stimulus tortuosity
## 1  1         4.361617       6.680208         a        1  0.6529163
## 2  2        75.437553      78.446680         b        1  0.9616411

The summaryStops() function also requires an additional argument. When using summaryStops(), the user must specify the velocity to be used as the “stop threshold”. Aggregating the data earlier helped remove some spurious movement recordings, but it is possible there are very small movements registered that were not in fact movements. The stop threshold argument tells the function that if it encounters non-zero velocities below a certain amount, it should still treat them as zero measured velocity. For example, a researcher may observe that the software has recorded movements below 0.1 cm/s but based on their observations, they know that such a measured speed is just a remnant of spurious motion that was detected.

##   id number_stops avg_length_stops net_displacement total_distance
## 1  1           44        10.068182         4.361617       6.680208
## 2  2           39         1.487179        75.437553      78.446680
##   treatment stimulus tortuosity
## 1         a        1  0.6529163
## 2         b        1  0.9616411

The function has returned the number of stops and the average length of those stops in seconds for each of the movement paths.

The remaining summary functions do not require any additional arguments. Their use is presented below. The avgBearing() function returns two columns: circular_mean and circular_rho. The bearing variable lies between 0 and 360, so the typical arithmetic mean should not be used. Instead, average bearing is calculated as the circular mean, which also includes a measure of concentration, the circular rho, which is between 0 and 1. Circular rho is a measure of concentration of bearing; it gives an idea of how clustered the points are around the mean with a circular rho of 1 corresponding to identical bearing measurements and a circular rho of 0 corresponding to a lack of clustering.

##   id avg_velocity circular_mean circular_rho number_stops avg_length_stops
## 1  1    0.0262414     16.063781    0.6896376           44        10.068182
## 2  2    0.3078803      2.211555    0.9549934           39         1.487179
##   net_displacement total_distance treatment stimulus tortuosity
## 1         4.361617       6.680208         a        1  0.6529163
## 2        75.437553      78.446680         b        1  0.9616411

Now the summary data frame has all the available summary variables, which can be used along with the variables specifying treatment (or stimulus if stimulus was used to split data) for analysis.

Example work flow

Below is a simplified example work flow showing how the functions in this package can be used. Consult the documentation or this vignette for more information on the functions.

# Get files, clean file column names, and merge relevant experimental info
servosphere_data <- getFiles(path = ".", pattern = "_servosphere") %>%
   cleanNames(colnames = c("stimulus",
                           "dT",
                           "dx", 
                           "dy")) %>% 
   mergeTrialInfo(trial.data = experiment_info,
                  col.names = c("id", "treatment"),
                  stimulus.keep = c(1)) %>% 
   aggregateData(n = 100) %>% 
   # Calculate derived movement variables
   calcDistance() %>% 
   calcBearing() %>% 
   calcTurnAngle() %>% 
   calcTurnVelocity() %>% 
   calcVelocity()

# Summarize derived variables
summary_data_frame <- summaryTotalDistance(list = servosphere_data,
                                           summary.df = NA)

summary_data_frame <- summaryNetDisplacement(list = servosphere_data,
                                             summary.df = summary_data_frame)

summary_data_frame <- summaryTortuosity(summary.df = summary_data_frame,
                                        total.distance = total_distance,
                                        net.displacement = net_displacement,
                                        inverse = FALSE)

summary_data_frame <- summaryStops(list = servosphere_data,
                                   summary.df = summary_data_frame,
                                   stop.threshold = 0.1)

summary_data_frame <- summaryStops(list = servosphere_data,
                                   summary.df = summary_data_frame,
                                   stop.threshold = 0.1)

summary_data_frame <- summaryAvgBearing(list = servosphere_data,
                                        summary.df = summary_data_frame)

summary_data_frame <- summaryAvgVelocity(list = servosphere_data,
                                         summary.df = summary_data_frame)

Comments, Requests, Suggestions, Etc.

This first author will happily respond to any requests or suggestions to improve this package or implement additional functionality. He can be reached at wittja01 [at] gmail dot com or on Twitter @wittja01. This package is also available on GitHub at github.com/wittja01/servosphereR, so consider making pull requests or leaving comments there.

Citations

Arnold, Sarah E.J., Philip C. Stevenson, and Steven R. Belmain. 2016. “Shades of yellow: interactive effects of visual and odour cues in a pest beetle.” PeerJ 4 (July). PeerJ Inc.: e2219.

Becher, Paul G, and Patrick M Guerin. 2009. “Oriented responses of grapevine moth larvae Lobesia botrana to volatiles from host plants and an artificial diet on a locomotion compensator.” Journal of Insect Physiology 55 (4): 384–93.

Bell, William J., and Ernest Kramer. 1980. “Sex pheromone-stimulated orientation of the American cockroach on a servosphere apparatus.” Journal of Chemical Ecology 6 (2). Kluwer Academic Publishers-Plenum Publishers: 287–95.

Noldus, Lucas P J J, Andrew J Spink, and Ruud A J Tegelenbosch. 2002. “Computerised video tracking, movement analysis and behaviour recognition in insects.” Computers and Electronics in Agriculture 35 (2-3): 201–27.

Otálora-Luna, Fernando, and Joseph C. Dickens. 2011a. “Spectral preference and temporal modulation of photic orientation by Colorado potato beetle on a servosphere.” Entomologia Experimentalis et Applicata 138 (2): 93–103.

———. 2011b. “Multimodal stimulation of colorado potato beetle reveals modulation of pheromone response by yellow light.” Edited by Guy Smagghe. PLoS ONE 6 (6): e20990.

Otálora-Luna, Fernando, Stephen L. Lapointe, and Joseph C. Dickens. 2013. “Olfactory Cues Are Subordinate to Visual Stimuli in a Neotropical Generalist Weevil.” Edited by Frederic Marion-Poll. PLoS ONE 8 (1). Public Library of Science: e53120.

Party, Virginie, Christophe Hanot, Daniela Schmidt Busser, Didier Rochat, and Michel Renou. 2013. “Changes in odor background affect the locomotory response to pheromone in moths.” Edited by John I. Glendinning. PLoS ONE 8 (1). Public Library of Science: e52897.

Sakuma, Masayuki. 2002. “Virtual reality experiments on a digital servosphere: guiding male silkworm moths to a virtual odour source.” Computers and Electronics in Agriculture 35 (2): 243–54.