Part 1: Configuring Zoom to Capture Useful Data

Andrew P. Knight


The first part of this guide will provide recommended practices and processes to use when you are collecting data through Zoom to be used in research. Like any aspect of research, careful and thoughtful upfront planning pays dividends when using Zoom. Because zoomGroupStats relies on Zoom Cloud recording features, this guide will focus specifically on practices to use when recording your data to the cloud. However, even if you are recording virtual meetings locally, the same basic principles will likely apply.

This guide will not provide step-by-step instructions for how to operate Zoom. For detailed guidance on using Zoom, you should consult the Zoom Help Center.

Hopefully you are reading this guide before you have started collecting data. It is before collecting data that you have the best chance to minimize undesirable variation and maximize your options for using the data that you collect. Collecting data through virtual meetings is complicated and requires a thoughtful process. To give yourself the best downstream outcomes, take time upfront–before running any meetings at all–to configure your Zoom subscription. In particular, consider the following recommendations:

Develop a standardized protocol

Before launching data collection, create and produce documentation of a standard process for yourself and any collaborators to follow. This is especially important if you will be depending on others (e.g., collaborators, research assistants, participants themselves) to capture virtual meetings. A standardized protocol will ensure consistency in your raw Zoom output across multiple meetings. As examples, consider:

  1. Sample of a guide given to those charged with recording meetings
  2. Sample video guide for how to set up Zoom recording features
  3. Sample video guide for recording the meeting itself

Maximize degrees of freedom

When configuring your Zoom subscription and preparing to record virtual meetings, I recommend providing yourself the most flexibility upfront. You can always subset and focus on some elements downstream. But, if you don’t capture something upfront, you’ll lose those options downstream. In particular:

  1. If using cloud-based recording, select all possible recording options (of different views). This gives you the ability to make selective decisions after you’ve run the meeting.
  2. Select options that enhance the recording for 3rd part video editing.
  3. Make sure to select the option to have Zoom produce an audio transcript.
  4. Make other option selection in a manner consistent with your research goals (e.g., having names on videos, having video time stamped).

Require users to be registered in Zoom

A major challenge when collecting large scale data with Zoom recordings is the absence of a persistent individual identifier that is linked to the wide range of display names that people use. There are a few ways that this can contribute to data integrity issues. To illustrate some of these challenges, consider a few simple examples:

To properly study human behavior, we need to have a valid linkage between an individual’s behavior (e.g., face in video feed, spoken words, text chat messages) and their identity. When conducting research with Zoom, it is further critical to know which individual person logged into which virtual meeting. zoomGroupStats does provide functions for addressing this challenge after you have collected data. However, to save yourself considerable time, take steps before you collect data to actively minimize user identity confusion:

  1. If possible, require users to access meetings through an account registered with Zoom.
  2. If possible, require users to access Zoom using a known registered account (e.g., one with your institution).
  3. If neither of these is possible, add guidance to your standardized protocol for meeting participants to manually change their display names to some standardized format.

Capture timestamps to sync up data streams

One significant strength of using virtual meetings for research is that you gain the ability to unobtrusively capture streams of human behavior over time. Collecting datastreams throughout time, though, brings distinct challenges. One of the most challenges to overcome is compiling precise information on when things happen.

Within Zoom, there are two important baseline events for which you must capture precise timing information:

The reason that it is so critical to capture this information is that some Zoom outputs (e.g., chat) use the start of the session as the zero point, whereas others (e.g., transcript) use the start of the recording as the zero point. In order to properly sync up data streams, it is important to convert Zoom’s datastreams to true clock time.

Keep careful records about these events by using a spreadsheet like this template. It is, of course, inevitable that you will fail to capture some of this information. In the event that you do not capture the timestamp for the start of the session, this can be accessed through the participants information in Zoom’s Cloud recording system. If you did not capture the timestamp for the start of the recording, you might be able to extract this from the inset timestamp in video files associated with the session.

Next Steps

In Part 2 of the guide, you will learn how to organize the files that you download from Zoom and use zoomGroupStats to turn your downloads into datasets.