FMAT

😷 The Fill-Mask Association Test (掩码填空联系测验).

The Fill-Mask Association Test (FMAT) is an integrative, versatile, and probability-based method that uses Masked Language Models (BERT) to measure conceptual associations (e.g., attitudes, biases, stereotypes) as propositional representations in natural language.

Python (conda) environment and the “transformers” module can be installed automatically using the FMAT_load() function, but users must also specify the Python version in RStudio afterwards:

RStudio → Tools → Global/Project Options
→ Python → Select → Conda Environments
→ Choose “…/textrpp_condaenv/python.exe”

A full list of BERT-family models are available at Hugging Face. Use the FMAT_load() function to download and load specific BERT models. All downloaded model files are saved at your local folder “C:/Users/[YourUserName]/.cache/”.

Several necessary pre-processing steps have been designed in the functions for easier and more direct use (see FMAT_run() for details).

Improvements are still needed. If you find bugs or have problems using the functions, please report them at GitHub Issues or send me an email.

CRAN-Version GitHub-Version R-CMD-check CRAN-Downloads GitHub-Stars

Author

Han-Wu-Shuang (Bruce) Bao 包寒吴霜

📬 baohws@foxmail.com

📋 psychbruce.github.io

Citation

Installation

## Method 1: Install from CRAN
install.packages("FMAT")

## Method 2: Install from GitHub
install.packages("devtools")
devtools::install_github("psychbruce/FMAT", force=TRUE)

Since this package uses the “reticulate” package for an R interface to the “transformers” Python module, you also need to install both Python (with Anaconda) and the “transformers” module (with command pip install transformers) in your computer.

BERT Models

The reliability and validity of the following 12 representative BERT models have been established in my research articles, but future work is needed to examine the performance of other models.

(model name on Hugging Face - downloaded file size)

  1. bert-base-uncased (420MB)
  2. bert-base-cased (416MB)
  3. bert-large-uncased (1.25GB)
  4. bert-large-cased (1.25GB)
  5. distilbert-base-uncased (256MB)
  6. distilbert-base-cased (251MB)
  7. albert-base-v1 (45.2MB)
  8. albert-base-v2 (45.2MB)
  9. roberta-base (478MB)
  10. distilroberta-base (316MB)
  11. vinai/bertweet-base (517MB)
  12. vinai/bertweet-large (1.32GB)

If you are new to BERT, please read:

While the FMAT is an innovative method for computational intelligent analysis of psychology and society, you may also seek for an integrative toolbox for other text-analytic methods. Another R package I developed—PsychWordVec—is one of the most useful and user-friendly package for word embedding analysis (e.g., the Word Embedding Association Test, WEAT). Please refer to its documentation and feel free to use it.