The ampir (short for antimicrobial peptide prediction in r ) package was designed to be a fast and user-friendly method to predict antimicrobial peptides (AMPs) from any given size protein dataset. ampir uses a supervised statistical machine learning approach to predict AMPs. It incorporates a support vector machine classification model that has been trained on publicly available antimicrobial peptide data.
Standard input to ampir is a data.frame
with sequence names in the first column and protein sequences in the second column.
Read in a FASTA formatted file as a data.frame
with read_faa()
seq_name | seq_aa |
---|---|
G1P6H5_MYOLU | MALTVRIQAACLLLLLLASLTSYSLLLSQTTQLADLQTQDTAGAT… |
Calculate the probability that each protein is an antimicrobial peptide with predict_amps()
Note that amino acid sequences that are shorter than five amino acids long and/or contain anything other than the standard 20 amino acids are not evaluated and will contain an NA
as their prob_AMP
value.
seq_name | seq_aa | prob_AMP |
---|---|---|
G1P6H5_MYOLU | MALTVRIQAACLLLLLLASLTSYSLLLSQTTQLADLQTQDTAGAT… | 0.934 |
Predicted proteins with a specified predicted probability value could then be extracted and written to a FASTA file:
seq_name | seq_aa |
---|---|
G1P6H5_MYOLU | MALTVRIQAACLLLLLLASLTSYSLLLSQTTQLADLQTQDTAGAT… |
Write the data.frame
with sequence names in the first column and protein sequences in the second column to a FASTA formatted file with df_to_faa()