PEIMAN2-vignette

1 Introduction

The annotation enrichment analysis increases the chance of identifying relevant biological pathways in a list of genes or proteins. The post translational enrichment, integration, and matching analysis (PEIMAN v1) software was introduced to provide a systematic framework to identify more probable and enriched post-translational modification (PTM) terms in a list of proteins obtained from high-throughput technologies (Nickchi, Jafari, and Kalantari 2015). PEIMAN maps a large list of proteins to PTM pathways and test for their statistical significance, using a hypergeometric test. PEIMAN uses the most traditional way of enrichment analysis, by getting a list of proteins selected by user, and search for enriched PTM terms one by one. This strategy is called Singular Enrichment Analysis or SEA. Although this is a very promising approach for identifying biological pathways, the quality of selected list by researcher can potentially affect results at the end of the analysis.

To avoid this problem, we extend our enrichment framework to a wider class of enrichment analysis called Gene Set Enrichment Analysis or GSEA (Subramanian et al. 2005). The underlying idea of GSEA is very similar to SEA. Instead of applying a cutoff on input genes obtained from micro array experiments (either p-value or fold-change in gene expression), a ‘no-cutoff’ strategy is considered. The immediate benefits of this approach is to reduce the bias of gene selection and include genes with a low change in their expression level to participate in final analysis. The maximum value of the running score profile for ranked genes in each enrichment category is then calculated and compared with random scores obtained from permutation. More details on (Subramanian et al. 2005). This framework can be expanded to enrichment analysis in proteins. Inspired by GSEA idea, we here introduce a package in R for Protein Set Enrichment Analysis (PSEA).

The database in PEIMAN package updates monthly according to changes in UniProt. The package can be used to perform singular enrichment analysis (SEA) and visualize the results. PEIMAN can also be used to match and integrate results of two SEA analysis (for the same species) by visualizing their common pathways. To correct for biases in SEA, we implement protein set enrichment analysis (PSEA) as a new tool for computational community. Researchers can use this package to run PSEA and visualize the results.

Figure1: Our suggested workflow for a PTM-centric proteomics using PEIMAN software v2.0

2 Example data

We consider two example datasets to demonstrate the features of our package.

exmplData1: We use the first example data for single enrichment analysis. This dataset contains two list of human proteins randomly selected from UniProt. The first list contains 45 proteins and the second list contains of 97 randomly selected proteins. Both lists belongs to Homo Sapiens (Human). Note: Only the first six proteins in each list are shown below.

P31946

P62258

Q04917

P61981

P31947

P27348

P17174

Q9NY61

P00505

Q96GS6

Q5VST6

Q6PCB6

exmplData2: We will use the second dataset to perform protein set enrichment analysis or PSEA. The dataset is described in (Gholizadeh et al. 2021).

beatAML dataset samples
UniProtAC	Score
P47819	579.6287
P20428	129.7175
P62982	2139.2700
P0CG51	2139.2700
P62986	2139.2700
Q63429	2139.2700

3 Singular Enrichment analysis (SEA)

In this section, we introduce the functions related to singular enrichment analysis or SEA in PEIMAN2 package. The functions in this section are divided into two parts, functions for enrichment and functions for plotting. We use exmplData1 in this part.

3.1 Enrichment

runEnrichment() function can be used to run singular enrichment analysis for one list of protein. This function takes the following inputs:

protein which is a character vector with protein UniProt accession codes.
os.name which is a character vector of length one with exact taxonomy name of species.
p.adj.method which is pvalue adjustment methos and optional. By default the value is set to ‘BH’. To see a possible list of values, type p.adjust.methods in R console.

As it was mentioned, the taxonomy name of species must be provided, e.g for a list of proteins belongs to human we pass os.name as ‘Homo sapiens (Human)’. The list is available at UniProt website. We also included a helper function named getTaxonomyName to help getting the exact taxonomy name. More on this function later.

The following lines of code illustrate the steps to run SEA on exmplData1. In runEnrichment function, we pass pl1 (a character vector of UniProt accession code) to perform SEA as follows and save the results in enrich1.

# Load PEIMAN2 package
library(PEIMAN2)
#> Loading required package: tidyverse
#> ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
#> ✔ ggplot2 3.4.0     ✔ purrr   0.3.4
#> ✔ tibble  3.1.7     ✔ dplyr   1.0.9
#> ✔ tidyr   1.2.0     ✔ stringr 1.4.0
#> ✔ readr   2.1.2     ✔ forcats 0.5.1
#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag()    masks stats::lag()

# Extract dataset and assign a variable name to it
pl1 <- exmplData1$pl1

# Run SEA on the list
enrich1 <- runEnrichment(protein = pl1, os.name = 'Homo sapiens (Human)')

The function returns a dataframe with the following columns:

PTM: Post-translational modification (PTM).
Freq in Uniprot: The total number of proteins with this PTM in UniProt.
Freq in List: The total number of proteins with this PTM in the list.
Sample: Number of proteins in the given list.
Population: Total number of proteins in the current version of PEIMAN databse.
pvalue: The p-value obtained from hypergeometric test (enrichment analysis).
corrected pvalue: Adjusted p-value to correct for multiple testing.
AC: Uniprot accession code (AC) of proteins with each PTM.

PTM	FreqinUniprot	FreqinList	Sample	Population	pvalue	corrected pvalue	AC
N6-(pyridoxal phosphate)lysine	53	5	97	14256	2e-06	6e-05	Q96QU6; Q4AC99; Q8N5Z0; Q8NHS2; P17174
Pyridoxal phosphate	60	5	97	14256	3e-06	6e-05	Q96QU6; Q4AC99; Q8N5Z0; Q8NHS2; P17174
Isoglutamyl cysteine thioester (Cys-Gln)	7	2	97	14256	1e-05	1e-04	P01023; A8K2U0
Thioester bond	11	2	97	14256	5e-05	5e-04	P01023; A8K2U0
S-cysteinyl cysteine	3	1	97	14256	1e-04	1e-03	P01009
Sulfation	57	3	97	14256	6e-04	4e-03	P05408; P08697; P05067

Note: As it was mentioned, the os.name is the exact taxonomy name of species that you are working with. The name should be exactly the same as UniProt definition. To facilitate searching for this name, you can pass your protein list with UniProt accession ID to getTaxonomyName function as follows. The result is the exact taxonomy name of protein list that you need to pass to runEnrichment. In the following example, the exact taxonomy name is printed:

getTaxonomyName(x = exmplData1$pl1)
#> [1] "Please use os.name = `Homo sapiens (Human)`"

Similarly, we can run SEA for the second list of proteins:

# Extract dataset and assign a variable name to it
pl2 <- exmplData1$pl2

# Run SEA on the list
enrich2 <- runEnrichment(protein = pl2, os.name = 'Homo sapiens (Human)')

PTM	FreqinUniprot	FreqinList	Sample	Population	pvalue	corrected pvalue	AC
Nucleotide-binding	1800	33	45	14256	0e+00	0.000	O95477; Q9BZC7; Q99758; P78363; Q8WWZ7; Q8N139; Q8IZY2; O94911; Q8IUA7; Q8WWZ4; Q86UK0; Q86UQ4; Q2M3G0; Q9NP58; O75027; Q9NP78; Q9NRK6; O95342; Q09428; O60706; P33897; Q9UBJ2; P28288; O14678; P61221; Q8NE71; Q9UG63; Q9NUQ8; P45844; Q9UNQ0; Q9H172; Q9H222; Q96J66
Glutathionylation	11	1	45	14256	5e-04	0.003	Q9NRK6
Glycoprotein	4691	25	45	14256	5e-04	0.003	O95477; Q9BZC7; Q99758; P78363; Q8WWZ7; Q8N139; Q8IZY2; O94911; Q8IUA7; Q86UK0; Q2M3G0; Q9NP58; O95342; Q09428; O60706; P33897; Q9UBJ2; P28288; Q9UNQ0; Q9H172; Q9H222; Q9H221; Q8N2K0; Q0P651; Q96J66
N6-(pyridoxal phosphate)lysine	53	2	45	14256	6e-04	0.003	P17174; P00505
S-glutathionyl cysteine	8	1	45	14256	3e-04	0.003	Q9NRK6
Pyridoxal phosphate	60	2	45	14256	9e-04	0.004	P17174; P00505

3.2 Plotting SEA results

The plotEnrichment function can be used to visualize singular enrichment analysis for one set of proteins or match, analyse, and integrate results for two sets of proteins. To read more about this match and integration, please read details at (Nickchi, Jafari, and Kalantari 2015). We start by plotting the results for the firs list.

plotEnrichment(x = enrich1, sig.level = 0.05)

The results is a Lollipop plot which presents “Relative frequency” of each “PTM keywords” along with their corrected p-value measured in log scale. Note that only significant PTMs are shown. The default value for significance level is 5 percent. One can also visualize and match the results of two enrichment. For example, we can see the integrated results of enrich1 and enrich2 by the following line of code:

plotEnrichment(x = enrich1, y = enrich2, sig.level = 0.05)

The plot presents the ‘Relative frequency’ of common PTM terms among two enriched list (x and y). The coloring is the corrected p-value measured in log scale. By default a significance level of 5 percent is set to filter results. This can be modified by sig.level parameter.

4 Protein set enrichment analysis (PSEA)

In this section, we introduce the functions for protein set enrichment analysis (PSEA). The functions in this section are divided into two parts, functions for PSEA and functions for plotting the results. We use exmplData2 in this part.

4.1 PSEA

In order to run protein set enrichment analysis (PSEA), you can use runPSEA function. This function takes the following inputs:

protein: A character vector with protein UniProt accession.
os.name: A character of length one for the exact name of organism name.
pexponent: Enrichment weighting exponent, p. The default value is 1. For values of p < 1, one can detect incoherent patterns in a set of protein. If one expects a small number of proteins to be coherent in a large set, then p > 1 is a good choice.
nperm: Number of permutation to adjust for multiple testing in different pathways. Default is 1000.
p.adj.method: The adjustment method to correct for multiple testing. Run p.adjust.methods to get a list of possible methods.
sig.level: The significance level to filter pathways (applies to adjusted p-value), 0.05 is the default value.
minSize: PTM pathways with a lower number of proteins than minSize are excluded. The default value is one.

psea_res <- runPSEA(protein = exmplData2, os.name = 'Rattus norvegicus (Rat)', nperm = 1000)

The result is a list with 6 elements. The first element of this list is important: A dataframe with protein set enrichment analysis (PSEA) results. Every row corresponds to a post-translational modification (PTM) pathway with the following columns:

pval: p-value for singular enrichment analysis.
pvaladj: Adjusted p-value
ES: Enrichment score
NES: Enrichmnt score normalized to mean enrichment of random samples of the same size.
nMoreExtreme: Number of times the permuted sample resulted in profile with ES larger than abs(ES original)
size: Number of proteins in the pathway
Enrichment: Whether the proteins in the pathway have been enriched in the list.
leadingEdge: UniProt accession code of leading edge proteins that drive the enrichment.

knitr::kable(psea_res[[1]], format = 'html')

PTM	pval	pvaladj	FreqinUniProt	FreqinList	ES	NES	nMoreExtreme	size	Enrichment	AC	leadingEdge
ADP-ribosylglycine	0e+00	0e+00	4	4	0.7707317	1.5787625	297	4	Significant	P62986; P62982; P0CG51; Q63429	P62982; P0CG51; P62986; Q63429
Acetylation	0e+00	0e+00	1762	123	0.7521522	1.1819493	4	123	Significant	P0C1X8; P11030; P60711; P63259; Q63028; Q62847; Q62848; Q9WUC4; P31399; P29419; P21571; P15999; D3ZAF6; Q9JJW3; O08839; P0DP29; P0DP30; P0DP31; P18418; P26772; P63039; B0K020; P08081; P08082; P45592; Q91ZN1; P11240; Q63768; P10715; P62898; Q9JHL4; Q7M0E3; P62628; Q07266; P84060; P62870; P15429; P07323; P60841; P56571; B0BN94; P55053; P55051; P07483; Q62658; Q32PX7; Q99PF5; Q5XI73; Q63228; P62994; P01946; P02091; P11517; P62959; P82995; P34058; P27321; Q5XI72; P50411; Q6AXU6; Q5BK20; P11980; Q99MZ8; Q792I0; Q66HF9; P15205; Q5M7W5; P02688; B0BN72; P30904; O35763; P62775; Q71UE8; Q9JJ19; P13084; Q01205; P08461; Q920Q0; O88767; P04785; P31044; O55012; P10111; Q6J4I0; Q9R063; Q9EPC6; P02625; Q63475; P51583; Q68A21; P02401; P62982; P62859; Q6RJR6; Q9JK11; Q63945; B0BN85; P07632; Q66HL2; P28042; O35814; P13668; P37377; Q62880; P19332; P68370; Q6P9V9; Q6AYZ1; Q68FR8; Q5XIF6; Q6PEC1; P11232; P62076; P62078; Q9WV97; P48500; P04692; P58775; Q63610; P09495; Q7M767; Q9Z1A5; P63045	P62628; P31044; P37377; P45592; P11030; P02625; P29419; P62775; P21571; O88767; P31399; P02688; P08082; P62898; P63045; P62076; P11232; O35814; Q9WUC4; Q62658; Q63228; P07632; Q5XI73; B0K020; P08081; P62959
Cysteine sulfinic acid (-SO2H)	0e+00	0e+00	1	1	0.9423077	65.9672131	49	1	Significant	O88767	O88767
N-acetylaspartate	0e+00	0e+00	1	1	-0.9615385	-183.8975297	38	1	Not significant	P60711	P31044
N-acetylglutamate	0e+00	0e+00	1	1	-0.9663462	-34.1391187	38	1	Not significant	P63259	P31044
N6-acetyllysine	0e+00	0e+00	992	73	0.7226249	1.1498783	40	73	Significant	P11030; Q62848; Q9WUC4; P31399; P29419; P21571; P15999; D3ZAF6; Q9JJW3; P0DP29; P0DP30; P0DP31; P18418; P26772; P63039; B0K020; P08081; P08082; P45592; P11240; P62898; Q9JHL4; Q7M0E3; P07323; P56571; Q62658; Q99PF5; Q5XI73; P62994; P01946; P62959; P82995; P34058; P27321; Q6AXU6; Q5BK20; P11980; Q99MZ8; P02688; B0BN72; P30904; O35763; P62775; Q71UE8; P13084; Q01205; P08461; O88767; P04785; P10111; Q9R063; Q63475; P51583; Q68A21; P02401; P62982; Q9JK11; Q63945; P07632; Q66HL2; P28042; O35814; P13668; P19332; P68370; Q6P9V9; Q6AYZ1; Q68FR8; Q5XIF6; P11232; P48500; P09495; Q9Z1A5	P45592; P11030; P29419; P62775; P21571; O88767; P31399; P02688; P08082; P62898; P11232; O35814; Q9WUC4; Q62658; P07632; Q5XI73; B0K020; P08081; P62959
Phosphoprotein	0e+00	0e+00	4088	171	0.5995932	0.9309700	754	171	Significant	P0C1X8; P11030; Q63028; Q62847; O08838; Q99068; Q05140; Q62848; Q9WUC4; P29419; P21571; P15999; D3ZAF6; Q05175; O08839; O88778; P0DP29; P0DP30; P0DP31; O35783; O35397; P26772; P63039; P08081; P08082; P10354; P45592; Q91ZN1; P11240; P84087; Q5U2U2; Q63768; Q6AY72; P11951; P10715; P62898; Q9JHL4; Q9QXU8; Q7M0E3; Q62950; P47942; Q07266; P84060; Q9WTP0; P62870; P15429; P07323; P60841; Q9Z1Z3; Q5RJL0; B0BN94; P55053; P07483; Q9JIX3; Q62658; Q32PX7; Q99PF5; Q920R4; Q5XI73; P47819; Q63228; P62994; P01946; P02091; P11517; P62959; Q9Z2X5; P82995; P34058; P27321; Q5XI72; Q68FR3; P50411; Q6AXU6; Q5BK20; P07335; P11980; Q99MZ8; Q66HF9; P34926; P15205; Q5M7W5; Q63560; P30009; P02688; B0BN72; Q5FVH7; Q4KM98; Q6XVN8; Q62625; O35763; Q9EPH2; P15146; P62775; P20428; Q05982; P69682; P97603; P07936; Q9JJ19; P13084; Q63083; Q9JI85; Q01205; P08461; Q4V8B0; Q5XIL2; Q9Z0W5; Q920Q0; O88767; P04785; Q5U318; P31044; O55012; Q99MC0; P10111; Q6J4I0; Q9R063; P02625; Q812D1; Q63475; P51583; P86252; Q68A21; P62986; P02401; P62982; P62859; Q64548; Q6RJR6; Q9JK11; O35314; P10362; Q63945; B0BN85; P60881; Q9Z2P6; P07632; Q66HL2; P28042; O35814; P13668; P21818; P09951; Q63537; O70441; Q58DZ9; P37377; Q63754; P21643; Q62880; P19332; P68370; Q6P9V9; Q6AYZ1; Q68FR8; Q5XIF6; Q66HC1; P62076; Q9WVA1; P48500; P04692; P58775; Q63610; P09495; P02767; P0CG51; Q63429; P63045; P20156; Q5BJU7	P31044; P37377; P45592; P11030; P02625; P29419; Q05175; P62775; P21571; O88767; P15146; Q63754; P02688; P08082; P62898; P63045; P62076; O35814; Q9WUC4; Q62658; P86252; Q63228; P07632; Q9WVA1; Q5XI73; P08081; P62959; P09951; P60881; P84087; P10362
Phosphothreonine	0e+00	0e+00	1511	92	0.5499037	0.8731984	904	92	Significant	P0C1X8; Q63028; O08838; Q05140; Q62848; P15999; Q05175; O08839; O88778; P0DP29; P0DP30; P0DP31; O35783; P26772; P08082; P45592; Q91ZN1; P11240; Q9JHL4; Q9QXU8; Q62950; P47942; Q07266; P84060; Q9WTP0; P62870; P15429; P07323; P60841; Q9Z1Z3; Q5RJL0; B0BN94; P07483; Q32PX7; Q99PF5; Q920R4; P47819; P62994; P01946; P02091; P11517; P82995; P34058; P27321; P50411; Q6AXU6; P07335; P11980; Q99MZ8; P34926; P15205; Q5M7W5; P30009; P02688; B0BN72; Q4KM98; O35763; Q9EPH2; P15146; P62775; P20428; P69682; P97603; P07936; Q9JJ19; P13084; Q63083; Q4V8B0; Q9Z0W5; Q920Q0; P31044; Q99MC0; P10111; Q6J4I0; Q812D1; Q63475; P51583; Q68A21; Q6RJR6; Q9JK11; B0BN85; P60881; Q66HL2; O35814; P09951; Q63537; Q62880; P19332; P48500; P58775; Q63610; P09495	P31044; P45592; Q05175; P62775; P15146; P02688; P08082; O35814
N-acetylalanine	0e+00	0e+00	435	42	0.7139681	1.1185408	178	42	Significant	P31399; D3ZAF6; O08839; P0DP29; P0DP30; P0DP31; P26772; P45592; Q63768; Q7M0E3; P62628; Q07266; P15429; B0BN94; P55053; P07483; Q32PX7; Q5XI73; P62959; Q5XI72; P50411; Q792I0; P15205; Q5M7W5; P02688; O88767; P31044; Q9EPC6; P51583; Q68A21; Q6RJR6; B0BN85; P07632; P13668; P19332; Q6PEC1; P62078; Q9WV97; Q63610; P09495; Q7M767; Q9Z1A5	P62628; P31044; P45592; O88767; P31399; P02688; P07632; Q5XI73; P62959; Q9WV97; Q6PEC1; P07483
Phosphoserine	0e+00	0e+00	3634	155	0.5378929	0.8429392	952	155	Significant	P0C1X8; Q63028; Q62847; O08838; Q99068; Q05140; Q62848; Q9WUC4; P29419; P21571; P15999; D3ZAF6; Q05175; O08839; O88778; P0DP29; P0DP30; P0DP31; O35783; O35397; P63039; P08081; P08082; P10354; P45592; Q91ZN1; P84087; Q63768; P11951; Q9JHL4; Q9QXU8; Q7M0E3; Q62950; P47942; Q07266; P84060; Q9WTP0; P62870; P15429; P07323; P60841; Q9Z1Z3; Q5RJL0; P55053; P07483; Q62658; Q32PX7; Q99PF5; Q920R4; Q5XI73; P47819; P01946; P02091; P11517; P62959; Q9Z2X5; P82995; P34058; P27321; Q5XI72; Q68FR3; P50411; Q6AXU6; Q5BK20; P07335; P11980; Q99MZ8; Q66HF9; P34926; P15205; Q5M7W5; Q63560; P30009; P02688; B0BN72; Q5FVH7; Q4KM98; Q6XVN8; O35763; Q9EPH2; P15146; P20428; Q05982; P97603; P07936; Q9JJ19; P13084; Q63083; Q9JI85; Q01205; P08461; Q4V8B0; Q5XIL2; Q9Z0W5; Q920Q0; P04785; Q5U318; P31044; O55012; Q99MC0; P10111; Q6J4I0; Q9R063; P02625; Q812D1; Q63475; P51583; P86252; Q68A21; P62986; P02401; P62982; P62859; Q64548; Q6RJR6; Q9JK11; O35314; P10362; Q63945; B0BN85; P60881; Q9Z2P6; P07632; Q66HL2; P28042; O35814; P13668; P21818; P09951; Q63537; O70441; Q58DZ9; P37377; Q63754; P21643; Q62880; P19332; P68370; Q6P9V9; Q6AYZ1; Q68FR8; Q5XIF6; Q66HC1; P62076; Q9WVA1; P48500; P04692; P58775; Q63610; P09495; P02767; P0CG51; Q63429; P20156; Q5BJU7	P31044; P37377; P45592; P02625; P29419; Q05175; P21571; P15146; Q63754; P02688; P08082; P62076; O35814; Q9WUC4; Q62658; P86252; P07632; Q9WVA1; Q5XI73; P08081; P62959; P09951; P60881; P84087; P10362; Q9Z0W5; Q63537; P07483; P15999; Q9JHL4; D3ZAF6; P62982; P0CG51; P62986; Q63429; O08838
Phosphotyrosine	0e+00	0e+00	655	48	0.7093412	1.1190650	135	48	Significant	P0C1X8; P11030; P0DP29; P0DP30; P0DP31; O35783; P63039; P45592; Q5U2U2; Q63768; Q6AY72; P62898; Q9JHL4; Q62950; P47942; Q9WTP0; P15429; P07323; P55053; P07483; Q9JIX3; P01946; P82995; P34058; P07335; P11980; P34926; P15205; Q63560; P02688; B0BN72; O35763; P15146; P13084; Q9Z0W5; O88767; P51583; Q63945; Q66HL2; O35814; P09951; P37377; P19332; Q6AYZ1; Q68FR8; Q5XIF6; P04692; P58775	P37377; P45592; P11030; O88767; P15146; P02688; P62898; O35814
N6-succinyllysine	0e+00	0e+00	327	31	0.7518702	1.1814730	71	31	Significant	P11030; P31399; P21571; P15999; P26772; P63039; P62898; P47942; P07323; P56571; Q62658; Q5XI73; P01946; P02091; P11517; P34058; P11980; Q99MZ8; P30904; O35763; P13084; P08461; O88767; P04785; Q9R063; P02401; P07632; P28042; P11232; P62076; P48500	P11030; P21571; O88767; P31399; P62898; P62076; P11232; Q62658; P07632; Q5XI73
Methylation	0e+00	0e+00	494	39	0.3911697	0.6132559	999	39	Significant	P0C1X8; P60711; P63259; Q05140; P15999; O88778; P0DP29; P0DP30; P0DP31; P47942; Q9Z1Z3; Q32PX7; Q99PF5; P47819; P02091; P11517; P34058; Q5XI72; P11980; Q99MZ8; P15205; P02688; Q920Q0; Q63475; Q68A21; P63033; P62986; Q63945; Q66HL2; P13668; P09951; P19332; P68370; Q6P9V9; Q6AYZ1; Q68FR8; Q5XIF6; P48500; Q5BJU7	P02688; P09951; P15999; P62986; Q6P9V9; Q6AYZ1; P68370; P48500; Q5XI72
3’-nitrotyrosine	2e-07	1e-06	31	8	0.5834036	0.9581443	667	8	Significant	Q62950; P07335; P68370; Q6P9V9; Q6AYZ1; Q68FR8; Q5XIF6; P48500	Q6P9V9; Q6AYZ1; P68370; P48500
Nitration	3e-07	1e-06	32	8	0.5834036	0.9621031	657	8	Significant	Q62950; P07335; P68370; Q6P9V9; Q6AYZ1; Q68FR8; Q5XIF6; P48500	Q6P9V9; Q6AYZ1; P68370; P48500
N6-methyllysine	6e-06	3e-05	57	9	-0.3200000	-0.5244605	30	9	Not significant	P60711; P63259; P0DP29; P0DP30; P0DP31; Q99MZ8; P13668; P19332; P48500	P31044; Q9WUC4; P0DN35; P62859; Q5PPG6; P10715; Q71UE8; O88778; Q6AXU6
Lipoyl	3e-05	1e-04	3	2	-0.7584541	-2.7737591	64	2	Not significant	Q01205; P08461	P31044; Q63754
N6-lipoyllysine	3e-05	1e-04	3	2	-0.7584541	-2.6576121	65	2	Not significant	Q01205; P08461	P31044; Q63754
N-acetylvaline	4e-05	1e-04	14	4	0.3804878	0.8251630	804	4	Significant	P55051; P02091; P11517; P10111	P10111; P55051; P11517; P02091
N6,N6,N6-trimethyllysine	5e-05	2e-04	33	6	0.5493333	0.9787996	683	6	Significant	P0DP29; P0DP30; P0DP31; P11980; P62986; Q6P9V9	P62986; Q6P9V9
N6-malonyllysine	8e-05	3e-04	16	4	0.8782475	1.7548040	110	4	Significant	P11030; P26772; P63039; P34058	P11030
Isopeptide bond	1e-04	4e-04	708	38	0.6661330	1.0420931	398	38	Significant	Q62847; Q05175; P0DP29; P0DP30; P0DP31; P63039; B0K020; P45592; P07323; Q99PF5; Q5XI73; P27321; Q68FR3; P11980; Q66HF9; Q5M7W5; Q05982; Q71UE8; P13084; O88767; O55012; P10111; Q812D1; P62986; P62982; Q63945; B0BN85; Q66HL2; O35814; P19332; P68370; Q6P9V9; Q66HC1; P48500; P0CG51; Q63429; Q5BJP3; P63025	P45592; Q05175; Q5BJP3; O88767; O35814; Q5XI73; B0K020
Omega-N-methylarginine	2e-04	7e-04	256	18	0.4700971	0.7378234	914	18	Significant	P0C1X8; Q05140; P15999; O88778; Q9Z1Z3; Q32PX7; Q99PF5; P47819; Q5XI72; P15205; P02688; Q63475; Q68A21; Q66HL2; P09951; P19332; Q6P9V9; Q5BJU7	P02688; P09951; P15999; Q6P9V9; Q5XI72
Phosphatidylethanolamine amidated glycine	3e-04	7e-04	5	2	0.6231884	2.1257031	464	2	Significant	Q6XVN8; Q62625	Q62625; Q6XVN8
Phosphatidylserine amidated glycine	3e-04	7e-04	5	2	0.6231884	2.1183246	479	2	Significant	Q6XVN8; Q62625	Q62625; Q6XVN8
S-nitrosocysteine	4e-04	1e-03	45	6	0.7352011	1.3056068	363	6	Significant	P47942; P82995; P34058; P15205; O35763; P11232	P11232
Methionine (R)-sulfoxide	5e-04	1e-03	6	2	-0.9661836	-3.2497440	2	2	Not significant	P60711; P63259	P31044; P37377
N-acetylmethionine	5e-04	1e-03	383	23	0.6967950	1.0902895	328	23	Significant	P0C1X8; P60711; P63259; Q63028; P84060; P62870; P62994; Q6AXU6; Q5BK20; Q99MZ8; P13084; Q920Q0; P10111; Q6J4I0; P02401; P62859; Q9JK11; O35814; P37377; Q62880; P62076; P04692; P58775	P37377; P62076; O35814
Oxidation	5e-04	1e-03	23	4	0.9077364	1.8695481	72	4	Significant	P60711; P63259; P10354; O88767	O88767
S-nitrosylation	6e-04	1e-03	49	6	0.7352011	1.2510038	365	6	Significant	P47942; P82995; P34058; P15205; O35763; P11232	P11232
5-glutamyl polyglutamate	9e-04	2e-03	7	2	0.7053140	2.4210168	373	2	Significant	P68370; Q6P9V9	Q6P9V9; P68370
Tele-methylhistidine	1e-03	3e-03	8	2	-0.9661836	-3.4628886	2	2	Not significant	P60711; P63259	P31044; P37377
ADP-ribosylation	2e-03	4e-03	43	5	0.7465420	1.3694143	325	5	Significant	P13084; P62986; P62982; P0CG51; Q63429	P62982; P0CG51; P62986; Q63429
Deamidated glutamine	3e-03	6e-03	3	1	0.9230769	-433.1865169	68	1	Not significant	P02688	P02688
Arginine amide	5e-03	1e-02	4	1	0.5144231	-22.8345354	466	1	Not significant	O35314	O35314
Glycine amide	5e-03	1e-02	4	1	-0.7644231	-30.2412945	242	1	Not significant	P10354	P31044
N6-(2-hydroxyisobutyryl)lysine	7e-03	1e-02	26	3	0.9429546	2.1759290	31	3	Significant	P11030; P18418; P07323	P11030
N,N,N-trimethylalanine	9e-03	2e-02	5	1	-0.7115385	-72.3475728	277	1	Not significant	Q63945	P31044
Methionine sulfoxide	1e-02	2e-02	6	1	-0.7644231	-72.2008141	245	1	Not significant	P10354	P31044
N6-methylated lysine	1e-02	2e-02	6	1	-0.8365385	-62.6976242	151	1	Not significant	P34058	P31044
Asymmetric dimethylarginine	1e-02	2e-02	105	7	0.5311510	0.8965908	726	7	Significant	Q05140; O88778; P47942; P02091; P11517; P09951; Q5BJU7	P09951
N-acetylglycine	1e-02	2e-02	17	2	0.8299358	2.7063825	224	2	Significant	P10715; P62898	P62898
Pyruvate	2e-02	4e-02	8	1	-0.7692308	-77.8671837	249	1	Not significant	P11980	P31044
N-acetylserine	2e-02	4e-02	210	11	0.8333308	1.3299214	73	11	Significant	P11030; Q62847; Q91ZN1; P07323; P60841; Q99PF5; Q63228; Q9JJ19; O55012; P02625; P63045	P11030; P02625; P63045; Q63228
4-carboxyglutamate	3e-02	4e-02	9	1	-0.7596154	76.0936455	242	1	Significant	P02767	P31044
Citrulline	3e-02	4e-02	39	3	0.8494968	2.0468695	167	3	Significant	P47819; P02688; Q812D1	P02688

4.2 Plotting

We now introduce the plotting features for protein set enrichment analysis. Two functions are included to visualize PSEA results returned from runPSEA function. The first plot is generated by plotPSEA function and shows Normalized Enrichment Score (NES) for each PTM pathway. User can restrict the number of pathways to draw based by adjusting sig.level parameter (default value is 0.05). The coloring of the plot indicates if the pathway is enriched or not.

plotPSEA(x = psea_res)

The second plot is generated by plotRunningScore function. A running enrichment score plot for each PTM can be plotted.

5. Translate PEIMAN results for Mass spectrometry searching tools

In addition to the introduced features and extensions from previous version, the results from PEIMAN can also be utilized in Mass spectrometry searching tools. The enriched PTM terms in list of proteins generated by runPSEA function in the previous step can be searched in subset of protein modifications database. psea2mass function takes PSEA results and a significant level (default value is 0.05) and returns protein modification of statistically significant pathways for later searches in mass spectrometry tools. For example, continuing from exmplData2 for PSEA, we call psea2mass function as follows:

MS <- psea2mass(x = psea_res, sig.level = 0.05)
MS
#>      MOD_ID                     name
#> 1 MOD:00064       N6-acetyl-L-lysine
#> 2 MOD:00085       N6-methyl-L-lysine
#> 3 MOD:00322    1'-methyl-L-histidine
#> 4 MOD:00051 N-acetyl-L-aspartic acid
#> 5 MOD:00053 N-acetyl-L-glutamic acid
#>                                                                                                                                                                                                                                                           def
#> 1 "converts an L-lysine residue to N6-acetyl-L-lysine." [ChEBI:17752, DeltaMass:214, OMSSA:24, PubMed:11369851, PubMed:11857757, PubMed:11999733, PubMed:12175151, PubMed:14730666, PubMed:15350136, PubMed:1680872, PubMed:670159, RESID:AA0055, Unimod:1#K]
#> 2                                                                                                              "converts an L-lysine residue to N6-methyl-L-lysine." [ChEBI:17604, DeltaMass:165, PubMed:11875433, PubMed:3926756, RESID:AA0076, Unimod:34#K]
#> 3                                                                                "converts an L-histidine residue to tele-methyl-L-histidine." [PubMed:10601317, PubMed:11474090, PubMed:11875433, PubMed:6692818, PubMed:8076, PubMed:8645219, RESID:AA0317]
#> 4                                                                                                                              "converts an L-aspartic acid residue to N-acetyl-L-aspartic acid." [ChEBI:21547, PubMed:1560020, PubMed:2395459, RESID:AA0042]
#> 5                                                                                                                                              "converts an L-glutamic acid residue to N-acetyl-L-glutamic acid." [ChEBI:17533, PubMed:6725286, RESID:AA0044]
#>   FreqinList
#> 1         73
#> 2          9
#> 3          2
#> 4          1
#> 5          1

Note that list of proteins generated by runEnrichment function can be passed to sea2mass function too.

References

Gholizadeh, Elham, Reza Karbalaei, Ali Khaleghian, Mona Salimi, Kambiz Gilany, Rabah Soliymani, Ziaurrehman Tanoli, et al. 2021. “Identification of Celecoxib-Targeted Proteins Using Label-Free Thermal Proteome Profiling on Rat Hippocampus.” Molecular Pharmacology 99 (5): 308–18. https://doi.org/https://doi.org/10.1124/molpharm.120.000210.

Nickchi, Payman, Mohieddin Jafari, and Shiva Kalantari. 2015. “PEIMAN 1.0: Post-translational modification Enrichment, Integration and Matching ANalysis.” Database 2015 (April). https://doi.org/10.1093/database/bav037.

Subramanian, Aravind, Pablo Tamayo, Vamsi K. Mootha, Sayan Mukherjee, Benjamin L. Ebert, Michael A. Gillette, Amanda Paulovich, et al. 2005. “Gene Set Enrichment Analysis: A Knowledge-Based Approach for Interpreting Genome-Wide Expression Profiles.” Proceedings of the National Academy of Sciences 102 (43): 15545–50. https://doi.org/10.1073/pnas.0506580102.