Next Article in Journal
Particle Swarm Optimization Based on a Novel Evaluation of Diversity
Previous Article in Journal
A Memetic Algorithm for an External Depot Production Routing Problem
Open AccessArticle

Capturing Protein Domain Structure and Function Using Self-Supervision on Domain Architectures

1
L3S Research Center, Leibniz University Hannover, 30167 Hannover, Germany
2
Knowledge-Based Systems Laboratory, Leibniz University Hannover, 30167 Hannover, Germany
*
Author to whom correspondence should be addressed.
Received: 28 December 2020 / Revised: 6 January 2021 / Accepted: 15 January 2021 / Published: 19 January 2021
Predicting biological properties of unseen proteins is shown to be improved by the use of protein sequence embeddings. However, these sequence embeddings have the caveat that biological metadata do not exist for each amino acid, in order to measure the quality of each unique learned embedding vector separately. Therefore, current sequence embedding cannot be intrinsically evaluated on the degree of their captured biological information in a quantitative manner. We address this drawback by our approach, dom2vec, by learning vector representation for protein domains and not for each amino acid base, as biological metadata do exist for each domain separately. To perform a reliable quantitative intrinsic evaluation in terms of biology knowledge, we selected the metadata related to the most distinctive biological characteristics of a domain, which are its structure, enzymatic, and molecular function. Notably, dom2vec obtains an adequate level of performance in the intrinsic assessment—therefore, we can draw an analogy between the local linguistic features in natural languages and the domain structure and function information in domain architectures. Moreover, we demonstrate the dom2vec applicability on protein prediction tasks, by comparing it with state-of-the-art sequence embeddings in three downstream tasks. We show that dom2vec outperforms sequence embeddings for toxin and enzymatic function prediction and is comparable with sequence embeddings in cellular location prediction. View Full-Text
Keywords: protein domain architectures; word embeddings; quantitative quality assessment; SCOPe secondary structure class; enzymatic commission class protein domain architectures; word embeddings; quantitative quality assessment; SCOPe secondary structure class; enzymatic commission class
Show Figures

Figure 1

MDPI and ACS Style

Melidis, D.P.; Nejdl, W. Capturing Protein Domain Structure and Function Using Self-Supervision on Domain Architectures. Algorithms 2021, 14, 28. https://0-doi-org.brum.beds.ac.uk/10.3390/a14010028

AMA Style

Melidis DP, Nejdl W. Capturing Protein Domain Structure and Function Using Self-Supervision on Domain Architectures. Algorithms. 2021; 14(1):28. https://0-doi-org.brum.beds.ac.uk/10.3390/a14010028

Chicago/Turabian Style

Melidis, Damianos P.; Nejdl, Wolfgang. 2021. "Capturing Protein Domain Structure and Function Using Self-Supervision on Domain Architectures" Algorithms 14, no. 1: 28. https://0-doi-org.brum.beds.ac.uk/10.3390/a14010028

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Search more from Scilit
 
Search
Back to TopTop