FastET

A web server for identify electron transport proteins using word embedings

Submit your proteins Download dataset

Introduction

Living organisms receive necessary energy substances directly from cellular respiration. The completion of electron storage and transportation requires the process of cellular respiration with the aid of electron transport chains. Therefore, the work of deciphering electron transport proteins is inevitably needed. In order to identify proteins, classification performance has a prompt dependence on the choice of methods for feature extraction and machine learning algorithm. In this study, protein sequences are treated as natural language sentences comprising words. The nominated word embedding-based feature sets, hinged on the word embedding modulation and protein motif frequencies, were useful for feature choosing. Five word embedding types and a variety of conjoint multiple features was examined for such feature selection. The support vector machine algorithm consequentially was employed to identify electron transport proteins.



Figure 1: The process of cellular respiration in which ATP molecules, the energy source of the cells, was created



Figure 2: The flowchart of this study

Result

The statistics of models within the 5-fold cross validation including average accuracy, specificity, sensitivity as well as MCC rates are 98.46%, 99.36%, 95.26%, and 0.955, respectively. Such metrics in the independent test are 96.82%, 97.16%, 95.76%, and 0.9, respectively. Compared to state-of-the-art predictors, the prososed method can generate more preferable performance above all metrics. These figures indicated the proposed classification model effectiveness with the task of determining electron transport proteins. Furthermore, this study replenishes a basis for futuristic research which enables the enrichment of natural language processing tactics in bioinformatics research.

Dataset

The dataset used in this server were retrieved from UniProt. The detail of the dataset is listed in the table below.

Class name Number of proteins
Original After 30% similarity check and preprocessing Train dataset Test dataset
Electron transport 12,832 1,324 1,091 208
General transport 10,814 4,569 3,846 713

If you would like to build a model and evaluate our model, we provide the dataset as the below link.

Download dataset.zip

Submission

In order to avoid the errors, please submit the sequence in fasta format (we also give you the fasta file examples). The user can choose two options to submit, including paste the sequence into text area and upload sequence file. The user can submit one single fasta file or multiple fasta file. In the result page, we show the result for a sequence with a probability that it belong to tumor necrosis factors or not.

Sample fasta Sequence(s)

>sp|A2XVZ1|NDHM_ORYSI NAD(P)H-quinone oxidoreductase subunit M, chloroplastic OS=Oryza sativa subsp. indica OX=39946 GN=ndhM PE=3 SV=1
MATTASPFLSPAKLSLERRLPRATWTARRSVRFPPVRAQDQQQQVKEEEEEAAVENLPPP
PQEEEQRRERKTRRQGPAQPLPVQPLAESKNMSREYGGQWLSCTTRHIRIYAAYINPETN
AFDQTQMDKLTLLLDPTDEFVWTDETCQKVYDEFQDLVDHYEGAELSEYTLRLIGSDLEH
FIRKLLYDGEIKYNMMSRVLNFSMGKPRIKFNSSQIPDVK
>sp|P31039|SDHA_BOVIN Succinate dehydrogenase [ubiquinone] flavoprotein subunit, mitochondrial OS=Bos taurus OX=9913 GN=SDHA PE=1 SV=3
MSGVAAVSRLWRARRLALTCTKWSAAWQTGTRSFHFTVDGNKRSSAKVSDAISAQYPVVD
HEFDAVVVGAGGAGLRAAFGLSEAGFNTACVTKLFPTRSHTVAAQGGINAALGNMEEDNW
RWHFYDTVKGSDWLGDQDAIHYMTEQAPASVVELENYGMPFSRTEDGKIYQRAFGGQSLK
FGKGGQAHRCCCVADRTGHSLLHTLYGRSLRYDTSYFVEYFALDLLMESGECRGVIALCI
EDGSIHRIRARNTVIATGGYGRTYFSCTSAHTSTGDGTAMVTRAGLPCQDLEFVQFHPTG
IYGAGCLITEGCRGEGGILINSQGERFMERYAPVAKDLASRDVVSRSMTLEIREGRGCGP
EKDHVYLQLHHLPPAQLAMRLPGISETAMIFAGVDVTKEPIPVLPTVHYNMGGIPTNYKG
QVLRHVNGQDQGVPGLYACGEAACASVHGANRLGANSLLDLVVFGRACALSIAESCRPGD
KVPSIKPNAGEESVMNLDKLRFANGSIRTSELRLNMQKSMQSHAAVFRVGSVLQEGCEKI
SSLYGDLRHLKTFDRGMVWNTDLVETLELQNLMLCALQTIYGAEARKESRGGPRREDFKE
RVDEYDYSKPIQGQQKKPFEQHWRKHTLSYVDIKTGKVTLEYRPVIDRTLNETDCATVPP
AIGSY
>sp|O31214|UCRI_ALLVD Ubiquinol-cytochrome c reductase iron-sulfur subunit OS=Allochromatium vinosum (strain ATCC 17899 / DSM 180 / NBRC 103801 / NCIMB 10441 / D) OX=572477 GN=petA PE=3 SV=2
MLASAGGYWPMSAQGVNKMRRRVLVAATSVVGAVGAGYALVPFVASMNPSARARAAGAPV
EADISKLEPGALLRVKWRGKPVWVVHRSPEMLAALSSNDPKLVDPTSEVPQQPDYCKNPT
RSIKPEYLVAIGICTHLGCSPTYRPEFGPDDLGSDWKGGFHCPCHGSRFDLAARVFKNVP
APTNLVIPKHVYLNDTTILIGEDRGSA
>sp|E0TW67|QOX2_BACPZ Quinol oxidase subunit 2 OS=Bacillus subtilis subsp. spizizenii (strain ATCC 23059 / NRRL B-14472 / W23) OX=655816 GN=qoxA PE=1 SV=2
MIFLFRALKPLLVLALLTVVFVLGGCSNASVLDPKGPVAEQQSDLILLSIGFMLFIVGVV
FVLFTIILVKYRDRKGKDNGSYNPKIHGNTFLEVVWTVIPILIVIALSVPTVQTIYSLEK
APEATKDKEPLVVHATSVDWKWVFSYPEQDIETVNYLNIPVDRPILFKISSADSMASLWI
PQLGGQKYAMAGMLMDQYLQADEVGTYQGRNANFTGEHFADQEFDVNAVTEKDFNSWVKK
TQNEAPKLTKEKYDQLMLPENVDELTFSSTHLKYVDHGQDAEYAMEARKRLGYQAVSPHS
KTDPFENVKENEFKKSDDTEE
>sp|A5GCQ9|VATE_GEOUR V-type ATP synthase subunit E OS=Geobacter uraniireducens (strain Rf4) OX=351605 GN=atpE PE=3 SV=1
MGYVELIAALRRDGEEQLEKIRSDAEREAERVKGDASARIERLRAEYAERLASLEAAQAR
AILADAESKASSIRLATESALAVRLFLLARSSLHHLRDEGYEQLFADLVRELPPGEWRRV
VVNPADMALAARHFPNAEIVSHPAIVGGLEVSEEGGSISVVNTLEKRMERAWPELLPEIL
RDIYREL  

Members

Yu-Yen Ou
Associate Professor

Department of Computer Science and Engineering
Yuan Ze University
135 Yuan-Tung Road, Chung-Li, Taiwan 32003, R.O.C.

Trung-Duong Nguyen-Trinh
Research Scholar

Department of Computer Science and Engineering
Yuan Ze University
135 Yuan-Tung Road, Chung-Li, Taiwan 32003, R.O.C.

Quang-Thai Ho
Research Scholar

Department of Computer Science and Engineering
Yuan Ze University
135 Yuan-Tung Road, Chung-Li, Taiwan 32003, R.O.C.

Nguyen-Quoc-Khanh Le
Assistant Professor

Professional Master Program in Artificial Intelligence in Medicine
Taipei Medical univeristy
Taipei City 106, Taiwan

Dinh-Van Phan
Research Scholar

Deparment of Statistics – Informatics
University of Economics, University of Danang
71 Ngu Hanh Son St, Danang, Vietnam 550000

Contact us


Department of Computer Science and Engineering
Graduate Program in Biomedical Informatics
Bioinformatics Laboratory (R1607B)
Address: No. 135, Yuandong Road, Chungli City, Taoyuan County, Taiwan R.O.C .32003
Tel: (03) 463-8800

If you have any problem or suggest any idea for our website, feel free to contact us via email: [email protected]