A webserver for discriminating five electron complexes of transport proteins
Every cell uses cellular respiration machinery to oxidize food molecules such as glucose (sugars) to carbon dioxide and water, thus obtaining energy-carrying molecules in the form of adenosine triphosphate. This crucial process can not occur without the aid of electron transport chains, a series of 5 protein complexes embedded in the inner mitochondrial membrane. A variety of human diseases such as Parkinson's disease, pulmonary hypertension, and Alzheimer's disease involve the functional loss of these protein complexes. Thus, investigating these electron complexes is an ongoing concern for biologists to better understand the molecular mechanisms of important human diseases. In this research, we employed two representation learning methods namely word embedding and transfer learning to analyse electron complex sequences and create efficient feature sets before using support vector machine algorithm to classify them.
On an average, our final classification models achieved performance of 96%, 96.1%, 95.3%, and 0.86, respectively on cross-validation data. For the independent test data, those corresponding performance scores are 95.3%, 92.6%, 94%, and 0.87. Using representation learning methods, we show that simple machine learning method is on par with existing deep neural network method on the task of categorizing electron complexes while enjoying much faster way for feature generation.
The dataset used in this server were from Le et al’s work which were retrieved from UniProt and GeneOntology. The detail of the dataset is listed in the table below.
Original | Similarity
< 30% (BLASTCLUST) |
Benchmark datasets | ||
---|---|---|---|---|
Testing Data | Training Data | |||
Complex I | 5,147 | 302 | 252 | 50 |
Complex II | 154 | 30 | 25 | 5 |
Complex III | 1,781 | 29 | 25 | 4 |
Complex IV | 2026 | 153 | 128 | 25 |
Complex V | 2026 | 190 | 38 | 152 |
If you would like to build a model and evaluate our model, we provide the dataset as the below link
Download dataset.zipIn order to avoid the errors, please submit the sequence in fasta format (we also give you the fasta file examples). The user can choose two options to submit, including paste the sequence into text area and upload sequence file. The user can submit one single fasta file or multiple fasta file. In the result page, we show the results for the sequences separately.
> A2XVZ1 MATTASPFLSPAKLSLERRLPRATWTARRSVRFPPVRAQDQQQQVKEEEEEAAVENLPPP PQEEEQRRERKTRRQGPAQPLPVQPLAESKNMSREYGGQWLSCTTRHIRIYAAYINPETN AFDQTQMDKLTLLLDPTDEFVWTDETCQKVYDEFQDLVDHYEGAELSEYTLRLIGSDLEH FIRKLLYDGEIKYNMMSRVLNFSMGKPRIKFNSSQIPDVK > P31039 MSGVAAVSRLWRARRLALTCTKWSAAWQTGTRSFHFTVDGNKRSSAKVSDAISAQYPVVD HEFDAVVVGAGGAGLRAAFGLSEAGFNTACVTKLFPTRSHTVAAQGGINAALGNMEEDNW RWHFYDTVKGSDWLGDQDAIHYMTEQAPASVVELENYGMPFSRTEDGKIYQRAFGGQSLK FGKGGQAHRCCCVADRTGHSLLHTLYGRSLRYDTSYFVEYFALDLLMESGECRGVIALCI EDGSIHRIRARNTVIATGGYGRTYFSCTSAHTSTGDGTAMVTRAGLPCQDLEFVQFHPTG IYGAGCLITEGCRGEGGILINSQGERFMERYAPVAKDLASRDVVSRSMTLEIREGRGCGP EKDHVYLQLHHLPPAQLAMRLPGISETAMIFAGVDVTKEPIPVLPTVHYNMGGIPTNYKG QVLRHVNGQDQGVPGLYACGEAACASVHGANRLGANSLLDLVVFGRACALSIAESCRPGD KVPSIKPNAGEESVMNLDKLRFANGSIRTSELRLNMQKSMQSHAAVFRVGSVLQEGCEKI SSLYGDLRHLKTFDRGMVWNTDLVETLELQNLMLCALQTIYGAEARKESRGGPRREDFKE RVDEYDYSKPIQGQQKKPFEQHWRKHTLSYVDIKTGKVTLEYRPVIDRTLNETDCATVPP AIGSY > O31214 MLASAGGYWPMSAQGVNKMRRRVLVAATSVVGAVGAGYALVPFVASMNPSARARAAGAPV EADISKLEPGALLRVKWRGKPVWVVHRSPEMLAALSSNDPKLVDPTSEVPQQPDYCKNPT RSIKPEYLVAIGICTHLGCSPTYRPEFGPDDLGSDWKGGFHCPCHGSRFDLAARVFKNVP APTNLVIPKHVYLNDTTILIGEDRGSA > E0TW67 MIFLFRALKPLLVLALLTVVFVLGGCSNASVLDPKGPVAEQQSDLILLSIGFMLFIVGVV FVLFTIILVKYRDRKGKDNGSYNPKIHGNTFLEVVWTVIPILIVIALSVPTVQTIYSLEK APEATKDKEPLVVHATSVDWKWVFSYPEQDIETVNYLNIPVDRPILFKISSADSMASLWI PQLGGQKYAMAGMLMDQYLQADEVGTYQGRNANFTGEHFADQEFDVNAVTEKDFNSWVKK TQNEAPKLTKEKYDQLMLPENVDELTFSSTHLKYVDHGQDAEYAMEARKRLGYQAVSPHS KTDPFENVKENEFKKSDDTEE > A5GCQ9 MGYVELIAALRRDGEEQLEKIRSDAEREAERVKGDASARIERLRAEYAERLASLEAAQAR AILADAESKASSIRLATESALAVRLFLLARSSLHHLRDEGYEQLFADLVRELPPGEWRRV VVNPADMALAARHFPNAEIVSHPAIVGGLEVSEEGGSISVVNTLEKRMERAWPELLPEIL RDIYREL
Department of Computer Science and Engineering
Yuan Ze University
135 Yuan-Tung Road, Chung-Li, Taiwan 32003, R.O.C.
Department of Computer Science and Engineering
Yuan Ze University
135 Yuan-Tung Road, Chung-Li, Taiwan 32003, R.O.C.
School of Humanities
Nanyang Technological University
48 Nanyang Ave, Singapore 6397983
Deparment of Statistics – Informatics
University of Economics, University of Danang
71 Ngu Hanh Son St, Danang, Vietnam 550000
Department of Computer Science and Engineering
Yuan Ze University
135 Yuan-Tung Road, Chung-Li, Taiwan 32003, R.O.C.
If you have any problem or suggest any idea for our website, feel free to contact us via email: [email protected]