Server use case

To illustrate the different outputs of SA-Mot, we present the extraction and characterization of structural motifs from the protein corresponding to the PDB code: 2RHM)

Input data

Input data is the protein tri-dimensional structure. SA-Mot accepts as input either the PDB formatted coordinate file (A) or the PDB code(B) of the protein target. Input PDB files can contain several chains.

Encoding output

When a protein structure is uploaded, SA-Mot server proceeds for all chains contained in the uploaded structures of all SA-Ws, and the results are presented separately for each chain. The result of this process is organized in three parts. At first, SA-Mot presents the protein chain (C) using the different sequences corresponding to the primary sequence (AA), secondary-structure (SS) and three dimensional structure through the structural-letter sequence (SL). These sequences allow the users to easily identify the loop regions of the studied chain.

Then, SA-Mot provides a table containing the counts of extracted SA-Ws of interest (D). This table gives an overview of isolated SA-Ws of interest. We can observe that the protein 2RHM chain B contains 19 recurrent SA-Ws , one ubiquitous SA-W and four functional candidat words. We can already conclude that this protein contains regions of interest.

Lastly, SA-Mot provides a second interactive table allowing the identification of SA-Ws of interest (E) using statistic, geometric and sequence parameters. This interactive table contains, for each SA-W (column SW} its positions and amino-acid sequence (columns Pos and AA. Other columns contain the values of parameters used for the identification of SA-Ws of interest presented in ``SA-Mot Method'' page. Thus the columns Occ and OR presenting the occurrence and the over-representation score computed in loop dataset, allow users to identify encoded into recurrent and non random SA-Ws corresponding to structural motifs involved in the structural redundancy of loops. To illustrate the occurrence of a SA-W, users can access to the list of proteins (and positions) it has been identified in (F) by clicking on the related icon. The columns RMSd}} and AACons, corresponding to the RMSd and Z_max allow users to identify SA-Ws with a relevant structural or sequence conservation. These conservations are illustrated by figures obtained by clicking on the corresponding values. The first figure (G) corresponds to the superimposition of all fragments encoded into the SA-W and the second (H)corresponds to the logo logo of the amino-acid sequences of all fragments encoded into a SA-W. Lastly, using the column ORsf corresponding to the result of the computation of the SA-W over-representation in SCOP superfamilies, the user can locate structural motifs, which are likely involved in protein function (functional candidate words) or in protein structures (ubiquitous SA-Ws). To help users to identify the role of these SMs, they have access to the SCOP id of each superfamily where the SA-W is over-represented (I,J).

In the table SA-Ws are ranked according to their positions in the studied chain. The table can be sorted according to the different columns in order to facilitate the identification of SA-Ws of interest.

Thus, SA-Mot learn us that the region at position 130-141 is composed of rare SA-Ws suggesting that it is flexible. Moreover, we learn that 19 SA-Ws are recurrent and over-represented in the loop data set. Most of the SA-Ws present weak structural variability and amino-acid specificities, suggesting that they correspond to structural motifs involved in crucial region in protein. Thus, we can conclude that regions located at positions 11-18, 25-32, 45-51, 128-135 seems to be crucial for the protein. Moreover, we have more information about SA-Ws ZCDS, YUOD, UODO, CDSK, OZGB. In fact SA-W ZCDS is an ubiquitous words, suggesting that this region is involved in protein stability and folding. For four SA-Ws YUOD, UODO, CDSK, OZGB a putative function role is assigned because they are strongly over-represented in few SCOP superfamilies. YUOD and UODO are over-represented in the superfamily "P-loop containing nucleotide triphosphate hydrolases" (SCOPid=52540, as presented in J) that groups protein binding nucleotides. These results suggest the structural motifs encoded into these SA-Ws contain residues involved in the nucleotide-binding site. CDSK is strongly over-represented in the superfamily ``YVTN repeat-like/Quinoprotein amine dehydrogenase'' (SCOP id=50969) and OZGB is the ``Trypsin-like serine proteases'' superfamily (SCOP id=50494). We can suppose that the structural motifs encoded into OZGB is involved in the functional site of trypsin proteins. For the three other chains, we find also interesting SM encoded into YUOD, UODO and OZGB.

Thus we can conclude that SA-Mot allows us an easily and rapid identification of SMs of interest for protein function. After this analysis, we can propose that each chain of protein 2RHM contains a nucleotide-binding sites at position 11-18 and a second potential functional site in position 165-171. Thus contrary to methods based on the sequence, SA-Mot allows the identification and location of potential functional sites in the uncharacterized protein 2RHM, that could help the determination of its function.

SA-Mot

Extract structural motifs from protein loop structures using HMM-SA

Server use case

Input data

Encoding output