Biological Sequence Motif Discovery Using motif‐x

Michael F. Chou1, Daniel Schwartz2

1 Department of Genetics, Harvard Medical School, Boston, Massachusetts, 2 Department of Physiology and Neurobiology, University of Connecticut, Storrs, Connecticut
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 13.15
DOI:  10.1002/0471250953.bi1315s35
Online Posting Date:  September, 2011
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library


The Web‐based motif‐x program provides a simple interface to extract statistically significant motifs from large data sets, such as MS/MS post‐translational modification data and groups of proteins that share a common biological function. Users upload data files and download results using common Web browsers on essentially any Web‐compatible computer. Once submitted, data analyses are performed rapidly on an associated high‐speed computer cluster and they produce both syntactic and image‐based motif results and statistics. The protocols presented demonstrate the use of motif‐x in three common user scenarios. Curr. Protoc. Bioinform. 35:13.15.1‐13.15.24. © 2011 by John Wiley & Sons, Inc.

Keywords: protein motif; phosphorylation; post‐translational modification (PTM); motif discovery; motif‐x; mass spectrometry; proteomics

PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Basic Protocol 1: Extracting Sequence Motifs from MS/MS Post‐Translational Modification Data
  • Alternate Protocol 1: Extracting Sequence Motifs from Pre‐Aligned Data
  • Alternate Protocol 2: Extracting Sequence Motifs from Whole Protein Sequence Data in FASTA Format
  • Guidelines for Understanding Results
  • Commentary
  • Literature Cited
  • Figures
  • Tables
PDF or HTML at Wiley Online Library


PDF or HTML at Wiley Online Library



Literature Cited

   Bonferroni, C.E. 1935. Il calcolo delle assicurazioni su gruppi di teste. In Studi in Onore del Professore Salvatore Ortu Carboni. 13‐60.
   Chou, M.F. and Schwartz, D. 2011. Using the scan‐X web site to predict protein post‐translational modifications. Curr. Protocol. Bioinform. Submitted.
   Dinkel, H., Chica, C., Via, A., Gould, C.M., Jensen, L.J., Gibson, T.J., and Diella, F. 2011. Phospho.ELM: A database of phosphorylation sites–update 2011. Nucleic Acids Res. 39:D261‐D267.
   Durek, P., Schmidt, R., Heazlewood, J.L., Jones, A., MacLean, D., Nagel, A., Kersten, B., and Schulze, W.X. 2010. PhosPhAt: The Arabidopsis thaliana phosphorylation site database. An update. Nucleic Acids Res. 38:D828‐D834.
   Edwards, R.J., Davey, N.E., and Shields, D.C. 2008. CompariMotif: Quick and easy comparisons of sequence motifs. Bioinformatics 24:1307‐1309.
   Eng, J.K., Mccormack, A.L., and Yates, J.R. 1994. An approach to correlate tandem mass‐spectral data of peptides with amino‐acid‐sequences in a protein database. J. Am. Soc. Mass Spectrom. 5:976‐989.
   Gould, C.M., Diella, F., Via, A., Puntervoll, P., Gemund, C., Chabanis‐Davidson, S., Michael, S., Sayadi, A., Bryne, J.C., Chica, C., Seiler, M., Davey, N.E., Haslam, N., Weatheritt, R.J., Budd, A., Hughes, T., Pas, J., Rychlewski, L., Trave, G., Aasland, R., Helmer‐Citterich, M., Linding, R., and Gibson, T.J. 2010. ELM: The status of the 2010 eukaryotic linear motif resource. Nucleic Acids Res. 38:D167‐D180.
   Hornbeck, P.V., Chabra, I., Kornhauser, J.M., Skrzypek, E., and Zhang, B. 2004. PhosphoSite: A bioinformatics resource dedicated to physiological protein phosphorylation. Proteomics 4:1551‐1561.
   Howard, A.D., Kostura, M.J., Thornberry, N., Ding, G.J., Limjuco, G., Weidner, J., Salley, J.P., Hogquist, K.A., Chaplin, D.D., Mumford, R.A., Schmidt, J.A., and Tocci, M.J. 1991. IL‐1‐converting enzyme requires aspartic acid residues for processing of the IL‐1 beta precursor at two distinct sites and does not cleave 31‐kDa IL‐1 alpha. J. Immunol. 147:2964‐2969.
   Hutti, J.E., Jarrell, E.T., Chang, J.D., Abbott, D.W., Storz, P., Toker, A., Cantley, L.C., and Turk, B.E. 2004. A rapid method for determining protein kinase phosphorylation specificity. Nat. Methods 1:27‐29.
   Keshava Prasad, T.S., Goel, R., Kandasamy, K., Keerthikumar, S., Kumar, S., Mathivanan, S., Telikicherla, D., Raju, R., Shafreen, B., Venugopal, A., Balakrishnan, L., Marimuthu, A., Banerjee, S., Somanathan, D.S., Sebastian, A., Rani, S., Ray, S., Harrys Kishore, C.J., Kanth, S., Ahmed, M., Kashyap, M.K., Mohmood, R., Ramachandra, Y.L., Krishna, V., Rahiman, B.A., Mohan, S., Ranganathan, P., Ramabadran, S., Chaerkady, R., and Pandey, A. 2009. Human protein reference database–2009 update. Nucleic Acids Res. 37:D767‐D772.
   Kessels, H.W., Ward, A.C., and Schumacher, T.N. 2002. Specificity and affinity motifs for Grb2 SH2‐ligand interactions. Proc. Natl. Acad. Sci. U.S.A. 99:8524‐8529.
   Koyasu, S., Tse, A.G., Moingeon, P., Hussey, R.E., Mildonian, A., Hannisian, J., Clayton, L.K., and Reinherz, E.L. 1994. Delineation of a T‐cell activation motif required for binding of protein tyrosine kinases containing tandem SH2 domains. Proc. Natl. Acad. Sci. U.S.A. 91:6693‐6697.
   Matic, I., Schimmel, J., Hendriks, I.A., van Santen, M.A., van de Rijke, F., van Dam, H., Gnad, F., Mann, M., and Vertegaal, A.C. 2010. Site‐specific identification of SUMO‐2 targets in cells reveals an inverted SUMOylation motif and a hydrophobic cluster SUMOylation motif. Mol. Cell 39:641‐652.
   Perkins, D.N., Pappin, D.J., Creasy, D.M., and Cottrell, J.S. 1999. Probability‐based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20:3551‐3567.
   Rigbolt, K.T., Prokhorova, T.A., Akimov, V., Henningsen, J., Johansen, P.T., Kratchmarova, I., Kassem, M., Mann, M., Olsen, J.V., and Blagoev, B. 2011. System‐wide temporal characterization of the proteome and phosphoproteome of human embryonic stem cell differentiation. Sci. Signal. 4:rs3.
   Schwartz, D. and Gygi, S.P. 2005. An iterative statistical approach to the identification of protein phosphorylation motifs from large‐scale data sets. Nat. Biotechnol. 23:1391‐1398.
   Shakin‐Eshleman, S.H., Spitalnik, S.L., and Kasturi, L. 1996. The amino acid at the X position of an Asn‐X‐Ser sequon is an important determinant of N‐linked core‐glycosylation efficiency. J. Biol. Chem. 271:6363‐6366.
   Songyang, Z. and Cantley, L.C. 1995. Recognition and specificity in protein tyrosine kinase‐mediated signalling. Trends Biochem. Sci. 20:470‐475.
   Sternsdorf, T., Jensen, K., Reich, B., and Will, H. 1999. The nuclear dot protein sp100, characterization of domains necessary for dimerization, subcellular localization, and modification by small ubiquitin‐like modifiers. J. Biol. Chem. 274:12555‐12566.
   Van Hoof, D., Munoz, J., Braam, S.R., Pinkse, M.W., Linding, R., Heck, A.J., Mummery, C.L., and Krijgsveld, J. 2009. Phosphorylation dynamics during early differentiation of human embryonic stem cells. Cell Stem Cell 5:214‐226.
Key Reference
  Schwartz and Gygi, . See above.
  Original description of the motif‐x algorithm.
Internet Resources
  Home page for the motif‐x Web tool (NOTE: The protocols in this manuscript pertain to motif‐x version 1.2).
Supplemental Data Files
  File 1
  Sample human embryonic stem cell phosphorylation data obtained from the Van Hoof et al. () study. This file exemplifies the proper MS/MS foreground format for motif‐x analyses, and may be used by readers to verify results obtained in conjunction with the .
  File 2
  Sample human sumoylation data. This file exemplifies the proper pre‐aligned foreground format for motif‐x analyses, and may be used by readers to verify results obtained in conjunction with .
  File 3
  Sample proteins from the Eukaryotic Linear Motif (ELM) database known to bind Grb2‐like Src Homology (SH2) domains. This file exemplifies the proper FASTA foreground format for motif‐x analyses, and may be used by readers to verify results obtained in conjunction with .
  File 4
  This file contains a comprehensive list of potential motif‐x error messages, as well as suggested corrective actions.
PDF or HTML at Wiley Online Library