Using CisGenome to Analyze ChIP‐chip and ChIP‐seq Data

Hongkai Ji1, Hui Jiang2, Wenxiu Ma2, Wing Hung Wong2

1 The Johns Hopkins University, Baltimore, Maryland, 2 Stanford University, Stanford, California
Publication Name:  Current Protocols in Bioinformatics
Unit Number:  Unit 2.13
DOI:  10.1002/0471250953.bi0213s33
Online Posting Date:  March, 2011
GO TO THE FULL TEXT: PDF or HTML at Wiley Online Library

Abstract

Chromatin immunoprecipitation (ChIP) coupled with genome tiling array hybridization (ChIP‐chip) and ChIP followed by massively parallel sequencing (ChIP‐seq) are high‐throughput approaches to profiling genome‐wide protein‐DNA interactions. Both technologies are increasingly used to study transcription‐factor binding sites and chromatin modifications. CisGenome is an integrated software system for analyzing ChIP‐chip and ChIP‐seq data. This unit describes basic functions of CisGenome and how to use them to find genomic regions with protein‐DNA interactions, visualize binding signals, associate binding regions with nearby genes, search for novel transcription‐factor binding motifs, and map existing DNA sequence motifs to user‐supplied genomic regions to define their exact locations.Curr. Protoc. Bioinform. 33:2.13.1‐2.13.45. © 2011 by John Wiley & Sons, Inc.

Keywords: transcription factor; chromatin immunoprecipitation; tiling array; next generation sequencing; motif; gene regulation

     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Table of Contents

  • Introduction
  • Basic Protocol 1: ChIP‐chip Peak Calling for Affymetrix Tiling Array Data
  • Basic Protocol 2: Visualization
  • Basic Protocol 3: Peak‐Gene Association
  • Basic Protocol 4: DNA Sequence Retrieval
  • Basic Protocol 5: De Novo Motif Discovery
  • Basic Protocol 6: Motif Mapping
  • Basic Protocol 7: ChIP‐chip Peak Calling for Other Tiling Array Platforms
  • Basic Protocol 8: ChIP‐seq Peak Calling (One‐Sample Analysis)
  • Basic Protocol 9: ChIP‐seq Peak Calling (Two‐Sample Analysis)
  • Support Protocol 1: Installing CisGenome
  • Support Protocol 2: Installing Genome Databases
  • Guidelines for Understanding Results
  • Commentary
  • Literature Cited
  • Figures
  • Tables
     
 
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Materials

GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library

Figures

  •   FigureFigure 2.13.1 Overview of the CisGenome basic data analysis pipeline.
  •   FigureFigure 2.13.2 The CisGenome graphic user interface (GUI) and menu system. The menu for creating an Affymetrix tiling array data set is shown as an example.
  •   FigureFigure 2.13.3 The dialog for adding BPMAP files to an Affymetrix ChIP‐chip data set.
  •   FigureFigure 2.13.4 The dialog for adding CEL files to an Affymetrix ChIP‐chip data set.
  •   FigureFigure 2.13.5 The newly created tiling array data set shown in the CisGenome Project Explorer. Double‐clicking a CEL file will open a CisGenome Browser window displaying a heat map of the array image.
  •   FigureFigure 2.13.6 The dialog for normalizing an Affymetrix tiling array data set.
  •   FigureFigure 2.13.7 ChIP‐chip peak calling. Before peak detection, a normalized tiling array data set (circle 1.10) needs to be available in the Project Explorer, and one needs to provide several basic peak calling parameters in a dialog.
  •   FigureFigure 2.13.8 ChIP‐chip peak calling results. Peaks are summarized in a COD file shown in the right window. A number of BAR files are also created to store enrichment signals. Both the COD file and the BAR files are added to the Project Explorer on the left.
  •   FigureFigure 2.13.9 CisGenome Browser. (A) The shortcut icon for the browser. (B) The first page of the browser.
  •   FigureFigure 2.13.10 The browser page for choosing browser session type.
  •   FigureFigure 2.13.11 An empty browser session newly created.
  •   FigureFigure 2.13.12 The browser page for choosing data track type.
  •   FigureFigure 2.13.13 The track configuration page in CisGenome Browser.
  •   FigureFigure 2.13.14 CisGenome Browser showing different types of data. Tools to adjust the display styles are highlighted.
  •   FigureFigure 2.13.15 Peak‐gene association. (A) The dialog for annotate peaks by nearby genes. (B) The annotation results returned in a COD file.
  •   FigureFigure 2.13.16 DNA sequence retrieval. (A) The parameter configuration dialog. (B) Returned files. The DNA sequences will be returned in FASTA format (top). If cross‐species conservation score is requested, conservation scores for each sequence will be returned as well. The conservation scores can be returned in a text format (bottom left), in BED format (bottom right), or a binary CS format (not shown).
  •   FigureFigure 2.13.17 The parameter configuration dialog for de novo motif discovery.
  •   FigureFigure 2.13.18 An example of the summary file produced by de novo motif discovery.
  •   FigureFigure 2.13.19 Motif matrix files produced by de novo motif discovery. (A) Each motif matrix is stored in a MAT file. (B) The list of motifs is stored in a MATL file. (C) Double‐clicking the MATL file in Project Explorer opens CisGenome Browser to display sequence logos of the motifs. The last motif in this example matches the known Gli motif.
  •   FigureFigure 2.13.20 An example of the CONS file for describing motif consensus sequence.
  •   FigureFigure 2.13.21 Mapping a motif matrix to a list of genomic regions. (A) The parameter configuration dialog. (B) The mapped motif sites are saved to a COD file.
  •   FigureFigure 2.13.22 Input data format for calling peaks from ChIP‐chip experiments based on non‐Affymetrix tiling array platforms.
  •   FigureFigure 2.13.23 The parameter configuration dialog for normalizing ChIP‐chip data from a text file.
  •   FigureFigure 2.13.24 Converting ChIP‐chip data in a text file to a tiling array data set consisting of BAR files. (A) The parameter configuration dialog. (B) The converted data set shown in Project Explorer.
  •   FigureFigure 2.13.25 A sample ALN file.
  •   FigureFigure 2.13.26 Loading aligned reads for ChIP‐seq peak calling. (A) The parameter configuration dialog for loading the ALN file. (B) Loaded data shown in Project Explorer.
  •   FigureFigure 2.13.27 FDR computation for an one‐sample ChIP‐seq experiment. (A) The parameter configuration dialog. (B) The results are returned in a table that summarizes statistical properties of the data.
  •   FigureFigure 2.13.28 Peak calling from one‐sample ChIP‐seq data. (A) The parameter configuration dialog. (B) The detected peaks are reported in a COD file.
  •   FigureFigure 2.13.29 Data for two‐sample ChIP‐seq analysis loaded into CisGenome.
  •   FigureFigure 2.13.30 FDR computation for a two‐sample ChIP‐seq experiment. (A) The parameter configuration dialog. (B) The results are returned in a table that summarizes statistical properties of the data.
  •   FigureFigure 2.13.31 The parameter configuration dialog for two‐sample ChIP‐seq peak calling.
  •   FigureFigure 2.13.32 An example of the CisGenome.ini file.
  •   FigureFigure 2.13.33 Load a genome database into CisGenome GUI. (A) In the file open dialog, choose the file named [species]_[assembly].cgw in the genome database folder. (B) The loaded database shown in Project Explorer.

Videos

Literature Cited

Literature Cited
   Barski, A., Cuddapah, S., Cui, K., Roh, T.Y., Schones, D.E., Wang, Z., Wei, G., Chepelev, I., and Zhao, K. 2007. High‐resolution profiling of histone methylations in the human genome. Cell 129:823‐837.
   Bolstad, B.M., Irizarry, R.A., Astrand, M., and Speed, T.P. 2003. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19:185‐193.
   Cawley, S., Bekiranov, S., Ng, H.H., Kapranov, P., Sekinger, E.A., Kampa, D., Piccolboni, A., Sementchenko, V., Cheng, J., Williams, A.J., Wheeler, R., Wong, B., Drenkow, J., Yamanaka, M., Patel, S., Brubaker, S., Tammana, H., Helt, G., Struhl, K., and Gingeras, T.R. 2004. Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 116:499‐509.
   Crooks, G.E., Hon, G., Chandonia, J.M., and Brenner, S.E. 2004. WebLogo: A sequence logo generator. Genome Res. 14:1188‐1190.
   The Gene Ontology Consortium. 2000. Gene ontology: Tool for the unification of biology. Nat. Genet. 25:25‐29.
   Jensen, S.T., Liu, X.S., Zhou, Q., and Liu, J.S. 2004. Computational discovery of gene regulatory binding motifs: A Bayesian perspective. Stat. Sci. 19:188‐204.
   Ji, H. and Wong, W.H. 2005. TileMap: Create chromosomal map of tiling array hybridizations. Bioinformatics 21:3629‐3636.
   Ji, H., Jiang, H., Ma, W., Johnson, D.S., Myers, R.M., and Wong, W.H. 2008. An integrated software system for analyzing ChIP‐chip and ChIP‐seq data. Nat. Biotechnol. 26:1293‐1300.
   Johnson, D.S., Mortazavi, A., Myers, R.M., and Wold, B. 2007. Genome‐wide mapping of in vivo protein‐DNA interactions. Science 316:1497‐1502.
   Liu, J.S., Neuwald, A.F., and Lawrence, C.E. 1995. Bayesian models for multiple local sequence alignment and Gibbs sampling strategies. J. Amer. Statist. Assoc. 90:1156‐1170.
   Liu, X.S., Brutlag, D.L., and Liu, J.S. 2002. An algorithm for finding protein‐DNA binding sites with applications to chromatin‐immunoprecipitation microarray experiments. Nat. Biotechnol. 20:835‐839.
   Mikkelsen, T.S., Ku, M., Jaffe, D.B., Issac, B., Lieberman, E., Giannoukos, G., Alvarez, P., Brockman, W., Kim, T.K., Koche, R.P., Lee, W., Mendenhall, E., O'Donovan, A., Presser, A., Russ, C., Xie, X., Meissner, A., Wernig, M., Jaenisch, R., Nusbaum, C., Lander, E.S. and Bernstein, B.E. 2007. Genome‐wide maps of chromatin state in pluripotent and lineage‐committed cells. Nature 448:553‐560.
   Ren, B., Robert, F., Wyrick, J.J., Aparicio, O., Jennings, E.G., Simon, I., Zeitlinger, J., Schreiber, J., Hannett, N., Kanin, E., Volkert, T.L., Wilson, C.J., Bell, S.P., and Young, R.A. 2000. Genome‐wide location and function of DNA binding proteins. Science 290:2306‐2309.
   Robertson, G., Hirst, M., Bainbridge, M., Bilenky, M., Zhao, Y., Zeng, T., Euskirchen, G., Bernier, B., Varhol, R., Delaney, A., Thiessen, N., Griffith, O.L., He, A., Marra, M., Snyder, M., and Jones, S. 2007. Genome‐wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat. Methods 4:651‐657.
   Siepel, A., Bejerano, G., Pedersen, J.S., Hinrichs, A.S., Hou, M., Rosenbloom, K., Clawson, H., Spieth, J., Hillier, L.W., Richards, S., Weinstock, G.M., Wilson, R.K., Gibbs, R.A., Kent, W.J., Miller, W., and Haussler, D. 2005. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15:1034‐1050.
GO TO THE FULL PROTOCOL:
PDF or HTML at Wiley Online Library