About the "G4P Calculator": ----------------------------------------------------------------------------- The "G4P Calculator" software is written in C# to run on Microsoft Windows XP. The program computes G4 DNA potential based on the density of runs of guanines in a sequence. The program evaluates runs of guanines in a sliding window and calculates the percentage of windows searched that meet the desired density criteria. Using the default criteria, the algorithm assesses 100 nucleotides (nt) at a time, shifting by 20 nt to assess overlapping windows along the entire sequence. Each 100 nt window that contains 4 or more runs of 3 or more guanines is scored as a “hit”. The G4 DNA potential is the percentage of the total number of windows searched that were scored as a “hit”. The program output includes this percentage which represents the relative structuring potential, called G4 DNA Potential (G4P). Each DNA strand is evaluated independently. The format of the output is described in more detail below. Installing the "G4P Calculator" program: ------------------------------------------------------------------------------ 1. Copy the file "G4P Calculator.exe" to a PC with the Windows XP operating system. (Other Microsoft operating systems may work, but have not been tested.) 2. Copy this file, "ReadMe.txt", to the same directory as the "G4P Calculator.exe" file. It can then be opened with the Help button. 3. If you get an error when you try to run "G4P Calculator", then you may need to install the Microsoft .NET Framework. a. Install the .NET Framework from the Microsoft Windows Update web site (http://update.microsoft.com) b. Click on the "Custom" button. c. Click on the "Software, Optional" link. d. Select Microsoft .NET Framework 2.0 (or most current version), and follow the directions on the screen to complete the installation. Using the "G4P Calculator" program: ------------------------------------------------------------------------------ 1. The sequence(s) to be analyzed should be in Fasta format. The DNA Sequence File can have multiple sequences each beginning with an identifying line that starts with the character >. 2. The criteria used for calculating G4 Potential may be modified from the default values. a. Size of search window (nt): Enter the size of the sequence to be evaluated for G4 DNA potential, in number of nucleotides. This size should be small enough to adequately evaluate the density of G-runs. If you increase the window size, you should also increase the number of G-runs per window in order to predict potential to form G4 DNA. If this number is larger than the length of the entire sequence, then the window size will automatically be reduced to the size of the entire sequence. b. Size of window shift (nt): The search windows will overlap if this number is less than the size of the search window. If the shift is equal to the search window size, then there will be no overlap. The shift size should not be greater than the window size or there will be gaps in evaluation of the entire input sequence. c. Minimum size of G-run (nt): This is the minimum number of consecutive Gs necessary to predict a G4 DNA structure. For example, if this size is 3, it would count sequences such as GGG or GGGGG as a G-run. d. Minimum number of G-runs per window: Enter the minimum number of G-runs necessary to predict a G4 DNA structure. For example, if this size is 4, and the minimum size of G-run is 3, and the size of search window is at least 30, then it would count the sequence NNGGGNNNNGGGNNNNNGGGNNNGGGNNNN as a "hit". 3. The results are written to a text file in tab delimited format to be opened in Excel. a. The first column will contain the text following the >. If this text contains the character, |, it will be replaced by a tab in the output file, therefore appearing in a separate column in Excel. b. Following one or more columns with this header information, the next columns will contain: i. The number of G-runs that met your criteria. ii. The number of C-runs that met your criteria. iii. The total number of windows searched. iv. The percentage of windows containing G-runs that met your criteria. v. The percentage of windows containing C-runs that met your criteria. vi. The sum of the two percentages, for both G-runs and C-runs that met your criteria. vii. An indicator if the sequence contained a block of unknown sequence, determined by a run of Ns. The Ns are removed from the sequence before computing the G4 DNA potential. TRUE = more than 4 Ns in a row. FALSE = no significant runs of Ns. ------------------------------------------------------------------------------ Copyright 2006 Johanna Eddy