| Supplemental data | |
|---|---|
| L2L Manuscript | Diabetic Nephropathy |
| Genomics of Ageing | Co-expression Relationships |
| Statistical Controls | |
A manuscript recently published in Genome Biology describes L2L in detail and provides examples
of how L2L produces novel biological insights from gene expression data.
L2L manuscript at Genome
Biology
PDF of L2L manuscript (1.3 MB)
Note that supplemental data was collected using the 2005.1 release of L2L.
Repeating the analyses with the current release will produce slightly different results.
|
Baelde HJ, Eikmans M, Doran PP, Lappin DW, de Heer E, and Bruijn JA. "Gene expression profiling in glomeruli from human kidneys with diabetic nephropathy". Am J Kidney Dis. 2004 Apr;43(4):636-50. |
|||
|
Up-regulated |
Probe ID list |
Down-regulated |
Probe ID list |
|
Blalock EM, Geddes JW, Chen KC, Porter NM, Markesbery WR, and Landfield PW. "Incipient Alzheimer's disease: microarray correlation analyses reveal major transcriptional and tumor suppressor responses". Proc Natl Acad Sci U S A. 2004 Feb 17;101(7):2173-8. |
|||
|
Up-regulated |
Probe ID list |
Down-regulated |
Probe ID list |
|
Up-regulated |
Probe ID list |
Down-regulated |
Probe ID list |
|
Lu T, Pan Y, Kao SY, Li C, Kohane I, Chan J, and Yankner BA. "Gene regulation and DNA damage in the ageing human brain". Nature. 2004 Jun 24;429(6994):883-91. |
|||
|
Up-regulated |
Probe ID list |
Down-regulated |
Probe ID list |
|
34 genes are up-regulated in all three conditions above (ageing brain, incipient Alzheimer's, and overt Alzheimer's), and 10 genes are down-regulated in all three. The follwing are the results of analyzing these lists of commonly-regulated genes with L2L: |
|||
|
Up-regulated |
Probe ID list |
Down-regulated |
Probe ID list |
|
We also looked for commonly-regulated genes between these human brain studies and four experiemnts that examined various parts of the ageing mouse brain. The lists of differentially regulated genes were much smaller in the mouse studies, and only a handful of genes were found on more than one of the four lists. Only one of these genes was also found on the three human lists: ATPA2. The program used to search for commonalities between lists, findcommon.pl, is part of the L2L-Utilties collection. |
|||
|
Common to human brain ageing: Common to mouse brain ageing: Common to all: |
Up-regulated Up-regulated Up-regulated |
Down-regulated Down-regulated Down-regulated |
|
Any two genes that appear on a list together have a potential positive co-expression relationship - that is, they were regulated in common by a particular stimulus. Two genes that appear on inverse lists (one on "up", one on "down" for the same condition) have a potential negative co-expression relationship. The Perl applications findposcoexp.pl and findnegcoexp.pl (part of the L2L-Utilties collection) mine the L2L MDB for such relationships. All relationships that appear at least twice in the database are contained in the files below:
Positive co-expression relationships (2,732,324 relationships, 13.9 MB GZip)
Negative co-expression relationships (591,931 relationships, 2.9 MB GZip)
The files are in the format:
#relationship GENE1 #gene1 GENE2 #gene2
#relationship: number of times the relationship appears in the database
GENE1: symbol of gene1
#gene1: number of times gene1 appears in the database
GENE2: symbol of gene2
#gene2: number of times gene1 appears in the database
L2L uses a relatively simple binomial approximation calculation to quantify the statistical significance of the overlaps between lists. We performed a variety of statistical controls to validate that this calculation serves as a reasonable metric for biological significance. As a sample data set, we used the list of probes down-regulated in diabetic nephropathy.
We first tried using several of the potentially suitable statistical distributions, and found that they all produced similar results (Supplemental Table 1). We then applied one-step and step-down Bonferroni and Sidak adjustments to the binomial p-values (Supplemental Table 2), but the results from our subsequent simulation analysis suggested that these adjustments were too conservative.
We ran a 10,891-trial random-data simulation, using data sets the same size as our sample (513 probes) drawn randomly from the universe of the U95Av2 array (11,877 probes). We used the frequency of occurence of p-values among these random data to generate adjusted p-values for the real data. We also used the random data to calculate false discovery rates. The results suggested that the original binomial p-value was a good approximation of such simulation-derived data (Supplemental Table 3). In fact, the binomial p-value may be a little conservative, since a p-value of 0.05 never occured with a frequency greater than 0.05 for any list among the random data, and for some lists as low as 0.001 (Supplemental Table 4).
To quantify the robustness of conclusions generated by L2L, we ran a similar 10,891-trial permutation simulation using data sets that replaced 10% of the original probes with random ones (Supplemental Table 5). Finally, we tried running an analysis by translating the user's probes into gene symbols, and comparing these to the lists of genes in the database, instead of by translating the database lists into probe IDs, and comparing these to the user's list of probes. The results were broadly similar (Supplemental Table 6).
Supplemental Table 1: Comparison of Binomial, Hypergeometric, and Poisson distributions in calculating p-values (124 KB, MS Excel)
Supplemental Table 2: P-value adjustment using Bonferroni and Sidak procedures (236 KB, MS Excel)
Supplemental Table 3: P-value adjustment using 10,891-trial random data simulation (208 KB, MS Excel)
Supplemental Table 4: Simulation-adjusted p-values for p = 0.05 (52 KB, MS Excel)
Supplemental Table 5: 10% data permutation analysis (164 KB, MS Excel)
Supplemental Table 6: Comparison by gene symbol instead of by probe ID translation (92 KB, MS Excel)


