L2L is a database of published microarray gene expression data, and a software tool for comparing that published data to a user's own microarray results. It is very simple to use - all you need is a web browser and a list of the probes that went up or down in your experiment. Yet it is also powerful and extensible for those who need it to be.back to top
Gene expression microarrays have become perhaps the most popular contempory tool for hypothesis generation. The more intractable the biological problem, the more tempting microarrays become. Yet interpreting the mountain of data that a microarray experiment produces can be a frustrating chore. The inevitable outcome of every such experiment is a list of genes, or many such lists: genes that are induced or repressed under one condition or another, at one time point or another, in one cluster or another. The daunting task is to extract some real biological meaning from these lists.
L2L finds true biological patterns in gene expression data by systematically comparing your own list of genes to lists of genes that have been experimentally determined to be co-expressed in response to a particular stimulus - in other words, published lists of microarray results. The patterns it finds can point to the underlying disease process or affected molecular function that actually generated the observed changed in gene expression. Its insights are far more systematic than "critical gene" analyses, and more biologically relevant than pure Gene Ontology-based analyses.back to top
The development of L2L in 2003 was inspired by our efforts to extract meaning from our own microarray analysis of the progeroid Cockayne syndrome, so the publications included in the L2L MDB initially reflected topics thought to be related to Cockayne syndrome: ageing, cancer, and DNA damage. Since then, the scope of the publications we included has expanded considerably, to include chromatin structure, immune and inflammatory mediators, the hypoxic response, adipogenesis, growth factors, hormones, cell cycle regulators, and others. Despite the parochial origins of the database, the wide range of topics covered will make L2L of general interest to any investigator using microarrays to study human biology.
The database is annotated with a variety of keywords, primarily to make browsing it easier. These keywords also give some idea of the topics covered, however:
Keyword Number of lists
Each list in the database represents a group of genes, identified by HUGO symbols, that were either up-regulated or down-regulated under some experimental condition. All data is derived from published works - from the lists of genes that were found to be "significantly" changed in expression according to the authors' criteria. To create the lists, we downloaded the publication in electronic form, along with any supplemental data. We extracted the complete list of significant expression changes for each condition studied, and converted whatever gene identifiers were used to HUGO symbols. Any genes that could not be uniquely mapped to a HUGO symbol were ignored. We then annotated each list with a meaningful-but-short file name, a longer description, and various other information. Our Genome Biology manuscript provides a more detailed description of the process, and the rationale behind it.back to top
The lists of gene symbols that make up the L2L database require periodic updating to keep up with current gene annotations. Every six to twelve months we refresh the database from the orignal published data, whatever form that took: UniGene accessions, probe IDs, GenBank accessions, etc. We essentially re-create the database from scratch, using the latest gene annotations from NCBI Gene, UniGene, ENSEMBL, and microarray manufacturers. Similarly, we refresh the translator libraries, using the latest microarray annotation files from the manufacturers. Our first such update took place in January 2006.back to top
Data from mouse studies is initially treated identically to human data. We extracted lists of expression changes from published works and converted them to official mouse gene symbols. Any data that could not be matched unambiguously to a gene symbol was ignored. The mouse gene symbols were then mapped to their human homolog. This done by a Perl application we created for the purpose, which first looks for a human homolog in the NIH HomoloGene database, and, if none is found, looks for a human gene with an identical symbol to the mouse. The second step captures many real homologs that have not yet been entered into HomoloGene, but may produce rare false-matches. Any mouse genes that could not be mapped to a unique human gene by either step were ignored. The resulting list of HUGO symbols is then used as an L2L database list.
The program we use to perform these interspecies conversions is now open to the public as a web-based tool: MammalHom.back to top
The L2L application does the heavy lifting of data analysis in L2L. This program receives user input from the web interface and performs the actual data processing tasks, along with the creation of the output HTML pages. The program requires three inputs: the data to be analyzed, in the form of a list of microarray probe identifiers; a translator library that pairs each probe on the microarray with its corresponding HUGO gene name; and a folder of lists to which the data will be compared.
The program works sequentially through all the lists, first using the translator to map each gene name in the list to all the probes on the microarray that represent that gene (Figure 3a of the L2L manuscript). Thus, a given gene on a list may be represented by several microarray probes, or none at all. This name-to-probe translation - the reverse of the process by which the L2L MDB lists were originally generated - allows L2L to retain the greatest possible amount of the user's data, by performing comparisons based on the probe IDs of the user's microarray, rather than the gene names those probes represent. This helps to compensate for the loss of this probe ID information when the lists were compiled.
Each of these translated probe IDs is then queried against the data. The program records the number of probes from the list that match the data, as well as the total number of probes on the microarray that represent the gene names on the list (Figure 3b). It also determines the fraction of probes on the microarray that are found in the data. From these three numbers, the program first calculates the number of expected matches for that list, then the relative enrichment of actual matches, and finally a binomial probability for the relative enrichment. The results are logged, and written to a raw output file. In addition, for each list, the program records the IDs of all the probes from the data which matched to that list. Similarly, for each probe in the data, the program records the names of all the lists on which it was found. All of this information is then used to create the output HTML pages (Figure 3c).back to top
Translator libraries are plain-text files that contain all of the probe IDs for a microarray platform that represent a named gene, and the HUGO symbol of that gene. The most basic annotation for each probe ID is extracted from the manufacturers' annotation files - usually a GenBank accession number or "representative public ID". These accessions are then converted to the appropriate HUGO symbol using the latest release of NCBI databases. A list of all the supported platforms, with links to the manufacturers' product sites, can be found on the file format page.back to top
Human Gene ID associations with GO terms were extracted from the current release of the NIH's NCBI Gene database. Annotations for each term - term name, term description, etc. - were extracted from Gene Ontology's current database. We used a custom Perl application to find all of the Gene IDs directly associated with each particular GO term, as well as the Gene IDs associated with any descendants of that term. The Gene IDs were translated to gene symbols, and an L2L list file was created for any GO term with at least five gene symbols. The list files were named with the GO terms' accession numbers, and sorted into the three major GO categories: Biological Process, Molecular Function, or Cell Component. A more detailed description of the process can be found in the GO-L2L README file.back to top
The lists of Predicted Human microRNA Targets are derived from data supplied by microrna.org, a project of the Computational Biology Center at Memorial Sloan-Kettering Cancer Center. The computational tool used for site prediction, miRanda, and an initial compilation of target sites on human genes, was published in PLoS Biology. L2L uses the most recent list of predicted targets on microrna.org, dated January 21, 2005.
There is one L2L list for each miRNA. The list contains all of the genes on which a binding site for that miRNA is predicted by miRanda. L2L can therefore mine your microarray data for significant over-representation of genes with binding sites for a particular miRNA. L2L links each list to the miRNA's web page at microrna.org, where you can view its target sequence. Overlaps of an miRNA's targets with your data might suggest that that particular miRNA is more (or less) active in the condition you are studying, and is responsible for some of the gene expression changes you see.
The lists of Cancer Gene Expression Modules are derived from Eran Segal's Module Map, currently hosted by the Stanford AI Lab. The group analyzed 1975 microarrays spanning 22 tumor types and identified discrete gene expression modules: sets of genes that act in concert to carry out specific functions. The module map was published in Nature Genetics.
There is one L2L list for each module, containing all of the known genes in that module. The modules vary greatly in size and heterogeneity. Each module's list is linked to its web page at the Module Map, where you can view more detailed information about its composition and possible function.
The lists of protein-protein interactions are derived from the interaction database of the international Reactome consortium. There is one L2L list for each gene in the Reactome database; each list includes all of the genes that have a known interaction with that given gene. Reactome defines an interaction as two proteins that "occur in the same complex or occur in the same reaction".back to top
The colored boxes that appear on the List match and Gene match results pages help you to see, at a glance, the general functions of a gene or group of genes. The functions are derived from Gene Ontology Biological Process categories, but are basically arbitrary and represent our best effort at functions that are specific enough to be useful, but broad enough that only a handful (nine) can represent the entire spectrum of cellular function. The nine functions, and the GO categories that comprise them up, are:
|Growth and Death|
|Stress and Immunity (X)|
GO:0019538 Protein metabolism
|Cell Adhesion and Signalling|
GO:0007165 Signal transduction
Yes. The data files you upload for analysis, and any analysis results, are not downloaded or examined in any way by the administrators of L2L, unless required for system maintenance and troubleshooting. All files are deleted from the server after no more than 24 hours, and no archives or backups are kept. For this reason, though, you will usually want to download your results as an archive (link at the bottom of every Results Summary page) immediately after performing an analysis.back to top
You don't need to download L2L to use it to analyze your microarray data. There is an easy-to-use web-based analysis tool, and you have the option of downloading your results so you can view them at any time on your own computer, using any web browser.
However, the entire L2L project, and all of its components, can be downloaded from the download page. Why might you want to download all or part of L2L?
- Run L2L on your own web server
- Use the database lists in another analysis program
- Do batch analysis with the command-line version of L2L
- Use your own set of lists (if so, please consider contributing your efforts!)
- Examine the source code of L2L and see exactly how it works
Please note that the entire L2L project, including the database lists, is released as Free Software under the GNU General Public License. If you wish to include the database lists for distribution in another GPL-compatible analysis program, we ask that you please acknowledge L2L in a place obviously visible to users, and cite us in any manuscript describing such a tool. Please contact us if you have any questions.back to top
Using the web-based analysis tool requires only a modern web browser. If you have trouble viewing the website or the results pages in Internet Explorer, try updating to a recent version of Firefox (for Windows, Linux, and Mac OS X) or Safari (for Mac OS X and Windows).
Downloading the L2L Application and running it locally, or hosting the entire L2L website on your own web server, requires a computer with a UNIX-like command shell and Perl 5 installed. Macintosh OS X 10.2 and above come with both installed by default; L2L can be run with no modifcation. Any installation of Linux includes a shell, and if your distribution did not install Perl 5 by default, you can download it from CPAN. Cygwin provides a UNIX-like shell environment for Windows, and offers Perl as an installable package. A popular stand-alone Perl package is ActivePerl. Disclaimer: we cannot officially recommend or provide any support for any of these system modifications; if you choose to install any of them, you do so at your own risk. If any programmers are interested in helping to creating GUI ports of L2L, we would be delighted to collaborate.back to top
A manuscript was published in Genome Biology in 2005 that describes L2L in detail, and provides examples of using L2L to extract new biological insights from exisitng microarray data. Please cite this paper when reporting L2L results in the literature. The data that was used in the sample analyses, and the complete results of the sample analyses, can be found on the Supplemental Data page.back to top