L2L uses simple, tab-delimted formats for all of its files.
The three types of files it needs are data files, translator libraries, and list files.
Data files contain a user's own experimental data - the list of genes that were up- or down-regulated in a microarray experiment. Genes that were up-regulated and genes that were down-regulated should be put in separate files and analyzed separately. The file is simply a list of unique probe identifiers for the particular microarray system that was used, one identifier per line:
probeID1
probeID2
probeID3
Support for a variety of popular microarray systems is built-in to L2L. If your microarray system isn't among them, you can create your own translator library. The following table lists the supported microarray systems and a few sample probe identifiers. Note that probe identifiers for chip sets (U133 Set, U95 Set, etc.) include the chip ID. All identifiers are case-insensitive (i.e. 200007_at or 200007_AT are both fine).
Built-in microarray platforms and sample probe identifiers
| Human microarrays | ||
|---|---|---|
| Platform | Probe ID | Notes |
| Affy HG-U133 Plus 2.0 | 1552275_s_at 211759_x_at |
Affy's current one-chip whole-genome expression array. |
| Affy HG-U133 Set | 202116_at_HG-U133A 244828_x_at_HG-U133B |
Set of U133A and U133B; requires chip IDs. |
| Affy HG-U133A 2.0 | 203753_at 222209_s_at |
Current, revised version of U133A. |
| Affy HG-U133A | 202116_at 219899_x_at |
Chip from U133 Set that includes most known genes. |
| Affy HG-U95 Set | 36361_at_HG_U95Av2 63169_at_HG_U95C |
Previous-generation comprehensive chip set, including U95Av2 and U95B-E; requires chip IDs. |
| Affy HG-U95Av2 | 152_f_at 32226_at |
Chip from previous-generation set that includes most known genes. |
| Affy HG-Focus | 200018_at 205651_x_at |
"Starter chip" that contains a subset of the known genes on U133A. |
| Affy Hu6800/HuGeneFL | AB002332_at M15465_s_at |
Previous-generation "starter chip". |
| Affy Human Cancer G110 | 1531_at 2050_s_at |
Custom array that includes 1700 genes implicated in cancer. |
| Affy Hu35K Set | RC_AA283044_AT_HU35KSUBA RC_N25923_F_AT_HU35KSUBD |
Discontinued array set, including Hu35K subA, subB, subC and subD. Requires chip IDs. |
| Affy Hu35KA | AA019475_at RC_AA071075_at |
Chip from Hu35K set that includes most known genes. |
| Agilent Whole Human Genome | A_24_P417162 A_23_P414312 |
Agilent's current one-chip whole-genome expression array. |
| Agilent H1 Set | A_23_P123587_1Av2 A_32_P449722_1B |
Previous-generation comprehensive chip set, including H1Av2 and H1B; requires chip IDs. |
| Agilent H1A | A_23_P12628 A_23_P216610 |
Chip from previous-generation set that includes most known genes. |
| Illumina Human-6 BeadChip | GI_10092672-S GI_31415879-I |
Whole-genome bead chips for analyzing 6 samples simultaneously. |
| Illumina HumanRef-8 BeadChip | GI_18765751-I GI_42794764-A |
Bead chips that permit analysis of 8 samples for characterized RefSeq transcripts. |
| NIH/NIA 15k Human | 1 36382 |
15k cDNA array from the NIA's Gene Expression and Genomic Unit; probe ID is "14k index" column of annotation file. |
| JHU/NIH MGC1 Human | 1-R3C5 384-R4C2 |
cDNA array from the NIA/JHU Microarray Facility; probe ID is grid coordinate of feature. |
| All HUGO names | ACTB LMAN1 |
Default translator library if your array is not supported by L2L. |
| Mouse microarrays | ||
| Platform | Probe ID | Notes |
| Affy Mouse 430 2.0 | 1436006_at 1448142_x_at |
Affy's current one-chip whole-genome expression array. |
| Affy MOE 430 Set | 1415691_at_MOE430A 1444190_at_MOE430B |
Set of 430A and 430B; requires chip IDs. |
| Affy Mouse 430A 2.0 | 1422256_at 1439185_x_at |
Currrent, revised version of 430A. |
| Affy MOE 430A | 1415689_s_at 1419249_at |
Chip from 430 Set that includes most known genes. |
| Affy MG_U74v2 Set | 162478_r_at_MG_U74Av2 136558_at_MG_U74Cv2 |
Previous-generation comprehensive chip set, including U74(A,B,C)v2; requires chip IDs. |
| Affy MG_U74Av2 | 100015_at 160660_r_at |
Chip from U74 set that includes most known genes. |
| Affy Mu11k Set | l25913_f_at_Mu11ksubA Msa.3206.0_s_at_Mu11ksubB |
Early chip set, including Mu11ksubA and Mu11ksubB; requires chip IDs. |
| Affy Mu11k SubA | aa000380_s_at U35142_f_at |
Chip from Mu11k set that includes most known genes. |
| Agilent Whole Mouse Genome | A_51_P438924 A_51_P477917 |
Agilent's current one-chip whole-genome expression array. |
| Agilent Mouse v2 | A_51_P371972 A_51_P220343 |
Previous-generation Agilent array. |
| Agilent Mouse Development | A_66_P100091 A_66_P137744 |
Agilent array derived from NIA Mouse Gene Index, optimized for stem cell and developmental studies. |
| Illumina Mouse-6 BeadChip | GI_38074788-S RI|0610030G03|R000004K23|AK002703|711-S |
Whole-genome bead chips for analyzing 6 samples simultaneously. |
| Illumina MouseRef-8 BeadChip | GI_6671508-S SCL40201.4.1_211-S |
Bead chips that permit analysis of 8 samples for characterized RefSeq transcripts. |
| NIH/NIA Mouse 15k | H3015B04 H3073F11 |
|
| NIH/NIA Mouse 7.4k | H4001C04 H4048B10 |
|
| JHU/NIA M17Kam Set | 2-R1C3-A 381-R1C1-B |
Based on NIH/NIA mouse 15k clone set; probe ID is grid coordinate of feature. |
| Rat microarrays | ||
| Platform | Probe ID | Notes |
| Affy Rat 230 2.0 | 1367474_at 1387360_at |
Affy's current one-chip whole-genome expression array. |
| Affy RAE 230 Set | 1367466_at_RAE230A 1392885_at_RAE230B |
Set of 230A and 230B; requires chip IDs. |
| Affy RAE 230A | 1367468_at 1373932_at |
Chip from 230 Set that includes most known genes. |
| Affy HT Rat Focus | 1390315_a_at 1369905_at |
High-throughtput array plate that contains probes for 16,000 of the best-characterized genes on the Rat 230 2.0 array. |
| Affy RG_U34 Set | AA685112_at_RG_U34A rc_AA819798_g_at_RG_U34C |
Previous-generation comprehensive chip set, including U34(A,B,C); requires chip IDs. |
| Affy RG_U34A | L23148_at rc_AI234969_s_at |
Chip from U34 set that includes most known genes. |
| Agilent Whole Rat Genome | A_44_P1002173 A_44_P308858 |
Agilent's current one-chip whole-genome expression array. |
| Agilent Rat v2 | A_42_P745958 A_43_P11226 |
Previous-generation Agilent array. |
| Primate microarrays | ||
| Platform | Probe ID | Notes |
| Affy Rhesus macaque | MmugDNA.3443.1.S1_at MmugDNA.16349.1.S1_at |
|
| Agilent Rhesus macaque | A_23_P202540 KRM_1_04524 |
|
"All HUGO Names" is intended to be used for gene annotation - you can put a few genes you want to annotate in your data file, use this translator library, and see which L2L Microarray Database lists your genes of interest are found on. "All HUGO Names" can also be a used as a default "microarray system" if your microarray is not represented and you do not want to create a translator library for it. However, L2L's statistical analysis relies on knowing how many genes are actually on your microarray, and how many of those were changed in your experiment. Therefore, you should not put much faith in any P values or fold-enrichment numbers if you use "All HUGO Names". It is also very easy to create a new translator library (see below), so we highly recommend you do this if L2L doesn't include a translator library for your microarray system.
back to topA translator library allows L2L to translate gene names to microarray probe identifiers and back. It is a tab-delimted file with a paired probe identifer and HUGO name on each line:
#L2L library
#NAME Affy-SomeArray_v1
#RELEASE 2007.1
probeID1 XYZ1
probeID2 ABCD1
probeID3 HUJA6
The annotations at the top of the file are optional; they are not used by L2L. The probe identifers can be anything, as long as they match the probe identifiers you use for your data. Try to avoid special characters, however. The web-interface of L2L will warn you if your uploaded translator library has improper characters in it (this is a security measure). Gene names must be official HUGO gene symbols in order for L2L's gene annotation functions to work (linking to EntrezGene, for example).
back to topEach list in the database is a file with a few annotations at the top, followed by the HUGO gene symbols of all the genes on that list, one per line:
#L2L listfile
#NAME brca1_overexp_up
#REFERENCE 12032322
#DESCRIPTION Upregulated by induction of BRCA1 in EcR-293 cells
#KEYWORDS cancer
#PLATFORM HuGeneFL
#RELEASE 2007.1
#ORGANISM human
#REPOSITORY
FSTL1
GALNT3
SEC10L1
HTATIP...
The first line tells the L2L application that this is, indeed, a list. This line should be the same for all list files. The second line is a short, informative name for the list (the same as the file name). It should contain only alphanumeric characters and underscores. The third line is a reference to the source of the list. For L2L Microarray Database lists, this is the PubMed ID of the source publication. The fourth line is a description of the list. It can be as long as necessary, and can include any character except tabs.
The fifth line can contain one of a number of keywords for browsing the database and (in a future revision of L2L) restricting searches to particular topics. Current keywords. The sixth line describes the platform (microarray or otherwise) that was used to generate the data encompassed by the list. The seventh line is the release version of the list; this is for the user's reference only, and is not used by the L2L program (i.e. it can be omitted). The eight line is the organism on from which the data was generated. The ninth line contains a repository accession number, if this publication was linked to any data in Gene Expression Omnibus or Array Express. All other lines in the file contain one of the genes on the list. Note that the order and position of the lines is not critical; L2L identifies annotations by the "#XXX" designators.
Beginning with release 2006.2 (L2L v1.1), the L2L command-line tool can use a single-file database format to speed high-throughput batch processing. This format is simply a concatenation of all list files into a single text file, with a line containing "##" at the end of each list of genes.
L2L provides a variety of annotations and hyperlinks to external references in its output HTML files. The data needed to generate these links are contained in a text file, "listsets.txt", with a format similar to the single-file databases:
#L2L listset
#NAME l2lmdb
#REFERENCE http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=XURLX
#REFDESC PubMed Abstract
#DESCRIPTION L2L MDB
##
"NAME" is the shorthand name to be used in file and folder names. "REFERENCE" is a template for hyperlinking to the data source for a list in this database; "XURLX" will be replaced with the "REFERENCE" annotation from the individual list. "REFDESC" is a description of the hyperlink destination for display in output web pages. "DESCRIPTION" is also for display in output web pages (a longhand version of "NAME"). "##" marks the end of one entry in the file.
back to top

