ResultsSharingFormat

From Charge
Revision as of 18:08, 14 November 2012 by ZalokarNewton (talk | contribs)
Jump to navigation Jump to search

Results Sharing

The following variables should be included when sharing imputed results for meta-analysis; large files can be shared among small groups via secure file transfer site (as described in Results Sharing). Many working groups use ShareSpaces, a secure web-based file-sharing system implemented by the University of Washington's Catalyst computing group. The service has ample storage space for large files and limits access to a select group identified by UW Netids, by Protect Network IDs, or by Google Account IDs. 


ShareSpace Access is arranged via working groups. New members who expect to need access to these sites should register for a ProtectNetworkID: register online
(step-by-step instructions), or create a Google Account ID: create a new Google Account (more information).

File Formats

Results should be shared as plain text files, with the following variable names:

variable name description
SNPID SNP ID as rs number
chr chromosome number. Use symbols X, XY, Y and mt for non-autosomal markers.
position physical position for the reference sequence (indicate build 35/36 in readme file)
coded_all coded allele, also called modeled allele (in example of A/G SNP in which AA=0, AG=1 and GG=2, the coded allele is G)
noncoded_all the other allele
strand_genome + or -, representing either the positive/forward strand or the negative/reverse strand of the human genome reference sequence; to clarify which strand the coded_all and noncoded_all are on
beta beta estimate from genotype-phenotype association, at least 5 decimal places -- “NA” if not available
SE standard error of beta estimate, to at least 5 decimal places -- “NA” if not available
pval p-value of test statistic, here just as a double check -- “NA” if not available
AF_coded_all allele frequency for the coded allele -- “NA” if not available
HWE_pval exact test Hardy-Weinberg equilibrium p-value -- only directly typed SNPs, NA for imputed
callrate genotyping callrate after exclusions
n_total total sample with phenotype and genotype for SNP
imputed 1/0 coding; 1=imputed SNP, 0=if directly typed
used_for_imp 1/0 coding; 1=used for imputation, 0=not used for imputation
oevar_imp observed divided by expected variance for imputed allele dosage

Please note that a README should be uploaded with a very brief description of the data uploaded, the date, the NCBI human genome reference sequence used (e.g. NCBI 36.2) for strand reference, and the scale of the beta estimates; please also include in the README the SNP HWE p-value, callrate and minor allele frequency filters that have been applied.


For gene-environment interaction analyses, the following variables should be included:

variable name description
SNPID SNP ID as rs number
chr chromosome number. Use symbols X, XY, Y and mt for non-autosomal markers.
position physical position for the reference sequence (indicate build 35/36 in readme file)
coded_all coded allele, also called modeled allele (in example of A/G SNP in which AA=0, AG=1 and GG=2, the coded allele is G)
noncoded_all the other allele
strand_genome + or -, representing either the positive/forward strand or the negative/reverse strand of the human genome reference sequence; to clarify which strand the coded_all and noncoded_all are on
beta beta estimate from additive interaction term, at least 5 decimal places -- “NA” if not available
SE standard error of beta estimate, to at least 5 decimal places -- “NA” if not available
pval p-value of interaction test statistic, here just as a double check -- “NA” if not available
df.t degrees of freedom estimate for t reference distribution for interaction term -- “NA” if not available
pval.t p-value of interaction test statistic, using t reference distribution, here just as a double check -- “NA” if not available
beta.main beta estimate from genotype-phenotype association, at least 5 decimal places -- “NA” if not available
SE.main standard error of beta.main estimate, to at least 5 decimal places -- “NA” if not available
pval.main p-value of main test statistic, here just as a double check -- “NA” if not available
covar.main.inter covariance between beta and beta.main, to at least 5 decimal places  -- “NA” if not available
AF_coded_all allele frequency for the coded allele -- “NA” if not available
HWE_pval exact test Hardy-Weinberg equilibrium p-value -- only directly typed SNPs, NA for imputed
callrate genotyping callrate after exclusions
n_total total sample with phenotype and genotype for SNP
n_exposed (DICHOTOMOUS EXPOSURE ONLY) number in sample exposed to environmental variable of interest [in longitudinal data, estimated number of independent observations that are exposed]
imputed 1/0 coding; 1=imputed SNP, 0=if directly typed
used_for_imp 1/0 coding; 1=used for imputation, 0=not used for imputation
oevar_imp observed divided by expected variance for imputed allele dosage

Please note that a README should be uploaded with a very brief description of the data uploaded, the date, the NCBI human genome reference sequence used (e.g. NCBI 36.2) for strand reference, and the scale of the beta estimates; please also include in the README the SNP HWE p-value, callrate and minor allele frequency filters that have been applied.