Informal Seminars Presented In Spring 2003

April 7, 2003
Title:Meet and Greet
Speaker:Cong Han and Elizabeth Brown, Biostatistics Faculty
Abstract: Cong and Elizabeth recently joined our department's faculty. The purpose of this seminar is to meet them and to get a brief idea of the research they are working on. Cong received his Ph.D. in Biostatistics in 2002 from the University of Minnesota. He will describe some optimal experimental designs for a model where the response is a mixture of a normal random variable and a point mass, and can be modeled using a logit model combined with a linear model on the normal response. Elizabeth received her Sc.D. in Biostatistics from Harvard University in 2002 after completing her dissertation, "Bayesian Methods for Jointly Modeling Longitudinal and Survival Data".

April 14, 2003
Title:Gene Mapping on a Pedigree of Inbred Lines
Speaker:Amy D. Anderson, doctoral student Statistics Department, UW
Abstract: Methods have been developed for gene mapping using pedigrees of related individuals. In this talk, I will describe a method for modifying existing techniques for use on a different sort of pedigree -- a pedigree in which data is collected on related inbred lines instead of related individuals (alternatively, you could think of it as a pedigree of related individuals in which the data on most of the individuals is missing). In the process, I will speak a bit about inbred lines -- what they are and why they are used.

April 21, 2003
Title:Estimating the effects of time-varying treatments using G-estimation
Speaker:Mary Redman, doctoral student Biostatistics Department, UW
Abstract: Robins and colleagues have developed two main models for estimating the effects of time-varying treatments, called structural nested models and marginal structural models. The associated approaches to estimating the parameters in these models are called G-estimation and Inverse Probability of Treatment weighted estimation. In this talk I will first present an introduction to the causal modeling framework for time-varying treatments. Then, I will discuss instrumental variables estimation and how G-estimation is a special case of it.

April 28, 2003
Title:Local Motif Discovery Using Reversible Jump Markov Chain Monte Carlo
Speaker:Sierra Li, doctoral student Biostatistics Department, UW
Abstract: Genome projects are generating large data sets of genomic sequence data to answer the two essential questions: what is the function of each gene and when are genes expressed. To answer the second question, biologists are trying to identify the regulatory regions for gene expression, which are usually short patterns/motifs located at the upstream of the gene. Many methods have been applied for motif finding, such as Smith-Waterman algorithm in computational biology, and EM algorithm and hidden Markov model in statistics. I will briefly introduce the background of motif finding based on the site-specific probabilistic model and then focus on our model using Bayesian inference, which is flexible in modeling the existence of the motif in each sequence. Unlike most existing models, it does not require specifying the motif width. Posterior distribution of the motif width will be estimated; posterior distributions of the motif starting positions and the motif pattern are conditional on the motif width. The dimension of the parameter space varies when motif existence status and/or motif width change(s). Reversible Jump Markov Chain (RJMC) is used to allow jumps between parameter spaces of different dimensions. Ergodic theory ensures convergence of the Markov chain and the equilibrium is the posterior distribution of the parameters.

May 5, 2003
Title:Causal inference in HIV vaccine trials: comparing biomarkers measured only in a subgroup chosen post-randomization
Speaker:Bryan Shepherd, graduate student Biostatistics Department, UW
Abstract: In many trials, researchers want to perform treatment comparisons in subgroups selected after randomization. For example, in vaccine efficacy trials, it may be of interest to compare viral load between vaccine and placebo recipients who become infected with HIV during the trial. To account for potential selection bias we propose a sensitivity analysis following the principal stratification framework set forth by Frangakis and Rubin (2002). The average causal effect of treatment assignment at a given covariate level can be estimated in the always infected stratum (those individuals who would have been infected whether they had been assigned to vaccine or to placebo). Assignment to the always infected stratum is unknown, but can be modeled conditional on randomization arm, infection status, covariates, the observed viral load, and a specified sensitivity parameter. The potential viral load as a function of covariates and given treatment assignment is also modeled. Under the assumption that being randomized to the vaccine arm does not increase the risk of infection, the EM algorithm is applied and maximum likelihood estimates can be obtained.

May 12, 2003
Title:Joint Modeling of Spatially-Arranged Outcomes
Speaker:Alan Dabney, graduate student Biostatistics Department, UW
Abstract: Spatial structure in the risks for non-infectious diseases is likely due to unmeasured risk factors which themselves have spatial structure. If risk patterns in space can be identified, these can be compared to maps of various exposures, perhaps generating new risk-exposure hypotheses. These spatially-arranged, unmeasured, risk factors may also be common to two (or more) diseases. In such cases, spatial trends that are common to both diseases would perhaps provide greater evidence of clustering than trends observed for either disease alone. Knorr-Held and Best (2001) proposed a "shared-component" approach to jointly modeling two disease risks. Their research focused on disease "risks" as characterized by standardized mortality ratios (SMR's). However, the approach could be generalized to other applications (e.g. spatial survival data). I will review background material for this topic (Bayesian disease mapping). I will then discuss the shared-component model and explore some of its characteristics with simulation.

May 19, 2003
Title:Source Apportionment and Health Effect Model
Speaker:Hao Liu, graduate student Biostatistics Department, UW
Abstract: Would the fine smoke particles from wood burning at home hurt you? Epidemiological studies are seeking the evidence for the health effect of particulate matter emitted from various sources and they want to ascertain the exposures. Source apportionment models are used to estimate the PM mass concentrations from various sources. The estimated exposures suffer from complicated errors structures, and when used in the health effect model directly, they induce bias. In this RA work with Dr. Thomas Lumley, we performed computer simulations to study the extent of the bias. Alternative method based on bias correction techniques in measurement error model is proposed and tested by computer simulation. Potential statistical challenges are disclosed and discussed.

July 16, 2003
Title:Ecological studies using supplemental case-control data
Speaker:Sebastien Haneuse, graduate student Biostatistics Department, UW
Abstract: Ecological studies examine outcome-exposure associations at the group level, rather than at the individual level. Such groups are often defined on the basis of geography. The continued use of ecological studies stems, in part, from the routine availability of group level data (for example the census), and large between area exposure contrasts (which can lead to increased power). Using ecological studies to draw conclusions regarding individuals is, in general, very difficult. Issues that require consideration include specification bias (where the same form of the model is assumed at the individual and group level), confounding at both the group and individual level and, perhaps most seriously, contextual effects.

To overcome these issues either: (a) substantive prior information must be incorporated, (b) assumptions regarding the model must be made, or (c) individual level data must be collected. Although many 'solutions' to the problem have been put forward, without the use of individual level data it is, in general, not possible to assess the validity of any assumptions made. Thus, the only 'solution' is to collect data on individuals. Previous approaches include aggregate data studies (Prentice and Sheppard, 1995), where a group level model is derived by aggregating an individual level model. Using survey data regarding within group exposure/confounder distributions one can recover parameters that correspond to individual level associations.

We consider a different approach which has similarities to two-phase designs. Our approach is based on having ecological data available across a series of areas and then collecting individual level data via a case-control scheme. In this talk I will outline the new scheme, and derive the likelihood, for a simplified scenario of a dichotomous exposure and outcome. I will also discuss various computational schemes, as well as future avenues for research, such as study design issues and incorporating spatial effects.


July 23, 2003
Title:Weighted Estimators for Proportional Hazards Model with Missing Covariates
Speaker:Lihong Qi, doctoral student Biostatistics Department, UW
Abstract: Missing covariate data are common in epidemiologic studies and disease prevention trials. When certain covariates are observed for all subjects and other covariate data of interest are collected only for a subset, complications arise in applying the Cox regression model to analysis of survival data. Inconsistent and inefficient estimates can be generated by naively discarding subjects with missing covariate data. This talk presents methods that use partially incomplete data nonparametrically in simple and fully augmented weighted estimating equations. Under the assumption of ``missing at random'', we employ nonparametric methods to estimate selection probabilities in a simple weighted estimating equation. We also propose to use nonparametric kernel smoothing techniques to estimate the conditional expectations in fully augmented weighted estimating equations. The kernel-assisted method estimates directly the parameter of interest without restrictions on the association between missing covariates and observed covariates. We show that the resulting simple and the kernel-assisted fully augmented weighted estimators are consistent and asymptotically normal. When covariates are time-independent, certain simple weighted estimators are asymptotically equivalent to the kernel-assisted fully augmented estimators. Compared to the simple weighted estimator with true selection probabilities, important efficiency gains can be achieved by the proposed methods. They also correct the bias of estimates from complete-data analyses. The proposed methods can be used when missing covariates occur by happenstance or by design, e.g., various cohort sampling procedures, including case-cohort and nested case-control designs.