Informal Seminars Presented In Winter 2002

 

January 14, 2002

 

Title: 

Markov Chain Monte Carlo for Bayesian Inference

Speakers:

Sebastien Haneuse, Graduate Student, Biostatistics

Abstract: 

Often we hear of the use of Markov Chain Monte Carlo in statistical analyses. Unfortunately, these powerful inferential methods are not presented very often. In this talk I am going to present two of the most widely used Markov chain simulation methods - the Metropolis-Hastings Algorithm and the Gibbs sampler - and in particular their role in a Bayesian analysis.

Initially, I will go through the Bayesian paradigm - treating the unknown parameter as a random variable and then combining prior information and data to form a posterior distribution - and then go through the principle of the duality between a density and samples generated by that density. MCMC is a technique for simulating samples from posterior densities, even though the model may be very complex. I will also sketch a proof for the convergence of MCMC algorithms and go through a method for monitoring the convergence in practice.

 

Click here for a postscript copy of the slides for this talk

 

 


 

January 22, 2002

 

Title: 

Semiparametric Efficient and Useful Inefficient Estimation for the Auxiliary Outcome Problem with the Conditional Mean Model

Speakers:

Jinbo Chen, Graduate Student, Biostatistics

Abstract: 

The auxiliary outcome problem involves study of the regression relationship between an outcome Y and a set of covariates X, where Y is observed only for a subset of subjects but a correlate of Y, S, is observed for all the subjects. For example, a binary outcome may be subject to misclassification, but a validation subsample exists.  In this talk, I will present semiparametric efficient estimation and useful inefficient estimation for the auxiliary outcome problem when the relationship between Y and X is restricted by the conditional mean model.

The semiparametric efficient estimator we consider is a one-step estimator based on the efficient score function.  I will present an algebraic approach for efficient score calculation, which is based on our insight for semiparametric efficient estimation by connecting semiparametric efficient estimation theory and Godambe's optimal estimating function theory.  We consider two other useful inefficient estimators. We extend the robust imputation method of Yi-hau Chen (Biometrika, 2000) to the situation when the probability that an outcome is validated depends on the auxiliary outcome.  We also propose an estimator based on the conditional expectation of unbiased estimating functions, where we condition on the observed data.

Some simulation results for the finite sample performance of the estimators proposed will be presented.

 

 



 

January 28, 2002

 

Title: 

"My Dissertation"

Speakers:

Do Peterson, Graduate Student, Biostatistics

Abstract: 

I will be presenting a talk given through 13 original music recordings with lyrics and 13 corresponding supplemental slides detailing the distribution theory for the recurrence risk ratio that I developed as part of my dissertation work in statistical genetics.  This 45 minute work in progress entitled, "My Dissertation," is inspired by my interest in using multimedia and the arts to communicate deeper scientific material to more people.  My primary objective in this presentation is to get feedback, i.e. find out what works and what needs work in this presentation.  Please come if you are curious!

 

 




February 4, 2002

 

Title: 

Statistical Analysis of the Comet Assay Using a Mixture of Gamma Distributions

Speakers:

Bryan Shepherd, Graduate Student, Biostatistics

Abstract: 

The single cell gel electrophoresis (comet) assay is an increasingly popular method for detecting and comparing nuclear DNA damage and repair. In this technique, a measurement called the "tail moment" quantifies DNA damage for an individual cell.  The distribution of tail moments among a group of cells on a slide (experimental unit) often follows a skewed bimodal distribution, perhaps because cells are at different stages of the cell cycle when exposed to treatments.  To better examine DNA damage, the distribution of tail moments on a slide were modeled using a mixture of two gamma distributions.  Maximum likelihood, modified to accommodate left censored data, can be used to estimate the 5 parameters of the gamma mixture distribution for each slide.  A weighted analysis of variance on the parameter estimates for the gamma mixtures can be performed to determine differences in DNA damage between treatments.  These methods were applied to an experiment on the effect of thymadine kinase in DNA damage and repair.  Analysis based on a mixture of gamma distributions was found to be more statistically valid, more powerful, and more scientifically informative than an analysis on the log-transformed tail moments.

 

 




February 11, 2002

 

Title: 

CARTscans: A tool for Visualizing Tree-Based Models

Speakers:

Martha Nason, Graduate Student, Biostatistics

Abstract: 

Tree-Based Models, including Classification and regression trees (CART), provide a useful alternative to more standard regression techniques using linear predictors. These models are especially useful for forming diagnostic or prognostic rules.  However, the predictive models obtained from trees typically involve complex, high order interactions between the modeled covariates, and it is therefore difficult to visualize the results of a tree model in a succinct manner. In particular, reports of the fitted tree may obscure similarity of response distribution among regions that are actually adjacent in Cartesian space. We present CARTscans, a graphical tool providing a view of the structure of a tree model. Predicted values are displayed across a four dimensional subspace of the covariates with smoothing of effects due to other covariates. Using these graphs, a user is able to take advantage of the flexibility of tree-based models to find complex interactions and pick out interesting regions while still being able to visualize main effects.

 

 





February 25, 2002

 

Title: 

Bias-Reduced Variance Estimators for GEE?

Speakers:

Kristian Lynch, Graduate Student, Biostatistics

Abstract: 

Generalized Estimating Equations methodology (GEE) provides an alternative regression tool for analyzing correlated response data. However, several studies  have  shown  the 'sandwich' empirical covariance estimator (the estimated covariance matrix of the regression coefficients in GEE) to be  bias in certain settings.  For example, in small samples with binary  response data, the empirical covariance estimator tends to underestimate the variance  of regression coefficients and thus provide liberal confidence intervals.

In this talk, I will first review  previous studies on the performance of GEE for continuous and binary response data. I will next mention alternative covariance estimators for GEE and compare their performance to the sandwich estimator,  and finally I will present a  small simulation study which examines the covariance estimators on clustered count data. The simulations include overdispersion, various covariate patterns, balanced or unbalanced cluster sizes and other misspecifications. The talk is part of an ongoing  Master's Thesis with Dr Lumley and should be suitable for all levels.

 

 





March 4, 2002

 

Title: 

Short and Long Range Serial Correlation in a Panel Study with Binary Response Data: Detection and Impact on Regression Analysis

Speakers:

Jon Schildcrout, Graduate Student, Biostatistics

Abstract: 

Recent studies in the field of pediatric asthma have collected daily exposure and outcome data on cohorts of children with the aim to estimate the association between ambient pollution levels and daily indicators of symptoms.  Analysis using regression methods such as GEE requires specification of a correlation structure with common choices for short longitudinal series admitting temporal autoregressive and/or exchangeable dependence structures. I will discuss a useful method to estimate the dependence structure for longitudinal, categorical data, and the impact dependence misspecification has on efficiency of estimates. Specifically, I evaluate three types of misspecification, or working correlations--independence, exchangeable, and autoregressive—under several true autoregressive/exchangeable data generating structures when the predictor of interest is a time-varying covariate. Finally, I will discuss an analysis of a panel of 134 subjects participating in the Child Asthma Management Program (CAMP) during the pre- randomization phase of the study.

 

 





March 11, 2002

 

Title: 

Estimating Lifetime Medical Costs Using a Joint Frailty Model of Survival Time and Cost as a Mark Variable

Speakers:

Kristin Berry, Graduate Student, Biostatistics

Abstract: 

The analysis of lifetime medical costs with censored data presents statistical challenges. The assumption of independent censoring may be valid on the time scale, but is not reasonable on the cost scale. The censoring pattern on the cost scale is typically induced to be dependent. Of more concern is the fact that the cost distribution is potentially nowhere identifiable in a nonparametric setting owing to the censoring. Methods to date have avoided this problem by arbitrarily estimating costs only up to the time of the final failure. We propose a semi-parametric joint gamma frailty model for costs and survival. This model assumes a common frailty for an individual's costs and survival time. We will develop maximum likelihood estimates (MLE) for baseline hazards and the gamma frailty parameter using the E-M algorithm. These MLE estimates can be combined to obtain the marginal cost distribution and mean. We will discuss the existence and consistency of these estimates. We will also present results of these methods as applied to both simulated and real data.

 

 





 



Last Modification: 04 March 2002