Informal Seminars Presented In Spring 2004

April 7, 2004
Title:Nonparametric Confidence Intervals for the One and Two-sample Highly Skewed Data
Speaker:Phil Dinh, graduate student UW Biostatistics
Abstract: Confidence intervals for the mean of one sample and the difference in means of two independent samples based on the ordinary-t statistic suffer deficiencies when the samples come from highly skewed families and the sample sizes are small to moderate. In this talk, I will present simulation study evaluating several existing methods and propose new methods to improve coverage accuracy. The methods examined include the ordinary-t, the bootstrap-t, the biased-corrected acceleration (BCa), and three new intervals based on transformation of the t-statistic. Our study shows that the new transformation intervals and the bootstrap-t intervals give best coverage accuracy for a variety of skewed distributions; and that our new transformation intervals have shorter interval lengths.


April 14, 2004
Title:Estimating Causal Effect Using Observational Data: The Intensity-score Approach to Adjusting for Confounding
Speaker:Mary Redman, graduate student UW Biostatistics
Abstract: Observational studies are characterized by non-random allocation of treatments to study participants. Measures of the impact of a treatment on an outcome can be subject to confounding by both measured and unmeasured factors. Recently, Brumback, Greenland, Redman, et. al. introduced the intensity-score approach to adjusting for confounding to estimate the effects of both point-source and time-varying treatments. The intensity score is a contrast between treatment received and expected treatment conditional on confounders. This approach relies on proper specification of a model for the expected treatment as a function of measured confounders and specification of form of treatment effect (i.e. presence or lack of effect-measure modification by measured confounders) to obtain unbiased parameter estimates.

In general, when deriving the intensity score, the true expected treatment is unknown and an estimate needs to be used in its place. In this talk I will give a brief overview of the intensity-score approaches and discuss the impact of using estimates of the expected treatment on the variance of the treatment effect parameter.


April 21, 2004
Title:The reversed Berk-Jones statistic
Speaker:Leah Jager, graduate student UW Statistics
Abstract: In nonparametric testing problems, we often use statistics based on the empirical distribution function to test whether or not the underlying distribution of the data is what we think it might be. One classical example is the Kolmogorov-Smirnov statistic. Berk and Jones introduced another such statistic which has certain optimality advantages over Kolmogorov-Smirnov. Here we'll look at a closely related statistic (the reversed Berk-Jones statistic) and the motivation behind it, as well as some finite sample characteristics and optimality properties. Consider yourself warned...


April 28, 2004
Title:Image Analysis and Signal Extraction from cDNA Microarrays
Speaker:Tracy Bergemann, graduate student UW Biostatistics
Abstract: Open discussions about microarray technology invariably lead to concerns about data quality and objectivity in assessment. The focus of my dissertation has been to address some of these concerns in a simple and straightforward manner. The first half of the talk will discuss image analysis techniques for robust automated detection of microarray spots. These techniques are packaged into a MATLAB application called SignalViewer that is freely available at http://qge.fhcrc.org/signalviewer.

The second half of the talk will cover methods for describing spot quality measures. The goal here was to develop a metric that describes within-spot variability while accounting for spatial correlation of image pixels. This will include a discussion of estimating equations, weighted estimating equations and prediction error. Methods are evaluated on real and simulated data.


May 5, 2004
Title:Competing Risk Current Status Data
Speaker:Marloes Maathuis, graduate student UW Statistics
Abstract: We study competing risk data subject to current status censoring. I will discuss example data sets from cross sectional studies in which this type of data arises. We are primarily interested in nonparametric estimation of the subdistribution functions, i.e. the cumulative probabilities of a certain failure type.

Our main focus is on the nonparametric maximum likelihood estimator of the subdistribution functions. However, for comparison, we also look at a very simple 'naive estimator'. I will talk about properties of both estimators, such as computational aspects, graph theoretic interpretations, uniqueness, consistency, rate of convergence, and (first steps towards) a limiting distribution.


May 12, 2004
Title:Regression Analysis of Longitudinal Data with Subject-specific sampling Times
Speaker:Patra Miksova, graduate student UW Biostatistics
Abstract: Abstract available at: http://students.washington.edu/~miksova/research.pdf


May 19, 2004
Title:Surveillance of Geographical Cancer Incidence
Speaker:Alan Dabney, graduate student UW Biostatistics
Abstract: Cancer surveillance involves the systematic examination of incidence rates over a predefined time period and collection of geographical regions for localized increases in risk. State departments of health receive frequent calls from citizens concerned with perceived clusters of cancer. The vast majority of such alarms turn out to be unfounded. A surveillance method, in fact, is unlikely to turn up many substantive clusters. However, in light of the fact that incidence data for all cancers is collected periodically at local registries (CSS at the Hutch, for example), it is perhaps a duty of those registries to make an official effort. I will motivate the use of surveillance with maps of bladder and lung cancer incidence in Washington census tracts. I will then discuss existing surveillance methods and propose the use of a Bayesian hierarchical model.


May 26, 2004
Speaker:Bryan Comstock, graduate student UW Biostatistics
Abstract: Biomarkers have become an increasingly used tool in both the diagnosis and monitoring of various diseases. Increased levels of serum carbohydrate antigen 19-9 (CA19-9) have long been known to be associated with pancreatic cancer. In this commonly fatal form of cancer, CA19-9 levels decrease immediately following initial treatment but are then expected to rise again in the following months with the almost sure resurgence of the tumor. Previous studies have primarily focused on determining whether or not CA19-9 is a worthwhile screening tool for pancreatic and other forms of cancer. While several studies have also examined the prognostic ability of baseline CA19-9 on the time of survival, the focus of our methods and analyses here is to create a longitudinal model for serial CA19-9 measurements. By doing so, our goal is to take a first step towards examining the prognostic value of having longitudinal CA19-9 data on patient survival. Using serial post-surgical CA19-9 data on 262 pancreatic cancer patients, we use a reversible jump MCMC algorithm to average between longitudinal CA19-9 models in a Bayesian framework. Before we proceed with a fully Bayesian joint model for CA19-9 and time of survival, the resulting longitudinal model is used as the basis for a time-varying covariate in a Cox regression model to assess its potential predictive value.


June 6, 2004
Title:Estimation when the outcome of interest is subject to misclassification
Speaker:Pamela Shaw, graduate student UW Biostatistics
Abstract: In many settings, presence or absence of disease is measured with an imperfect test. These misclassified outcomes can lead to biased estimates of covariate effects and survival time. This talk will examine these issues and methods to address them for binary outcomes and time to event data. For logistic regression, the EM algorithm can be applied to obtain unbiased estimates of the parameters. For time to event data, the different computational issues arise with discrete and continuous time. For discrete time, methods for both the product limit survival estimate and covariate effects for the proportional hazards model have been developed. Existing approaches and some open questions for continuous time data will be presented. Particular attention will be given to the situation of interval censored data; these data commonly arise in clinical settings where repeat testing is performed for the detection of asymptomatic disease.


June 6, 2004
Title:Estimation when the outcome of interest is subject to misclassification
Speaker:Pamela Shaw, graduate student UW Biostatistics
Abstract: In many settings, presence or absence of disease is measured with an imperfect test. These misclassified outcomes can lead to biased estimates of covariate effects and survival time. This talk will examine these issues and methods to address them for binary outcomes and time to event data. For logistic regression, the EM algorithm can be applied to obtain unbiased estimates of the parameters. For time to event data, the different computational issues arise with discrete and continuous time. For discrete time, methods for both the product limit survival estimate and covariate effects for the proportional hazards model have been developed. Existing approaches and some open questions for continuous time data will be presented. Particular attention will be given to the situation of interval censored data; these data commonly arise in clinical settings where repeat testing is performed for the detection of asymptomatic disease.