Semester Schedule: Statistics - Fall 2008

Seminars are on Mondays
Time:12:00 - 1:10 PM Location: Room 903, 1255 Amsterdam Avenue, Tea and Coffee will be served before the seminar at 11:30 AM, Room 1025

September 8

Dr. Shaw-Hwa Lo, Department of Statistics, Columbia University

"Discovering Influential Variables: A Method of Partitions"

We shall introduce a general computer intensive approach, based on a method proposed earlier by us for detecting which, of many potential explanatory variables, have an influence on a dependent variable Y. This approach is suited to detect influential variables, where causal effects depend on the confluence of values with other variables. It has the advantage of avoiding a difficult direct analysis involving possibly thousands of variables, by dealing with many randomly selected small subsets. The main objective is to discover the influential variables, rather than to measure their effects. Once they are detected, the problem of dealing with a much smaller group of influential variables should be vulnerable to appropriate analysis. In a sense, we are confining our attention to locating a few needles in a haystack. If time permits, we shall include a real application by using a variation of proposed methods to a case-control sporadic breast cancer study/data. Interactions of gene pairs associated with breast cancer are reported (PNAS, Aug. 2008).

September 15

Dr. Montse Fuentes, Statistics Department
North Carolina State University

"Neighborhood and Environmental Factors Associated with Physical Activity During Pregnancy"

Physical activity has well-documented health benefits for cardiovascular fitness and weight control. For pregnant women, the American College of Obstetricians and Gynecologists currently recommends 30 minutes of moderate exercise on most, if not all, days; however, very few pregnant women achieve this level of activity. Epidemiologists, policy makers, and city planners are interested in whether characteristics of the physical environment in which women live and work have influence on physical activity levels during pregnancy period. In this paper we study the associations between physical activity and several factors including personal characteristics, meteorological/air quality variables, and neighborhood characteristics in pregnant women in four counties of North Carolina. We simultaneously analyze six types of physical activity and investigate cross-dependencies between these activity types. Exploratory analysis suggests that the associations are different in different regions. Therefore we use a multivariate regression model with
spatially-varying regression coefficients. This model includes a regression parameter for each covariate at each spatial location. For our data with many predictors, some form of dimension reduction is clearly needed. We introduce a Bayesian variable selection procedure to identify subsets of important variables. Our stochastic search algorithm determines the probabilities that each covariate's effect is null, non-null but constant across space, and spatially-varying.

*Jointly with Brian Reich (NCSU) and Amy Herring (Biostatistics, UNC, Chapel Hill)

 

September 22

 

Dr. Jingchen Liu, Department of Statistics, Columbia University

"Rare-event Simulation for Heavy-tailed Multi-server Queue"

In this talk, I will present the first provably efficient simulation algorithm, via state-dependent importance sampling, to compute the probability that a customer experiences a long delay for a positive recurrent two-server (G/G/2) queue with heavy-tailed service requirement. Such a delay is usually caused by one or two customers (depending on the traffic intensity) who have extremely large service requirement and occupy the servers for a long time. We propose a three-step program to design the algorithm and prove its efficiency. First, we adopt a mixture family of changes-of-measure; second, propose an appropriate Lyapunov inequality to control the variance of our estimator; third, construct a Lyapunov function (the solution to the Lyapunov inequality) and tune various parameters to verify the inequality. Because of the upper bound provided by the Lyapunov function, our method also suggests an asymptotic approximation of the rare-event probability. Therefore, rare-event simulation and large deviations analysis are connected naturally. Our strategy including the mixture family, the construction of Lyapunov function, and proof techniques can solve a large class of problems.

 

 

Joint work with Jose Blanchet and Peter Glynn

 

September 29

Dr. Anna Amirdjanova, University of Michigan

"Inference for stochastic evolution equations driven by Volterra processes"

Volterra processes represent a rich class of continuous Gaussian random fields capable of modelling both short and long memory behaviors and possessing varying degrees of roughness of paths. One interesting example of a Volterra 1-parameter process is the celebrated fractional Brownian motion (fBm) with Hurst index $H$, where $H$ is the parameter in (0,1) that controls the self-similarity and memory structures of the process. In the past decade fBm and its multiscale and multiparameter generalizations have found applications in many diverse fields (most notably in biomechanics, turbulence, finance and internet traffic modelling) and fueled the on-going efforts of researchers to develop various forms of fractional stochastic calculus. While so far fBm is the most well-known representative of the class of Volterra processes, there are other interesting examples of Volterra processes which are very useful for applications. For example, there are some relatively unknown Volterra processes that display longer (or shorter) range dependence behavior than any fBm with any Hurst index.


The focus of this talk will be on the study of stochastic evolution equations driven by general Volterra processes and development of estimation theory for parameters of such equations. We will discuss in some detail existence and uniqueness of solutions to such equations, construct MLE theory for coefficients of these evolution equations and establish a number of statistical properties of the resulting estimators.

October 6


Dr. Adam A. Szpiro, Department of Biostatistics, University of Washington

"Bayesian, frequentist, or both?Model-robust regression and the ‘sandwich’ estimator"

 

In this talk we present a new Bayesian approach to model-robust linear regression that leads to uncertainty estimates with the same robustness properties as the ’sandwich’ estimator.The ’sandwich’ estimator is known to provide asymptotically correct frequentist inference, even when standard modeling assumptions such as linearity and homoscedasticity in the data-generating mechanism are violated.Our derivation provides a compelling Bayesian justification for using this simple and popular tool, and it also clarifies what is being estimated when the data-generating mechanism is not linear. We demonstrate the applicability of our approach using a simulation study and health care cost data from an evaluation of the Washington State Basic Health Plan.

 

October 13

Dr. Ingemar Nåsell, Royal Institute of Technology, Stockholm

"On Persistence of Endemic Infections"

Stochastic models in the form of Markov chains with absorbing states are studied. Persistence is measured by time to extinction. It shows qualitatively different behaviors in three different parameter regions. Two classical models for endemic infections are described, namely the univariate SIS model, and the bivariate SIR model accounting for demographic changes.Explicit solutions do not exist; all results are approximations. They hold for finite population sizes, in distinction to the deterministic case, where only population proportions are studied. The concept of quasi-stationarity is important for the analysis. Dimensional analysis and scaling are used to simplify the parameter space. An extended approximation of the continuity correction defined by Cox (1970) is derived. Mathematically satisfactory results are given for the SIS model in the form of an approximation of the expected time to extinction from quasi-stationarity that is uniform across the three parameter regions. The bivariate SIR model is harder to treat, and still presents open problems.

 

October 20

Dr. Hernando Ombao, Brown University

"Spectral Analysis of Brain Signals”

In many neuroscience experiments, one of the key goals is to investigate the oscillatory behavior of brain signals as quantified by spectral analysis. First, we review some basic ideas of Fourier analysis of stationary time series and highlight its connection to analysis of variance. Second, we give an overview of current models and methods for analyzing non-stationary processes (i.e., processes whose spectral decomposition change over time). Stochastic representations using localized basis functions will be discussed. The talk will conclude with some current investigations including discrimination and classification of biological signals. These methods will be illustrated using electroencephalogram (EEGs) and magnetoencephalogram (MEGs).
 

October 27

 

Dr. David Brillinger, Statistics Department, University of California, Berkeley

"Dynamic Indeterminism In Science"

Jerzy Neyman's life history and some of his contributions to applied statistics are reviewed in this talk. In a 1960 article Neyman wrote:

``Currently in the period of dynamic indeterminism in science, there is hardly a serious piece of research which, if treated realistically, does notinvolve operations on stochastic processes. The time has arrived for thetheory of stochastic processes to become an item of usual equipment of every applied statistician."

The emphasis in this talk is on stochastic processes and on stochasticprocess data analysis. A number of data sets and corresponding substantive questions are addressed. The data sets concern sardine depletion, blowflydynamics, weather modification, elk movement, and seal journeying. Three ofthe examples are from Neyman's work and four from the speaker's joint work with collaborators.

The preceding is the Abstract of an article that will appear in StatisticalScience shortly. That article is meant to introduce people, who don't know about it, to some of Neyman's work in applied statistics. Also some morerecent work of the speaker will be presented.

 

November 3

 

November 10

Dr. Davy Paindaveine, Université Libre de Bruxelles

"Optimal Rank-Based Tests for Homogeneity of Scatter"

We propose a class of locally and asymptotically optimal tests, based on multivariate ranks and signs, for the homogeneity of scatter matrices in m elliptical populations. Contrary to the existing parametric procedures, these tests remain valid without any moment assumptions, and thus are perfectly robust against heavy-tailed distributions (validity robustness). Nevertheless, they reach semiparametric efficiency bounds at correctly specified densities (efficiency robustness). They are also affine-invariant. We compute local powers and asymptotic relative efficiencies of the proposed tests with respect to the Schott (2001) pseudo-Gaussian test, which actually is a robustified version of the traditional Gaussian likelihood ratio test. As we show, the normal-score version of our tests outperforms Schott's test in most cases.

(joint work with Marc Hallin)

 

November 17


November 24


 

December 1

 

 

Gang Zheng, Ph.D.,

Office of Biostatistics Research, National Heart, Lung and Blood Institute

"On robust tests for case-control genetic association studies"

 

 

When testing association between a single marker and a disease using case-control samples, the data can be presented in a 2x3 table. Pearson’s Chi-square test (2 df) and the trend test (1 df) are commonly used. Usually one does not know which of them to choose. It depends on the unknown genetic model underlying the data. So one could either choose the maximum (MAX) of a family of trend tests over all possible genetic models (following Davies, 1977; 1987) or take the smaller p-values (MIN2) of Pearson’s test and the trend test (following Wellcome Trust Case-Control Consortium - WTCCC, 2007).

We first show that Pearson’s test, the trend test and MAX are all trend tests with different types of scores: data-driven or prespecified, restricted or not restricted. The results provide insight into the properties that MAX is always more powerful than Pearson’s test when the genetic model is restricted and that Pearson’s test is more robust when the model is not restricted. Then, for the MIN2 of WTCCC (2007), we show that its asymptotic null distribution can be derived, so the p-value of MIN2 can be obtained. Simulation is used to compare some common test statistics. The results are applied to WTCCC (2007). In particular, MIN2 is applied to the SNPs obtained by The SEARCH Collaborative Group (NEJM, August 21, 2008) who used the minimum p-values to detect these SNPs in a genome-wide association study, but also reported these minimum p-values as p-values.

This is based on joint works with Jungnam Joo, Minjung Kwak, and Yaning Yang.

 

 

December 8

 



 

 

 

 
Close Window