|
Semester
Schedule: Statistics - Fall 2008
Seminars are on Mondays
Time:12:00 - 1:10 PM Location: Room 903, 1255 Amsterdam Avenue,
Tea and Coffee will be served before
the seminar at 11:30 AM, Room 1025
|
|
Dr. Shaw-Hwa Lo, Department of Statistics, Columbia University
"Discovering Influential Variables: A Method of Partitions"
We shall introduce a general computer intensive approach, based on a method proposed earlier by us for detecting which, of many potential explanatory variables, have an influence on a dependent variable Y. This approach is suited to detect influential variables, where causal effects depend on the confluence of values with other variables. It has the advantage of avoiding a difficult direct analysis involving possibly thousands of variables, by dealing with many randomly selected small subsets. The main objective is to discover the influential variables, rather than to measure their effects. Once they are detected, the problem of dealing with a much smaller group of influential variables should be vulnerable to appropriate analysis. In a sense, we are confining our attention to locating a few needles in a haystack. If time permits, we shall include a real application by using a variation of proposed methods to a case-control sporadic breast cancer study/data. Interactions of gene pairs associated with breast cancer are reported (PNAS, Aug. 2008).
|
|
Dr. Montse Fuentes, Statistics Department
North Carolina State University
"Neighborhood and Environmental Factors Associated with Physical Activity
During Pregnancy"
Physical activity has well-documented health benefits for cardiovascular
fitness and weight control. For pregnant women, the American College of
Obstetricians and Gynecologists currently recommends 30 minutes of
moderate exercise on most, if not all, days; however, very few pregnant
women achieve this level of activity. Epidemiologists, policy makers, and
city planners are interested in whether characteristics of
the physical environment in which women live and work have influence on
physical activity levels during pregnancy period. In this paper we study
the associations between physical activity and several factors including
personal characteristics, meteorological/air quality variables, and
neighborhood characteristics in pregnant women in four
counties of North Carolina. We simultaneously analyze six types of
physical activity and investigate cross-dependencies between these
activity types. Exploratory analysis suggests that the associations are
different in different regions.
Therefore we use a multivariate regression model with
spatially-varying regression coefficients. This model includes a
regression parameter for each covariate at each spatial location. For our
data with many predictors, some form of dimension reduction is clearly
needed. We introduce a Bayesian variable selection procedure to identify
subsets of important variables. Our stochastic search algorithm determines
the probabilities that each covariate's effect is null, non-null but
constant across space, and spatially-varying.
*Jointly with Brian Reich (NCSU) and Amy Herring (Biostatistics, UNC,
Chapel Hill)
|
|
Dr. Jingchen Liu, Department of Statistics, Columbia University
"Rare-event Simulation for Heavy-tailed Multi-server
Queue"
In this talk, I will present the first provably
efficient simulation algorithm, via state-dependent importance sampling, to
compute the probability that a customer experiences a long delay for a positive
recurrent two-server (G/G/2) queue with heavy-tailed service requirement. Such
a delay is usually caused by one or two customers (depending on the traffic
intensity) who have extremely large service requirement and occupy the servers
for a long time. We propose a three-step program to design the algorithm and
prove its efficiency. First, we adopt a mixture family of changes-of-measure; second,
propose an appropriate Lyapunov inequality to control the variance of our
estimator; third, construct a Lyapunov function (the solution to the Lyapunov
inequality) and tune various parameters to verify the inequality. Because of
the upper bound provided by the Lyapunov function, our method also suggests an
asymptotic approximation of the rare-event probability. Therefore, rare-event
simulation and large deviations analysis are connected naturally. Our strategy
including the mixture family, the construction of Lyapunov function, and proof
techniques can solve a large class of problems.
Joint work with Jose Blanchet and Peter Glynn
|
|
Dr. Anna Amirdjanova, University of Michigan
"Inference for stochastic evolution equations driven by Volterra processes"
Volterra processes represent a rich class of continuous Gaussian random
fields capable of modelling both short and long memory behaviors and
possessing varying degrees of roughness of paths. One interesting example
of a Volterra 1-parameter process is the celebrated fractional Brownian
motion (fBm) with Hurst index $H$, where $H$ is the parameter in (0,1)
that controls the self-similarity and memory structures of the process. In
the past decade fBm and its multiscale and multiparameter generalizations
have found applications in many diverse fields (most notably in
biomechanics, turbulence, finance and internet traffic modelling) and
fueled the on-going efforts of researchers to develop various forms of
fractional stochastic calculus. While so far fBm is the most well-known
representative of the class of Volterra processes, there are other
interesting examples of Volterra processes which are very useful for
applications. For example, there are some relatively unknown Volterra
processes that display longer (or shorter) range dependence behavior
than any fBm with any Hurst index.
The focus of this talk will be on the study of stochastic evolution
equations driven by general Volterra processes and development of
estimation theory for parameters of such equations. We will discuss in
some detail existence and uniqueness of solutions to such equations,
construct MLE theory for coefficients of these evolution equations and
establish a number of statistical properties of the resulting estimators.
|
Dr. Adam A. Szpiro, Department of Biostatistics, University of Washington
"Bayesian, frequentist, or both?Model-robust regression and the ‘sandwich’ estimator"
In
this talk we present a new Bayesian approach to model-robust linear regression
that leads to uncertainty estimates with the same robustness properties as the ’sandwich’
estimator.The ’sandwich’ estimator is
known to provide asymptotically correct frequentist inference, even when standard
modeling assumptions such as linearity and homoscedasticity in the
data-generating mechanism are violated.Our derivation provides a compelling Bayesian justification for using
this simple and popular tool, and it also clarifies what is being estimated
when the data-generating mechanism is not linear. We demonstrate the
applicability of our approach using a simulation study and health care cost
data from an evaluation of the Washington State Basic Health Plan.
|
|
Dr. Ingemar Nåsell, Royal Institute of Technology, Stockholm
"On Persistence of Endemic Infections"
Stochastic models in the form of Markov chains with absorbing states are studied. Persistence is measured by time to extinction. It shows qualitatively different behaviors in three different parameter regions. Two classical models for endemic infections are described, namely the univariate SIS model, and the bivariate SIR model accounting for demographic changes.Explicit solutions do not exist; all results are approximations. They hold for finite population sizes, in distinction to the deterministic case, where only population proportions are studied. The concept of quasi-stationarity is important for the analysis. Dimensional analysis and scaling are used to simplify the parameter space. An extended approximation of the continuity correction defined by Cox (1970) is derived. Mathematically satisfactory results are given for the SIS model in the form of an approximation of the expected time to extinction from quasi-stationarity that is uniform across the three parameter regions. The bivariate SIR model is harder to treat, and still presents open problems.
|
|
Dr. Hernando Ombao, Brown University
"Spectral Analysis of Brain Signals”
In many neuroscience experiments, one of the key goals is to investigate the oscillatory behavior of brain signals as quantified by spectral analysis. First, we review some basic ideas of Fourier analysis of stationary time series and highlight its connection to analysis of variance. Second, we give an overview of current models and methods for analyzing non-stationary processes (i.e., processes whose spectral decomposition change over time). Stochastic representations using localized basis functions will be discussed. The talk will conclude with some current investigations including discrimination and classification of biological signals. These methods will be illustrated using electroencephalogram (EEGs) and magnetoencephalogram (MEGs).
|
|
Dr. David Brillinger, Statistics Department, University of California, Berkeley
"Dynamic Indeterminism In Science"
Jerzy Neyman's life history and some of his contributions to applied statistics are reviewed in this talk. In a 1960 article Neyman wrote:
``Currently in the period of dynamic indeterminism in science, there is hardly a serious piece of research which, if treated realistically, does notinvolve operations on stochastic processes. The time has arrived for thetheory of stochastic processes to become an item of usual equipment of every applied statistician."
The emphasis in this talk is on stochastic processes and on stochasticprocess data analysis. A number of data sets and corresponding substantive questions are addressed. The data sets concern sardine depletion, blowflydynamics, weather modification, elk movement, and seal journeying. Three ofthe examples are from Neyman's work and four from the speaker's joint work with collaborators.
The preceding is the Abstract of an article that will appear in StatisticalScience shortly. That article is meant to introduce people, who don't know about it, to some of Neyman's work in applied statistics. Also some morerecent work of the speaker will be presented.
|
|
|
|
Dr. Davy Paindaveine, Université
Libre de Bruxelles
"Optimal Rank-Based Tests for Homogeneity of Scatter"
We propose a class of locally and asymptotically optimal tests, based
on multivariate ranks and signs, for the homogeneity of scatter
matrices in m elliptical populations. Contrary to the existing
parametric procedures, these tests remain valid without any moment
assumptions, and thus are perfectly robust against heavy-tailed
distributions (validity robustness). Nevertheless, they reach
semiparametric efficiency bounds at correctly specified densities
(efficiency robustness). They are also affine-invariant. We compute
local powers and asymptotic relative efficiencies of the proposed tests
with respect to the Schott (2001) pseudo-Gaussian test, which actually
is a robustified version of the traditional Gaussian likelihood ratio
test. As we show, the normal-score version of our tests outperforms
Schott's test in most cases.
(joint work with Marc Hallin)
|
|
|
|
Gang Zheng, Ph.D.,
Office of Biostatistics
Research, National Heart, Lung and
Blood Institute
"On robust tests for case-control genetic association
studies"
When testing association between a single marker and a
disease using case-control samples, the data can be presented in a 2x3 table.
Pearson’s Chi-square test (2 df) and
the trend test (1 df) are commonly
used. Usually one does not know which of them to choose. It depends on the
unknown genetic model underlying the data. So one could either choose the
maximum (MAX) of a family of trend tests over all possible genetic models (following
Davies, 1977; 1987) or take the smaller p-values (MIN2) of Pearson’s test and
the trend test (following Wellcome Trust Case-Control Consortium - WTCCC,
2007).
We first show that Pearson’s test, the trend test and
MAX are all trend tests with different types of scores: data-driven or
prespecified, restricted or not restricted. The results provide insight into the properties
that MAX is always more powerful than Pearson’s test when the genetic model is
restricted and that Pearson’s test is more robust when the model is not
restricted. Then, for the MIN2 of WTCCC (2007), we show that its asymptotic null
distribution can be derived, so the p-value of MIN2 can be obtained. Simulation
is used to compare some common test statistics. The results are applied to
WTCCC (2007). In particular, MIN2 is applied to the SNPs obtained by The SEARCH
Collaborative Group (NEJM, August
21, 2008) who used the minimum p-values to detect these SNPs in a
genome-wide association study, but also reported these minimum p-values as
p-values.
This is based on joint works with Jungnam Joo, Minjung
Kwak, and Yaning Yang.
|
|
|
| |
| |
 |
|