|
Semester
Schedule: Statistics - Fall 2009
Seminars are on Mondays
Time:12:00 - 1:30 PM Location: Room 903, 1255 Amsterdam Avenue,
Tea and Coffee will be served before
the seminar at 11:30 AM, Room 1025
|
|
Dr. Annie Qu, University of Illinois at Urbana-Champaign
"Model selection of correlation structures for clustered data"
Model selection for correlation structure is a challenging problem since it involves a higher order of moments. However, the correct specification of correlation matrix plays an important role for improving the estimation efficiency in clustered data. In addition, the high dimensional parameters involved in the correlation matrix could make the parameter estimation unreliable. We intend to capture the correlation information from the clustered data based on a pool of candidate structures. It is computationally efficient, and is not restricted by the large cluster size. In theory, we show that the proposed method selects the true correlation structure consistently and the estimator associated with the true correlation structure is asymptotically normal.
This is joint work of Jianhui Zhou.
|
|
Dr. Javier Rojo, Rice University
"Testing for long tailed distributions"
After presenting a review of some concepts of tail-ordering, tail-heaviness, and tail categorization of probability distributions, methodology for testing for medium-tailed distributions against either small- or long-tailed distributions is presented and its operating characteristics examined.
|
Dr. Ji Zhu, University of Michigan
"Partial Correlation Estimation by Joint Sparse Regression Models"
In this talk, we propose a computationally efficient approach for selecting non-zero partial correlations under the high-dimension-low-sample-size setting. This method assumes the overall sparsity of the partial correlation matrix and employs sparse regression techniques for model fitting. We illustrate the performance of our method by extensive simulation studies. It is shown that our method performs well in both non-zero partial correlation selection and the identification of hub variables, and also outperforms two existing methods. We then apply our method to a microarray breast cancer data set and identify a set of "hub genes" which may provide important insights on genetic regulatory networks. Finally, we prove that, under a set of suitable assumptions, the proposed procedure is asymptotically consistent in terms of model selection and parameter estimation.
This is joint work with Jie Peng, Pei Wang and Nengfeng Zhou.
|
Dr. Marc Hallin (Joint seminar with Economics) ECARES
|
|
Dr. Wolfgang Jank, Robert H Smith School of Business, University of Maryland
"Forecasting Innovation Success via Shapes of Prediction Markets"
We propose a novel model for forecasting innovation success based on prediction markets. Prediction markets are market-like mechanisms that efficiently collect knowledge from a large number of participants for the sole purpose of making better forecasts. Prediction markets have been successfully used to forecast events ranging from presidential elections to homeland security and they are being applied by major corporations such as HP and Google for internal forecasting. In this talk, we start by introducing and discussing prediction markets and the data associated with it.
Our study proposes an innovative approach for forecasting demand for innovations using prediction market data. In particular, we forecast the release weekend box office performance of Hollywood movies which serves as an important planning tool for allocating marketing resources, determining optimal release timing and advertising strategies. Our approach is based on ideas from functional data analysis and extracts shapes from the trading histories of prediction markets. We show that our shape model not only adds value compared to traditional forecasting models (such as those that only use the most recent trading value), it also allows us to make forecasts early (i.e. long before the movie is released) and it also enables us to incorporate newly arriving information "on the fly." In particular, our approach is especially well-suited to capture information that changes dynamically, such as movie buzz or hype originating from word-of-mouth and other sources that are not easily controlled by the marketer.
*jointly with Natasha Foutz, McIntire School of Commerce, University of Virginia
|
|
Dr. David Mason, University of Delaware
"On Proving Consistency of Non-standard kernel estimators"
I shall discuss a general method based on empirical process techniques to prove uniform in bandwidth consistency of a class of non-standard kernel-type function estimators. Examplesinclude biased corrected kernel density and Nadaraya-Watson function estimators, projectionpursuit regression and conditional distributioin estimation and kernel estimation of the density of linear regression residuals. Our results are useful to establish uniform consistency of datadrivenbandwidth kernel-type function estimators. My talk will be based upon joint workcompleted and in progress with Julia Dony, Uwe Einmahl and Jan Swanepoel.
|
*Friday, October 23, 2009 *2:30PM *Room 903 SSWDr. Philip Protter, Cornell University
"Questions on filtration shrinkage and illusory arbitrage"
Abstract: The theory of the expansion of filtrations dates from the 1980s, but it is not well known today, although recently it has undergone a mild revival due to applications to insider trading and in credit risk theory. A closely related flip side to the expansion of filtrations is the shrinkage of filtrations. We will discuss some results from the late 1970s and connect them to recent results, and also show how it impacts some areas of mathematical finance theory. This talk is based on current work joint with Hans Föllmer of Berlin, as well as current work joint with Robert Jarrow of Cornell.
|
|
Dr. David Blei, Princeton University
"Supervised and relational topic models"
Abstract:
A surge of recent research in machine learning and statistics has developed new techniques for finding patterns of words in document collections using hierarchical probabilistic models. These models are called "topic models" because the discovered word patterns often reflect the underlying topics that permeate the documents. Topic models also naturally apply to data such as images and biological sequences.
In this talk I will review the basics of topic modeling, and discuss some recent extensions: supervised topic modeling and relational topic modeling. Supervised topic models allow us to use topics in a setting where we seek both exploratory and predictive power. Relational topic models---which are built on supervised topic models---consider documents interconnected in a graph. These models can be used to summarize a network of documents, predict links between them, and predict words within them.
Joint work with Jonathan Chang and Jon McAuliffe.
Bio:
David Blei is an assistant professor in the Computer Science department at Princeton University. He received his Ph.D. in 2004 from U.C. Berkeley and was a postdoctoral researcher in the Department of Machine Learning at Carnegie Mellon University. His research interests include graphical models, approximate posterior inference, and nonparametric Bayesian statistics. He focuses on applications to information retrieval and natural language processing.
|
Academic Holiday
|
Dr. Wei Pan, University of Minnesota
"Statistical Tests of Genetic Association with Multiple SNPs"
Genome-wide association studies have become popular in detecting genetic variants associated with complex diseases. Because of often weak association strengths, it is critical to use statistical tests with high power. For the typical case-control design, we consider testing disease association with multiple SNPs in a candidate gene or region. The statistical question can be formulated in a simple and familiar way: we are testing on multiple parameters in one or more logistic (or other) regression models. Two most popular
existing approaches are 1) to test individual or SNP-specific parameters separately in each marginal/univariate regression model (with multiple test
adjustment), and 2) to test multiple parameters simultaneously in a joint regression model; the parameter estimates are all (approximately) normally
distributed. Two alternative approaches are discussed: the first is a compromise of the above univariate and multivariate approaches, which works well under some situations but not in others; the second is a "fix" of the first. Both approaches are based on incorrect models. For example, in contrast to the use of the covariance matrix in the Wald test on multiple parameters, an alternative test that ignores the correlations among the parameter estimates may yield higher power.
|
| Dr. Marina Vannucci, Rice University "Mixture Priors for Bayesian Variable Selection"
In this talk I will review Bayesian methods for variable selection that use spike and slab priors. Specific interest will be towards high-dimensional data. Linear and nonlinear models will be considered, with continuous, categorical and survival responses. Applications will be to genomics data from DNA microarray studies. The analysis of the high-dimensional data generated by such studies often challenges standard statistical methods. Models and algorithms are quite flexible and allow us to incorporate additional information, such as data substructure and/or knowledge on gene functions and on relationships among genes.
|
|
Dr. Nozer Singpurwalla, George Washington University
"Network Routing in a Dynamic Environment"
Abstract: We propose a framework for route selection in an archetypal network which is required to function in a dynamic environment. A consideration of the dynamic environment is motivated by the adversarial scenario of obstacles placed by adversaries on one or more links of the network. The problem poses some challenging and novel statistical issues pertaining to inducing likelihoods based on sampling from posterior distributions, and modeling the socio psychological behavior of adversaries. Our efforts here show how data analysis and statistical inference can beproductively brought to bear on problems normally addressed by computer scientists, electrical engineers, and operations researchers.
|
|
Dr. Bala Rajaratnam, Stanford University
"Flexible Covariance Estimation in Gaussian Graphical models"
Covariance estimation is known to be a challenging problem, especially for high-dimensional data. In this context, graphical models can act as a tool for regularization and have proven to be excellent tools for the analysis of high dimensional data. Graphical models are statistical models where dependencies between variables are represented by means of a graph. Bothfrequentist and Bayesian inferential procedures for graphical models have recently received much attention in the statistics literature. The hyper-inverse Wishart distribution is a commonly used prior for Bayesianinference on covariance matrices in Gaussian Graphical models. This prior has the distinct advantage that it is a conjugate prior for this model but it suffers from lack of flexibility in high dimensional problems due to its single shape parameter. In this talk, for posterior inference on covariance matrices in decomposable Gaussian graphical models, we use a flexible class of conjugate prior distributions defined on the cone of positive-definite matrices with fixed zeros according to a graph G. This class includes the hyper inverse Wishart distribution and allows for up to k+1 shape parameters where k denotes the number of cliques in the graph. We first add to this class of priors, a reference prior, which can be viewed as an improper member of this class. We then derive the general form of the Bayes estimators under traditional loss functions adapted to graphical models and exploit the conjugacy relationship in these models to express these estimators in closed form. The closed form solutions allow us to avoid heavy computational costs that are usually incurred in these high-dimensional problems. We also investigate decision-theoretic properties of the standard frequentist estimator, which is the maximum likelihood estimator, in these problems. Furthermore, we illustrate the performance of our estimators by exploring frequentist risk properties and the efficacy of graphs in the estimation of high-dimensional covariance structures. We demonstrate that our estimators yield substantial risk reductions over the maximum likelihood estimator in the graphical model.
|
|
Dr. Srikesh Arunajadai, Biostatistics Department, Columbia University
"Unwinding RNA : Application of Point Process and Change Point Models"
Helicases are a class of enzymes involved in Ribonucleic Acid (RNA) metabolism. The study of double-stranded RNA unwinding by helicases is a problem of basic scientific interest. One such example is provided by studies on the hepatitis C virus (HCV) NS3 helicase using single molecule mechanical experiments. HCV currently infects nearly 3% of the world population and NS3 is a protein essential for viral genome replication. In this work a statistical method is proposed to analyze the individual mechanistic cycle of these motor proteins which are crucial to the understanding of their cellular functions. The RNA unwinding by NS3 helicase is hypothesized to occur in a series of discrete steps and the steps themselves occurring in accordance to an underlying point process. A point process driven multiple change point model is proposed to model the RNA unwinding mechanism. Algorithms based on robust-resistant statistical procedures are proposed to detect the change points.
|
|
Dr. Parthanil Roy, Michigan State University
"Ergodic Properties of Stable Random Fields"
Abstract: We establish characterization results for the ergodicity of symmetric $\alpha$-stable (S$\alpha$S stationary random fields. We first show that the result of Samorodnitsky(2005) remains valid in the multiparameter setting, i.e., a stationary S$\alpha$S ($0<\alpha<2$) random field is ergodic (or equivalently, weakly mixing) if and only if it is generated by a null group action. By establishing multiparameter versions of Stochastic and Birkhoff Ergodic Theorems, we give a criterion for ergodicity of these random fields which is valid for all dimensions and new even in the one-dimensional case. The similarity of the spectral representations for sum- and max-stable random fields yields parallel characterization results in the max-stable setting. (This talk is based on a joint work with Yizao Wang and Stilian A. Stoev.)
|
| |
| |
 |
|