Thank you for visiting our site! You landed on this page because you entered a search term similar to this: multiple choice exponents algebra powerpoint rules, here's the result:
Fall 2005 Biostatistics Brown Bag Seminar AbstractsPartially Linear Models and Related Topics In this brown-bag seminar I will bring a presentation of the state of the art of partially linear models, with a particular focus on several special topics such as with error-prone covariates, missing observation, nonlinear component checking. Extension to more general models will be discussed. The applications of these projects in biology, economics, and nutrition will be mentioned. The talk covers a series of my publications in the Annals of Statistics, JASA, Statistica Sinica, Statistical Methods in Medical Research, and more recent submission. Similarity Searches in Genome-wide Numerical Data Sets Many types of genomic data are naturally represented as multidimensional vectors. The frequent purpose of genome-scale data analysis is to uncover the subsets in the data that are related by a similarity of some sort. One way to do it is by computing the distances between vectors. The major question here is: how to choose the distance measure, when several of them are available? First, we consider the problem of functional inference using phyletic patterns. Phyletic patterns denote presence and absence of orthologous genes in completely sequenced genomes, and are used to infer functional links, on the assumption that genes involved in the same pathway or functional system are co-inherited by the same set of genomes. I demonstrate that the use of appropriate distance measure and clustering algorithm increases the sensitivity of phyletic pattern method; however, the method itself has the limit of applicability caused by differential gains, losses, and displacements of orthologous genes. Second, we study the characteristic properties of various distance measures and their performance in several tasks of genome analysis. Most distance measures between binary vectors turn out to belong to a single parametric family, namely generalized average-based distance with different exponents. I show that descriptive statistics of distance distribution, such as skewness and kurtosis, can guide the appropriate choice of the exponent. On the contrary, the more familiar distance properties, such as metric and additivity, appear to have much less effect on the performance of distances. Third, we discuss the new approach for local clustering based on an iterative pattern-matching and apply the new approach to identify potential malaria vaccine candidates in Plasmodium falciparum transcriptome.
Spring 2005 Biostatistics Brown Bag Seminar AbstractsEstimating Incremental Cost-Effectiveness Ratios and Their ConfidenceIntervals with Differentially Censored Data With medical cost escalating over recent years, cost analysis is beingconducted more and more to assess economical impact of new treatmentoptions. An incremental cost-effectiveness ratio is a measure that assessesthe additional cost for a new treatment for saving one year of life. In thistalk, we consider cost effective analysis for new treatments evaluated in arandomized clinical trial setting with staggered entries. In particular,the censoring times are different for cost and survival data. We propose amethod for estimating the incremental cost-effectiveness ratio and obtainingits confidence interval when differential censoring exists. Simulationexperiments are conducted to evaluate our proposed method. We also apply ourmethods to a clinical trial example comparing the cost-effectiveness ofimplanted defibrillators with conventional therapy for individuals withreduced left ventricular function after myocardial infarction. Regression Analysis of ROC Curves and Surfaces Receiver operating characteristic (ROC) curves are commonly used to describe the performance of a diagnostic test in terms of discriminating between healthy and diseased populations. A popular index of the discriminating ability or accuracy of the diagnostic test is the area under the ROC curve. When there are three or more populations, the concept of an ROC curve can be generalized to that of an ROC surface,with the volume under the ROC surface serving as an index of diagnostic accuracy. After introducing the basic concepts associated with ROC curves and surfaces, methods for assessing the effects of covariates on diagnostic test performance will be discussed. Examples from a recent study organized by the Agency for Toxic Substances and Disease Registry (and conducted here in Rochester) will be presented to illustrate thesemethods. Constructing Prognostic Gene Signatures for Cancer Survival Modern micro-array technologies allow us to simultaneously measure theexpressions of a huge number of genes, some of which are likely to beassociated with cancer survival. While such gene expressions are unlikely toever completely replace important clinical covariates, evidence is alreadybeginning to mount that they can provide significant additional predictiveinformation. The difficult task is to search among an enormous number ofpotential predictors and to correctly identify most of the important ones,without mistakenly identifying too many spurious associations. Many commonlyused screening procedures unfortunately over-fit the training data, leadingto subsets of selected genes that are unrelated to survival in the targetpopulation, despite appearing associated with the outcome in the particularsample of data used for subset selection. And some genes might only beuseful when used in concert with certain other genes and/or with clinicalcovariates, yet most available screening methods are inherently univariatein nature, based only on the marginal associations between each predictorand the outcome. While it is impossible to simultaneously adjust for a hugenumber of predictors in an unconstrained way, we propose a method thatoffers a middle ground where some partial adjustments can be made in anadaptive way, regardless of the number of candidate predictors. A New Test Statistic for Testing Two-Sample Hypotheses in Microarray DataAnalysis We introduce a test statistic intended for use in nonparametric testing of the two-samplehypothesis with the aid of resampling techniques. This statistic is constructed as an empirical counterpart of a certain distance measure N between the distributions F and G from which the samples under study are drawn. The distance measure N can be shown to be a probability metric. In two-sample comparisons, the null hypothesis F = G is formulated as H0 : N = 0. In a computer experiment, where gene expressions were generated from a log-normal distribution, while departures from the null hypothesis were modeled via scale transformations, the permutation test based on the distance N appeared to be more powerful than the one based on the commonly used t-statistic. The proposed statistic is not distribution free so that the two-sample hypothesis F = G is composite, i.e., it is formulated as H0 : F(x) = H(x), G(x) = H(x) for all x and some H(x). The question of how the null distribution H should be modeled arises naturally in this situation. For the N-statistic, it can be shown that a specific resampling procedure (resampling analog of permutations) provides a rational way of modeling the null distribution. More specifically, this procedure mimics the sampling from a null distribution H which is, in some sense, the "least favorable" for rejection of the null hypothesis. No statement of such generality can be made for the t-statistic. The usefulness of the proposed statistic is illustrated with an application to experimental data generated to identify genes involved in the response of cultured cells to oncogenic mutations. The Effects of Normalization on the Correlation Structure of Microarray Data Stochastic dependence between gene expression levels in microarray data isof critical importance for the methods of statistical inference that resortto pooling test statistics across genes. It is frequently assumed thatdependence between genes (or tests) is sufficiently weak to justify theproposed methods of testing for differentially expressed genes. A potentialimpact of between-gene correlations on the performance of such methods hasyet to be explored. We present a systematic study of correlation betweenthe t-statistics associated with different genes. We report the effects offour different normalization methods using a large set of microarray data onchildhood leukemia in addition to several sets of simulated data. Ourfindings help decipher the correlation structure of microarray data beforeand after the application of normalization procedures. A long-rangecorrelation in microarray data manifests itself in thousands of genes thatare heavily correlated with a given gene in terms of the associatedt-statistics. The application of normalization methods may significantlyreduce correlation between the t-statistics computed for different genes.However, such procedures are unable to completely remove correlation betweenthe test statistics. The long-range correlation structure also persists innormalized data. Estimating Complexity in Bayesian Networks Bayesian networks are commonly used to model complex genetic interactiongraphs in which genes are represented by nodes and interactions by directededges. Although a likelihood function is usually well defined, the maximumlikelihood approach favors networks with high model complexity. To overcomethis we propose a two step algorithm to learn the network structure. First,we estimate model complexity. This requires finding the MLE conditional onmodel complexity then using Bayesian updating, resulting in an informativeprior density on complexity. This is accomplished using simulated annealingto solve a constrained optimization problem on the graph space. In thesecond step we use an MCMC algorithm to construct a posterior density ofgene graphs which incorporates the information obtained in the first step.Our approach is illustrated by an example. A New Approach to Testing for Sufficient Follow-up in Cure-Rate Analysis The problem of sufficient follow-up arises naturally in the context of curerate estimation. This problem was brought to the fore by Maller and Zhou(1992, 1994) in an effort to develop nonparametric statistical inferencebased on a binary mixture model. The authors proposed a statistical test tohelp practitioners decide whether or not the period of observation has beenlong enough for this inference to be theoretically sound. The test isinextricably entwined with estimation of the cure probability by theKaplan-Meier estimator at the point of last observation. While intuitivelycompelling, the test by Maller and Zhou does not provide a satisfactorysolution to the problem because of its unstable and non-monotonic behaviorwhen the duration of follow-up increases. The present paper introduces analternative concept of sufficient follow-up allowing derivation of a lowerbound for the expected proportion of immune subjects in a wide class of curemodels. By building on the proposed bound, a new statistical test isdesigned to address the issue of the presence of immunes in the studypopulation. The usefulness of the proposed approach is illustrated with anapplication to survival data on breast cancer patients identified throughthe NCI Surveillance, Epidemiology and End Results Database. Assessment of Diagnostic Tests in the Presence of Verification Bias Sensitivity and specificity are common measures of the accuracy of a diagnostic test. The usual estimators of these quantities are unbiased if data on the diagnostic test result and the true disease status are obtained from all subjects in a random sample from the intended population to which the test will be applied. In many studies, however, verification of the true disease status is performed only for a subset of the sample. This may be the case, for example, if ascertainment of the true disease status is invasive or costly. Often, verification of the true disease status depends on the result of the diagnostic test and possibly other characteristics of the subject (e.g., only subjects judged to be at higher risk of having the disease). If sensitivity and specificity are estimated using only the information from the subset of subjects for whom both the test result and the true disease status have been ascertained, these estimates will typically be biased. This talk will review some methods for dealing with the problem of verification bias. Some new approaches to the problem will also be introduced. Estimation of Causal Treatment Effects from Randomized Trials with Varying Levels of Non-Compliance Data from randomized trials with non-compliance are often analyzed with an intention-to-treat (ITT) approach. However, while ITT estimates may be of interest to policy-makers, estimates of causal treatment effects may be of more interest to clinicians. For the simple situation where treatment and compliance are binary (yes/no), instrumental variable (IV) methods can be used to estimate the average causal effect of treatment among those that would comply with treatment assignment. When there are more than two compliance levels (e.g., non-compliance, partial compliance, full compliance), however, these IV methods cannot identify the compliance-level causal effects without strong assumptions. We consider likelihood-based methods for dealing with this problem. The research was motivated by a study of the effectiveness of a disease self-management program in reducing health care utilization among older women with heart disease. This is work-in-progress. Statistical Inference for Branching Processes It is well known that branching processes have many applications in biology. In this talk the asymptotic behavior of branching populations having an increasing and random number of ancestors is investigated. An estimation theory will be developed for the mean, variance and offspring distributions of the process $\{Z_{t}(n)\}$ with random number of ancestors $Z_{0}(n)$, as both $n$ (and thus $Z_{0}(n)$, in some sense) and $t$ approach infinity. Nonparametric estimators are proposed and shown to be consistent and asymptotically normal. Some censored estimators are also considered. It is shown that all results can be transferred to branching processes with immigration, under an appropriate sampling scheme. A system for simulation and estimation of branching processes will be demonstrated. No preliminary knowledge in this field is assumed. Modeling of Stochastic Periodicity: Renewal, Regenerative and Branching Processes In deterministic processes periodicity is usually well defined. However in the stochastic case there are many possible models.One way to study stochastic periodicity is proposed in this lecture. The models are based on Alternating Renewal and Regenerative Processes. The limiting behavior is investigated, with special attention given to the case of periods of regeneration with infinite mean. Two applications in the Branching Processes are considered:Bellman-Harris branching processes with state-dependent immigration anddiscrete-time branching processes with a random migration. The main purpose of the talk is to describe stochastic models which can be applied in Biology, especially Epidemiology and Biotechnology. No preliminary knowledge in this field is assumed. Testing Approximate Statistical Hypotheses Statistical hypotheses often take the form of statements about some properties of functionals of probability distributions. Usually, according to a hypothesis the functionals in question have certain exact values. Many of the classical statistical hypotheses are of this form: the hypothesis about mathematical expectation of a normal sample (one-dimensional or multidimensional); the hypothesis about probabilities of outcomes in independent trails (which should be tested based on observed frequencies); the linear hypotheses in Gaussian linear models etc. Stated as suppositions about exact values those hypotheses do not express accurately the thinking of natural scientists. In practice an applied scientist would be satisfied if those or similar suppositions were correct in some approximate sense (meaning their approximate agreement with statistical data). The above-mentioned discrepancy between applied-science approach and the mathematical expression of it leads to rejection of any statistical hypothesis given sufficiently large amount of sample data a well known statistical phenomenon. This talk will show how hypotheses about exact values can be re-stated as rigorously formulated approximate hypotheses and how those can be tested against sample data with special attention given to the hypotheses mentioned above.
Fall 2004 Biostatistics Brown Bag Seminar AbstractsA Bayesian Analysis of Multiple Hypothesis Tests A Bayesian methodology is proposed for the problem of multiple hypothesis tests for a given effect. The density of test statistics is modelled as a mixture based on hypothesis status. A full posterior measure is constructed for the mixture conditional on the observable total density. Commonly used quantities such as false discovery rates and posterior probabilities of hypothesis status can be directly calculated from the mixture, and so full posterior measures for these quantities can be directly obtained. The posterior measure is computed by sampling from a Monte Carlo Markov chain. This approach proves to be very flexible, allowing a model for the magnitude of the effects, as well as for dependence structure, to be developed and incorporated into the posterior measure. In addition, this approach is ideally suited to the situation in which the presence of large numbers of marginal, or weak, effects complicates any attempt to estimate the hypothesis mixture. In this case, a simple redefinition of the null hypothesis is proposed which makes the mixture estimation well defined and feasible. Analysis of Variance, Coefficient of Determination, and ApproximateF-tests for Local Polynomial Regression In this paper, we develop analogousANOVA inference tools for nonparametric local polynomial regression in the simple case with bivariate data.The results include: (i) a local exact ANOVA decomposition, (ii) a local R-squared, (iii) a global ANOVA decomposition, (iv) a global R-squared, (v) an asymptotically idempotent projection matrix, (vi) degree of freedom, and (vii) approximate$F$-tests. We also provide some interesting geometric views why a localexact ANOVA decomposition holds. The work here is different fromearlier developments by other investigators. This is a joint work with Jianwei Chen in the department. On the Role of Copula Models in Survival Analysis An important class of models in bivariate survival analysis consists of theso-called frailty models, which arise from the introduction of a common unobserved random proportionality factor into the hazard functions of the two related survival times. This assumption leads to a simple copula representation of the joint survivor function. The class of such models will be described and various characterization results presented. Some extensions will be discussed. Methods for parametric and semiparametric inference about the parameters governing the marginal distributions and the association structure will be surveyed briefly. Cost-Effectiveness Studies Associated with Clinical Trials - Projecting Effects Beyond the Range of the Data I start with a quick overview of the MADIT-II clinical trial and of the associated cost-effectiveness study, including a general overview of cost-effectiveness studies. I then review the need for projecting results beyond the limited (3.5 years) span of the available data and the associated difficulties [and fool-hardiness???]. Finally, I talk about the life-table method we developed for use in the MADIT-II cost-effectiveness study to project survival experience beyond the time span of the data. [Hongwei Zhao, Hongkun Wang and Hongyue Wang contributed to the MADIT-II cost-effectiveness study, as well as five people from the Community and Preventive Medicine Department and twoin the Heart Research Group. A manuscript has just been submitted for publication (with these eleven authors).] Spring 2004 Biostatistics Brown Bag Seminar AbstractsQuick and Easy Solutions for Dealing with Data: Part 1 2: Visit our Web site. Conditional Inference Methods for Incomplete Poisson Data WithEndogenous Time-Varying Covariates We investigate the effect of protease inhibitors (PIs) on the rate ofemergency room (ER) visits among HIV-infected women from a longitudinalcohort study. One strategy to account for serial correlation inlongitudinal studies is to assume observations are independent, conditionalon unit-specific nuisance parameters. It is possible to estimate thesemodels using unconditional maximum likelihood, where the nuisance parametersare assigned a parametric distribution and integrated out of the likelihood.Alternately, we can proceed using conditional inference, where we eliminatethe nuisance parameters from the likelihood by conditioning on a sufficientstatistic for these parameters. An advantage of conditional inferencemethods over parametric random effects models is all patient-leveltime-invariant factors (both measured and unmeasured) are accounted for inthe analysis. A limitation is standard conditional inference methods assumemissing data are missing completely at random and do not allow endogenoustime-varying covariates (i.e., ER visits in the past cannot predict futurePI use). Both assumptions are unlikely to be met for these data, because onewould expect `sicker' patients would be more likely to receive treatmentand/or drop out from the study. We develop new estimation strategies thatallow endogenous time-varying covariates and missing at random dropouts.The analysis shows that PI use reduces the rate of ER visits among patientswhose CD4 cell count was On the Density of the Solution to a Random System of Equations for PDF file of seminar abstract Paradoxical Association of a Group of Atherosclerosis-related Genotypes with Reduced Rate of Coronary Events After Myocardial Infarction Local Polynomial Density Estimation With Interval Censored Data A survival time is interval censored if only its current status,an indicator of whether the event has occurred,is observed at a possibly random number of monitoring times.We provide estimators with pointwise confidence limits for allderivatives of the distribution of the time till event,assuming that the observed monitoring times are independent of the time of interest. Our estimator is a standard local polynomial regression smootherapplied to the pooled sample of dependent current status observations.We show that the proposed estimator has a normal limiting distributionidentical to that of a smoother applied to independent current status observations. Thus local bandwidth selection techniques and pointwiseconfidence limit procedures for standard nonparametric regressionperform properly, despite the dependence in the pooled sample. Pre-limit Theorems and Their Applications Finitely many empirical observations can never justify any tail behavior, thus they cannot justify the applicability ofclassical limit theorems in probability theory. In this paper weattempt to show that instead of relying on limit theorems, one may usethe so-called pre-limit theorems explained later. The applicability ofour pre-limit theorem relies not on the tail but on the 'centralsection' ('body') of the distributions and as a result, instead of alimiting behavior (when $n$, the number of i.i.d. observations tends toinfinity), the pre-limit theorem should provide an approximation fordistribution functions in case $n$ is 'large' but not too 'large'.Our pre-limiting approach seems to be more realistic for practicalapplications. p-Values-Only-Based Stepwise Procedures for Multiple Testing and Their Optimality Properties for PDF file of seminar abstract Modeling Cancer Screening: Further Thoughts and Results Over the years, many large-scale randomized trials have been conducted to evaluate the effects of breast cancer screening. These trials have failed to provide conclusive evidence for significant survival benefits of mammographic screening because of certain pitfalls in their design and lack of statistical power. However, such studies represent a rich source of information on the natural history of breast cancer, thereby opening up the way to evaluate potential benefits of breast cancer screening through using realistic mathematical models of cancer development and detection. We propose a biologically motivated model of breast cancer development and detection allowing for arbitrary screening schedules and the effects of clinical covariates recorded at the time of diagnosis on post-treatment survival. Biologically meaningful parameters of the model are estimated by the method of maximum likelihood from the data on age and tumor size at detection that resulted from two randomized trials known as the Canadian National Breast Screening Studies. When properly calibrated, the model provides a good description of the U.S. national trends in breast cancer incidence and mortality. The model was validated by predicting (without any further calibration or tuning) certain quantitative characteristics obtained from the SEER data. In particular, the model provides an excellent prediction of the size-specific age-adjusted incidence of invasive breast cancer as a function of calendar time for the period 1975-1999. Predictive properties of the model are also illustrated with an application to the dynamics of age-specific incidence and stage-specific age-adjusted incidence over the period 1975-1999. Iterated Birth and Death Markov Process and its Biological Applications We solve, under realistic biological assumptions, the followinglong-standing problem in radiation biology: to find the distributionof the number of clonogenic tumor cells surviving a given arbitraryschedule of fractionated radiation. Mathematically, this leads to theproblem of computing the distribution of the state N(t) of an iteratedbirth and death Markov process at any time t counted from the end ofexposure. We show that the distribution of the random variable N(t)belongs to the class of generalized negative binomial distributions,find an explicit computationally feasible formula for thisdistribution, and identify its limiting forms. In particular, for t =0, the limiting distribution turns out to be Poisson, and an estimateof the rate of convergence in the total variation metric thatgeneralizes the classical Law of Rare Events is obtained. Statistical Methods of Translating Microarray Data into Clinically Relevant Diagnostic Information in Colorectal Cancer The aim of the study is two fold. First, we identify a set of differentially expressed (DE) genes in colorectal cancer, compared with normal colorectal tissues to rank genes for the development of biomarkers for population screening of colorectal cancer. Second, we detect a set of DE genes for subtypes of colorectal cancer which can be classified with respect to stage, location and carcino-embryonic antigen (CEA) level. The cancer and normal tissues were obtained from 87 colorectal cancer patients who underwent surgery at Severance Hospital, Yonsei Cancer Center, Yonsei University College of Medicine, from May to December of 2002. We originally attempted to extract total RNAs from tumor and normal tissues from 87 patients. From each of 36 patients we had RNA specimens both for tumor and normal tissues. However, from 19 (32) patients RNA specimens for normal tissues (tumor) only were available. Thus, we have a matched pair sample of size 36 and two independent samples of sizes 19 and 32. We conducted a cDNA microarray experiment using a common reference design with 17K human cDNA microarrays. We pooled eleven cancer cell lines from various origins and used it for the common reference. We used M=log2(R/G) for the evaluation of relative intensity. As a means of utilizing the whole data set we first use the matched pair data set as a training set from which we detect a set of DE genes between the normal tissue and the tumor. Then we use the pool of two independent data sets of "tumor only" and "normal only" as the test set for the validation. We employ four procedures for detecting a set of DE genes from the matched pair sample of size 36: Paired t test and Dudoit et al.s maxT procedure; Tusher et al.s SAM procedure; Lnnstedt and Speeds empirical Bayes procedure; Hotellings T2 statistic. We employ the diagonal quadratic discriminant analysis for the classification of the test set. We modify standard methods for the data at hand and propose a t-based statistics, say t3, which combine three data types for the detection of DE genes. We also extend Pepe et al.s ROC approach of ranking genes for the purpose of biomarker development for our mixed data type (Pepe et al., 2003 Biometrics). We note that only a few genes are required to achieve 0% test error in discriminating the normal tissue from the colorectal cancer. For the subtype analyses various approaches failed to identify DE genes with respect to colon cancer versus rectum cancer and stage B versus stage C. We employed a regression approach to detect a few genes which well correlated with CEA.
Fall 2003 Biostatistics Brown Bag Seminar AbstractsBiomarker Measurement Error: A Bayesian Approach with Application toLung Cancer Molecular biologists have identified specific cellular changes, calledbiomarkers, which enable them to better understand the pathway fromchemical exposure to initiation of some cancers. In lung cancer, onesuch biomarker is the number of DNA adducts in lung tissue. Adductsare formed from the binding of cigarette carcinogens to DNA, and thisadduct formation plays a central role in lung cancer initiation fromsmoking. The goal of this work is to incorporate knowledge of such underlyingbiological mechanisms into a useful statistical framework to improvecancer risk estimates. The model considers adducts in the blood to bea surrogate measure of lung adducts. Lung adducts can never bemeasured in controls. The model is developed on a subset of the data,a small portion of which has biomarker measurements, and is used topredict cancer risk for the remaining data which do not have biomarkermeasurements. These predictions are compared to those from atraditional model, and to observed case/control status. Although thebiomarker model compares favorably with the traditional approach,model diagnostics suggest that better predictions could be made froman expanded model which allows for measurement error in lung adducts. Functional Response Models and their Applications I will discuss a new class of semi-parametric (distribution-free) regression models with functional responses. This class of functional response models (FRM) generalizes the traditional regression models by defining the response variable as a function of several responses from multiple subjects. By using such multiple-subjects-based responses, the FRM integrates many popular non- and semi-parametric approaches within a unified modeling framework. For example, under the proposed framework, we can derive regression models to perform inferences for two-way contingency tables and to estimate variance components by identifying them as model parameters. The FRM also provides theoretical platform for developing new models for addressing limitations of existing non- and semi-models. For example, we can develop FRMs to generalize ANOVA so that we can not only compare the means, but also the variances of the multiple groups, and to derive and extend the Mann-Whitney-Wilcoxon (MWW) rank-based tests to more than two groups. For inferences, we discuss a novel approach by integrating the U-statistic theory with the generalized estimating equations. The talk is illustrated with examples from biomedical and psychosocial research. Biomedical Modeling, Prediction and Simulation In this brown-bag seminar, I am going to give a brief introductionto several on-going projects in my research group. Our research projects include In summary, we are trying to combine the models and techniques frombiomathematics, engineering, computer science and statisticsto solve important biomedical problems. The multi-discipline featureof these projects will be further enhanced in the next several years.Currently the research faculty and postdoc fellows who are involved in these projects include Drs. Yangxin Huang, Jianwei Chen, Haihong Zhu and Dacheng Liu as well as other external collaborators. A Discussion on Intent-To-Treat Principle for Blood Transfusion Trials Spring 2003 Biostatistics Brown Bag Seminar AbstractsStatistical Analysis of Skewed Data This talk is motivated by an example where the dependent variable has alot of zero values and a very skewed distribution, and the interest is tofind a relationship between several covariates and this variable. We willexamine briefly some current literatures which dealt with this problem. Wewill also discuss the interpretation of the parameters for some of thoseproposed models. In the end we will present the results of the dataanalysis of our example. Inference on multi-type cell systems using clonal data and application to oligodendrocytes development in cell culture Fall 2002 Biostatistics Brown Bag Seminar AbstractsDesigning and Analyzing a Small Bernoulli-Trial Experiment, with Application to a Recent Cardiological Device Trial In the recent `WEARIT' trial, the success of a wearable defibrillator in preventing death from a heart attack in patients awaiting a heart transplant was evaluated. A trial design was called for that would meet certain requirements on error probabilities, that would make a decision -- for or against the device -- within a speci- fied maximum number (n = 15) of heart attack incidents in a group of recruited patients, and hopefully would terminate after many fewer incidents. We will use this setting to review single and double sampling plans, curtailed sampling, and various other sequential sampling plans that might be used for such a trial, along with the associated methodology for inference about the implicit Bernoulli parameter -- the success rate in resuscitating patients after a heart attack. We present this in the context of the WEARIT trial. You may be surprised how many statistical issues arise in an inference problem associated with observation of a few Bernoulli trials! Using Local Correlation in Kernel-Based Smoothers for Dependent Data This is a joint work with Hongwei Zhao and Sara Eapen. Informative Prior Specification for Linear Regression Models using Parameter Decompositions I will motivate this work by discussing a dataset for which theintended Bayesian analysis requires an informative prior, due tointeractions for which the data likelihood has no direct information.I will then present a method of obtaining informative priors for alinear regression model, based on information elicited from a subjectmatter expert. This method relies on a decomposition, novel in themultivariate case, of regression coefficients, their covariancematrix, and the residual variance of the regression. The onlyquantities which the expert needs to specify are the population means,variances, and pairwise correlations. Finally, I will discuss how Iused the information elicited from the expert to obtain a properinformative prior for this example. This is joint work with JoeIbrahim and Susan Korrick. Topology, DNA Topology and Some Probabilistic Models of Nucleic Acids This is an informative talk based on already known results.The presentation was inspired by my effort to understand the book by A D Bates and A Maxwell "DNA Topology".First I will introduce a classical result about "Hopf Map" in orderto give my appreciation to the field of topology. Next I will give an example of a situation where topology can help to understand DNA recombination. At the end I will briefly introduce a probabilistic modelthat is used to distinguish if two nucleic acids or proteinsequences are related or not. This part is based on the book byDurbin, Eddy, Krogh and Mitchison "Biological Sequence Analysis".
I will discuss the paper,Doksum, K., and Samarov, A. (1995). Nonparametric estimation of globalfunctionals and a measure of the explanatory power of covariates inregression. Annals of Statistics, 23, 1443-1473. and propose new ideas of nonparametric coefficient of determination. An Overview of Multiple Imputation Spring 2002 Biostatistics Brown Bag Seminar AbstractsMADIT-II: A Recently Completed Sequential Clinical Trial The Multicenter Automatic Defibrillator Implantation Trial #2,administered here at the UR Medical Center with 1232 heart-diseasepatients enrolled through 76 hospital centers, came to a favorable conclusion in November by reaching a pre-specified sequential stop-ping criteria for efficacy. The statistical work was, and continues to be, carried out here, including statistical design of the study,weekly analyses of the survivorship data, chairing of the Monitoring Committee, and final analyses of efficacy and side-effects data, with cost analyses still to come. This talk will give an overview, focusing on the statistical aspects of designing, monitoring and analyzing such trial data. Combining Statified and Unstratified Log-Rank Tests for Correlated Survival Data The log-rank test is the most widely used nonparametric method fortesting treatment differences in survival-analysis-based clinicaltrials due to its efficiency under proportional hazards. Most previouswork on the log-rank test has assumed that the samples from the twotreatment groups are independent. However, in multicenter clinicaltrials, survival times of patients in the same medical center may becorrelated due to some factors specific to each center; or studies mayutilize pairing of patients or response units, resulting independence. For such data we can construct stratified and unstratifiedlog-rank tests (call them SLRT and ULRT respectively). These two testsaddress somewhat different features of the data. An appropriate linearcombination of these two tests may give a more powerful test thaneither individual test. Under a matched-pair frailty model, we obtainclosed-form asymptotic local alternative distributions and thecorrelation coefficient of SLRT and ULRT. Based on these results weconstruct an optimal linear combination of the two test statistics.Simulation studies with Hougaard model confirm our construction. Ourapproach is illustrated with data from the Diabetic RetinopathyStudy(Huster, et al, 1989). We extend our work to the cases ofstratum size > 2 and of variable (but upper bounded) stratum sizes. Non-Sexual Household Transmission of HCV Infection Objective: This study was designed to determine the prevalence and theincidence of HCV infection among non-sexual household contacts ofHCV-infected women and to describe the association between HCVinfection and potential household risk factors in order to examinewhether non-sexual household contact is a route of HCV transmission.Methods: A baseline prevalence survey included 409 non-sexualhousehold contacts of 241 HCV-infected index women in the Houston areafrom 1994 to 1997. A total of 470 non- sexual household contacts withno evidence of HCV infection at baseline investigation werere-assessed approximately three years after baselineenrollment. Information on potential risk factors was collectedthrough face to face interviews and blood samples were tested foranti- HCV with ELISA-2 and Matrix / RIBA-2. The relationships betweenHCV infection and potential risk factors were examined by usingunivariate and multivariate logistic regression analyses.Results: The overall prevalence of anti-HCV positivity among 409non-sexual household contacts was 4.4%. The highest prevalence ofanti-HCV was found in parents (19.5%), followed by siblings (8.1%) andother relatives (5.6%); the children had the lowest prevalence ofanti-HCV (1.2%). The univariate analysis showed that IDU, bloodtransfusion, tattoos, sexual contact with injecting drug users, morethan 3 sexual partners in a lifetime, history of a STD, incarceration,previous hepatitis, and contact with hepatitis patients weresignificantly associated with HCV infection, however, sharing razors,nail clippers, toothbrushes, gum, food or beds with HCV-infectedwomen, and history of dialysis, health care job, body piercing, andhomosexual activities were not. Multivariate analysis found that IDU(OR = 221.7 with 95% CI of 22.8 to 2155.7) and history of a STD (OR =11.7 with 95% CI of 1.2 to 113.1) were the only variablessignificantly associated with HCV infection. No such associationsremained for other risk factors. The three-year cumulative incidenceof anti- HCV among 352 non-sexual household contacts of HCV-infectedwomen was zero.Conclusion: This study has provided no evidence that non-sexualhousehold contact is a likely route of transmission for HCVinfection. The risk of sharing razors, nail clippers, toothbrushes,gum, food and/or beds with HCV-infected women is not evident and hasnot been shown to be the likely mode for HCV spread among familymembers. This study does suggest that IDU is the likely route oftransmission for most HCV infection. Association also has been shownindependently with a history of STD. The prevalence of anti-HCV amongnon-sexual household contacts was low. Exposure to common parenteralrisk factors and sexual transmission between sexual partners mayaccount for HCV spread among household members of HCV-infectedpersons. Parameter Estimation in Bivariate Copula Models Many models have been proposed for multivariatefailure-time data (T_{1},T_{2}) arising in reliability and otherapplications. A bivariate survivor function S(t_{1},t_{2}) issaid to be generated by an archimedean copula if it can beexpressed in the formS(t_{1},t_{2})=p[q{S_{1}(t_{1})}+q{S_{2}(t_{2})}]for some convex, decreasing function q defined on (0,1]. Here$p$ is the inverse function of q. Usually, p is specified assome function of an unknown parameter. Given a samplefrom S(t_{1},t_{2}), the distribution function ofV=S(T_{1},T_{2}), called the Kendall distribution, can beexpressed simply in terms of q. We use the score function fromthe log-likelihood of the V's to estimate the unknown parameter. Although theV's are unknown, they can be estimated empirically.Interestingly, our estimates based on the empirical V's are muchmore precise than the estimates based on the true and unknownV's. We also investigate an alternative procedure based oniteratively estimating the V's using the assumed copulastructure. We discuss the asymptotic theory for both methods andpresent some illustrative examples. I will also cover the recentdevelopment of a new method to estimate the parameter forbivariate data subject to right random censoring briefly. Microarray Analyses Bayesian inference of phylogeny A Few Remarks on Partial Correlation A Generalization of ROC Curves Fall 2001 Biostatistics Brown Bag Seminar AbstractsUse of placebo-controls vs. active-controls in clinical trials evaluating new treatments Using Measurement Error Models w/o and w/ Interactions to Assess Effects of Prenatal and Postnatal Methylmercury Exposure in the Seychelles Child Development Study at age 66-months Overrunning in Sequential Clinical Trials Most large-scale clinical trials these days have sequentialstopping rules that permit early termination of the trial whenclear superiority of a treatment is firmly established early inthe trial. Once a stopping boundary has been reached, statisticalmethods allow computation of p-values and estimates of treatmenteffects which recognize the sequential stopping rule. Typically,however, additional `lagged' data become available after theboundary has been reached. Earlier methods of accommodating such`overrunning' have serious defects. Two new methods (one jointwith Aiyi Liu, the other joint with Keyue Ding) will be described, and illustrated with data from the MADIT trial of an implanted defibrillator (New England Journal of Medicine, 335:1933-40, 1996). A Second Look at Some Statistical Ideas Via Geometric Projection Geometric concepts have always been useful in statistics. Consider, for example, the number of situations in which the idea of orthogonal projection plays a crucial role. We will discuss a closely related geometric operation, to be called orthogonal cross projection, and point out some of its manifestations in statistics (e.g., covariance). Power point technology would be used in the presentation, provided that all the equipments are functional and are not too sophisticated for the presenter to operate. Two-Period Designs: Part II On Two Consistent Tests of Bivariate Independence and Some Applications The use of the correlation coefficient for testing bivariateindependence, although most common, has serious limitations. In thistalk I will discuss Hoeffding's (1948) test of bivariate independence,and its asymptotic equivalent due to Blum, Kiefer and Rosenblatt(1961), which are well known to be consistent against all dependencealternatives. Specifically, I will describe the status of its nulldistribution and compare its power using a variety of copulas,including those due to Morgenstern, Gumbel, Plackett, Marshall andOlkin, Raftery, Clayton, and Frank. I will also show how the test ofbivariate independence can be used for constructing simplegoodness-of-fit tests. Smoothing Longitudinal Data: A Work in Progress We consider the general problem of smoothing longitudinal data to estimate the nonparametric marginal mean function, where a random but bounded number of measurements are available for each independent subject. In stark contrast to recent work in this area, we show that not only can consistent estimators use the correlation structure of the data but that ignoring this correlation structure necessarily results in inefficiency, just as in the parametric setting. The class of local polynomial kernel-based estimating equations considered by Lin & Carroll (JASA 2000) are shown to be too small, such that they cannot properly make use of the correlation structure; this explains the problem with their general message that it is best to assume working independence, while also providing insight into why penalized likelihood-based correlated smoothing splines can be expected to be efficient. We propose a class of simple, explicit ad hoc estimators which although not efficient can improve upon the working independence local polynomial modeling approach by making use of the local correlation structure to dramatically improve the precision even for moderate sample sizes. Spring 2001 Biostatistics Brown Bag Seminar Abstracts30th Anniversary of the Biplot
A Review of Nonparametric Surival Estimation with Bivariate Right-CensoredData The problem of nonparametric estimation of the survival function withcensored data has an elegant and efficient solution in theone-dimensional case: the Kaplan-Meier estimator. In higherdimensions, with multiple, possibly correlated, survival times,however, the task is much more formidable. Several authors haveproposed ad hoc estimators in this model, and in 1996 van der Laanproposed a theoretically efficient estimator, while also analyzinginefficient estimators previously proposed by Dabrowska, Prentice andCai, and Pruitt. I will review these estimators and explain why theNPMLE is not, in general, consistent for the bivariate survivalfunction. Unlike in the one-dimensional case, some sort of smoothingis required for efficient estimation. Bandwidth selection remains anopen problem in this context, thus contributing to the slow uptake ofvan der Laan's estimator. Confidence Intervals: Equal-Tail, Shortest or Unbiased? Various criteria for choosing confidence intervals have beenconsidered in the literature. We focus on three, named in the title. When based on a pivot with a symmetric distribution, the three coincide, but in `small-sample' applications this coverslittle more than confidence intervals for normal population means, contrasts among such means, and rank procedures about a center of symmetry. Of course, from a large-sample perspective, a maximum likelihood estimate minus parameter, standardized by a standard error estimate, is such a pivot, and this covers many applications. We review the pro's and con's of the three competitors, largely in the context of confidence intervals for the variance when sampling from a normal population, and similarly for varianceratios of analysis of variance. However, our motivation is for dealing with confidence intervals for the hazard ratio after a sequential clinical trial: What kind of interval should bepreferred? Your opinions will be invited.... Analysis of Chicago Ozone Data 1981-1991 Ozone concentrations are affected by precursor emissions and by meteorological conditions. It is of interest to analyze trends in ozoneafter adjusting for meteorological influences. We will discussthe following 4 approaches to analyze Chicago Ozone data 1981-91:
A Test for Equality of Ordered Inverse Gaussian Means The inverse gaussian (IG) distribution, called the fraternal twin ofthe Gaussian distribution, has been widely used in applied fields dueto the facts that it is ideally suited for modeling positively skeweddata and that its inference theory is well known to be analogous tothat of the Gaussian distribution in numerous ways. For example,Weiss (1982, 1983, 1984) demonstrated that the distribution ofcirculation times of drug molecules through the body can beapproximated by the IG distribution. We propose a test procedure toassess trends in the IG response variable (e.g., in animal toxicitystudies). This approach, based on combining independent tests usingclassical methods, can be easily extended to a spectrum of orderconstraints. It is also shown that this procedure is intriguinglyanalogous to that for the Gaussian distribution. The power propertiesare examined by simulation. Correlation Between Variables When Each Is Subject to Sets of Exchangeable Measurements: An Approach Based on Group Invariance An analytical procedure is developed for a type of data structure suitable for modelling the situation in which multiple measurements aremade on each of a set of variables, and the measurements can be divided into exchangeable subsets. The procedure is based on the pattern in covariance matrix corresponding to the group invariance inherent in the data structure,from which a closed-form expression of Gaussian likelihood can be found. Sufficient statistics in the form of sums of squares and cross products and their distributions are obtained, leading to methods of statistical inference for a variety of practical purposes from correction forattenuation to estimation of reliability coefficients.The closed-form expression of the likelihood function is also helpful for implementing likelihood-based computation, such as the EM algorithm forhandling missing data, and for Bayesian inference. The latter can be a very effective tool in dealing with some inferential problems that do not have standard solutions in the traditional framework. Examples include guaranteeing the nonnegative definiteness of an estimated disattenuated correlation matrix and combining information on association parameters from a main study and a reliability, reproducibility, or repeatability study. No originality is claimed and nothing presented will be beyond what is intuitively obviousand/or what has already been in the literature, although the procedure isreadily adaptable for variations on the basic structure.The main objective is to illustrate the application of group invariancein modelling and analysis, which is the topic of almost all my previouslunch presentations. The current presentation, however, involves a datastructure that has not been discussed in the previous presentations. On Kendall's Process and an Associated Estimation Procedure If X is a continuous univariate random variable with distribution function F(x) then it is well-known that F(X) = pr (X This talk will explore the use of a bivariate analog of the probability integral transform in estimating the parameters governing the dependence structure in a bivariate distribution. We will present and explain some simulation results that at first sight seemed somewhat surprising. (This is joint work with Antai Wang and will form the basis for his upcoming qualifying paper) Bootstrap variations: random weighting A review of treatment allocation methods in clinical trials Randomized-Withdrawal and Randomized-Start Designs Randomized-withdrawal and randomized-start designs have recently been introduced in the neurological clinical trials literature as designs which facilitate detection of long-term (`neuroprotective') effects as distinguished from short-term (`symptomatic') effects of a treatment relative to a placebo. Models and analyses for such designs will be described, along with various advantages and limitations. Factorial versions will also be considered. Fall 2000 Biostatistics Brown Bag Seminar AbstractsA Roughness-Penalty View of Kernel Smoothing It has been shown that a smoothing spline estimateis an equivalent kernel estimate. In this paper, we show thatboth the Nadaraya-Watson and local linear kernel estimators areequivalent penalized estimators. Algebraic Rationales for Some Statistical Procedures: Possibilities forUnification and Generalization Many common procedures in statistics have algebraic interpretations.We will discuss a series of examples beginning with the most basicones. It will be shown how algebraic rules extracted from simplecases can be applied to tackle some non-trivial problems.Possibilities for a general framework will also be discussed. A Simulation Study of Frailty Effects in Censored Bivariate Survival Data Multivariate censored survival data typically have correlated failuretimes. The corrleation can be a consequence of the observationaldesign, for example with clustered sampling and matching, or it can bea focus of interest as in genetic studies, longitudinal studies ofrecurrent events and other studies involving multiplemeasurements. The correlation between failure times can be accountedfor by fixed or random effects. A simulation study was designed tocompare the performance of the mixture likelihood approach toestimating the model with these frailty effects in censored bivariatesurvival data. It is found that the mixture method is surprisinglyrobust to misspecification of the frailty distribution. Profile Likelihood and the EM-algorithm A Review of the Case-Crossover Design & Applications The case-crossover design -- a case-control study in which the subject serves as his own control -- was formally introduced by the epidemiologist Malcolm McClure in 1991. He described it as `a method for studying transient effects on the the risk of acute events'. The design will be described and discussed in the context of several published applications (including participation and Robert Tibshirani), evaluating the questions: Are MI's more likely following (i) sexual activity? (ii) coffee drinking? (iii) episodes of anger? Are auto accidents more likely while using a cell phone? Exploring Multivariate Data with Density Trees Classification trees are widely used as rules for assigningobservations to classes based on their attributes or features.A classification tree is equivalent to a partition of thefeature space into rectangular regions, with a constant estimate of class probabilities in each region. Density treesare proposed as a variation on this idea, designed to examine the multivariate distribution of the features themselves. A tree-structured approach is used to partition the feature space into low- and high-density regions; that is, regions withespecially low or especially high numbers of observations relative to an arbitrary reference distribution. This results in a nonparametric, piecewise-constant estimate of the joint distribution of the features. Because the regions are defined by simpleinequalities on individual features, density trees can providea direct and interpretable description of multivariate structure.In addition, they may be useful for identifying regionswhere prediction models derived from the data are poorly supported by observations. Nonparametric regression for longitudinal data My talk is motivated by an applied example where it is desirable to fita nonparametric regression model for data that were obtainedlongitudinally. Even though theory for nonparametric regression forindependent data have been well developed, there are still questions thatneed to be answered for applying nonparametric methods to the longitudinaldata. Simulations are conducted to compare some current available methodsas well as some news ones. These methods are also applied to a realexample. Generalized Nonlinear Regression
Please send your comments and suggestions about this web page to the BST Webmaster () | |||||||
of Rochester Medical Center, 1999-2005. Disclaimer. For questions or suggestions concerning the content of these pages, contactthe URMC Webmaster. |