Title: | Functions for Likelihood Asymptotics |
---|---|
Description: | Functions for computing the r and r* statistics for inference on an arbitrary scalar function of model parameters, plus some code for the (modified) profile likelihood. |
Authors: | Ruggero Bellio and Donald Pierce |
Maintainer: | Ruggero Bellio <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.51 |
Built: | 2025-02-27 04:01:53 UTC |
Source: | https://github.com/cran/likelihoodAsy |
Some functions for likelihood asymptotics, based on the expository paper
Pierce, D.A. and Bellio, R. (2017). Modern likelihood-frequentist inference. International Statistical Review, 85, 519-541.
For a detailed introduction see the vignette (browseVignettes("likelihoodAsy")
), which includes an
R script that carries out the examples.
Author: Ruggero Bellio and Donald A Pierce
Maintainer: Ruggero Bellio [email protected]
Binomial data from a bioassay.
A data frame with 10 observations on the following 3 variables.
z
a numeric vector of doses.
den
a numeric vector of binomial denominators.
y
binomial responses.
Finney, D.J. (1947). Probit Analysis: A Statistical Treatment of the Sigmoid Response Curve. Cambridge University Press, London and New York.
This function evaluates the Modified Profile Likelihood (MPL) for a subset of the model parameter. The result is optionally returned with a minus sign, so the function can be used directly as input to a general-purpose optimizer.
logMPL(psival, data, mle, floglik, fscore=NULL, indpsi, datagen, R=500, seed=NULL, minus=FALSE, onestep=FALSE, jhat=NULL, trace=FALSE)
logMPL(psival, data, mle, floglik, fscore=NULL, indpsi, datagen, R=500, seed=NULL, minus=FALSE, onestep=FALSE, jhat=NULL, trace=FALSE)
psival |
A numerical vector containing the value of the parameter of interest. |
data |
The data as a list. All the elements required to compute the likelihood function
at a given parameter value should be included in this list. The required format of such list
will be determined by the user-provided function |
mle |
A numerical vector, containing the maximum likelihood estimate of the entire model parameter. |
floglik |
A function which returns the log likelihood function at a given parameter value.
In particular, for a certain parameter value contained in a numerical vector |
fscore |
An optional function which returns the score function at a given parameter value. It must return a numerical
vector of the same length of |
indpsi |
A vector of integers in the range |
datagen |
A function which simulates a data set. A call |
R |
The number of Monte Carlo replicates used for computing the modified profile likelihood. A positive integer, default
is |
seed |
Optional positive integer, the random seed for the Monte Carlo computation. Default is |
minus |
Logical. Should the modified profile likelihood be multiplied by -1? This may be useful for usage with
optimizers. Default is |
onestep |
Logical. If set to |
jhat |
A squared matrix with dimension equal to |
trace |
Logical. When set to |
The function implements the Modified Profile Likelihood employing the approximation to sample space derivatives proposed in Skovgaard (1996). The function is designed to be used with external functions, such as optimizers and evaluators over a grid of points.
A scalar value, minus the modified profile likelihood at psival
.
Severini, T.A. (2000). Likelihood Methods in Statistics. Oxford University Press.
Skovgaard, I.M. (1996) An explicit large-deviation approximation to one-parameter tests. Bernoulli, 2, 145–165.
# Approximating the conditional likelihood for logistic regression # Let us define the various functions # Log likelihood for logistic regression loglik.logit<- function(theta, data) { y <- data$y den <- data$den X <- data$X eta <- X %*% theta p <- plogis(eta) l <- sum(y * log(p) + (den - y) * log(1-p)) return(l) } # Score function grad.logit<- function(theta, data) { y <- data$y den <- data$den X <- data$X eta <- X %*% theta p <- plogis(eta) out <- t(y - p * den) %*% X return(drop(out)) } # Data generator gendat.logit<- function(theta, data) { X <- data$X eta <- X %*% theta p <- plogis(eta) out <- data out$y <- rbinom(length(data$y), size = data$den, prob = p) return(out) } # Famous crying babies data data(babies) mod.glm <- glm(formula = cbind(r1, r2) ~ day + lull - 1, family = binomial, data = babies) data.obj <- list(y = babies$r1, den = babies$r1 + babies$r2, X = model.matrix(mod.glm)) # Numerical optimization of profile and modified profile log likelihoods max.prof <- nlminb(0, logPL, data=data.obj, thetainit=coef(mod.glm), floglik=loglik.logit, fscore=grad.logit, indpsi=19, minus=TRUE, trace=FALSE) max.mpl <- nlminb(0, logMPL, data=data.obj, mle=coef(mod.glm), floglik=loglik.logit, fscore=grad.logit, datagen=gendat.logit, indpsi=19, R=50, seed=2020, minus=TRUE, trace=FALSE) c(max.prof$par, max.mpl$par) # We can plot the profile likelihood and the modified profile likelihood # R=50 suffices for the modified profile likelihood as the model is a full exp. family psi.vals <- seq(-0.3, 3.7, l=20) obj.prof <- sapply(psi.vals, logPL, data=data.obj, thetainit=coef(mod.glm), floglik=loglik.logit, fscore=grad.logit, indpsi=19, trace=FALSE) obj.mpl <- sapply(psi.vals, logMPL, data=data.obj, mle=coef(mod.glm), floglik=loglik.logit, fscore=grad.logit, datagen=gendat.logit, indpsi=19, trace=FALSE, R=50, seed=2020) par(pch="s") plot(psi.vals, obj.prof - max(obj.prof), type="l", xlab=expression(psi), ylab="log likelihood", lwd=2, las=1) lines(psi.vals, obj.mpl - max(obj.mpl), col="red", lwd=2) legend("topright", col=c(1, 2), lty=1, lwd=2, legend=c("Profile","MPL"), bty="n")
# Approximating the conditional likelihood for logistic regression # Let us define the various functions # Log likelihood for logistic regression loglik.logit<- function(theta, data) { y <- data$y den <- data$den X <- data$X eta <- X %*% theta p <- plogis(eta) l <- sum(y * log(p) + (den - y) * log(1-p)) return(l) } # Score function grad.logit<- function(theta, data) { y <- data$y den <- data$den X <- data$X eta <- X %*% theta p <- plogis(eta) out <- t(y - p * den) %*% X return(drop(out)) } # Data generator gendat.logit<- function(theta, data) { X <- data$X eta <- X %*% theta p <- plogis(eta) out <- data out$y <- rbinom(length(data$y), size = data$den, prob = p) return(out) } # Famous crying babies data data(babies) mod.glm <- glm(formula = cbind(r1, r2) ~ day + lull - 1, family = binomial, data = babies) data.obj <- list(y = babies$r1, den = babies$r1 + babies$r2, X = model.matrix(mod.glm)) # Numerical optimization of profile and modified profile log likelihoods max.prof <- nlminb(0, logPL, data=data.obj, thetainit=coef(mod.glm), floglik=loglik.logit, fscore=grad.logit, indpsi=19, minus=TRUE, trace=FALSE) max.mpl <- nlminb(0, logMPL, data=data.obj, mle=coef(mod.glm), floglik=loglik.logit, fscore=grad.logit, datagen=gendat.logit, indpsi=19, R=50, seed=2020, minus=TRUE, trace=FALSE) c(max.prof$par, max.mpl$par) # We can plot the profile likelihood and the modified profile likelihood # R=50 suffices for the modified profile likelihood as the model is a full exp. family psi.vals <- seq(-0.3, 3.7, l=20) obj.prof <- sapply(psi.vals, logPL, data=data.obj, thetainit=coef(mod.glm), floglik=loglik.logit, fscore=grad.logit, indpsi=19, trace=FALSE) obj.mpl <- sapply(psi.vals, logMPL, data=data.obj, mle=coef(mod.glm), floglik=loglik.logit, fscore=grad.logit, datagen=gendat.logit, indpsi=19, trace=FALSE, R=50, seed=2020) par(pch="s") plot(psi.vals, obj.prof - max(obj.prof), type="l", xlab=expression(psi), ylab="log likelihood", lwd=2, las=1) lines(psi.vals, obj.mpl - max(obj.mpl), col="red", lwd=2) legend("topright", col=c(1, 2), lty=1, lwd=2, legend=c("Profile","MPL"), bty="n")
This function evaluates the profile likelihood for a subset of the model parameter. The result is optionally returned with a minus sign, so the function can be used directly as input to a general-purpose optimizer.
logPL(psival, data, thetainit, floglik, fscore=NULL, indpsi, minus=FALSE, onestep=FALSE, jhat=NULL, trace=FALSE)
logPL(psival, data, thetainit, floglik, fscore=NULL, indpsi, minus=FALSE, onestep=FALSE, jhat=NULL, trace=FALSE)
psival |
A numerical vector containing the value of the parameter of interest. |
data |
The data as a list. All the elements required to compute the likelihood function
at a given parameter value should be included in this list. The required format of such list
will be determined by the user-provided function |
thetainit |
A numerical vector with the size of the entire model parameter, that will be used as starting
point in the constrained optimization performed to obtain the maximum likelihood estimate under
the null. The specific meaning of |
floglik |
A function which returns the log likelihood function at a given parameter value.
In particular, for a certain parameter value contained in a numerical vector |
fscore |
An optional function which returns the score function at a given parameter value. It must return a numerical
vector of the same length as |
indpsi |
A vector of integers in the range |
minus |
Logical. Should the profile likelihood be multiplied by -1? This may be useful for usage with optimizers.
Default is |
onestep |
Logical. If set to |
jhat |
A squared matrix with dimension equal to |
trace |
Logical. When set to |
This function is designed to be used with external functions, such as optimizers and evaluators over a grid of points.
A scalar value, minus the profile likelihood at psival
.
Severini, T.A. (2000). Likelihood Methods in Statistics. Oxford University Press.
# A negative binomial example, taken from Venables and Ripley (2002, MASS4 book) library(MASS) # The quine data are analysed in Section 7.4 data(quine) # We fit a model with just the main effects quine.nb1 <- glm.nb(Days ~ Eth + Sex + Age + Lrn, data = quine) # The data list includes the design matrix and the response vector quinedata<-list(X=model.matrix(quine.nb1), y=quine$Days) # Let us define the various functions # Log likelihood, log link logLikNbin <- function(theta,data) { y <- data$y X <- data$X eta <- X %*% theta[1:ncol(X)] mu <- exp(eta) alpha <- theta[ncol(X)+1] l <- sum(lgamma(y + alpha) + y * log(mu) - (alpha + y) * log(alpha + mu) - lgamma(alpha) + alpha * log(alpha)) return(l) } # Score function gradLikNbin <- function(theta,data) { y <- data$y X <- data$X eta <- X %*% theta[1:ncol(X)] mu <- exp(eta) alpha <- theta[ncol(X)+1] g <-rep(0,ncol(X)+1) g[1:ncol(X)] <- t(y - (alpha+y)*mu / (alpha+mu)) %*% X g[ncol(X)+1] <- sum(digamma( y + alpha) - log(alpha + mu) - (alpha + y) / (alpha + mu) - digamma(alpha) + 1 + log(alpha)) return(g) } # Data generator genDataNbin<- function(theta,data) { out <- data X <- data$X eta<- X %*% theta[1:ncol(X)] mu <- exp(eta) out$y <- rnegbin(length(data$y), mu=mu, theta=theta[ncol(X)+1]) return(out) } # First we refine the maximum likelihood estimates mleFull <- optim( c(coef(quine.nb1),quine.nb1$theta), logLikNbin, gr=gradLikNbin, method="BFGS", data=quinedata, control=list(fnscale=-1), hessian=TRUE) # Then we can plot the profile likelihood list.psi <- seq(0.90, 1.70, l=30) list.prof <- sapply(list.psi, logPL, data=quinedata, thetainit=mleFull$par, floglik=logLikNbin, fscore=gradLikNbin, indpsi=8, trace=FALSE) plot(list.psi, list.prof-max(list.prof), type="l", xlab=expression(psi), ylab="Log likelihood")
# A negative binomial example, taken from Venables and Ripley (2002, MASS4 book) library(MASS) # The quine data are analysed in Section 7.4 data(quine) # We fit a model with just the main effects quine.nb1 <- glm.nb(Days ~ Eth + Sex + Age + Lrn, data = quine) # The data list includes the design matrix and the response vector quinedata<-list(X=model.matrix(quine.nb1), y=quine$Days) # Let us define the various functions # Log likelihood, log link logLikNbin <- function(theta,data) { y <- data$y X <- data$X eta <- X %*% theta[1:ncol(X)] mu <- exp(eta) alpha <- theta[ncol(X)+1] l <- sum(lgamma(y + alpha) + y * log(mu) - (alpha + y) * log(alpha + mu) - lgamma(alpha) + alpha * log(alpha)) return(l) } # Score function gradLikNbin <- function(theta,data) { y <- data$y X <- data$X eta <- X %*% theta[1:ncol(X)] mu <- exp(eta) alpha <- theta[ncol(X)+1] g <-rep(0,ncol(X)+1) g[1:ncol(X)] <- t(y - (alpha+y)*mu / (alpha+mu)) %*% X g[ncol(X)+1] <- sum(digamma( y + alpha) - log(alpha + mu) - (alpha + y) / (alpha + mu) - digamma(alpha) + 1 + log(alpha)) return(g) } # Data generator genDataNbin<- function(theta,data) { out <- data X <- data$X eta<- X %*% theta[1:ncol(X)] mu <- exp(eta) out$y <- rnegbin(length(data$y), mu=mu, theta=theta[ncol(X)+1]) return(out) } # First we refine the maximum likelihood estimates mleFull <- optim( c(coef(quine.nb1),quine.nb1$theta), logLikNbin, gr=gradLikNbin, method="BFGS", data=quinedata, control=list(fnscale=-1), hessian=TRUE) # Then we can plot the profile likelihood list.psi <- seq(0.90, 1.70, l=30) list.prof <- sapply(list.psi, logPL, data=quinedata, thetainit=mleFull$par, floglik=logLikNbin, fscore=gradLikNbin, indpsi=8, trace=FALSE) plot(list.psi, list.prof-max(list.prof), type="l", xlab=expression(psi), ylab="Log likelihood")
This function evaluates the r* statistic for testing of a scalar function of interest.
rstar(data, thetainit, floglik, fscore=NULL, fpsi, psival, datagen, R=1000, seed=NULL, trace=TRUE, ronly=FALSE, psidesc=NULL, constr.opt="solnp")
rstar(data, thetainit, floglik, fscore=NULL, fpsi, psival, datagen, R=1000, seed=NULL, trace=TRUE, ronly=FALSE, psidesc=NULL, constr.opt="solnp")
data |
The data as a list. All the elements required to compute the log likelihood function at a given parameter value should be included in this list. |
thetainit |
A numerical vector containing the initial value for the parameter of the model. It will be used as starting point in the numerical optimization of the log likelihood function. |
floglik |
A function which returns the log likelihood function at a given parameter value.
In particular, for a certain parameter value contained in a numerical vector |
fscore |
An optional function which returns the score function at a given parameter value. It must return a numerical
vector of the same length of |
fpsi |
A function which specifies the parameter of interest. A call |
psival |
A numerical scalar value containing the value of the parameter of interest under testing. |
datagen |
A function which simulates a data set. A call |
R |
The number of Monte Carlo replicates used for computing the r* statistic. A positive integer, default is |
seed |
Optional positive integer, the random seed for the Monte Carlo computation. Default is |
trace |
Logical. When set to |
ronly |
Logical. If set to |
psidesc |
An optional character string describing the nature of the parameter of interest. Default is |
constr.opt |
Constrained optimizer used for maximizing the log likelihood function under the null hypothesis. Possible
values are |
The function computes the r* statistic proposed by Skovgaard (1996) for accurate computation of the asymptotic distribution of the signed likelihood ratio test for a scalar function of interest.
The function requires the user to provide three functions defining the log likelihood function, the scalar parametric function of interest, and a function for generating a data set from the assumed statistical model. A further function returning the gradient of the log likelihood is not required, but if provided it will speed up the computation.
When ronly = TRUE
the function returns the value of the signed likelihood ratio test
statistic r only.
The function handles also one-parameter models.
The returned value is an object of class "rstar"
, containing the following components:
r |
The observed value the signed likelihodo ratio test statistic r for testing |
NP , INF
|
The Nuisance Parameter adjustment (NP) and the Information adjustment (INF) from the decomposition of the r*-r adjustment. The former is not computed for one-parameter models. Neither one is computed when |
rs |
The observed value of the r* statistic. Not computed when |
theta.hat |
The maximum likelihood estimate of the parameter theta, the argument of the |
info.hat |
The observed information matrix evaluated at |
se.theta.hat |
The estimated standard error of |
psi.hat |
The parameter of interest evaluated at |
se.psi.hat |
The estimated standard error for the parameter of interest. Not computed when |
theta.hyp |
The constrained estimate of the parameter, under the null hypothesis |
psi.hyp |
The value under testing, equal to |
seed |
Random seed used for Monte Carlo trials. Not returned when |
psidesc |
A character string describing the nature of the parameter of interest. |
R |
Number of Monte Carlo replicates used for computing the r* statistic. Not returned when |
There are print
and summary
methods for this class.
The method implemented in this function was proposed in
Skovgaard, I.M. (1996) An explicit large-deviation approximation to one-parameter tests. Bernoulli, 2, 145–165.
For a general review
Severini, T.A. (2000). Likelihood Methods in Statistics. Oxford University Press.
# Autoregressive model of order 1 # We use the lh data from MASS library(MASS) data(lh) dat.y <- list(y=as.numeric(lh)) # First let us define the function returning the log likelihood function # We employ careful parameterizations for the correlation and variance to # avoid numerical problems likAR1 <- function(theta, data) { y <- data$y mu <- theta[1] phi <- theta[2] ### phi is log(sigma) sigma2 <- exp(phi*2) z <- theta[3] ### z is Fisher'z transform for rho rho <- (exp(2*z)-1) / (1 + exp(2*z)) n <- length(y) Gamma1 <- diag(1+c(0,rep(rho^2,n-2),0)) for(i in 2:n) Gamma1[i,i-1]<- Gamma1[i-1,i] <- -rho lik <- -n/2 * log(sigma2) + 0.5 * log(1-rho^2) -1/(2*sigma2) * mahalanobis(y, rep(mu,n), Gamma1, inverted = TRUE) return(lik) } # We need a function for simulating a data set genDataAR1 <- function(theta, data) { out <- data mu <- theta[1] sigma <- exp(theta[2]) z <- theta[3] rho <- (exp(2*z)-1) / (1 + exp(2*z)) n <- length(data$y) y <- rep(0,n) y[1] <- rnorm(1,mu,s=sigma*sqrt(1/(1-rho^2))) for(i in 2:n) y[i] <- mu + rho * (y[i-1]-mu) + rnorm(1) * sigma out$y <- y return(out) } # For inference on the mean parameter we need a function returning the first component of theta psifcn.mu <- function(theta) theta[1] # Now we can call the function rs.mu <- rstar(dat.y, c(0,0,0), likAR1, fpsi=psifcn.mu, psival=2, datagen=genDataAR1, R=1000, trace=TRUE, psidesc="mean parameter") summary(rs.mu)
# Autoregressive model of order 1 # We use the lh data from MASS library(MASS) data(lh) dat.y <- list(y=as.numeric(lh)) # First let us define the function returning the log likelihood function # We employ careful parameterizations for the correlation and variance to # avoid numerical problems likAR1 <- function(theta, data) { y <- data$y mu <- theta[1] phi <- theta[2] ### phi is log(sigma) sigma2 <- exp(phi*2) z <- theta[3] ### z is Fisher'z transform for rho rho <- (exp(2*z)-1) / (1 + exp(2*z)) n <- length(y) Gamma1 <- diag(1+c(0,rep(rho^2,n-2),0)) for(i in 2:n) Gamma1[i,i-1]<- Gamma1[i-1,i] <- -rho lik <- -n/2 * log(sigma2) + 0.5 * log(1-rho^2) -1/(2*sigma2) * mahalanobis(y, rep(mu,n), Gamma1, inverted = TRUE) return(lik) } # We need a function for simulating a data set genDataAR1 <- function(theta, data) { out <- data mu <- theta[1] sigma <- exp(theta[2]) z <- theta[3] rho <- (exp(2*z)-1) / (1 + exp(2*z)) n <- length(data$y) y <- rep(0,n) y[1] <- rnorm(1,mu,s=sigma*sqrt(1/(1-rho^2))) for(i in 2:n) y[i] <- mu + rho * (y[i-1]-mu) + rnorm(1) * sigma out$y <- y return(out) } # For inference on the mean parameter we need a function returning the first component of theta psifcn.mu <- function(theta) theta[1] # Now we can call the function rs.mu <- rstar(dat.y, c(0,0,0), likAR1, fpsi=psifcn.mu, psival=2, datagen=genDataAR1, R=1000, trace=TRUE, psidesc="mean parameter") summary(rs.mu)
This function obtains confidence intervals for a scalar function of interest, based on the r* statistic.
rstar.ci(data, thetainit, floglik, fscore=NULL, fpsi, datagen, R=1000, seed=NULL, ronly=FALSE, psidesc=NULL, constr.opt="solnp", lower=NULL, upper=NULL, control=list(...), ...)
rstar.ci(data, thetainit, floglik, fscore=NULL, fpsi, datagen, R=1000, seed=NULL, ronly=FALSE, psidesc=NULL, constr.opt="solnp", lower=NULL, upper=NULL, control=list(...), ...)
data |
The data as a list. All the elements required to compute the likelihood function at a given parameter value should be included in this list. |
thetainit |
A numerical vector containing the initial value for the parameter of the model. It will be used as starting point in the numerical optimization of the likelihood function. |
floglik |
A function which returns the log likelihood function at a given parameter value.
In particular, for a certain parameter value contained in a numerical vector |
fscore |
An optional function which returns the score function at a given parameter value. It must return a
numerical vector of the same length of |
fpsi |
A function which specifies the parameter of interest. A call |
datagen |
A function which simulates a data set. A call |
R |
The number of Monte Carlo replicates used for computing the r* statistic. A positive integer, default is
|
seed |
Optional positive integer, the random seed for the Monte Carlo computation. Default is |
ronly |
Logical. If set to |
psidesc |
An optional character string describing the nature of the parameter of interest. Default is |
constr.opt |
Constrained optimizer used for maximizing the log likelihood function under the null hypothesis. Possible
values are |
lower , upper
|
Optional numeric values defining the lower/upper limit of a grid of points for the parameter of interest,
where the r* statistic will be evaluated. Default is |
control |
A list of parameters for controlling the computation of confidence intervals. See rstar.ci.control. |
... |
Arguments to be used to form the default |
The function obtains 90%, 95% and 99% two-sided confidence intervals for the scalar function of interest based on the r* statistic.
The function requires the user to provide three functions defining the log likelihood function, the scalar parametric function of interest, and a function for generating a data set from the assumed statistical model. A further function returning the gradient of the log likelihood is not required, but if provided it will speed up the computation.
When ronly = TRUE
the function literally returns the value of the signed likelihood ratio test
statistic r only. The function handles also one-parameter models.
The function provides two different strategies to obtain the various confidence intervals. The default
strategy, invoked by leaving either lower
or upper
to NULL
, starts from the MLE
and moves away in a stepwise fashion, until the r* statistic crosses the standard normal quantiles
corresponding to the 99% two-sided confidence interval. It is crucial to start the search a bit away
from the MLE, where the r* is singular, and this is regulated by the away
argument of the
rstar.ci.control function. The first strategy may fail to cross the target normal quantiles when
the profile likelihood has an upper asymptote. For such cases, and for any other instances when the
output of the default strategy is deemed not satisfactory, it is possible to specify the
range of a grid of values where the r* statistic will be evaluated. The lower
and upper
argument specify the lower and upper limit of such grid, whereas the number of points is controlled by the
npoints
of the rstar.ci.control function.
The returned value is an object of class "rstarci"
, containing the following components:
psivals |
A list of values for the parameter of interest for which the r and r* statistics have been evaluated. |
rvals |
A numerical list containing the values of the r statistic evaluated at each element of |
NPvals , INFvals
|
Numerical lists containing the values of the Nuisance Parameter adjustment (NP) and the Information
adjustment (INF) from the decomposition of the r*-r adjustment, for each of the |
rsvals |
The observed value of the r* statistic at each element of |
CIr |
A 3 x 2 matrix containing the 90%, 95% and 99% confidence intervals for the parameter of interest (first, second and third row respectively) based on the first-order r statistic. |
CIrs |
A 3 x 2 matrix containing the 90%, 95% and 99% confidence intervals for the parameter of interest (first,
second and third row respectively) absed on the r* statistic. Not computed when |
seed |
Random seed used for Monte Carlo replicates used for computing the r* statistic. Not returned when |
psidesc |
A character string describing the nature of the parameter of interest. |
R |
Number of Monte Carlo replicates used for computing the r* statistic. Not returned when |
There are print
, summary
and plot
methods for this class.
The method implemented in this function was proposed in
Skovgaard, I.M. (1996). An explicit large-deviation approximation to one-parameter tests. Bernoulli, 2, 145–165.
For a general review
Severini, T.A. (2000). Likelihood Methods in Statistics. Oxford University Press.
# A negative binomial example, taken from Venables and Ripley (2002, MASS4 book) library(MASS) # The quine data are analysed in Section 7.4 data(quine) # We fit a model with just the main effects quine.nb1 <- glm.nb(Days ~ Eth + Sex + Age + Lrn, data = quine) # The data list includes the design matrix and the response vector quinedata <- list(X=model.matrix(quine.nb1), y=quine$Days) # Let us define the required functions # Log likelihood, log link logLikNbin <- function(theta,data) { y <- data$y X <- data$X eta <- X %*% theta[1:ncol(X)] mu <- exp(eta) alpha <- theta[ncol(X)+1] l <- sum(lgamma(y + alpha) + y * log(mu) - (alpha + y) * log(alpha + mu) - lgamma(alpha) + alpha * log(alpha)) return(l) } # Score function gradLikNbin <- function(theta,data) { y <- data$y X <- data$X eta <- X %*% theta[1:ncol(X)] mu <- exp(eta) alpha <- theta[ncol(X)+1] g <-rep(0,ncol(X)+1) g[1:ncol(X)] <- t(y - (alpha+y)*mu / (alpha+mu)) %*% X g[ncol(X)+1] <- sum(digamma( y + alpha) - log(alpha + mu) - (alpha + y) / (alpha + mu) - digamma(alpha) + 1 + log(alpha)) return(g) } # Data generator genDataNbin <- function(theta,data) { out <- data X <- data$X eta<- X %*% theta[1:ncol(X)] mu <- exp(eta) out$y <- rnegbin(length(data$y), mu=mu, theta=theta[ncol(X)+1]) return(out) } # Confidence intervals for the coefficient of EthN ## Not run: obj <- rstar.ci(quinedata, thetainit=c(coef(quine.nb1),quine.nb1$theta), floglik=logLikNbin, datagen=genDataNbin, fscore=gradLikNbin, fpsi=function(theta) theta[2], R=1000, psidesc="Coefficient of EthN") print(obj) summary(obj) plot(obj) # Confidence intervals for the overdispersion parameter obj <- rstar.ci(quinedata, thetainit=c(coef(quine.nb1),quine.nb1$theta), floglik=logLikNbin, datagen=genDataNbin, fscore=gradLikNbin, fpsi=function(theta) theta[8], R=1000, psidesc="Overdispersion parameter") summary(obj) plot(obj) ## End(Not run)
# A negative binomial example, taken from Venables and Ripley (2002, MASS4 book) library(MASS) # The quine data are analysed in Section 7.4 data(quine) # We fit a model with just the main effects quine.nb1 <- glm.nb(Days ~ Eth + Sex + Age + Lrn, data = quine) # The data list includes the design matrix and the response vector quinedata <- list(X=model.matrix(quine.nb1), y=quine$Days) # Let us define the required functions # Log likelihood, log link logLikNbin <- function(theta,data) { y <- data$y X <- data$X eta <- X %*% theta[1:ncol(X)] mu <- exp(eta) alpha <- theta[ncol(X)+1] l <- sum(lgamma(y + alpha) + y * log(mu) - (alpha + y) * log(alpha + mu) - lgamma(alpha) + alpha * log(alpha)) return(l) } # Score function gradLikNbin <- function(theta,data) { y <- data$y X <- data$X eta <- X %*% theta[1:ncol(X)] mu <- exp(eta) alpha <- theta[ncol(X)+1] g <-rep(0,ncol(X)+1) g[1:ncol(X)] <- t(y - (alpha+y)*mu / (alpha+mu)) %*% X g[ncol(X)+1] <- sum(digamma( y + alpha) - log(alpha + mu) - (alpha + y) / (alpha + mu) - digamma(alpha) + 1 + log(alpha)) return(g) } # Data generator genDataNbin <- function(theta,data) { out <- data X <- data$X eta<- X %*% theta[1:ncol(X)] mu <- exp(eta) out$y <- rnegbin(length(data$y), mu=mu, theta=theta[ncol(X)+1]) return(out) } # Confidence intervals for the coefficient of EthN ## Not run: obj <- rstar.ci(quinedata, thetainit=c(coef(quine.nb1),quine.nb1$theta), floglik=logLikNbin, datagen=genDataNbin, fscore=gradLikNbin, fpsi=function(theta) theta[2], R=1000, psidesc="Coefficient of EthN") print(obj) summary(obj) plot(obj) # Confidence intervals for the overdispersion parameter obj <- rstar.ci(quinedata, thetainit=c(coef(quine.nb1),quine.nb1$theta), floglik=logLikNbin, datagen=genDataNbin, fscore=gradLikNbin, fpsi=function(theta) theta[8], R=1000, psidesc="Overdispersion parameter") summary(obj) plot(obj) ## End(Not run)
Auxiliary function for rstar.ci
.
rstar.ci.control(npoints=10, away=0.3, stepsizefac=3, maxstep=50, trace=TRUE)
rstar.ci.control(npoints=10, away=0.3, stepsizefac=3, maxstep=50, trace=TRUE)
npoints |
Integer giving the number of points at which the r* and r statistics will be evaluated away from the MLE
in each direction when both |
away |
Positive value indicating how far from the MLE the computation of the r* and r statistics will be started, expressed
in units of standard error of the scalar function of interest. Default is |
stepsizefac |
Positive value used to determine the stepsize of the confidence interval algorithm when either |
maxstep |
Integer giving the maximum number of steps that will be taken for crossing the normal quantiles for a 99
confidence interval for the r* statistic. Default is |
trace |
Logical indicating if output should be produced during the computation. Default is |
A list with components named as the arguments.
# A variation on example(rstar.ci): ## Not run: obj <- rstar.ci(quinedata, thetainit=c(coef(quine.nb1),quine.nb1$theta), floglik=logLikNbin, datagen=genDataNbin, fscore=gradLikNbin, fpsi=function(theta) theta[2], R=1000, psidesc="Coefficient of EthN", npoints=5, away=0.1) plot(obj) ## End(Not run)
# A variation on example(rstar.ci): ## Not run: obj <- rstar.ci(quinedata, thetainit=c(coef(quine.nb1),quine.nb1$theta), floglik=logLikNbin, datagen=genDataNbin, fscore=gradLikNbin, fpsi=function(theta) theta[2], R=1000, psidesc="Coefficient of EthN", npoints=5, away=0.1) plot(obj) ## End(Not run)