There are a bunch of issues here. The primary one is that your likelihood expression is wrong (you can't log the components separately and then add them, you have to add the components and then take the log). Your bounds are also funny: the mixture probability should be [0,1] and the means should be [0, Inf].
The other problem you have is that with the current simulation design (n=20, prob=0.01), you have a high probability of getting no points in the first mixture component (the probability of a point being in the second component is 1-0.01=0.99, so the probability that all of the points are in the second component is 0.99^20 = 82%). In this case the MLE will be degenerate (i.e., you're trying to fit a two-component mixture to a data set that essentially only has one component); in this case any of these solutions will give equivalent likelihoods:
- prob=0, mu2=mean of the data, mu1=anything
- prob=1, mu1=mean of the data, mu2=anything
- mu1=mu2=mean of the data, prob=anything
With all these solutions, where you end up will depend very sensitively on starting conditions and optimization algorithm.
For this problem I would encourage you to use the built-in dmixexp2
function from the Renext
package (which correctly implements the log-likelihood as log(p*Prob(X|exp1) + (1-p)*Prob(X|exp2))
) and the formula interface to mle2
:
fit_norm <- mle2(X ~ dmixexp2(rate1=1/mu1,rate2=1/mu2,prob1=eps),
data=list(X=X),
start = list(mu1=1, mu2 = 2, eps=0.4),
lower = c(mu1=0, mu2=0, eps=0),
upper = c(mu1=Inf, mu2=Inf, eps=1),
method = 'L-BFGS-B')
This gives me estimates of mu1=1.58
, mu2=2.702
, eps=0
. mean(X)
in my case equals the value of mu2
, so this is the first case in the bulleted list above. You also get a warning:
some parameters are on the boundary: variance-covariance calculations based on Hessian may be unreliable
There are also a variety of more specialized algorithms for fitting mixture models (especially those based on the expectation-maximization algorithm); you can look for packages on CRAN (flexmix
is one of them).
This problem is small enough that you can visualize the whole log-likelihood surface by brute force (code below): the colours represent deviations from the minimum negative log-likelihood (the colour gradient is log-scaled, so there's a small offset to avoid log(0)). Dark blue represents parameters that are the best fit to the data, yellow are the worst.
dd <- expand.grid(mu1=seq(0.1,4,length=51),
mu2=seq(0.1,4,length=51),
eps=seq(0,1,length=9),
nll=NA)
for (i in 1:nrow(dd)) {
dd$nll[i] <- with(dd[i,],
-sum(dmixexp2(X,rate1=1/mu1,
rate2=1/mu2,
prob1=eps,
log=TRUE)))
}
library(ggplot2)
ggplot(dd,aes(mu1,mu2,fill=nll-min(nll)+1e-4)) +
facet_wrap(~eps, labeller=label_both) +
geom_raster() +
scale_fill_viridis_c(trans="log10") +
scale_x_continuous(expand=c(0,0)) +
scale_y_continuous(expand=c(0,0)) +
theme(panel.spacing=grid::unit(0.1,"lines"))
ggsave("fit_norm.png", type="cairo-png")