
Motivation
In statistical modeling, we have to calculate the estimator to determine the equation of your model. The problem is, the estimator itself is difficult to calculate, especially when it involves some distributions like Beta, Gamma, or even Gompertz distribution.
Maximum Likelihood Estimator (MLE) is one of many methods to calculate the estimator for those distributions. In this article, I will give you some examples to calculate MLE with the Newton-Raphson method using R.
The Concept: MLE
First, we consider

as independent and identically distributed (iid) random variables with Probability Distribution Function (PDF)

where parameter θ is unknown. The basis of this method is the likelihood function given by

The log of this function – namely, the log-likelihood function – is denoted by

To determine the MLE, we determine the critical value of the log-likelihood function; that is, the MLE solves the equation

The Concept: Newton-Raphson Method
Newton-Raphson method is an iterative procedure to calculate the roots of function f. In this method, we want to approximate the roots of the function by calculating

where x_{n+1} are the (n+1)-th iteration. The goal of this method is to make the approximated result as close as possible with the exact result (that is, the roots of the function).
Putting it Together: Newton-Raphson Method for Calculating MLE
The Newton-Raphson method can be applied to generate a sequence that converges to the MLE. If we assume θ as a k×1 vector, we can iterate

where l'(θ) is the gradient vector of the log-likelihood function, and l”(θ) is the Hessian of the log-likelihood function.
Implementation in R
For the implementation, suppose that we have

and we want to estimate μ by using MLE. We know that the PDF of the Poisson distribution is

The likelihood function can be written as follows.

From the likelihood function above, we can express the log-likelihood function as follows.

In R, we can simply write the log-likelihood function by taking the logarithm of the PDF as follows.
#MLE Poisson
#PDF : f(x|mu) = (exp(-mu)*(mu^(x))/factorial(x))
#mu=t
loglik=expression(log((exp(-t)*(t^(x))/factorial(x))))
dbt=D(loglik,"t")
dbtt=D(dbt,"t")
Then, we calculate the first and second partial derivative of the log-likelihood function with respect to μ (then μ for the second one) by running dbt=D(loglik,"t")
and dbtt=D(dbt,"t")
, respectively. The results are as follows.
dbt=(exp(-t) * (t^((x) - 1) * (x)) - exp(-t) * (t^(x)))/factorial(x)/(exp(-t) *
(t^(x))/factorial(x))
dbtt=(exp(-t) * (t^(((x) - 1) - 1) * ((x) - 1) * (x)) - exp(-t) *
(t^((x) - 1) * (x)) - (exp(-t) * (t^((x) - 1) * (x)) - exp(-t) *
(t^(x))))/factorial(x)/(exp(-t) * (t^(x))/factorial(x)) -
(exp(-t) * (t^((x) - 1) * (x)) - exp(-t) * (t^(x)))/factorial(x) *
((exp(-t) * (t^((x) - 1) * (x)) - exp(-t) * (t^(x)))/factorial(x))/(exp(-t) *
(t^(x))/factorial(x))^2
Then, we can start to create the Newton-Raphson method function in R. First, we generate the random number that Poisson distributed as the data we used to calculate the MLE. For this function, we need these parameters as follows.
n
for the number of generated data that Poisson distributed,t
for the μ value, anditer
for the number of iteration for the Newton-Raphson method.
Since the MLE of Poisson distribution for the mean is μ, then we can write the first lines of codes for the function as follows.
x=rpois(n,t)
x.mean=mean(x)
par.hat=matrix(0,1,1)
estimate=c(rep(NULL,iter+1))
difference=c(rep(NULL,iter+1))
estimate[1]=t
difference[1]=abs(t-x.mean)
Then, we create the loop function to calculate the sum of the partial derivatives (which is why we just need to write the logarithm of the PDF for the log-likelihood function in R), the gradient vector, the Hessian matrix, and the MLE approximated value as follows.
for(i in 1:iter)
{
#First partial derivative of log-likelihood function with respect to mu
dbt=(exp(-t) * (t^((x) - 1) * (x)) - exp(-t) * (t^(x)))/factorial(x)/(exp(-t) *
(t^(x))/factorial(x))
#Second partial derivative of log-likelihood function with respect to mu, then mu
dbtt=(exp(-t) * (t^(((x) - 1) - 1) * ((x) - 1) * (x)) - exp(-t) *
(t^((x) - 1) * (x)) - (exp(-t) * (t^((x) - 1) * (x)) - exp(-t) *
(t^(x))))/factorial(x)/(exp(-t) * (t^(x))/factorial(x)) -
(exp(-t) * (t^((x) - 1) * (x)) - exp(-t) * (t^(x)))/factorial(x) *
((exp(-t) * (t^((x) - 1) * (x)) - exp(-t) * (t^(x)))/factorial(x))/(exp(-t) *
(t^(x))/factorial(x))^2
sdbt=sum(dbt)
sdbtt=sum(dbtt)
#hessian matrix
h=matrix(sdbtt,1,1)
#gradient vector
g=matrix(sdbt,1,1)
#parameter
par=matrix(t,1,1)
par.hat=par-solve(h)%*%g
t=par.hat[1,]
estimate[i+1]=t
difference[i+1]=t-x.mean
}
When the iteration reaches the limit, we need to calculate the difference of the actual and approximated value of MLE in each iteration to evaluate the Newton-Raphson method performance for calculating the MLE. The rule is simple: smaller difference, better performance. We can write it as the last lines of codes in our function as follows.
tabel=data.frame(estimate,difference)
rownames(tabel)=(c("Initiation",1:iter))
print(x)
print(tabel)
cat("The real MLE value for mu is :",x.mean,"n")
cat("The approximated MLE value for mu is",t,"n")
The complete function would be written as follows.
nr.poi=function(n,t,iter=100)
{
x=rpois(n,t)
x.mean=mean(x)
par.hat=matrix(0,1,1)
estimate=c(rep(NULL,iter+1))
difference=c(rep(NULL,iter+1))
estimate[1]=t
difference[1]=abs(t-x.mean)
for(i in 1:iter)
{
#First partial derivative of log-likelihood function with respect to mu
dbt=(exp(-t) * (t^((x) - 1) * (x)) - exp(-t) * (t^(x)))/factorial(x)/(exp(-t) *
(t^(x))/factorial(x))
#Second partial derivative of log-likelihood function with respect to mu, then mu
dbtt=(exp(-t) * (t^(((x) - 1) - 1) * ((x) - 1) * (x)) - exp(-t) *
(t^((x) - 1) * (x)) - (exp(-t) * (t^((x) - 1) * (x)) - exp(-t) *
(t^(x))))/factorial(x)/(exp(-t) * (t^(x))/factorial(x)) -
(exp(-t) * (t^((x) - 1) * (x)) - exp(-t) * (t^(x)))/factorial(x) *
((exp(-t) * (t^((x) - 1) * (x)) - exp(-t) * (t^(x)))/factorial(x))/(exp(-t) *
(t^(x))/factorial(x))^2
sdbt=sum(dbt)
sdbtt=sum(dbtt)
#hessian matrix
h=matrix(sdbtt,1,1)
#gradient vector
g=matrix(sdbt,1,1)
#parameter
par=matrix(t,1,1)
par.hat=par-solve(h)%*%g
t=par.hat[1,]
estimate[i+1]=t
difference[i+1]=t-x.mean
}
tabel=data.frame(estimate,difference)
rownames(tabel)=(c("Initiation",1:iter))
print(x)
print(tabel)
cat("The real MLE value for mu is :",x.mean,"n")
cat("The approximated MLE value for mu is",t,"n")
}
For the example of this function implementation, suppose that we want to calculate the MLE of 100 Poisson-distributed data with the mean of 5. By using the Newton-Raphson method function that has been written above with the number of the iteration is 5, the result as follows.
> nr.poi(100,5,5)
[1] 5 4 6 9 7 8 7 2 9 4 5 6 10 1 4 8 5 7 4 3 6 3 4 4 4 7 6 6 3 6 5 5 6 4 5 5 9 5
[39] 5 3 5 6 5 8 5 3 3 12 6 5 3 4 8 5 4 5 7 8 8 5 7 2 8 3 6 4 2 3 7 5 3 4 6 5 2 6
[77] 3 3 5 4 8 2 4 7 6 5 4 3 4 7 3 4 6 6 4 7 4 4 14 4
estimate difference
Initiation 5.000000 2.400000e-01
1 5.229008 -1.099237e-02
2 5.239977 -2.305956e-05
3 5.240000 -1.014779e-10
4 5.240000 0.000000e+00
5 5.240000 0.000000e+00
The real MLE value for mu is : 5.24
The approximated MLE value for mu is 5.24

From the result above, we can see that the Newton-Raphson method MLE produces the same result as the real MLE value. Please note that this method is using the generated data, so the result might be different in every run.
Conclusion
The MLE can help us to calculate the estimator based on their log-likelihood function. We can numerically approach the estimator result from MLE by using the Newton-Raphson method.
And here we are, you now can calculate the MLE with the Newton-Raphson method by using R! For more discussions about this topic, feel free to contact me via LinkedIn here.
References:
[1] Robert V. Hogg, Joseph W. McKean, and Allen T. Craig, Introduction to Mathematical Statistics, Seventh Edition (2013), Pearson Education.
[2] Adi Ben-Israel, A Newton-Raphson method for the solution of systems of equations (1966), Journal of Mathematical Analysis and Applications.
[3] https://bookdown.org/rdpeng/advstatcomp/newtons-method.html#proof-of-newtons-method