This tutorial will introduce Survival Analysis and its wide scope. Survival analysis is applicable to all fields that deal with time-dependent data: medicine, biology, engineering, marketing, finance, and many others. A data scientist in the role of a survival analyst is involved in questions concerning clinical trials, customer churn, time till violent death of Roman emperors, the failure rate of products or systems, and a practically infinite number of other situations that raise the question: "When will it happen, and what’s its risk of failure or its chance of success?" After the introduction in chapter 1, we will explore a number of examples in chapter 2 – and then complete the tutorial with bathtub-shaped lifetimes.
- What is Survival Analysis?
-
- Force of Mortality and Hazard Rate
-
- System Failure
- Parametric Survival Analysis by Example
-
- Example: Exponential Lifetime Model
-
- Example: Weibull Lifetime Model
-
- Bathtub-Shaped Lifetimes
1. What is Survival Analysis?
1.1 Time-to-Event Analysis – the Concept
To begin with: What is force of mortality? It is the conditional probability of death at a particular instant after having survived up to that instant. Also called the intensity of mortality or the instantaneous death rate in actuarial science. In fields that are less concerned about predicting the transition to the afterlife, it is better known as the hazard rate.
This sounds as if the data science methods we are going to discuss refer to just one type of problem: time till biological exit, seen from an insurance company’s angle.
Far from it. The classes of events from which we want to extract analytical insights arise in fields as diverse as medical therapy and clinical trials, sociology, product development, life insurance, biology and ecology, engineering, criminal recidivism, credit risk, political science (for instance: survival time of dictatorships), and history (for instance, the mean time till violent death of Roman emperors was the subject of multiple studies: "a most dangerous occupation"; and Roman Empire: a survival analysis | Royal Society).

In other words, the methods of survival analysis deal with every situation imaginable. Survival analysis is complementary to time series analysis.
The focus of survival analysis is on the time until an event of interest occurs – the failure time, survival time, or event time — and its associated risks.
It goes beyond the prediction of death or system failure and can also refer to event types with less finality. The event of interest can be any phenomenon along the time axis:
- time between first and second childbirth (ojs)
- time till divorce: survival analysis on marriage and divorce (researchgate)
- time to criminal recidivism: Who Returns to Prison
- time till failure: risk that equipment fails within its targeted operational life or within the warranty period
- medical therapy or clinical trials: metabolic absorption rate of a drug; the 5-year survival rate after a cancer treatment or after a transplant; time till recovery after any illness; time till the maximum antibody count is reached after a vaccination; time till the antibody count will fall below a threshold
- survival analysis of water pipelines (awa.asn.au)
- time to biological or ecological events: A unified survival-analysis approach to insect population development (nature.com); the hiding time of prey; the waiting time of predators
- customer churn or employee attrition rate: time when a certain percentage of customers or employees will leave a business or will not renew their subscription
- time till a company will default on its debt
- time till half of the Bachelorette candidates will have failed at receiving a rose and must go home
In engineering, survival analysis is typically called reliability analysis. In economics, duration modelling. In sociology, event history analysis.
A more comprehensive synonym that expresses the universal scope of survival analysis is Time-to-Event Analysis. The event of interest can be any time-dependent occurrence.
Methods of survival analysis come in three basic flavors:
- non-parametric: for instance, the Kaplan-Meier estimator – it makes no assumption about the shape of the hazard function, nor any assumptions about the effects covariates or exogenous regressors exert
- semi-parametric: for instance, the Cox model – it makes no assumption about the shape of the hazard function; but it does make assumptions about covariates and how they influence the hazard function
- parametric: assumptions about the hazard function are expressed by probability distributions such as the exponential, Weibull, log-logistic, Gompertz, extreme value, Rayleigh, or log-normal models
In today’s tutorial, we will set the focus on parametric survival analysis. Quite a few of its tools are already known to us from other applications of probability distributions.

1.2 The Elements of Parametric Survival Analysis
Before we compute concrete examples, we need to learn a few technical terms which will become our focal points: the analytical descriptors we want to extract from the data.
Some of the following terms were coined by reliability analysis and actuarial science. Remember that what these two fields call a failure can also represent any neutral event of interest in other fields. For actuaries and reliability engineers, the event of interest just happens to be the failure of a system: mechanical, electronic, or biological. You may have to nudge your thought process to translate this traditional vocabulary to words that are more adequate for the events you want to study. But the functional tools will be applicable to all types of time-dependent events.
- CDF(t): the cumulative distribution function = the lifetime distribution function = expresses the fraction of all units that will fail by time t; conversely, expresses the probability that a randomly drawn unit fails by time t. Instead of "failure", we can also read "event": units that will be subject to the event of interest by time t.
- pdf(t): the probability density function = the event density = rate of failure per time unit
- SF(t) or R(t): the survival, survivor, or reliability function = simply the complement of the CDF = 1 – CDF(t) = fraction of all units that will survive (remain in operation; or not be subject to the event of interest) for at least t hours = the probability that the event will occur later than time t
- Failures or events until time t:
-
- If N is the number of samples, then CDF(t) * N is the number of failures (events) until time t;
-
- while SF(t) * N is the number of survivors: units that will still be in operation at time t.
- AFR(t1,t2): the average failure rate in the time interval [t1,t2], simply the number of failures counted in the interval, divided by the time span measured in months or days
- h(t): the hazard function, instantaneous failure rate, or force of mortality. This is the core factor for the lifetime models we will study.
-
- Also simply called the hazard rate or failure rate. We need to be aware that the hazard function is not necessarily a constant like the average failure rate AFR. The hazard function is a curve that can change from moment to moment, depending on the behavior of its underlying "hazard process", therefore its instantaneous failure rate moniker.
-
- It measures the failure rate of the units that have survived until time t, but will fail in the next instant t*
-
- The hazard rate expresses the propensity of a system to fail when it has reached a certain age. It is a conditional failure rate: failure at time t, given that a unit has survived until t
-
- Note that the hazard function cannot be interpreted as a probability – its integral over the interval [0;∞) is infinite. h(t) is the probability density function of the failure distribution.
-
- It can be computed as the quotient of probability density function pdf(t) and survival function SF(t):

- H(t): the cumulative hazard function. The hazard rate h(t) can be integrated to obtain H(t), in analogy to the relationship between probability density function and cumulative probability function.
- In reliability analysis, failure rates are often expressed by specific technical terms:
-
- 1.0%/K = %failure rate per "kilo hour" = 1% failure rate over 1000 hours
-
- 1.0 PPM/K = parts (failures) per million per kilo hour = 1 failure among 1 million components over 1000 hours
-
- 1.0 FIT = fails in time = 1 failure per 1 billion hours
- mean time to failure MTTF: the average of all the time spans till failure across the population
- mean residual life time of a unit after having survived for t hours
2. Parametric Survival Analysis by Example
In this chapter, we will go through several examples, extract the descriptors we have discussed in chapter 1, and interpret them.
2.1 Shape of the Failure Rate
The core property by which one survival model differs from other variants is the shape of its failure rate. We distinguish:
- constant failure rate
- monotonically increasing rate
- monotonically decreasing rate
- inconstant failure rate, among them the so-called bathtub shape
We will compute examples for each of these cases.
2.2 System Failure: The Multiplication Rule

Let’s take the example of a system that is composed of N constituent elements, for instance 12 electronic circuits that control the rocket engines in the first space ship headed for Mars. Each of the circuits operates on a separate task, a different booster stage. But the propulsion system in its entirety will misfire should any of the 12 fail.
Assume that any circuit has a survival probability of 0.999 within its targeted time window of operational use. Their eventual failures are not correlated, they would fail independently. The survival function of an individual component returns SF(t) = 0.999. What is the probability that the propulsion system will suffer a desaster? The multiplication rule for independent events tell us that the survival probability of the entire system till time t is S(t) = (0.999)¹² = 98.8%. If the system depends on 100 instead of 12 components, its survival function value will decrease to (0.999)¹⁰⁰ = 90.4%.
A space ship relies on tens of thousands of mission-critical components. Therefore, redundancy becomes an inevitable guard rail: our simplified example implies that a system that has just 100 critical components would need backups for at least 9.6% of them – provided that they are independent. But if the defect of the first one could trigger a cascade to neighboring components, the redundancy reserves would have to be higher.
2.2 Exponential Distribution
The exponential distribution is the most basic variant among the parametric failure time models. It assumes a constant failure rate, lambda > 0.

To instantiate the exponential distribution in SciPy, we need to define SciPy’s scale parameter as the inverse of the failure rate lambda. SciPy uses a scale parameter for all its distributions; it does not make an exception for the exponential distribution. Yet a more typical definition of the exponential distribution, as shown in Wikipedia, would parametrize it with a failure rate (aka the inverse scale).
- SciPy: expon(loc, scale), with scale = (1 / failure rate lambda)
The scale parameter represents the mean time between failures.
To obtain the hazard function h(t), we apply the definition we saw in chapter 1.2:
- h(t) = pdf(t) / (1 – CDF(t))
If you glance at the above screenshot that shows the formulas, you see that this quotient will simplify to h(t) = lambda.
The exponential distribution is the only case of a lifetime distribution with a time-invariant failure rate. This constancy is the definition of the exponential random process. At any time t, new defects turn up at the same level lambda. The exponential distribution is memory-less.
This is of course a bold assumption. It can be aligned with reality, though, if the analysis focuses on a time window that is narrow enough to exclude wear-out effects; and also excludes the so-called infant mortality phase – the time window immediately after a new process has been set in motion but before it has been stabilized by weeding out its early flaws. Thus, the exponential distribution can be applicable to the center section of the so-called bathtub curve model we will discuss further below.
Example: One type of circuit boards for our space ship to Mars will be switched on some hours after the launch. They will be shielded from the mechanical stresses during lift-off. The trip, including the return to Earth, will take 36 months. The manufacturer knows that this period is to short to result in wear-out caused by chemical or mechanical aging. But during the voyage, cosmic radiation will randomly hit some of the circuits. Radiation tests in the laboratory let us expect that circuits will burn out at a rate of 0.06%/K. The rate will be constant because the radiation exposure will vary only by minimal amounts after day 1, when the space ship leaves the Van Allen radiation belt, 58,000 km from Earth.
Translation: 0.06% per "kilo hour" = 0.06 / (100 * 1000). This is the failure rate lambda we can insert into our model. In SciPy’s notation, we use its inverse, 1/lambda, as the scale parameter: 1,666,667. The same 1.67M hours also represent the mean time till failure for any circuit.
In row 6, we tabulate a range of hours we want to analyze.
Row 7 instantiates SciPy’s expon distribution. The location parameter is set to zero: circuits can frizzle soon as they are switched on, without a waiting period after time zero. The scale parameter is set to the inverse of the rate lambda.
In row 8, we evaluate the cumulative distribution function by inserting into it the range of hours T. We collect both the hours T and the cdf outcomes in a dictionary in row 9. Then the list comprehension in row 10 prints the dictionary.
Row 15 computes the mean time till failure, based on the exponential distributions, and confirms that it is equal to 1/lambda.

After 20,000 hours of operation, or 833 days into the voyage, 1.19% of the circuits will have suffered a knock-out blow by high-energy particles. This suggests that a reserve of less than 2% will be able to offset the radiation risk.

Unsurprisingly, we observe a linear ascent of the aggregated failures in the cdf curve; and a constant hazard function.
This confirms that the exponential distribution operates with a constant failure rate.

Our next evaluation calculates the quantiles. After how many hours will 1.0% of the circuits have shown a blue-screen error? In lines 2 to 5, we prepare a range of probabilities which we will feed to the percent point function in row 6. Lines 6 to 8 collect the quantiles and the associated probabilities in a dictionary, which the list comprehension in line 9 prints row by row.
The table reports the 1% quantile at 16,751 hours.

System failure
In chapter 2.2, I had mentioned the multiplication rule: A system that would break down if just one of its N components fails has a survival probability equal to the product of the component-specific survival functions. If the N components are still functional by time T with probability s, then the system’s survival probability S by time T will be the product of the individual survival probabilities.

Whereas the failure rate of the system is the sum of the individual failure rates. If these are the same for all N components, then the system (our space ship) will exhibit a failure rate of N lambda, also following an exponential distribution. Its mean time till failure MTTF will be the inverse, 1 / (Nlambda). This is called the exponential closure property.
Let’s translate this to our example. If the space ship operates 100 independent, identical circuit boards, each with a lambda rate reported as 0.06%/K, and the system will break down when the first one expires, then what will be the risk that the ship will sort of capsize in space within T hours of operation?

The failure rate LL of the system simply is N times the individual rate L. After 5,000 hours of operation, 7 months into the flight, the risk of a shipwreck in space would already amount to almost 26% – unless backup circuits could be switched on as soon as the first sparks begin to fly.
Distribution fitting
At the beginning of the exercise, we reported that radiation tests in the lab had provided a failure rate. The exponential distribution, with its single parameter, is the easiest model to derive from observational data. We obtain the parameter estimate by dividing the number of failures by the number of observation hours.
Censoring
The calculation of the failure rate becomes more complex when we consider censoring.
When the circuits have been tested for T hours, some of them will have been recorded with the exact time t at which they failed. The survivors will burn out at some later time. But we cannot pin down the moments when they will expire. This source data issue is called right-censoring of type I: The test time T is fixed, but some units will survive for an unknown time beyond T.
A different test – one that is open-ended in time, but will be wrapped up when the k-th defect will have been observed, whenever it occurs – reports type II-censored results.
In both cases, we would have to adjust the estimated failure rate to obtain a more accurate estimate for the population of all circuits. Censoring is an aspect I will discuss in one of the next articles.
2.3 Weibull Distributions
The exponential model rests on the assumption that no wear-out mechanism will drive up the failure rate over time. That’s an assumption that should be carefully scrutinized. In our space ship example, we were justified to assume that it will be a hit-or-miss game of chance: a circuit may or may not be knocked out by a collision with a single high-energy particle.
But the accumulation of smaller radiation damages could act as a wear-and-tear process that requires an increasing rate of expiry in our lifetime model.

After arrival on Mars, our equipment will be exposed to dust storms and steep temperature differences. These will certainly contribute to an upswing in defects.
In 1951, the Swedish engineer Ernst Weibull developed the distribution that now bears his name. It was the response to a need, across industries, for a model that could depict real-world processes, which frequently did not exhibit constant failure rates. Weibull’s distribution can mirror
- processes with decreasing rates when the defective components are weeded out (shape parameter < 1);
- and also wear-out or aging processes with increasing rates (shape parameter > 1).
- If the Weibull shape parameter is set to 1, the distribution reduces to a special case: the exponential distribution.
- It also comprises the Rayleigh distribution as another special case if the shape is set to 2, which generates a linearly increasing hazard rate.

SciPy offers the Weibull model under the name _weibullmin. Wikipedia defines it with a shape and a scale parameter. In SciPy, you can turn it into a 3-parameter Weibull by inserting a location parameter between shape and scale. Shape and scale must be positive numbers.
- The location sets the minimum time on the x-axis at which observations start.
- The scale is the difference between maximum and minimum; in other words, maximum time tmax = location + scale. The scale parameter is also called the characteristic life: the span between minimum and maximum time.
The Weibull hazard function cannot be read out from a single parameter, unlike the lambda in the exponential case. It is time-dependent and influenced by both shape and scale:

Initially, we will assume that we have already established that the lifetime of the circuit boards we are going to use on the surface of Mars will follow a Weibull distribution. The lifetime shows an increasing failure rate. Later, we will look into distribution fitting to confirm the initial assumption.
The estimated characteristic life is 50,000 hours. The shape parameter is 1.5. We leave the location parameter at 0 – there is no waiting time: circuits can cease to function as soon as the device in which they are located is exposed to the hostile environment.


The hazard function’s slope becomes steeper during the wear-out phase.
Shape
The Weibull shape parameter controls the ascent or descent of the curves.
- 0 < shape < 1 : decreasing hazard rate
- shape = 1 : constant failure rate
- shape > 1: increasing hazard rate
- shape = 2: Rayleigh distribution: linearly increasing hazard rate
- 3 ≤ shape ≤ 4: the cdf curve begins to resemble a normal distribution’s bell curve, though with different skew and kurtosis
Characteristic Life
The scale parameter serves as a kind of anchor point:

We know that 68% of observations that follow a normal distribution lie within 1 standard deviation around its mean. In the case of any Weibull distribution, 63.2% of all observations fall below its characteristic life, irrespective of the shape parameter.
System Failure
If a system is composed of N independent elements that can be modeled with the same Weibull shape parameter, and the system will collapse when the first component breaks, then the hazard rate of the system also follows a Weibull distribution with that shape parameter.
The system’s characteristic life is a more complicated expression:

Our task is as follows:

Our Mars Rover relies on four circuit boards that control its instruments. Their lifetime curves can be modeled with a Weibull shape parameter, which the manufacturer pinned down to 1.5 for all four boards during testing. The boards all have different characteristic lives, of 8,000, 10,000, 11,000, and 14,000 hours, though, because they comprise different numbers of processors. The higher the number, the earlier a board can reach its critical point.
Which feats of endurance can we expect from the rover?

The list comprehension Chars computes the characteristic life of the rover by implementing the formula shown above: 4,058 hours.
The Weibull model that describes the rover informs us that it predicts a mean time till expiry of only 3,664 hours.

Fitting a Weibull Distribution
At the beginning of the chapter we assumed that the manufacturer had already pinned down the shape and the characteristic life for us. Now let’s make a quick foray into distribution fitting. The engineering team of the manufacturer has provided us with the raw data of their tests, and it will be up to us to identify a lifetime distribution that matches their data.
For this exercise, I created a synthetic set of 100 observations: variates of a Weibull distribution, interspersed with some random jitter. Then I will demonstrate SciPy’s distribution fitter method.
An earlier article of mine (Probability Distributions and Distribution Fitting with Python’s SciPy | Towards Data Science) discussed an auto-fitter script that tested 60 different candidate distributions against the source data. Today, we limit the task to finding the parameters for one distribution, the Weibull model.
The array of observational data is called randWW in the script.
- Before I apply SciPy’s fit function, I compute the quantile at the probability 0.632. Remember that this quantile corresponds to a Weibull’s characteristic life. I will provide it as an initial guesstimate to the fitter.
- In the fit function, we set the location parameter to zero. The prefix f in floc tells the fitter to leave the user-defined value unchanged. Based on experience, I’d advise to fix the location value whenever you can. You may have prior knowledge that the failure process will start at time zero, as in our case. If not, set the minimum value of the observations as the location parameter. Or else, the fitter could struggle to come up with a plausible result.

The fit function will return the parameter tuple that minimizes the prediction error. In this case, 1.45 for the shape and 48,578 hours for the characteristic life.
We call up SciPy’s Kolmogorov-Smirnov test for goodness of fit.

Its result, a high p-value of 0.401, confirms that the observational data are closely aligned with the Weibull parametrization the fitter has found. The fit is of medium quality (even though it far exceeds the usual significance level of 0.05), rather than perfect, because the sample size of 100 is small and therefore prone to outliers.

From Monotony to Bathtubs
The Weibull process models monotonically increasing, decreasing, or constant failure rates. It does not reflect, in its usual parametrization, the most typical failure process: the so-called bathtub curve. Let’s dip our toe into it.
2.4 Bathtub-Shaped Lifetimes

Many processes and systems exhibit lifetime curves that resemble the profile of a bathtub.
- Their early lifetime is characterized by a falling rate of obsolescence and waste. A new product or a new production process runs through a burn-in or learning phase with high scrappage in its initial stages.
- Over time, the weak parts – or the failure-prone process steps – are weeded out by quality control measures. The more robust components will determine the lower failure rates during the next phase.
- Later, the end phase will exhibit an increasing rate of expiry that ascends the right wall of the bathtub shape. This is the wear-out period. Components or systems – mechanical, electronical, or biological – will continue to degrade at an increasing pace while they age.

- The center phase is characterized by relatively flat failure rates. Neither degradation by wear-out nor the initial structural problems that were solved during the system’s infancy exert much influence on this so-called stable failure period. Failures do occur, but randomly.
2.4.1 Exponentiated Weibull Distribution
The Exponentiated Weibull Distribution is one of the few distributions flexible enough to draw a bathtub-shaped hazard function. It adds a second shape parameter, the exponentiation factor, to the Weibull-Min distribution. The additional parameter expands the flexibility of is curve, which can simulate constant, increasing, decreasing, or bathtub-like shapes.
Conveniently, SciPy makes it available via its subclass exponweib. If its exponentiation parameter is set to 1, then the Exponentiated Weibull is reduced to its special case, the Weibull-Min distribution.


For practical purposes, though, I find it difficult to identify a parameter combination that moves the sharp elbow of the blue hazard function farther to the right. The extremely steep descent of the failure rate at the beginning is hard to overcome by tweaking the parameters and does not quite fit many real-world scenarios in which the descent phase extends over more periods.
2.4.3 The Hjorth Distribution
The Hjorth distribution offers a more adaptable bathtub curve.
Since it is not available in SciPy’s catalogue, we implement it as a new distribution subclass that inherits from SciPy’s _rvcontinuous class. We’ll need six lines of code to accomplish it.
The inheritance shown in line 1 provides a great convenience: We only need to define either the probability density function pdf or the cumulative distribution function cdf (lines 7 to 11). Then SciPy’s parent class can take over and will compute all the other properties such as the moments, percentage point function, and survival function, by numerical integration or differentiation.
The Hjorth distribution takes a location parameter m > 0, a scale s > 0, and the shape parameter f ≥ 0.
If the shape parameter is set to zero, the Hjorth distribution is reduced to the Rayleigh distribution. The Rayleigh distribution is a special case of the Weibull distribution if the Weibull shape is set to 2. Thus, the Weibull and Hjorth distributions are kind of cousins, twice removed.

Our focus is on the hazard function, for which we are seeking a setup that yields a bathtub shape. We begin with location m = 20, scale s = 1, and shape f = 0.1

In code line 11, we compute the hazard function as the quotient of probability density function and survival function, both of which SciPy derives from our new subclass _hjorthgen.

We’ve got our bathtub. The parameter tuple (m, s, f) = (20, 1, 0.1) provides us with the blue curve of the hazard rate.
Similarly, the parameter tuple (3, 1, 3) would result in a shape with a more moderate increase in its right tail (red on the left).

The tuple (2, 47, 0.33) would depict a steeper increase of the bathtub curve.
Now we can start to evaluate the Hjorth distribution in the same way as we did with the other distributions that were characterized by constant or monotonical failure rates.
3. Conclusions
We have learnt that our existing knowledge of probability distributions can be transferred to the tasks we encounter in parametric survival analysis.
We have not yet touched on complications that can affect our source data: censoring and truncation, two core issues in survival analysis, but something we face less frequently in other data science fields. This will be a topic in a future article.
The Jupyter notebook is available for download on GitHub: h3ik0th/paramsurvival: tutorial on parametric survival analysis with SciPy (github.com)