In this stage of the pandemic, demand for all kinds of semiconductor chips has exploded and semiconductor fabs are struggling to keep up. Traditional IC customers are now competing with the demands of new compute applications (such as AI and autonomous driving) to get their custom-design wafers fabricated. Unfortunately, increasing the wafer capacity of semiconductor fabs to react with new waves of demand is very expensive and challenging. Once a decision is made to increase a fab’s capacity, it can take months to purchase new tools, install them in the fab, and qualify them for production.
In cases like these, a better option is to improve the the yield – that is, the ratio of "good" die to total die per wafer – to get more chips out of existing capacity. Increasing the yield also directly increases fab profit margins, with even small enhancements in yield driving massive profits – so, let’s just increase the die yield! Easier said than done.
Before the rise of Data Science in the semiconductor industry, device engineers would methodically hunt through available data searching for any clues or correlations to the yield. As humans, we like problems where a result can be attributed to a single factor. We’d love to have a strong correlation with yield and one specific process parameter – be that a film deposition rate, an etch rate, a lithography dimension tolerance, or any number of 1000s of process parameters. Unfortunately, this is almost never the case. The processes used to make modern semiconductor chips compound on one-another constantly.
New breed of device and yield engineer is needed to work with ever increasing process complexity and volumes of metrology data— a deep understanding of device physics and fab processes must be combined with a data science toolbelt to find the true yield triggers.
Variables Overlapping
To help understand the problem, consider an Aluminum routing layer. In this simple process we have an Aluminum deposition step, lithography for patterning, a plasma etch to define the features, followed by cleans. If we were to have a simple yield failure, such as neighboring lines being electrically shorting, the two most likely candidates would be that the Aluminum was deposited thicker than expected (preventing the etch from fully separating the lines), or the rate of the etch was slower than expected. However, in most process controlled fabs, the most often answer is that both processes were within the bounds of the statistical process control (SPC) system, and are therefore operating within the acceptable range. So what gives? In this case, the confluence of the deposition being a little thicker than normal and the etch being a little slower than normal can be a yield killer. Extracting which combination of variables contribute to failure is a problem perfectly set up for machine learning. We’re actually pretty lucky if our problem is as simple as this Aluminum routing case – it usually gets much worse.

Consider a new IC design running on an advanced-node (16nm or less) process. Physical failures like electrical shorts or electrical opens are simple. However, devices can fail to yield because the circuit timing path prevents the chip from performing it’s desired function. Maybe the turn-on voltage of the transistors is off target? Maybe the junction capacitance is high, increasing the RC product and introducing delay? In a semiconductor process flow containing hundreds of process steps, how would we go about finding the root cause?
Plan of attack
The benefit of a data science approach is the ability to evaluate the impacts of many variables simultaneously. In this case, we can incorporate four main types of variables:
- Inline metrology: physical measurements made upon the wafers as they processes, including things like film thicknesses and densities, lithographic size accuracy and alignment, contaminant particle defects, and many others.
- End-of-line electrical test: Data from the discrete scribeline devices (devices printed in the streets between die) tested at end-of-line, prior to dicing. The individual devices in the scribe are designed to tease out certain process dependencies and offer unique process insights (like Kelvin contact resistors).
- Tool process qualification runs: Tools regularly run a set process (identical to or very similar to the production process) on test wafers that can be directly measured. The results such as deposition rates, etch rates, or non-uniformity are tracked in a statistical process control (SPC) system and have associated date stamps.
- Product circuit performance: Depending on the chip and types of test equipment needed to interrogate it, the product die are tested at end-of-line (before dicing), after dicing, or post-packaging. Regardless, this performance or yield is what we’re trying to optimize as our dependent variable. This can either be on a pass/fail criteria (classification), performance binning (also classification), or raw performance (regression).
Using all four variable types together gives the engineer significant capability in taking the outcome (measured performance) and drilling down to the physical root cause, empowering engineers to directly impact yield through process tool adjustments.
Excursion versus Optimization
Where to go next really depends on what our main goals are. Did a given wafer or lot dramatically decrease in die yield? In that case, we know that whatever happened, happened on a wafer level. As we play detective, this drastically simplifies our data preparation process. Here, we’re looking for wafer-level shifts in variables, meaning we can probably average values from all four variables types by wafer and still find the smoking gun. A simple check using functions like varImp()
in Caret (R) or feature_importances_
in Scikit-learn (Python) can quickly give insights to which variables have direct impacts on yield.
Instead, if we have a fairly stable process, but on each wafer only 70% of the die are functional (or high enough performance to be sell-able), we have an opportunity to optimize. At a high level, are the failed die randomly spread across the wafer, or are they all grouped in one physical region of the wafer? This alone can tell us if the failures are defect-based (such as a particle landing on the wafer mid-process), or process non-uniformity based, allowing us to filter many of the potential variables.
It’s important to note that wafer-level processes tend to have unique non-uniformity profiles. Many will be radial, but gradient profiles are common, and even sombrero-shaped profiles can occur. How these tool process profiles match up with our yield is a fantastic tool. But the human eye can only pick out so much if multiple processes overlay, and again it’s data science to the rescue.

Before we get there, however, we have more data preparation to do for wafer-level yield engineering. While we may have end-of-line scribeline test data and metrology data that associate with each product die, finding a similar association with tool-qualification (a.k.a. "qual") data is tricky, as those are often measured on unpatterned wafers with evenly distributed 9, 13, or 39-point maps with radial coordinates. For our data we’ll need to map each die to the physically closest measurement, which usually includes the following steps:
- In our qual data we’ll translate (radius, theta) coordinates to (x,y) Cartesian coordinates and ensure the data collection date is in a proper
datetime
format. - Populate our product data set with the processing date/time that the lot ran through each process that has associated qual data.
- For a given lot/wafer we’ll iterate over our qual data to filter down to the run with the closest possible date, and find the measurement nearest to the product die coordinates.
Once our data is combined in this fashion, we have a very content-rich variable data set, and we can automate the process of looking for simple 1:1 correlations between any variable and our dependent variable (product die performance). However, to really get down to the interlinking of our variables we’ll need to dive into creating machine learning models.
Modelling
In this step, the goal isn’t to create the highest accuracy model – the goal is to create a model that does a reasonable job of predicting yield but is also easily interpreted by the yield engineer. Here we’ll want to stick to algorithms like Decision trees, Linear/logistic regression, Lasso regression, or Naive Bayes. Once we find an appropriate algorithm, we can dive into the visualizations to learn how our model "thinks" and drive actions in the fab to improve yield.
Predictive Yield
The next big leap comes from the predictive power of data science. What if we could to predict wafer yield before it even finishes processing? This kind of information could drive stakeholders to make important lot decisions such as:
- Triggering a rework process (recovering and repeating a certain step)
- Provide a milestone report to customers that a lot is in good health
- Scrapping a lot to cut losses and prevent future tool time on a "dead" lot
- Starting a "make-up" lot to replace a low-yielding lot to minimize the delivery delay to the customer.
Unlike in the yield improvement steps, here we’re looking to have the most accurate model possible. Interpretability is a distant "nice-to have", so we are able to explore a much broader range of models. Once we have a model selected, we can also use tools like Variable Importance and Recursive Feature Elimination to see what data is really driving our yield detectability. With that data in hand we can go as far as eliminating some existing metrology data collection to further reduce costs and process cycle times.
With a good enough model, we can also do virtual process windowing – we can alter the acceptable SPC range for a process to see how broadening or tightening control limits can impact yield. Done iteratively, the control limits for every process in the fab can be directly tied to wafer yield. This impacts how preventative maintenance is performed, how new tools are introduced to the line, and even the frequency of qual procedures – all of which impact fab costs.
So there we have it! Using the power of data science we can reshape how we investigate yield excursions, improve overall line yield, and predict wafer yield before it even finishes processing. In the hands of semiconductor device engineers who understand the physics of fabrication, these predictive Modelling capabilities have dramatic impacts on wafer process yield and fab profits. Thanks for reading!