The world’s leading publication for data science, AI, and ML professionals.

What you might not know about VLOOKUP

…and why you should perhaps start using pandas

Photo by Marten Newhall on Unsplash
Having a closer look at the data helps to make better predictions [image by Marten]

VLOOKUP is a basic Excel function that anyone required to match data across 2 tables is likely to use it. Perhaps even more amazing is the fact that some websites show resume samples of how to include the vlookup function in your resume. While there are many articles that write about how to use this function, some could be incomplete, outdated or shows a lack of depth. This article aims highlight the limitations in the vlookup function which are commonly overlooked, and then show some innovative hacks to overcome some of these limitations.

Vlookup is a built-in function in Excel that allows one to find the first occurrence of a specific value in the left-most column for a given array, then return a value in the same row corresponding to a specific column.

For most purposes, it is very useful. However, in specific situations, quite a bit of manual effort is required to shape the data before vlookup can be used effectively.

Helper/Duplicate columns

If Fruit & Qty columns are swapped, one cannot do a vlookup directly. A common workaround is to duplicate the column using a formula. (One can either duplicate the column containing the key to the left or move the column containing the value to the right.)

(e.g. when the left most column of array has duplicate values, or when two columns of data are required to be concatenate together to form the lookup key)

This is perfectly fine for simple tables, but when tables become somewhat complex, one will end up with many such duplicate columns. Typically, all the key columns will end up on the left. This is not sustainable.

Multi-Condition VLOOKUP

A highly respected Excel guru blogged about how to use array formula and the choose() function to create a vlookup that can look "left" without adding a helper column. Do note that you will need to press Ctrl+ Shift+ Enter instead of just Enter for array formula. This can give some unexpected results if not done.

This approach can also be used to perform a multi-condition vlookup.

We can evaluate the choose expression by highlighting the entire expression and pressing F9. Observe that it evaluates to an array with 5 rows, where the concatenation of Fruit & Column is the first entry and Qty is the second entry for each row:

Some articles may suggest converting the table to a grid with Fruit as columns and Color as rows and the intersection as value and then use INDEX() and MATCH(). This is fine for two criteria, anything more than 2 will be tricky to implement. Also such conversion is likely to be done manually. INDEX() & MATCH() are not that reader-friendly and can be hard to debug after a while. In short, I don’t recommend it; multi-condition vlookup is more scalable.

XLOOKUP

If you or your firm subscribe to Microsoft 365, the cleanest way to do is to use XLOOKUP.

We can highlight $I$4:$I$8&$H$4:$H$8 and press F9to evaluate the expression and observe that it concatenates the values pairwise.

Handling duplicates

The other imperfection of vlookup is that it can only return one value, hence if the key column contains duplicates, the returned value will correspond to the first row that matches. This is potentially dangerous, if the second value is the intended value, as the error will be passed silently. Therefore, one should always check that the key column does not contain duplicates before doing a vlookup. In the example below, there are 2 rows with the value Appleunder the Fruitcolumn. Vlookup returns only the first value under the Qty column.

Doing a left merge would return the bottom table with reference to the image below. And the additional row produced would provide an indication that there is a duplicate in the Fruit and Qty table.

Conclusion

  1. Check for duplicate values in the key column
  2. Think about whether using helper columns is more scalable, otherwise use multi-condition vlookup or xlookup
  3. vlookup is not equivalent to merge

Afterthoughts

The intent of the article is to bring about awareness to the limitations of vlookup and share innovative ways – which are more scalable, readable and hence maintainable – of using vlookup.

It is only after I’ve spent some time scripting in Python that I learnt "readability counts" and this made me more conscious of the way I write formula in Excel. There are people (well, I was one of them) who take a lot of pride in writing long nested-if-else statements in Excel. (Gosh! They are so hard/painful/mind-bending to amend, especially if you haven’t look at them in a while.) Hopefully, this will inspire the reader to think how best to prepare spreadsheets such that they can be easily read, maintained and debugged.

If you’ve enjoyed this story do check out the following as well:

Python tips for someone transiting from Excel

How to encrypt PDF and send as an email attachment using Python


Related Articles