
I learned something new recently – that it is possible to exclude hidden rows and/or columns when reading Excel files as Pandas DataFrames, all thanks to the openpyxl
package, and I’d like to share it with you.
It is not uncommon to have rows or columns of an Excel file hidden. Less-relevant columns may be hidden to keep a spreadsheet neater; or rows may be hidden to deliberately ignore duplicate data.
Whatever the reasons are, it may be of value to keep them hidden when loading a file as a Pandas DataFrame, as it may affect subsequent data wrangling and exploration.
Unfortunately, the traditional way of reading Excel files as Pandas DataFrames – using the pandas.read_excel()
method – does not facilitate that.
In this post, we will explore how we can identify hidden rows and columns of an Excel file using the Openpyxl
package and hence exclude them when loading it as a Pandas DataFrame.
The dataset
We will be using a small subset of the titanic training dataset publicly available on Kaggle. For simplicity, I have limited the number of rows of data to 20. I have also deliberately hidden Columns F ("Age"), I ("Ticket") and K ("Cabin"), as well as Rows 6, 11, 16 and 21.

Why pandas.read_excel()
does not work?
Typically, we use Pandas’ read_excel()
method to load a dataset in Excel as a Pandas DataFrame. Let’s see what happens when we do that for our titanic dataset containing hidden rows and columns.
>>> import pandas as pd
>>> df = pd.read_excel("data/titanic.xlsx")
>>> df

By default, the read_excel()
method reads in data from all rows and columns from the specified Excel file. In other words, it does not exclude the hidden rows and columns.
Installation of openpyxl
To load as Pandas DataFrames without hidden rows and columns, we can use the openpyxl package, a Python library to "read/write Excel 2010 xlsx/xlsm/xltx/xltm files".
Since openpyxl is not a standard Python built-in library, you will first need to install it. Open a command line window and type the following command:
>>> pip install openpyxl
Loading Excel file and worksheet using openpyxl
To open an existing Excel file using the openpyxl
package, we use the openpyxl.load_workbook()
method, specifying the name of the path where the Excel file is stored.
>>> import openpyxl
# Open an Excel workbook
>>> workbook = openpyxl.load_workbook("data/titanic.xlsx")
This creates a Workbook
object which, according to the documentation, is the "top-level container for all document information". This object contains many attributes pertaining to the input file, including the .sheetnames
attribute which returns the list of the names of all worksheets in the workbook.
# Create a list of names of all worksheets in `workbook`
>>> sheet_names = workbook.sheetnames
# Create a `Worksheet` object
>>> worksheet = workbook[sheet_names[0]]
In our titanic.xlsx
file, we only have one worksheet named "train", so we get the sheet name by taking the first element of the sheet_names
list. Next, we create a Worksheet
object from the Workbook
object.
Finding indices of hidden rows
Similarly, the Worksheet
object contains attributes pertaining to the specified worksheet. To find indices of all hidden rows, we make use of the .row_dimensions
attribute of the Worksheet
object, like this:
# List of indices corresponding to all hidden rows
>>> hidden_rows_idx = [
row - 2
for row, dimension in worksheet.row_dimensions.items()
if dimension.hidden
]
>>> print(hidden_rows_idx)
[4, 9, 14, 19]
Notice that we need to take row - 2
instead of just row
because we want to find indices corresponding to the Pandas DataFrame, not the Excel file.
Finding names of hidden columns
To find names of all hidden columns, we first use the .column_dimension
attribute of the Worksheet
object:
# List of indices corresponding to all hidden columns
>>> hidden_cols = [
col
for col, dimension in worksheet.column_dimensions.items()
if dimension.hidden
]
>>> print(hidden_cols)
['F', 'I', 'K']
This generates a list comprising uppercase alphabets that correspond to the hidden columns of an Excel worksheet. Thus, we need to somehow convert the hidden_cols
list into a list of names of the hidden columns. To do so, we use Python’s built-in library, string
, and its .ascii_uppercase
attribute:
# List of indices corresponding to all hidden columns
>>> hidden_cols_idx = [
string.ascii_uppercase.index(col_name)
for col_name in hidden_cols
]
# Find names of columns corresponding to hidden column indices
>>> hidden_cols_name = df.columns[hidden_cols_idx].tolist()
>>> print(hidden_cols_name)
['Age', 'Ticket', 'Cabin']
Disclaimer: Do note that using string.ascii_uppercase
assumes that there are at most 26 columns. If there are more than 26 columns, the code will need to be modified.
Finally, once we have the indices for the hidden rows and names for the hidden columns, the rest is simple. To exclude those hidden rows and columns, we simply use Pandas’ .drop()
method.
# Drop the hidden columns
>>> df.drop(hidden_cols_name, axis=1, inplace=True)
# Drop the hidden rows
>>> df.drop(hidden_rows_idx, axis=0, inplace=True)
# Reset the index
>>> df.reset_index(drop=True, inplace=True)
>>> df

Putting it all together
Here’s a code snippet that puts together the above codes with some simple refactoring:
Wrapping it up
So, there you have it – a Python code that allows you to read an Excel file containing hidden rows and/or columns as is, as Pandas DataFrames.
This particular use-case only scratches the surface of what the openpyxl
package offers. For more information about the openpyxl
package, check out its documentation here.
The codes shown in this post can also be found as a notebook at this GitHub repo.
Hello! I’m Zeya. Thanks for reading this post. If you’ve found it useful, do let me know in the comments. I welcome _ discussions, questions and constructive feedback too. Feel free to follow me on Medium or reach out to me via LinkedIn or [Twitter](https://twitter.com/zeyalt). Have a great day!