Sheldon Barry: Interacting with Excel using Python

Excel is one of the most common productivity tools used in any business or personal computing environment. It provides an easy to use interface that allows quick and easy analysis of data.

It's no wonder then, that there is a genuine use case for data generated in Python to be converted into some kind of format, for example CSV, that can be imported into Excel for further analysis.

Of course you could work with the data directly in Python, using a tool such as Pandas, but often this can overcomplicate the data analysis goals that can easily be achieved using Excel.

When streamlining or automating these workflows, it can be useful to be able to interact directly with an Excel workbook from within Python. Conversely, extracting data from Excel to be used in calculations or data analysis in Python, is also possible.

I have often used Python to create automation of workflows, such as exporting Python data to Excel, reading data from multiple Excel data sources in order to consolidate information, or reformatting unstructured Excel data sources to a more usable layout to perform further analysis.

With Python's easy to use syntax, many Excel tasks that would normally require complicated macros and Visual Basic for Applications (VBA) code, can be achieved with much less effort using an appropriate Python script.

There are a few Python packages available that can interact with Excel, but my favourite is the openpyxl library.

Before we can get started with some examples, make sure that Python is installed and then install openpyxl with the following command:

pip install openpyxl

Create a new file (for example, openpyxl_example.py), and import the openpyxl module on the first line:

import openpyxl

First, we create a new workbook instance and select the active sheet:

# Create a new workbook instance and select the active sheet
wb = openpyxl.Workbook()
sheet = wb.active

For our test data, we will create a Python dictionary of superheroes and their respective real names:

# dict of superhero data
superheroes = {
 'Captain America':'Steve Rogers', 
 'Iron Man':'Tony Stark', 
 'Spiderman':'Peter Parker', 
 'Hulk':'Bruce Banner', 
 'Superman':'Clark Kent', 
 'Batman':'Bruce Wayne', 
 'Wonder Woman':'Princess Diana'}

The data will be written in tabulated format to the active Excel workbook object contained in the variable sheet, but first we write the table headings to individual cells, using the cell reference, and set the font style to bold:

# Write 'Superhero' heading to cell A1 and set font style to bold
sheet['A1'].value = 'Superhero'
sheet['A1'].font = openpyxl.styles.Font(bold=True)

# Write 'Real Name' heading to cell A1 and set font style to bold
sheet['B1'].value = 'Real Name'
sheet['B1'].font = openpyxl.styles.Font(bold=True)

Next, we use a loop to iterate through the dictionary (starting at row 2) and write each key and value pair to the respective fields in a row in the workbook, ordered alphabetically by superhero name:

# Starting at row 2, write data to sheet (alphabetically by key)
row = 2
for key in sorted(superheroes.keys()):
 sheet.cell(column=1, row=row, value=key)
 sheet.cell(column=2, row=row, value=superheroes[key])
 row += 1

Notice how we now use the column and row numbering based on the absolute cell reference location (so 'A2' is in row 2, column 1) to identify the cells that are to be populated.

Finally, we save the workbook data to a file:

# Save the file
myFile = 'heroes.xlsx'
wb.save(filename = myFile)

If you open the newly created file 'heroes.xlsx', the workbook should contain the data and headers that we have written to Excel.

Data can also be read back from an existing workbook.

To open the workbook and create new workbook and sheet objects to read in the data, we use the following code:

# open active sheet in the workbook and create new workbook and sheet objects
wb1 = openpyxl.load_workbook(myFile)
sheet1 = wb1.active

To test the concept, we read back the third row of the workbook:

# Read back data from the worksheet by cell reference
print ('{}\t\t{}'.format(sheet1['A3'].value, sheet1['B3'].value))
print

The output should be as follows:

Captain America  Steve Rogers

We can also iterate through all the populated rows in the workbook and read in the data. First we create an iterator over sheet1 and place it in the variable sheet_rows. We then use a loop to read and print the data for each row, taking care to create fixed length spacing for columns to make the output easier to read:

# Read back all the rows in the worksheet
sheet_rows = sheet1.iter_rows()

# (warning - iter_rows starts counting at 0)
for row in sheet_rows:
 print('{}{}{}'.format(row[0].value, ' '*(20-len(row[0].value)), row[1].value))

The output of the above should be as follows:

Superhero           Real Name
Batman              Bruce Wayne
Captain America     Steve Rogers
Hulk                Bruce Banner
Iron Man            Tony Stark
Spiderman           Peter Parker
Superman            Clark Kent
Wonder Woman        Princess Diana

The above code only covers a few straightforward examples, which barely touch the surface of what can be achieved with openpyxl, but it is clearly a very powerful package that can be used in creative ways to interact with Excel.

The completed example code can be downloaded from my Github page.