Read Excel using Apache POI library

Overview

This blog post talks comprehensively about reading/writing excel contents from various excel formats using Apache POI Java library. I am going to cover 4 excel formats, XLS, XLSB, XSLM, and XLSX.

Why Apache POI Library?

Apache POI Java library helps any developer looking for programmatic solutions to deal with processing Office Open XML standards (OOXML) and Microsoft's OLE 2 Compound Document format (OLE2). In a nutshell, it's your goto API library to process XLS, PPT, DOC, Outlook(limited support), and also text extraction based solutions.

What kind of problems are addressed?

I am going to share reusable code snippets to perform the extraction of different types of content(text, numbers, hyperlinks, and most importantly pictures).

Let's get started!

POI libraries can be made available to your project in many different ways. I am going to build a maven based java project which manages all required libraries via dependency management. Read more about maven here.

Apache POI maven dependencies for Excel processing

Apache POI - Java API To Access Microsoft Format Files

HSSF is the POI Project's pure Java implementation of the Excel '97(-2007) file format. XSSF is the POI Project's pure Java implementation of the Excel 2007 OOXML (.xlsx) file format. Reading excel document to process records can be done in 2 ways.

WorkbookFactory API

     If you like to read excel documents in different formats agnostic way, use WorkbookFactory API. It helps to simplify reading excel document first and then figure out what kind of excel format it is. WorkbookFactory API can accept excel content in many ways via static create API. Please refer to the below snapshot on how to provide the excel content in different ways.









Code snippet on WorkbookFactory Usage


I have created a class called GenericExcelReader which can take InputStream.

Format-specific APIs

If you want to specifically read XLS format, use HSSFWorkbook class as shown in the below code snippet.


For XLSX format, use XSSFWorkbook class as shown in below code snippet.

How to read xlsb and xlsm formats?

xlsb format

POI does not yet support opening .xlsb files. You can read files with XSSFBReader in org.apache.poi.xssf.eventusermodel packages. It is event-based processing and only allowed in read-only mode to process. APIs currently it has is no way near to rich support available for .xls or .xlsx formats.

Yet, I found that converting .xslb to .xlsx format via Microsoft excel application turned out to be really useful as a workaround. The overall size of xslb is relatively less compared to xslx format as its single compressed binary file.
Read more about how to convert here- quora article

xlsm format 

xlsm is a variation of xlsx format. Use XSSFWorkbook API to read the same.




Comments