Import Data from a pdf document

Who has to import data from a pdf document?

Many consulting jobs involve taking data from other systems or webpages as text format downloads in  csv or txt files easily ‘imported’ by Excel by just opening the file and then saving as an Excel file format.  This kind of data import is usually predictable and repeatable and so you can build an application or reliable process around it and even automate it with VBA or the awesome new built-in Power Query functionality in current Excel versions.  Even using Excel’s native and certainly not new Text to Columns functionality (been in Excel since I can remember) under Data and Data Tools on the 2016 Ribbon was a fairly painless manual process and could also be automated using VBA using the macro recorder and some tinkering.  

Actually maybe I should briefly mention Power Query in current Excel versions has been renamed Get & Transform and sits modestly under the Data tab in the ribbon.  That was a very straightforward and informative name change from the Microsoft Excel Team (kind of refreshing in this world of spin), but understates how awesome Power Query is and so probably why the original name has stuck with most people.

But sometimes systems only produce PDF reports.  That’s a sticky problem and you can try to use OCR software or specialist conversion software that does some of the job but it’s hard work, usually not repeatable and can’t be automated.

But Matt Allington’s fairly recent post Import Tabular Data from PDF using Power Query is really a clever innovative approach to this common problem centered around saving the document to Word and then saving as a web page and importing it Power Query and then manipulating and cleaning the data from there with Power Query.   A query within Power Query then allows download to your workbook in an Excel table of data.   Really good article as Matt’s always are.

Matt is a passionate instructor in Power Pivot, Power Query and Power BI and I attended his Power Pivot course a few years back which I can highly recommend.  He also kindly invited me to ‘test drive’ his really practical and reasonably priced book Learn to Write DAX: A practical guide to learning Power Pivot for Excel and Power BI”.  It get’s you up and running on Power Pivot and it’s DAX formulas in a very fast, hands on and practical way in a tutorial style format.