Skip to content

PDF Liberation Hackathon Aims to Make Public Data More Accessible

Join the fight to liberate data from PDFs. Your tools could help unlock insights for local governments and non-profits.

There is an open book on which something is written.
There is an open book on which something is written.

PDF Liberation Hackathon Aims to Make Public Data More Accessible

In January 2014, the PDF Liberation Hackathon took place, focusing on developing open-source tools to work with PDFs and their databases. Organised by the Open Knowledge Foundation (now Open Knowledge Denmark and Open Knowledge Foundation), the event aimed to make PDF documents, especially those of public interest like parliamentary documents or reports, more accessible.

PDFs, introduced in 1993, are widely used across organisations due to their consistency across devices and operating systems. However, data scientists often struggle to extract structured data from PDFs, particularly older ones that are merely scanned images. During the hackathon, participants worked on tools like optical character recognition, software for data tables, and scripts for bulk downloads to tackle this issue.

One dataset worked on was USAID's Development Experience Clearinghouse, containing around 170,000 documents, of which around 150,000 are available for download. While the hackathon did not produce analysis, future applications of these tools could benefit local governments and non-profit organisations, helping them track trends and gain insights from data stored in PDF format.

The PDF Liberation Hackathon highlighted the need for better tools to work with PDFs, as many organisations still publish data in this format. By making PDFs more accessible, these tools promise to be useful in the future, enabling better data analysis and understanding for local governments and non-profit organisations.

Read also:

Latest