PDF Liberation Hackathon Aims to Make Public Data More Accessible
In January 2014, the PDF Liberation Hackathon took place, focusing on developing open-source tools to work with PDFs and their databases. Organised by the Open Knowledge Foundation (now Open Knowledge Denmark and Open Knowledge Foundation), the event aimed to make PDF documents, especially those of public interest like parliamentary documents or reports, more accessible.
PDFs, introduced in 1993, are widely used across organisations due to their consistency across devices and operating systems. However, data scientists often struggle to extract structured data from PDFs, particularly older ones that are merely scanned images. During the hackathon, participants worked on tools like optical character recognition, software for data tables, and scripts for bulk downloads to tackle this issue.
One dataset worked on was USAID's Development Experience Clearinghouse, containing around 170,000 documents, of which around 150,000 are available for download. While the hackathon did not produce analysis, future applications of these tools could benefit local governments and non-profit organisations, helping them track trends and gain insights from data stored in PDF format.
The PDF Liberation Hackathon highlighted the need for better tools to work with PDFs, as many organisations still publish data in this format. By making PDFs more accessible, these tools promise to be useful in the future, enabling better data analysis and understanding for local governments and non-profit organisations.
Read also:
- Minimal Essential Synthetic Intelligences Enterprise: Essential Minimum Agents
- Tesla is reportedly staying away from the solid-state battery trend, as suggested by indications from CATL and Panasonic.
- UK automaker, Jaguar Land Rover, to commit £500 million for electric vehicle manufacturing in Merseyside
- Standard Nuclear & Framatome Join Forces to Boost TRISO Fuel Production by 2027