Understanding Political Debate and Policy Decisions using 'Big Data'

Academic

Dr Gosia Mikolajczak
School of Social and Political Sciences

Intern

Philip Thierfelder
School of Languages and Linguistics

Project Description

The aim of the project is to empirically test a novel framework for analysing the relationship between political debates and policy decisions. Using digital sources and computational modelling approaches, it will investigate three specific issues to test this framework: public sector innovation, antimicrobial resistance, and same-sex marriage. Using an automated scraping process, the internship will undertake the collection and pre-processing of documents from identified key digital sources. The pre-processing will focus on cleaning this data for analysis and may also include trialling of digital tools for language analysis, such as named entity recognition.

Project Outcome

The Parliament of Australia website stores extensive records of parliamentary proceedings, making it a valuable resource for researchers interested in issues related to Australian government policy. For researchers interested in quantitative analysis of policy related discourse, the vast amounts of data available make manual data collection highly inefficient. The goal of this internship project was to automate this process by coding a web crawler designed to collect speech transcripts and related information and compile it into an analyzable dataset. The Scrapy library in Python was the key tool used for this project, and the final output was a spreadsheet containing data related to 321 speeches from the Australian Senate and House of Representatives on the issue of marriage equality.  The Python script produced for this project can be easily customized to collect data related to other bills for future research projects.