Purpose of this website

The project Observatory for Political Texts in European Democracies (OPTED; Horizon 2020 Grant agreement 951832) aims to design a European Research Infrastructure facilitating the large-scale computational analysis of political texts in Europe.

Work Package 5 focuses on national and supranational parliaments. We cover textual data on political speeches and debates as well as legislative texts produced in and by these key institutions of European democracy.

Easier access to existing text data collections as well as identifying the gaps in extant data avilability are among our key aims. Thus we have initially have assembled an inventory of available text data sources covering parliamentary activity. We identified the set of currently available sources - covering both primary archives and secondary data collections - by reviewing relevant academic literature, by scoping extant linguistic infrastructures (such as CLARIN), and by surveying the computational social science community via social media.

This website navigates prospective users and analysts through the inventory. It firstly provides a bird’s eye view om the coverage of existing primary archives and secondary data collections. THis shows where additional investements in text data collection are needed in particular. It secondly provides interactive tables through which users can filter and jump to already available sources along their specific research needs. If you use any of these sources, please cite them appropriately.

While we show all available sources we could uncover, we pay particular attention to what we call ‘ready-to-use’ data collections. We label existing data collections as ready-to-use if the respective source provides:

Users interested in further detail than provided here may also review the full inventory, the respective codebook, or the technical reports specifying the major primary and secondary sources per country or supranational institution.

If a relevant data source or data collection is missing in your view, please do get in touch with the contributors below.

A bird’s eye view on available parliamentary text data

Types of parliamentary texts

With regard to speeches of and debates among MPs (and the government) the most frequently available text data type are full text transcriptions of plenary speeches. More specific debate types (such as parliamentary questions) are often nested within plenary debates (depending on the organisation of the respective parliament and/or its archive) and must be extracted separately. Dedicated data sources and/or collections for such speech types are only rarely available thus far.

With regard to legislative text, primarily the finally adopted laws are available (in many primary sources also limited to laws currently in force). Information on the intermediary stages of legislative decision-making - in particular with a view to bills and amendments - are much less frequently available thus far.

Geographical coverage

Comparing the left and the right panel of the plot one immediately sees that many primary sources of parliamentary text data have not been transferred to ‘ready-to-use’ data sets that would be easily amenable to automated text analysis. Especially where comparatively well-structured primary sources exist, additional data collection efforts appear to be a low hanging fruit.

We also see that the availability of ‘ready-to-use’ data is particularly scant for legislative texts. Given that negotiating and fixing collectively binding rules are a key purpose of parliaments in democratic states, we thus far can hardly exploit the power of text-as-data aproaches for these processes.

We also see striking geographical imbalances: Some countries such as Germany and France are covered by various sources already, whereas systematic data on Central and Eastern European countries is much more scarce.

In adddition, there are no ‘ready-to-use’ text data collections combining multiple countries, with ParlSpeech being the only exception.

Temporal coverage of ‘ready-to-use’ sources

The temporal perspective reinforces imbalances already seen above. Especially countries from Western (and partially Northern) Europe are overrepresented in terms of data availability. For many Eastern European States, with the exception of Hungary, text data on parliamentary debates is limited mostly to the last decade. Legislative text data is lacking for the majority of European democracies where the supranational institutions provide a positive exception.

Find your data source: Parliamentary speeches

Find your data source: Legislative texts

Contributors - the WP5 team

Institute for Political Science, Centre for Social Sciences, Budapest

University of Cologne

WZB Berlin Social Science Center