Pollux Logo
PoliCorp Logo

About PoliCorp

Pollux Political Corpora (PoliCorp) is an open resource for accessing and analysing processed political text data. PoliCorp serves as an integral part of the Pollux project. This demonstrator offers researchers access to extensive textual datasets (currently the official protocols of plenary debates published by the German Bundestag, Germaparl), facilitating in-depth analysis of parliamentary discourse across time. Based on Pollux Political Corpora, researchers can easily generate sub-corpora for individual research. 

Currently, the platform hosts collection of the official protocols of plenary debates published by the German Bundestag, spanning 76 years of parliamentary discourse, starting from September 7, 1949. Raw parliamentary speeches up to September 7, 2021, were sourced from the GermaParl corpus, a comprehensive linguistic dataset curated by the PolMine project. GermaParl covers transcripts of parliamentary debates from September 7, 1949, to September 7, 2021, and comprises of 958,100 speech contributions. Raw parliamentary speeches published after September 7, 2021, were sourced from the Bundestag Open Data project. New speeches will be added monthly to the platform.

News

PoliCorp demonstrator launched on December 10, 2024.

Data from the legislative period 20 are included in PoliCorp on 24.03.2025.

Search

With the Advanced Search functionality, researchers can apply boolean operations such as AND, OR, and NOT to combine or exclude search criteria, making it easier to filter through vast amounts of parliamentary debate data. The search can be customised by combining multiple fields and applying logical operators within the data to uncover intricate patterns and insights. Visit the project'sGitHub pageto learn more about the search functionality. Furthermore, if help required on demo usage, clickhereto watch the assisting video on Youtube.

Download

Selected datasets can be downloaded freely in JSON format, providing a convenient option for further analysis using computational tools. Visit the project'sGitHub pageto learn more about the downloaded data. Furthermore, if help required on demo usage, clickhereto watch the assisting video on Youtube.

Public Demo

Try out the project's demo where you can trial the functionality. This URL may be subject to change or to removal after a period of time.

Experimental Features

Our website utilizes experimental tools for data processing. Currently you can see output of twoNamed Entity Recognition (NER)models:German NERandNER for German Legal Text. While these models are designed to provide insightful automatic annotations, they are not flawless and may produce inaccurate or incomplete results. We recommend exercising caution when relying on these outputs and verifying any critical information independently.

GitHub

Learn more about the PoliCorp by visiting the project'sGitHub repository.

Developers

   Nina Smirnova     ORCIDHuggingFaceGitHub

   Muhammad Ahsan Shahid     ORCIDLinkedInGitHub

   Philipp Mayr - Team Lead     ORCID

Cite PoliCorp

Smirnova, N., Shahid, M. A., & Mayr, P. (2025). Political Corpora (PoliCorp): An open resource for accessing and analysing processed political text data. https://demo-pollux.gesis.org/

Dissemination & Literature

  • Smirnova, Nina, Philipp Mayr, and Tobias Holtdirk. 2024. "Calls to order and interjections in Germaparl: a preliminary analysis of political agendas in the German Bundestag." 29. DVPW-Kongress, Georg-August-Universität Göttingen, 2024-09-27.
  • Smirnova, Nina. 2024. "Filtering metadata with AI methods - the use of BASE in the FID Political Science." 112th BiblioCon2024, 2024-06-06.
  • Smirnova, Nina. 2024. Automatically detecting scientific political science texts from a large general document index. ArXiv.org. https://arxiv.org/abs/2406.03067

Project Partners

  • GESIS
  • Pollux FID
  • SuUB Bremen
  • Qualiservice

Funding

This work was funded by the German Research Foundation (DFG) via grant: MA 3964/7 3.

GESIS:ImprintData protection

Project's website: Pollux

For inquiries, contact us at  nina.smirnova@gesis.org | ahsan.shahid@gesis.org

PoliCorp is a service by:Pollux Logo

Cite this project:Smirnova, N., Shahid, M. A., & Mayr, P. (2025). Political Corpora (PoliCorp): An open resource for accessing and analysing processed political text data. https://demo-pollux.gesis.org/

From:Gesis LogoSuub LogoQualiservice LogoFunded by:Dfg Logo
The Pollux team received funding from the German Research Foundation (DFG) via grant: MA 3964/7‑3.
Disclaimers:

Raw parliamentary speeches up to September 7, 2021, presented on this website, were sourced from the following publication: Blaette, Andreas (2017): GermaParl. Corpus of Plenary Protocols of the German Bundestag. The data are available as TEI files at the GermaParlTEI GitHub repository. Raw parliamentary speeches published after September 7, 2021, were sourced from the Bundestag Open Data project. The data provided herein have been utilized in accordance with the terms of use specified by the original source.

Our website utilizes experimental tools for data processing, including but not limited to named entity recognition (NER) models. While these models are designed to provide insightful automatic annotations, they are not flawless and may produce inaccurate or incomplete results.

We recommend exercising caution when relying on these outputs and verifying any critical information independently. If you encounter any issues or have feedback about the annotations, please feel free to contact us.