Pollux Logo
PoliCorp Logo

Frequently asked questions (FAQs)

Pollux Political Corpora (PoliCorp) is an open resource for accessing and analysing processed political text data. PoliCorp serves as an integral part of the Pollux project. This demonstrator offers researchers access to extensive textual datasets (currently the official protocols of plenary debates published by the German Bundestag, Germaparl), facilitating in-depth analysis of parliamentary discourse across time. Based on Pollux Political Corpora, researchers can easily generate sub-corpora for individual research. 

Currently, the platform hosts collection of the official protocols of plenary debates published by the German Bundestag, spanning 76 years of parliamentary discourse, starting September 7, 1949. Raw parliamentary speeches up to September 7, 2021, were sourced from the GermaParl corpus, a comprehensive linguistic dataset curated by the PolMine project. GermaParl covers transcripts of parliamentary debates from September 7, 1949, to September 7, 2021, and comprises of 958,100 speech contributions. Raw parliamentary speeches published after September 7, 2021, were sourced from the Bundestag Open Data project. New speeches will be added monthly to the platform.

FAQs

With the Advanced Search functionality, researchers can apply boolean operations such as AND, OR, and NOT to combine or exclude search criteria, making it easier to filter through vast amounts of parliamentary debate data. The search can be customised by combining multiple fields and applying logical operators within the data to uncover intricate patterns and insights. Visit the project'sGitHub pageto learn more about the search functionality. Furthermore, if help required on demo usage, clickhereto watch the assisting video on Youtube.

Selected datasets can be downloaded freely in JSON format, providing a convenient option for further analysis using computational tools. Visit the project'sGitHub pageto learn more about the downloaded data. Furthermore, if help required on demo usage, clickhereto watch the assisting video on Youtube.

Click hereto watch the tutorial video on Youtube.

Click hereto go to the project's GitHub page in order to learn more about the search functionality.

Click hereto view the downloaded data details and attributes descriptions on GitHub.

Our website utilizes experimental tools for data processing. Currently you can see output of twoNamed Entity Recognition (NER)models:German NERandNER for German Legal Text. While these models are designed to provide insightful automatic annotations, they are not flawless and may produce inaccurate or incomplete results. We recommend exercising caution when relying on these outputs and verifying any critical information independently.

Find all speeches from legislative period 19 that contain the keyword "Steuerpolitik" excluding speeches given by members of the government.

Do search:

Search example 1 image

Once you have the results, you can sort them, explore detailed information about a specific speech, navigate between result pages, and choose to download either all results or only your selected ones:

Search results 1 image

Choose from various text representations on the specific result details page, where you can also download the corresponding result data:

Search results details 1 image

Smirnova, N., Shahid, M. A., & Mayr, P. (2025). Political Corpora (PoliCorp): An open resource for accessing and analysing processed political text data. https://demo-pollux.gesis.org/

GESIS:ImprintData protection

Project's website: Pollux

For inquiries, contact us at  nina.smirnova@gesis.org | ahsan.shahid@gesis.org

PoliCorp is a service by:Pollux Logo

Cite this project:Smirnova, N., Shahid, M. A., & Mayr, P. (2025). Political Corpora (PoliCorp): An open resource for accessing and analysing processed political text data. https://demo-pollux.gesis.org/

From:Gesis LogoSuub LogoQualiservice LogoFunded by:Dfg Logo
The Pollux team received funding from the German Research Foundation (DFG) via grant: MA 3964/7‑3.
Disclaimers:

Raw parliamentary speeches up to September 7, 2021, presented on this website, were sourced from the following publication: Blaette, Andreas (2017): GermaParl. Corpus of Plenary Protocols of the German Bundestag. The data are available as TEI files at the GermaParlTEI GitHub repository. Raw parliamentary speeches published after September 7, 2021, were sourced from the Bundestag Open Data project. The data provided herein have been utilized in accordance with the terms of use specified by the original source.

Our website utilizes experimental tools for data processing, including but not limited to named entity recognition (NER) models. While these models are designed to provide insightful automatic annotations, they are not flawless and may produce inaccurate or incomplete results.

We recommend exercising caution when relying on these outputs and verifying any critical information independently. If you encounter any issues or have feedback about the annotations, please feel free to contact us.