Pollux Logo
PoliCorp Logo

Frequently asked questions (FAQs)

Pollux Political Corpora (PoliCorp) is an open resource for accessing and analysing processed political text data. PoliCorp serves as an integral part of the Pollux project. This demonstrator offers researchers access to extensive textual datasets (currently the official protocols of plenary debates published by the German Bundestag, Germaparl), facilitating in-depth analysis of parliamentary discourse across time. Based on Pollux Political Corpora, researchers can easily generate sub-corpora for individual research. 

Currently, the platform hosts collection of the official protocols of plenary debates published by the German Bundestag, spanning 76 years of parliamentary discourse, starting September 7, 1949. Raw parliamentary speeches up to September 7, 2021, were sourced from the GermaParl corpus, a comprehensive linguistic dataset curated by the PolMine project. GermaParl covers transcripts of parliamentary debates from September 7, 1949, to September 7, 2021, and comprises of 958,100 speech contributions. Raw parliamentary speeches published after September 7, 2021, were sourced from the Bundestag Open Data project. New speeches will be added monthly to the platform.

FAQs

With the Advanced Search functionality, researchers can apply boolean operations such as AND, OR, and NOT to combine or exclude search criteria, making it easier to filter through vast amounts of parliamentary debate data. The search can be customised by combining multiple fields and applying logical operators within the data to uncover intricate patterns and insights. Visit the project'sGitHub pageto learn more about the search functionality. Furthermore, if help required on demo usage, clickhereto watch the assisting video on Youtube.

Selected datasets can be downloaded freely in JSON format, providing a convenient option for further analysis using computational tools. Visit the project'sGitHub pageto learn more about the downloaded data. Furthermore, if help required on demo usage, clickhereto watch the assisting video on Youtube.

Click hereto watch the tutorial video on Youtube.

Click hereto go to the project's GitHub page in order to learn more about the search functionality.

Click hereto view the downloaded data details and attributes descriptions on GitHub.

Our website utilizes experimental tools for data processing. Currently you can see output of twoNamed Entity Recognition (NER)models:German NERandNER for German Legal Text. While these models are designed to provide insightful automatic annotations, they are not flawless and may produce inaccurate or incomplete results. We recommend exercising caution when relying on these outputs and verifying any critical information independently.

Find all speeches from legislative period 19 that contain the keyword "Steuerpolitik" excluding speeches given by members of the government.

Do search:

Search example 1 image

Once you have the results, you can sort them, explore detailed information about a specific speech, navigate between result pages, and choose to download either all results or only your selected ones:

Search results 1 image

Choose from various text representations on the specific result details page, where you can also download the corresponding result data:

Search results details 1 image

Smirnova, N., Shahid, M. A., & Mayr, P. (2025). Open Political Corpora: Structuring, Searching, and Analyzing Political Text Collections with PoliCorp. The 2025 Conference on Empirical Methods in Natural Language Processing. https://doi.org/10.48550/arXiv.2509.17465

GESIS:ImprintData protection

Project's website:Pollux

For inquiries, contact us atnina.smirnova@gesis.org | ahsan.shahid@gesis.org

PoliCorp is a service by:Pollux Logo

Cite this project:Smirnova, N., Shahid, M. A., & Mayr, P. (2025). Open Political Corpora: Structuring, Searching, and Analyzing Political Text Collections with PoliCorp. The 2025 Conference on Empirical Methods in Natural Language Processing. https://doi.org/10.48550/arXiv.2509.17465

From:Gesis LogoSuub LogoQualiservice LogoFunded by:Dfg Logo
The Pollux team received funding from the German Research Foundation (DFG) via grant: MA 3964/7‑3.
Disclaimers and License:Legal Notice