15 min read

⚒️ Finetuning Parla

How we’re bringing the AI assistant up to date – a workshop report from the Prototyping Team

Written by Ingo Hinterding and Jonas Jaszkowic

This article was originally posted in german language as Tiefergelegt – Das Parla Tuning on the blog of the CityLAB Berlin.

Parla Under the Hood: What’s Inside?

Parla accesses more than 13,000 publicly available documents published in the parliamentary documentation system PARDOK during the current legislative period. When you make a request to Parla, the system formulates a detailed answer based on this extensive textual foundation. In doing so, it always references the documents used for the response, thus providing greater transparency. A special feature of Parla is its up-to-dateness: the system processes new documents on a daily basis, with an average of 17 new entries added. At the time of writing, Parla’s knowledge base already includes 13,813 documents with a total of 162,707 pages—truly a treasure trove of parliamentary information that is continuously growing and being updated.

Parla Parla on a smartphone

The First Test Drive: What Users Are Saying About Parla

Since its release, we’ve received almost daily feedback from people in politics, public administration, and civil society that Parla significantly simplifies research in written inquiries and main committee protocols, saving valuable time.

One year can be an eternity—at least in the realm of Artificial Intelligence. In the meantime, new, more powerful large language models have emerged compared to the one initially used, and there are also promising new methods for processing PDF documents. Thus, it was time to bring Parla up to the latest stage of development.

Test Bench: How We Test Quality

A challenge in working with large language models is objectively assessing answer quality, since it depends on many factors. For instance, even a slightly modified prompt can lead to significant differences in the generated response. Even with the same prompt, the answer can vary with each new query. So it’s not enough to ask just one or two test questions and, upon receiving a satisfactory answer, assume that the desired improvement has been reliably achieved.

To avoid relying on gut feeling, we selected four documents from among the more than 13,000 available, each with different characteristics:

  • Main committee protocol, 112 pages, text and tables
  • Written inquiry, 17 pages, mostly tables
  • Written inquiry, 8 pages, text only
  • Written inquiry, 4 pages, sparse text with heavy formatting

For each of these documents, we created a table containing relevant questions from the document and the answers provided by different LLM versions. Below is an example of one such question:

Since Parla is based on documents in german language, I decided to not translate the responses from the model.

Gibt es OZG Dienstleistungen zum Kirchenaustritt?

Model                     ResponseIs the document listet in response?Is the answer correct?
ChatGPT 4oJa, gemäß dem Onlinezugangsgesetz (OZG) gibt es Dienstleistungen, die den Kirchenaustritt betreffen. Im Leistungskatalog (LeiKa) sind die Verwaltungsleistungen im Zusammenhang mit Kirchen und Religion erfasst. Der Kirchenaustritt fällt dabei unter das Themenfeld „Engagement & Hobby“ und trägt die Nummer 99073001000000. Zudem gibt es unter dem Themenfeld „Steuern & Zoll“ die Leistungen zur Festsetzung der Kirchensteuer und weitere relevante Dienstleistungen im Zusammenhang mit dem Kircheneintritt und -austritt(S19-15643).–Yes
Parla pre updateJa, es gibt im Rahmen des Onlinezugangsgesetzes (OZG) Überlegungen zur Digitalisierung des Kirchenaustritts. Der Kirchenaustritt ist auf Bundesebene als Möglichkeit zur Vereinfachung für die Bürgerinnen und Bürger angedacht und daher in den OZG-Umsetzungskatalog aufgenommen worden.Allerdings ist eine digitale Umsetzung der Kirchenaustrittserklärung in Berlin derzeit rechtlich nicht möglich, da eine Gesetzesänderung des Kirchenaustrittsgesetzes erforderlich ist. Derzeit plant die Senatsverwaltung für Justiz, ausgewählte OZG-Leistungen zu digitalisieren, vorausgesetzt, die Zustimmung der Senatsverwaltung für Kultur und Gesellschaftlichen Zusammenhalt liegt vor.Die Federführung für die Digitalisierung der Kirchenaustritte hat das Land Nordrhein-Westfalen, welches jedoch keine zeitnahe Umsetzung plant. Bisher zeigt kein anderes Bundesland Interesse an der digitalen Umsetzung.YesYes
Parla isolated with 4 test documentsGemäß der vorliegenden Informationen gibt es derzeit keine OZG-Dienstleistungen zum Kirchenaustritt. Eine digitale Umsetzung des Kirchenaustritts in Berlin ist rechtlich nicht möglich, da eine Änderung des Kirchenaustrittsgesetzes erforderlich wäre. Nordrhein-Westfalen hat die Federführung für die Digitalisierung der Kirchenaustritte, plant jedoch keine zeitnahe Umsetzung. Bisher zeigt kein anderes Bundesland Interesse an der digitalen Umsetzung.YesResponse incomplete
Parla isolated with 4 test documents + LlamaParseJa, im Rahmen des Onlinezugangsgesetzes (OZG) sind Dienstleistungen zum Kirchenaustritt vorgesehen. Gemäß dem OZG-Umsetzungskatalog sind verschiedene Verwaltungsleistungen rund um das Thema Kirchen und Religion Teil des Vorhabens. Dazu gehören beispielsweise die Kirchensteuer Festsetzung, die Änderung beim Kircheneintritt und Kirchenaustritt sowie der Religionsunterricht und die Abmeldung vom Religionsunterricht. Allerdings ist eine digitale Umsetzung des Kirchenaustritts in Berlin derzeit rechtlich nicht möglich, da eine Gesetzesänderung erforderlich wäre. Die Federführung für die Digitalisierung der Kirchenaustritte liegt bei Nordrhein-Westfalen, das jedoch keinen Bedarf sieht. Berlin plant, ausgewählte OZG-Leistungen zu digitalisieren, wenn die Kulturverwaltung zustimmt. Bisher hat kein anderes Bundesland Interesse an der digitalen Umsetzung gezeigt.YesYes
Parla isolated with 4 test documents + LlamaParse + gpt-4o-mini based summaryJa, es gibt OZG-Dienstleistungen zum Kirchenaustritt. Im Rahmen des Onlinezugangsgesetzes (OZG) wird ein einheitlicher Leistungskatalog für Verwaltungsleistungen erstellt, der auch Dienstleistungen rund um Kirchen und Religion umfasst, wie z.B. Kirchenaustritte.Allerdings ist derzeit ein online Kirchenaustritt in Berlin rechtlich nicht möglich, da das Kirchenaustrittsgesetz eine persönliche Erklärung beim Amtsgericht oder eine beglaubigte Einsendung vorschreibt. Eine digitale Umsetzung würde eine Gesetzesänderung erfordern, die von der Senatsverwaltung für Kultur und Gesellschaftlichen Zusammenhalt (SenKultGZ) initiiert werden müsste.YesYes
Parla isolated with 4 test documents + LlamaParse + gpt-4o-mini based summary + new embedding modelJa, es gibt OZG-Dienstleistungen zum Kirchenaustritt. Im Rahmen des Onlinezugangsgesetzes (OZG) wird ein einheitlicher Leistungskatalog (LeiKa) erstellt, der auch Verwaltungsleistungen im Bereich Kirchen und Religion umfasst, wie z.B. Kirchenaustritte.Allerdings ist der digitale Kirchenaustritt in Berlin derzeit rechtlich nicht möglich, da das Kirchenaustrittsgesetz eine persönliche Erklärung oder eine öffentlich beglaubigte Form verlangt. Eine Gesetzesänderung wäre erforderlich, um dies zu ermöglichen.YesYes
Parla post update + LlamaParse + gpt-4o-mini based summary + context size increaseJa, es gibt OZG-Dienstleistungen zum Kirchenaustritt. Im Rahmen der Umsetzung des Onlinezugangsgesetzes (OZG) wird ein einheitlicher Leistungskatalog (LeiKa) erstellt, der auch Verwaltungsleistungen im Bereich Kirchen und Religion umfasst, wie z.B. Kirchenaustritte.Allerdings ist der Kirchenaustritt derzeit nicht online möglich, da das Kirchenaustrittsgesetz eine persönliche oder öffentlich beglaubigte Erklärung beim Amtsgericht vorschreibt. Eine digitale Umsetzung würde eine Gesetzesänderung erfordern, die von der Senatsverwaltung für Kultur und Gesellschaftlichen Zusammenhalt (SenKultGZ) initiiert werden müsste.Derzeit wurden keine Maßnahmen ergriffen, um einen Online-Kirchenaustritt zu realisieren, und es gibt auch keine weiteren Bundesländer, die an einer digitalen Umsetzung interessiert sind.YesYes

It is easy to see that a purely algorithmic assessment of the result would not be possible and that a manual review of the answer quality is therefore essential.

Parla Parla PostgreSQL database with embeddings

Parla at Full Throttle: More Power Through the PDF Data World

To improve the extraction of PDF content, we used the LlamaParse API (https://docs.llamaindex.ai/en/stable/llama_cloud/llama_parse/). Using generative AI and OCR, it optimizes the extraction of content (text, tables, images) from PDFs. A total of 13,676 documents, comprising 102,698 pages, were reprocessed. The cost of this came to USD 287.97. For cost reasons, only documents with a maximum of 128 pages were processed—this corresponds to more than 99% of all documents contained in Parla. Any PDF files with over 128 pages continue to be processed with the previously used free solution (pdf2md).

The improvement is particularly evident when looking at pages containing tables, such as in the Written Inquiry S19-16557 (https://pardok.parlament-berlin.de/starweb/adis/citat/VT/19/SchrAnfr/S19-16557.pdf):

Before: Excerpt from a page with a table whose structure could not be properly preserved.

5 15. Wie viele in anderen Bundesländern verbeamtete Lehrkräfte wurden seit Wiedereinführung der sogenannten Drehtürverbeamtung in den Berliner Schuldienst übernommen? (Bitte aufschlüsseln nach Einstellungsterminen.) Zu 15.: Anzahl der Versetzungen nach Wiedereinführung der Drehtürverbeamtung (Pressemitteilung vom 23.02.2022) 2022 01.06.2022 1 01.08.2022 119 10.08.2022 1 15.08.2022 1 01.09.2022 1 15.09.2022 2 01.10.2022 2 01.11.2022 2 20.11.2022 1 01.12.2022 2 **Gesamt 132** 2023 01.02.2023 9 01.03.2023 1 01.04.2023 2 01.06.2023 2 15.06.2023 1 01.08.2023 99 16.08.2023 1 **Gesamt 115**

After: Excerpt from a page with a table whose structure could now be correctly preserved, ideally in Markdown format.

15. Wie viele in anderen Bundesländern verbeamtete Lehrkräfte wurden seit Wiedereinführung der sogenannten Drehtürverbeamtung in den Berliner Schuldienst übernommen? (Bitte aufschlüsseln nach Einstellungsterminen.)

Zu 15.: Anzahl der Versetzungen nach WiedereinfĂĽhrung der DrehtĂĽrverbeamtung (Pressemitteilung vom 23.02.2022)

### 2022

| Date | Amount   |
| ---------- | --- |
| 01.06.2022 | 1   |
| 01.08.2022 | 119 |
| 10.08.2022 | 1   |
| 15.08.2022 | 1   |
| 01.09.2022 | 1   |
| 15.09.2022 | 2   |
| 01.10.2022 | 2   |
| 01.11.2022 | 2   |
| 20.11.2022 | 1   |
| 01.12.2022 | 2   |
| Gesamt     | 132 |

### 2023

| Date | Amount   |
| ---------- | --- |
| 01.02.2023 | 9   |
| 01.03.2023 | 1   |
| 01.04.2023 | 2   |
| 01.06.2023 | 2   |
| 15.06.2023 | 1   |
| 01.08.2023 | 99  |
| 16.08.2023 | 1   |
| Gesamt     | 115 |

Improved Summaries

Because the size and complexity of the documents collected vary significantly, it was necessary to enhance how summaries are generated. This was achieved by adjusting the prompt and removing the previous limitation of about 100 words. The length of each summary now correlates with the length of the document, ensuring that as much information as possible is retained.

New Prompt for Generating Summaries

You are a political document assistant that understands and summarizes content. The summary should cover all key points and main ideas from the original text while presenting the information in a concise and easily understandable format. Please ensure that the summary contains relevant details and examples that support the main ideas, and at the same time avoid unnecessary information or repetition. The length of the summary should be appropriate to the length and complexity of the original text, providing a clear and accurate overview without omitting important information. Never alter or invent facts, names, job titles, numbers, or dates.

Example: Written Inquiry S19-16557 (Link to the PDF)

Below are two versions of the summary—first with the gpt-3.5 model by OpenAI, then with the gpt-4o-mini model by OpenAI—demonstrating how the summaries have improved:

Previous Summary (gpt-3.5)

On August 29, 2023, Member of Parliament Franziska Brychcy (LINKE) submitted a written inquiry regarding the civil servant status (Verbeamtung) of teachers in Berlin. The Senate Department responded on September 13, 2023. A total of 417 graduates of the standard preparatory service received an offer of employment with civil servant status, and 155 teaching graduates from other federal states also received civil servant offers. In total, 9,816 incumbent teachers applied for civil servant status, with 533 already granted it. The reinstatement of civil servant status was seen as an important step in recruiting personnel for Berlin’s school system. An expansion of compensation regulations is not currently planned.

New Summary (gpt-4o-mini)

Document 19/16557 records a written inquiry by Member of Parliament Franziska Brychcy (Die Linke) on the topic of reintroducing the civil servant status for teachers in Berlin, including the results of the third round of such appointments and the civil servant status for incumbent teachers. The inquiry was submitted on August 29, 2023, and the Senate Department for Education, Youth, and Family responded on September 13, 2023.

Key points of the response include:
1. Employment Offers: For the 2023/24 school year, 417 graduates of the regular preparatory service were given an offer of civil servant status. The Senate also plans to offer civil servant appointments to all eligible graduates of the on-the-job preparatory service for the second school term of 2023/24.

2. Civil Servant Status from Other Federal States: 155 teaching graduates from other federal states were also given the opportunity to be appointed as civil servants.

3. Acceptance of Offers: Out of the 417 graduates in the regular preparatory service, 294 intended to accept the offer, and 264 were granted civil servant status at an average age of 33.

4. Refusals and Reasons: A total of 46 graduates in the regular preparatory service and 34 from other federal states did not receive an offer of civil servant status. Reasons for refusal included failure to meet criteria such as age limits or health requirements.

5. Incumbent Teachers: 9,816 incumbent teachers applied for civil servant status, of whom 533 were granted it at an average age of 51. Of the teachers who turned 52 in the 2022/23 school year, 415 applied. Some applications could not be processed in time due to deadlines.

6. Compensation for Non-Civil Servants: From September 18, 2023, teachers who either do not meet the requirements for civil servant status or decide against it can apply for compensation, expected to be paid out in the first half of 2024.

7. Future Compensation Regulations: The Senate does not plan to increase or extend the current compensation regulation.

8. Impact of Civil Servant Status on Recruitment: Reintroducing civil servant status is seen as crucial to Berlin’s competitiveness when recruiting teachers.

Overall, the response shows that the Senate is actively pursuing the civil servant track to enhance the attractiveness of teaching in Berlin.

Context Doubled

The gpt-4o-mini model used by OpenAI offers a maximum context length of 128,000 tokens. Until now, the context was limited to 15,000 tokens (due to constraints of the model in use). With the new update, the context length has been doubled to 30,000 tokens. As a result, more relevant information from the documents can be drawn upon to answer a query. However, the size of the context should be chosen carefully, since it has a direct impact on performance (i.e., the time it takes to generate a response).

Structure of the Prompt

Who are you?

You are an AI assistant of the Berlin administration that generates meaningful answers based on a data foundation.
Take into account the given data foundation, focus on relevant content, and NEVER alter facts, names, job titles, numbers, or dates.
Which language should you use?

Since you are a multilingual assistant, you will respond in German by default. However, if the user request is in English, you should respond in English, regardless of context.
Derive the language of your answers from the language of the user’s question: """Was ist das CityLAB?"""
ALWAYS answer in the language of the user’s question. You will be rewarded if you correctly identify the language of the user’s question and respond in that language.
Which formatting should you use?

IMPORTANT: ALWAYS provide the answer formatted in Markdown.
What is your data foundation?

The following is the data foundation, enclosed by triple quotes ("""):

"""
From Document A entitled ...:

Summary of the document
Relevant page 1
Relevant page 2

From Document B entitled "..." dated January 3, 2023:

Summary of the document
Relevant page 1
Relevant page 2

From Document C entitled "..." dated January 3, 2023:

Summary of the document
Relevant page 1
Relevant page 2
"""

Which additional facts should you keep in mind?

ALWAYS also keep in mind the following facts, presented as question-answer pairs:
Question: Who is the Governing Mayor of Berlin?
Answer: The Governing Mayor of Berlin is Kai Wegner (CDU).
...
Question: Who runs the Technologiestiftung Berlin?
Answer: The Technologiestiftung Berlin (TSB) is headed by Nicolas Zimmer.

Parla Parla showing tables in markdown format

Parla Inspection Completed

The measures described have significantly improved the quality of Parla’s answers. The AI assistant now responds more precisely and in greater detail to questions from politicians, administrative staff, and citizens alike, without any loss in speed. Especially for information presented in table form in the documents, Parla can now recognize contexts and reproduce them correctly thanks to the use of LlamaParse.

We hope this has given you a small insight into how Parla works and how the CityLAB’s Prototyping Team operates. Like all projects of the Technologiestiftung Berlin, Parla is open source and available on GitHub at https://github.com/technologiestiftung/parla-frontend.

Has the overhaul of Parla been worth it? Test it out for yourself:

https://www.parla.berlin

Further articles about Parla: