Methodology

How we analyze the campaign speeches of political leaders

Λογότυπος Datalab Τμήματος Πληροφορικής ΑΠΘ

The working methodology for analyzing campaign speeches in the project – from data collection to data analysis. The iMEdD Lab has partnered with Datalab and Antonis Galanopoulos, a political science PhD candidate at AUTH, to integrate ChatGPT into our workflow, resulting in an experimental collaboration between humans and AI.

What are the main topics covered by political leaders in their campaign speeches? Which of these topics do they focus on the most? And what are the qualitative characteristics of their political discourse? These are three key research questions our project on the analysis of campaign speeches aims to answer.

During the early planning stage of our research project, we recognized that relying solely on human resources would not be adequate. Artificial intelligence has made significant strides and holds even more promise for the future. Now, the most popular and innovative version of AI, the ChatGPT interactive model, is readily accessible for use by anyone. This led us to believe that incorporating ChatGPT as a component tool in the survey would greatly expedite the analysis process while simultaneously addressing our questions regarding the capabilities and limitations of this highly debated technology. Overall, the project was designed as an experimental collaboration between humans from journalism, political theory, and data science, and AI, with the dual purpose of analyzing campaign speeches delivered by party leaders and exploring the limits of modern technology.

The collection of speeches

As part of this project, our focus is on analyzing the campaign speeches of the political leaders of the six parties that held seats in the Greek parliament during the last parliamentary term (18th Parliamentary Term, from July 17th, 2019 to April 22nd, 2023).

These include:

  • Kyriakos Mitsotakis, Prime Minister and President of New Democracy
  • Alexis Tsipras, President of SYRIZA
  • Nikos Androulakis, President of PASOK-Movement for Change
  • Dimitris Koutsoumbas, Secretary General of the Communist Party (KKE)
  • Kyriakos Velopoulos, President of Greek Solution
  • Yanis Varoufakis, Secretary General of MeRA25

Our working group has defined a campaign speech as any public speech delivered by the aforementioned political leaders in the presence of an audience, from the official announcement of the parliamentary elections in Greece on April 22, 2023, onwards. Short statements or speeches lasting less than 15 minutes or political leaders’ interactions with citizens are excluded from the study. Additionally, only complete speeches are subject to analysis, and not fragments.

However, in the absence of campaign speeches as defined above, Kyriakos Velopoulos’ speeches from various press conferences are analyzed. It is important to note that, as these speeches were not delivered in a public place in the presence of an audience, the analyses cannot be directly compared to those of other political leaders.

The analysis of the speeches relies on the written text of each political leader’s speech, which is obtained and/or modified by the working group as follows:

  • If the party provides the written text of each speech, that text is used for analysis. If the text is already divided into paragraphs based on topic, the working group retains that paragraph structure. If the written text is not already divided into paragraphs, or if the paragraph structure is based on oral delivery, the working group edits the text and divides it into paragraphs based on topic.
  • If the speeches of political leaders are not provided in writing by the parties but are available in audiovisual files, they are transcribed using artificial intelligence – specifically, the Transkriptor tool. The working group then edits the transcript to ensure accuracy and to divide it into paragraphs based on topic of discussion.

In every instance, the primary sources for gathering the speeches are as follows:

  • The press departments of the six political parties
  • The digital communication channels of the six political parties and their leaders, such as their websites and social media profiles, including those related to the digital presence of the (relevant) Prime Minister
  • The Athens-Macedonian News Agency (ANA MPA)
  • The Hellenic Broadcasting Corporation (ERT)
  • Other media outlets – only in rare cases where a speech may not be fully available from the previously mentioned sources, but has been published by alternative media.

After collecting/formatting the written text of each speech, it is automatically translated into English using artificial intelligence. The online translation tool DeepL API (Application Programming Interface) is specifically utilized for this purpose.

ChatGPT is then tasked with analyzing this translated speech text, using the definitions and options provided by the working group.

Speech analysis and chatting with ChatGPT: our prompts and approach

The political leaders’ campaign speeches are analyzed at the paragraph level according to the following criteria:

  • The extent to which the leader’s words are focused on criticizing political opponents versus presenting the agenda of their party, including ideas, opinions, positions, and program proposals.
  • The main topic of each speech. To conduct the thematic analysis, the working group has created a list of topics which may be expanded if required during the election period. The current list includes the following topics: abstention, accountability, agricultural policy, civil protection, corruption, culture, debt, democracy, economy, education, elections, employment, energy, entrepreneurship, environment, Europe, external affairs, health, human rights, housing, infrastructure, justice, labor, media, migration, national security, pandemic, pensioners, privatization, public sector, social state, transparency, tourism. Paragraphs that do not fit any of these topics are categorized as “other.”
  • The dominant sentiment (sentiment analysis) expressed in each paragraph, categorized as positive, neutral, or negative.
  • The level of political polarization and populism detected in each speech.
  • The identification of named entities such as people, places, or organizations mentioned in each speech.

Using the Python programming language and the ChatGPT API (gpt-3.5-turbo), we engage in a programmatic chat with the interactive AI model. Specifically, we prompt ChatGPT to provide us with various linguistic features for each paragraph of the campaign speeches, including:

  • A value of either “criticism” or “political agenda” if it determines that the political leader is mainly criticizing opponents or mainly referring to party positions or agenda.
  • The most likely topic/theme to be discussed in the political speech excerpt under study based on a list of topics provided to ChatGPT and translated into English.
  • A sentiment value on a scale of -1 to 1 indicating the negativity, neutrality, or positivity of the speech.
  • A value on a scale of 0 to 1 indicating the level of political polarization in the speech.
  • A value on a scale of 0 to 1 indicating the level of populism in the speech.
  • A list of named entities in the passage, categorized as individuals, groups, organizations, political parties, locations, countries, or dates.

The working group provides prompts to ChatGPT accompanied by context, which the model considers in the following situations:

  • When ChatGPT is requested to determine whether the text is primarily focused on criticizing opponents or a political agenda, and to identify the topic of the speech, the working group provides context such as the name and status of the political leader being studied, and the meaning of “Tempi” when mentioned in the text. This context is taken into account by ChatGPT for all the political leaders under study. The need for this contextual information became apparent during our initial test conversations with the model, when it interpreted utterances related to the Tempi tragedy in Greece as being about economic issues.

To illustrate, the working group has provided ChatGPT with the following context for the utterances of Kyriakos Mitsotakis and Alexis Tsipras, respectivel
-“You are reading a passage from a pre-election speech [AS1] of Greek Prime Minister Kyriakos Mitsotakis, leader of New Democracy political party. When you read the term Tempi, this responds [AS2] to the tragic accident that[AS3]  occurred in northern Greece when two trains collided in the village of Tempi, leaving 57 people dead”.
-“You are reading a passage from a pre-election speech of Alexis Tsipras, a Greek politician leader of the opposition party SYRIZA (Coalition of the Radical Left). When you read the term Tempi, this responds to the tragic accident that occurred in northern Greece when two trains collided in the village of Tempi, leaving 57 people dead.”

  • When ChatGPT is asked to evaluate the level of polarization and populism in a speech, within the range of -1 to 1, it is given context that considers the definitions of polarization and populism as drafted by the political scientist in the working group. The definitions are provided as context to ChatGPT on a case-by-case basis and include the following:

-“Polarization is considered an important feature of political systems. Although usually seen as a negative trait, it is important to recognize that a certain degree of polarization is reasonable and perhaps necessary. Political polarization represents the intensity of binary, opposing political ideologies and their respective party identities. Below are some critical features of a polarizing discourse:
1) Polarization occurs when a discourse promotes strong partisan or ideological divisions. This discourse promotes a representation of politics in dichotomous and binary terms, where society is divided into two major camps. A multitude of differences and contradictions are reduced to a single division. The remaining differences are downplayed.
2) The two political and ideological positions that this discourse constructs are presented as incompatible, and the political views and attitudes of citizens tend to diverge and cluster around these two opposing ideological positions. It creates a powerful and irreconcilable opposition between two camps, each challenging or even denying the legitimacy of the other. The political opponent becomes an enemy. 
3) This discourse limits pluralism and fosters fanaticism. It results in the marginalization of intermediate or alternative views from the public sphere and, correspondingly, the squeezing and even the exclusion of smaller parties.
4) A discourse that increases polarization perceives and describes politics through the “us” vs. “them” distinction. There is no midpoint, everyone is asked to choose sides.
5) A discourse of polarization has a strong emotional dimension. 
6) Polarizing discourse, in order to gain depth, often invokes deeply rooted social identities or social divisions that last over time and emphasizes opposing pairs of concepts and values (for example, modernization-tradition, progress-conservatism, workers-capitalists, right-left)”.

As for the definition of populism:

-Populism is a type of political discourse that aims to construct the collective subject “the people”. In the context of the analysis it is understood as a neutral term.  A populist discourse claims to express popular interests or demands against elites or the status quo. As such, it promotes an antagonistic, dichotomous representation of the social field between the people and the elites, the many and the few, the bottom and the top. There are two basic criteria that must be met in order for a discourse to be recognized as populist:
(a) People-centrism, i.e. the focus on the people: The signifier ‘people’ must be central to the discourse in question. This discourse refers to ‘the people’, regularly invokes ‘the people’ and attempts to represent them by representing their demands and interests.
(b) Anti-elitism: It refers to the dichotomous representation of society between an “us” (the people, the underprivileged, the subordinate and excluded classes) and a “them” (the status quo, the elites, the system, the power). Society is divided into two antagonistic camps.
Both criteria must be present for a discourse to be classified as populist

Checking and correcting the results

For each speech under study, a dataset is created which contains rows equal to the number of paragraphs of each speech and columns equal to the variables under study. Subsequently, the ChatGPT results are checked and corrected as described below.

The critique, political agenda, and topic in each paragraph

The paragraphs under analysis are read by a minimum of two journalists who verify the ChatGPT results in terms of: a) determining whether each speech passage criticizes opponents or presents positions/agenda of the leader’s party, and b) identifying the topic being discussed. If the ChatGPT results are incorrect, the working group makes the necessary corrections. Paragraphs that contain an equal amount of the programmatic position and the opponents’ criticisms are categorized as “agenda” as political leaders often contrast the positions of their party with those of opposing parties/leaders during a campaign speech.

Regarding the topics being discussed, paragraphs that address more than one issue are classified according to the most prevalent one. The “elections” topic includes paragraphs that focus on the conduct of the upcoming elections, debates on proportional representation, exhortations by campaign leaders to citizens to vote in a specific manner, and any other passages of speech that create an atmosphere during the election period without referring to other specific issues.

For instance, paragraphs that fall under both “criticism” and “election” categories indicate negative remarks made by a political leader regarding the electoral process or post-election partnerships of their opponents.

In cases where paragraphs are dominated by various historical references, as in excerpts from speeches given by Alexis Tsipras in Menidi and Dimitris Koutsoumbas in Kaisariani (both on 30 April 2023), they are categorized as “agenda” under the topic of “culture”.

The topics in leaders’ speeches visualized in a treemap

Sentiment, polarization, and populism

With regards to sentiment, polarization, and populism, two members of the working group – the political scientist and a journalist – review the analyzed paragraphs and verify the ChatGPT’s results for the respective indicators. In case of incorrect results, the working group makes the necessary corrections as follows:

  • Each excerpt is classified into a category based on the value it receives from the sentiment analysis indicator and it is placed into one of three categories: a) negative (if the score is between -1 and -0.34), b) neutral (if the score is between -0.33 and 0.33), or c) positive (if the score is between 0.34 and 1). If ChatGPT assigns a sentiment score that places a paragraph into a different category than human judgment, the incorrect value is deleted from the dataset and not used for further analysis/visualizations. The working group decided to use deletion as a way to correct the error, as it is the optimal approach to avoid any ad hoc human intervention in the sentiment indicators’ average, which is utilized to classify the set of the given election speech as negative, neutral or positive.

The sentiment analysis in a speedometer

  • Each paragraph is evaluated using polarization and populism indicators, and then placed into one of three categories: a) zero or low level (if the score is between 0 and 0.5), b) medium level (if the score is between 0.51 and 0.8), or c) high level (if the score is between 0.81 and 1). If ChatGPT assigns a value that places a paragraph into a different category than human judgment, the incorrect value (either polarization or populism) is corrected by the working group. To make the correction, the political scientist member of the working group replaces the incorrect value as follows: 1) with 0 if the paragraph should be placed in the category “zero or low level” (either polarization or populism); 2) with 0.6 if it should be placed in the category “medium level”; or 3) with 0.9 if it should be placed in the category “high level”. The rationale behind setting the corrective value of paragraphs that belong to the “zero or low level” category to 0 is due to the distribution of values. It was noticed that most of the excerpts in this category have scores that are close to 0. Hence, it was determined that their corrective value should also be 0 to prevent human intervention from generating “extreme values” within the category.

The evolution of polarization and populism in a step chart

As of the time of writing, the working group has analysed and checked a total of 20 speeches, and estimates that approximately 30% of the thematic analyses and 10% of the other indicators (sentiment, polarization, populism) require correction based on the results obtained from ChatGPT.

Data visualization

The results of the analyses are visualized in Observable through various charts and graphs. These include:

  • A treemap that displays the distribution of topics covered by each political leader in their speeches. The size of each section reflects how much time was spent on the topic relative to the overall length of the speech. The percentage displayed represents the estimated proportion of words spoken on that topic compared to the total number of words in the speech.
  • A speedometer chart that plots the sentiment analysis of each speech. The needle’s position indicates the average sentiment value of a paragraph. Depending on this average, the speech is subsequently classified as negative, neutral or positive.
  • A step chart that illustrates the evolution of polarization or populism during each speech, with the horizontal axis representing the number of paragraphs.
  • A radial dendrogram that shows the entities named by the political leader in their speech, organized by type.

«Radial dendrogram» με τις οντότητες που κατονομάζονται στον λόγο των αρχηγών

The working methodology will be revised and updated as necessary to reflect any changes in the project during the election period.

The working team

Idea & Project Coordination: Thanasis Troboukis, Kelly Kiki (iMEdD)
Journalistic Research/Analysis: Nota Vafea, Katerina Voutsina, Stefania Ibrishimova, Athina Thanasi, Kelly Kiki, Chrysoula Marinou, Thanasis Troboukis, Georgios Schinas (iMEdD)
IT Support: Christos Nomikos, Nikos Sarantos (iMEdD)
Scientific Advisor on Political Theory: Antonis Galanopoulos, PhD Candidate at the School of Political Sciences, Aristotle University of Thessaloniki
Software Development/ Data Analysis: Pavlos Sermpezis, Stelios Karamanidis, Dimitrios-Panteleimon Giakatos, Ilias Dimitriadis (Datalab, School of Informatics, Aristotle University)
Datalab Director (School of Informatics, Aristotle University): Professor Athena Vakali
Translation: Anatoli Stavroulopoulou

Λογότυπο Άδειας Χρήσης Creative Commons Non Commercial International