Tools & Practices

How we analyzed nine months of conversation about the Greek wiretapping scandal on Twitter

Λογότυπος Datalab Τμήματος Πληροφορικής ΑΠΘ
Go to the story

The working methodology, from data collection to data analysis, as part of the research collaboration presented by iMEdD Lab and Datalab, on the analysis of the conversation about the wiretapping case on Twitter over the last nine months.

In November 2022, when the hashtags #ypoklopes and #υποκλοπες were among the top so-called “trending topics” on Greek language Twitter, some of the key research questions were when do we start observing an intense and long-lasting conversation about the case of wiretapping and interception (which we knew at the time empirically that had been going on in the Greek public sphere for a year already, but we also knew that it had taken a long time to find its place in the daily news agenda) and, of course, what are its characteristics: indicatively, we asked which categories of accounts are involved in the conversation, whether their motive is to produce or repost content, what their political interests are, and whether there are evident traits of a polarized dialogue in the digital imprint of the case on Twitter.

The working methodology summarized.

Data collection

The data was collected by the Datalab group, with the use of the Twitter API (Application Programming Interface) v2, for academic research purposes –a license with which the social media offers the option to retrospectively retrieve all the tweets that have been made on the topic of interest, as long as they contain the keywords and/or hashtags defined by the researchers.

In this case, the following hashtags and keywords were used to collect data about the wiretapping conversation on Twitter: υποκλοπές, υποκλοπη, #υποκλοπες, #υποκλοπές, #παραρακολουθήσεις, επισύνδεση, επισυνδέσεις, #επισυνδέσεις, #δημητριαδης, #κοντολεων, #κουκακη, #ανδρουλακης, #ypoklopes, predator, #predator, #predatorgate, #pega, #spyware, #watergate, greekwatergate.

Therefore, the analysis presented by iMEdD Lab and Datalab, is based on and limited to posts made in the Greek language, on Twitter after April 1, 2022 including any of the aforementioned hashtags/keywords (either combined or not). The point in time for the beginning of the investigation and data analysis was set based on the knowledge that the publication of Thanasis Koukakis’s monitoring case and related journalistic revelations had contributed to the wider visibility of the matter, something that became apparently confirmed by retrospective, test searches for data collection from January 2022: in the first quarter of 2022 there was no relevant conversation on Twitter, only a few –not very relevant– tweets.

It should be noted that the selection of the aforementioned terms as data collection criteria was based on journalistic and technological criteria: after the initial thorough listing of terms relevant to the conversation about the wiretapping case on Twitter, the available data were quantified and subjected to quality control. For example, while the acronym ΕΥΠ (NIS) was initially included in our potential search terms, during quantitative and qualitative controls, we found that our searches with the term “ΕΥΠ” would mainly return data that was not relevant to the subject of the research, i.e. any tweet containing a word that included the string “ευπ”. Therefore, it was eventually not included in the data collection terms.

Similarly, also for reasons related to ensuring the relevance of the data to the subject matter of the conversation on the wiretapping case under study: a) the hashtag #ανδρουλακης (androulakis), which refers to the MEP and President of PASOK-KINAL, Nikos Androulakis, was added to the data collection criteria for tweets posted from July 20, 2022 –previous mentions of him on Twitter relate mainly to his activity as an MEP and are not related to the wiretapping case; b) this and other hashtags corresponding to the surnames of individuals are included in the data collection criteria for tweets posted until November 28, 2022.

For the analyses included in the publication under the title “The Greek wiretapping scandal on Twitter: The course of the conversation over the last nine months, polarization and the role of the media”, the closing point of the data collection differs as follows:

  • Analyses related to the volume of posts and their progression over time pertain to tweets made up to January 14, 2023. The sample comprises of 953,722 posts of all types (tweets, retweets, quotes, replies).  
  • Analyses related to the progression of the number of engaged unique users over time and their activity by post type, the top accounts in terms of number of published tweets and replies as well as number of mentions by others, the most influential accounts and the websites whose content is most circulated on Twitter pertain to tweets made up to January 1, 2023, which add up to more than 900,000.
  • Analyses related to the distribution of accounts involved by category (individuals, informative, political, other), to the political affiliation that users tend to follow most, and to polarization, pertain to tweets made up to December 1, 2022.

However, the data collection continues uninterrupted to date (and for an unspecified period of time) by Datalab, for the needs of the web application it has implemented in order to keep monitoring the case.

  • <iframe src="https://flo.uri.sh/visualisation/12521331/embed" height="900" width="100%" allow="fullscreen"></iframe>

Regarding the analyses on the political affiliation that the participating users tend to follow most, the working group considered it necessary to compare the conversation on the wiretapping case with other current affairs topics –both topics directly related to political parties or persons and others that are not directly related to political persons or political affiliations. Given the fact that the working group had no knowledge of any similar previous study focusing on issues of increased local interest, it was decided to sample and analyze data on three much discussed topics of recent news in Greece. The topics covered are:

  • The involvement of the MEP and deposed since December 13, 2022 Vice President of the European Parliament Eva Kaili in the so-called Qatargate scandal. (Study period: December 7-18, 2022. Sample: 263,070 tweets from 25,541 unique accounts)
  • The case of the lifting of the immunity of MEP Maria Spyraki. (Study period: December 15-18, 2022. Sample: 32,074 tweets from 7,608 unique accounts)
  • The injury and death of the Roma teen, Kostas Fragoulis, by a police officer’s shooting. (Study period: December 4-18, 2022. Sample: 159,506 tweets from 22,235 unique accounts)

The hashtags and keywords used as criteria for data collection were, per case, the following:

  • On the Eva Kaili case: Καϊλή, Καιλή, #Καϊλή, #Καιλή, #ευα_καιλη, ευρωκοινοβουλιο, #ευρωκοινοβουλιο, Κατάρ, #Κατάρ, #Καταρ_Gate, #Kaili, #EvaKaili, EvaKaili (either combined or not), as well as the combination of “αρση” (lifting) and “ασυλια” (immunity)
  • On the Maria Spyraki case: Σπυρακη, #σπυρακη, #μαρια_σπυρακη, ευρωκοινοβουλιο, #ευρωκοινοβουλιο, #Spyraki, #mariaspyraki, #maria_spyraki, MariaSpyraki (either combined or not), as well as the combination of “αρση” (lifting) and “ασυλια” (immunity)
  • For the injury and death of Kostas Fragoulis: Φραγκουλη, #Φραγκουλης, #ΚωσταςΦραγκουλης, #Κωστας_Φραγκουλης, Ρομα, #Ρομα, γυφτοι, #γυφτοι, #με_την_Αστυνομια, #16χρονος, #16χρονος_Θεσσαλονικη, 20 ευρώ, 20_ευρώ, 20ευρω, εικοσαευρω, #20_ευρώ, #20ευρω, #εικοσάευρω(either combined or not), as well as the combinations “αστυνομ” (police) and “Θεσσαλονικη” (Thessaloniki), “16χρονος” (16 year old) and “Θεσσαλονικη” (Thessaloniki), “ΔΙΑΣ” and “Θεσσαλονικη” (Thessaloniki)

Data analysis

After the collection of all tweets containing any of the above-mentioned terms per case study, the usernames, the number of posts per category (tweets, retweets, quotes, replies), other included words, urls, time of creation of each tweet, etc., were extracted from these posts, and secondary analyses followed, as appropriate.

In order to analyze the participating users by category (individuals, informative accounts, political accounts, others), a sample of 2,262 unique accounts was compiled, as a result of overlapping accounts after selecting the top 500 in each of the following categories: those who have posted the most tweets, those who have responded the most to third-party tweets, those who have posted the most quotes, those who have been quoted the most by others, and those who are most influential in the conversation about wiretapping. In fact, the latter are obtained as an algorithmic result based on graph network visualizations analysis and compiled by taking into account the retweets, mentions and replies recorded among users. In this case, the algorithm used was the one described in Chen, C., Tong, H., Prakash, B. A., Tsourakakis, C. E., Eliassi-Rad, T., Faloutsos, C., & Chau, D. H. (2015). “Node immunization on large graphs: Theory and algorithms”, IEEE Transactions on Knowledge and Data Engineering, 28(1), 113-126 (see here).

Γράφος που δείχνει την πόλωση στη συζήτηση για την υπόθεση των υποκλοπών στο Twitter

For the purpose of analyzing the accounts involved in the conversation by category, initially we collected all the accounts of Greek parties, members of the Greek parliament and Greek MEPs on Twitter. The accounts of Greek MPs and MEPs were collected using the relevant data available on Vouliwatch, the independent, not-for-profit open governance initiative, and the European Parliament’s website respectively. The same sources were used to match MEPs with the parties to which they belong. Afterwards, the usernames that appeared in the database with the tweets that had been collected during the research were marked accordingly. Subsequently, the remaining accounts were studied and marked accordingly by the working group: as “informative accounts” (in the case of media, journalists and blogs), as “individuals” or as “other” (in the case of organizations, brands, etc.).

Note that the expertise behind Datalab’s Bot Detective tool was utilized in order to verify bots’ activity, but only a statistically negligible number of unique accounts were found.

In order to categorize the users by the political affiliation within which they allegedly “follow” more often, we used the procedure outlined below: each account that was identified as content creator in the studied sample of tweets, regarding each case in question, was positioned programmatically on the traditional left-right political spectrum, taking into account the political affiliation of the accounts it follows, including political parties, members of the Greek Parliament and Greek MEPs. Specifically, users who follow mostly SYRIZA, the Communist Party of Greece (KKE), MeRA 25 and their party members were positioned on the Left. Accounts that mostly follow New Democracy, Greek Solution and their party members were positioned on the Right. Users who mostly follow PASOK-KINAL and its party members, as well as users who follow an equal number of Left and Right political accounts, were positioned in the Center. Users who do not follow the aforementioned political accounts were defined as neutral.

Programming languages and other tools

Data collection, processing and analysis were carried out using the Python programming language. Graph network visualization was carried out in Gephi. For the creation of the other visualizations included in the publication, the web-based tools Datawrapper and Flourish were used.

The development of the web application, which was implemented by Datalab for the ongoing monitoring of the topic, was carried out in JavaScript.


For more analysis and ongoing coverage of the topic, visit the Datalab web application


This is the result of a research collaboration between the iMEdD Lab and the Data & Web Science Lab (Datalab) of the School of Informatics of the Aristotle University of Thessaloniki.

Research/Data Analysis & Visualization:
Ilias Dimitriadis, Stelios Karamanidis, Pavlos Sermpezis (Datalab)
Kelly Kiki (iMEdD Lab)

Director of Datalab: Professor Athena Vakali

Additional research assistance:
Dimitrios-Panteleimon Giakatos, Vasileios Psomiadis (Datalab)
Phoebe Fronista (iMEdD)

Translation: Evita Lykou

The data was collected programmatically by the Datalab group, with the use of Twitter API (Application Programming Interface) v2, for academic research purposes. The following hashtags and keywords were used to collect data about the wiretapping conversation on Twitter: υποκλοπές, υποκλοπη, #υποκλοπες, #υποκλοπές, #παραρακολουθήσεις, επισύνδεση, επισυνδέσεις, #επισυνδέσεις, #δημητριαδης, #κοντολεων, #κουκακη, #ανδρουλακης, #ypoklopes, predator, #predator, #predatorgate, #pega, #spyware, #watergate, greekwatergate. The analysis presented herein is based on and limited to posts made on Twitter after April 1, 2022 including any of the aforementioned hashtags/keywords (either combined or not). Noted that, for reasons related to ensuring the relevance of the data to the subject of the conversation on the wiretapping case under study: a) the hashtag #ανδρουλακης (androulakis) was added to the data collection criteria on July 20, 2022, b) this and other hashtags corresponding to the surnames of individuals were included in the data collection criteria until November 28, 2022. For the hashtags and keywords used as data collection criteria for the purposes of comparative research with other recent news topics, as well as for more detailed information on the methodology, please read here.

Λογότυπο Άδειας Χρήσης Creative Commons Non Commercial International