Social Media Listening

Social media listening refers precisely to the Social listening conducted exclusively on social media, thus omitting other internet sources.

Within the context of popAI, the Centre for Research & Technology (CERTH) has conducted a social media listening activity, thus complementing the work carried out on the internet by ECAS colleagues.

In this light, the opinions and emotions of European citizens towards AI tools in Law Enforcement and safeguarding privacy and fundamental rights have been analysed. The most recent posts on Twitter related to this topic were extracted and sentiment analysis has been applied accordingly, through Natural Language Processing and computational techniques that analyse natural language and speech.

What is Sentiment Analysis and why it is important?

The automatic method of categorising text data into positive, negative, and neutral categories is called Sentiment Analysis. Sentiment Analysis uses machine learning to automatically identify how people are talking about a given topic. The most common use of sentiment analysis is detecting the polarity of text data that is automatically identifying if a tweet, product review or support ticket is talking positively, negatively, or neutrally about something.

Methodology

In the context of the popAI social media listening, the following procedure was followed:

  1. Compilation of a set of data from Twitter following keywords regarding citizen attitude towards controversies.
  2. Cleaning of data in order to use only those that presented European locations.
  3. Carrying out sentiment analysis for Europe and presenting NLP results on the most commonly occurring words on tweets to add more context on what people think and how they express themselves.
  4. Reinforcement of the above with NLP and conducting Sentiment analysis for 4 regions of Europe.

Data was obtained via API (Application Programme Interface) from social media, with Twitter being the most favourable platform for this collection.

Twitter is a free social networking site, where users broadcast short posts known as tweets. These tweets can contain text, videos, photos, or links. There is also the option for users to use hashtags in their tweets. Every hashtag on a tweet is a short string following a # and it becomes linked to all of the other tweets that include it. The function of hashtags was created on Twitter to add context and to also allow people to follow topics in a convenient and user-friendly manner that they are interested in.

Data from the most recent tweets related to the topic were gathered. The tweets were posted by more than 80.000 users to record their interaction activity. This process included programming in Python, performing Natural Language Processing to extract the information from the tweets and finally visualisation of the results.

To gather the data that focused the topic, a list of keywords that were presented on a previous task of popAI project was used as “Hashtags containing keywords, phrases names of companies and names of technologies”:

Results

The result was a dataset consisting of 172.939 rows of tweets and 17 features posted by 86.568 unique users. The dataset’s features included author’s ID, tweet’s full text, the number of hashtags, the number of retweets and 5 features for URL, image URL, video URL, location and hashtags in case the tweet included some of them. It also had a feature for sentiment and reliability which were both extracted by 2 extra algorithms applied on tweets.

With the intention of the sentiment analysis, the results came mainly from the authors’ ID feature, tweets’ full texts and the features of sentiment and location, as only 20% of the tweets included hashtags, 5% of them were labeled as “reliable” and there were no retweets.

Following that, Python’s NLP (Natural Language Processing) algorithms on the dataset were applied, using Spacy pre-trained models to tokenize the locations that users declared on Twitter to find the real ones as many people used fake information in locations’ features and then filtered them to focus on the specific region Europe. Then, the locations were divided into 4 parts of Europe and we ended up with 2.816 tweets from 4 European regions.

Finally, Python’s library Matplotlib was used to visualize the results into pie charts, bar plots, and word clouds.

In the context of popAI, a public conversation on Twitter about the use of Artificial Intelligence tools in the delicate field of law enforcement, safeguarding privacy and fundamental rights has been investigated.

Close attention has been paid to European citizens’ sentiment on the topic to increase the accuracy and efficiency of its future results.

Sentiment analysis was implemented by Natural Language Processing and the results indicated large differences in terms of opinions – and related expressions of them – between different parts of Europe.

Sentiment analysis of Tweets

More than 2800 tweets from Europe were examined for sentiment analysis. On a scale of -2 to 2, sentiment ranges from extremely negative (-2) to extremely positive (2). After the calculations, half of the population presented a mixed impression of the role of AI in security, while the other half expressed both positive and negative views.

More specifically, nearly half of the tweets (41%) had neutral thoughts regarding AI in security systems, compared to 32.8% who were in favor of it, 4.5% who were extremely in favor of it, 17.7% who were against it, and 4% who were really against it.

This pie chart shows the results of Sentiment Analysis of 2.816 tweets from Europe. This research reveals a sizable number of opposing views, both positive and negative, leading to the conclusion that societal opinion is not homogenised. Although, as the most considerable portion of people had a neutral opinion of AI in security systems, it emphasised the need of increasing the interest of people on the subject and focus on the benefits of AI tools to illustrate their effectiveness.

Tokenisation of the Tweets

It has been examined how people express themselves on Twitter. Tokenisation of the tweets’ text into words-tokens was applied before classifying the tokens as nouns, verbs, and adjectives using a pre-trained English language model from the Spacy Python package.

The utilisation of this methodology enables a more comprehensive identification of distinct terms through a more thorough comparison of text strings. In this case, the Spacy NLP English language model identified the most frequently used verbs, adjectives, and adverbs from tweets such as:

  • Verbs: help, revolutionizing, generated, find, evolves.
  • Adverbs: wisely, rapidly, here, together, better.
  • Adjectives: higher, right, new, artificial, worthy.

As a result, tokens with a mostly positive attitude were highlighted. Although, there were also tokens such as “scammed” that represent the negative part of the dataset.

Figure 2: A word cloud of tokens spotted on tweets divided on verbs, adverbs and nouns.

Geographical provenience

An interesting result emerges from the analysis of users’ location. Starting with the initial numbers, the dataset had 172.939 tweets, 8% of them (13.776 tweets) included locations, and 1.6% of them (2.816 tweets) had European locations.

To focus on Europe, the dataset was divided into 4 European regions:

  • Central and Eastern Europe: ‘Albania’, ‘Bosnia and Herzegovina’, ‘Bulgaria’, ‘Croatia’, ‘Kosovo’, ‘Czechia’, ‘Hungary’, ‘Poland’, ‘Romania’, ‘Slovakia’, ‘Slovenia’, ‘Serbia’, ‘Czech Republic’.
  • Northern Europe: ‘Denmark’, ‘Estonia’, ‘Finland’, ‘Iceland’, ‘Ireland’, ‘Latvia’, ‘Lithuania’, ‘Sweden’, ‘Norway’, ‘Sweden’, ‘United Kingdom’.
  • Southern Europe: ‘Greece’, ‘Cyprus’, ‘Italy’, ‘Spain’, ‘Malta’, ‘Portugal’.
  • Western Europe: ‘Austria’, ‘Belgium’, ‘France’, ‘Germany’, ‘Luxembourg’, ‘Netherlands’, ‘Switzerland’, ‘Monaco’, ‘Liechtenstein’.
Figure 3: A word cloud of users’ locations.

Sentiment analysis of 4 European Regions

The analysis of the four regions showed different citizens’ attitude across Europe.

More specifically, people from the Northern and Western parts of Europe tend to increase controversies on the topic by posting more on Twitter and having smaller differences between positive and negative opinions.

This part of the results identified an opportunity to focus on the demonstration of the benefits of AI tools in these two regions and maximise the positive sum approach on the subject.

Contrariwise, Southern and Central-Eastern European citizens contributed less on the controversies by posting a significantly lower number of tweets, presenting though different opinions. Southern Europeans showed a more positive attitude on the topic, while Central-Eastern Europeans revealed more negative thoughts. These two regions would require fundamental education on the subject in order to comprehend the existence and the benefits of AI tools in law enforcement, safeguarding privacy and fundamental rights.

Figure 4: Bar plot of number of tweets based on location, divided in 4 European Regions.

The diagram shows that conversation on the topic was highly increased in the Northern and Western parts of Europe, mainly represented by: the United Kingdom, France, Ireland, and Germany. More specifically, the graph pointed out 1511 tweets from Northern Europe, 825 tweets from Western Europe, 323 tweets from Southern Europe, and 157 tweets from Central-Eastern Europe. Therefore, there are significant differences either of the total knowledge or interest between the 4 regions.