citizen science: twitter scraping for species occurrence data?

Twitter can answer most questions, but not all of them correctly. Also, there are ethical and privacy questions with using Twitter data in a larger context.

Many packages allow you to interact with the Twitter API and extract location, time, #hashtag, content, image and video information. The more important parts are the links between hashtags, content and users. The API also allows publishing, following, RT, tweeting and modifying profile data. Still, a strict policy is enforced which flags (and later blocks) account with robots, mechanical or programmatic publishing habits.

Sentiment Analysis

Getting emotional context around specific content or hashtag for marketing to communication is simple and can be implemented using the API or directly purchased through consultancy packages.

NC state university developed Tweet Sentiment Visualization that helps with this, but also many guides are available online for Automated Machine Learning Sentiment Analysis Models.

Twitter Sentiment Chart for “GBIF’ search

Follow, Like, Retweet Bot

This is widespread use of the Twitter API for marketing, promotions, and influencing public opinion with diverse intentions. Bots are common, and Twitter seems built (as opposed to Facebook) for this purpose. The net is inundated with use cases and tutorials to configure bots and automate this.

Twitter scraping

Mighty is also the extraction of location data, which is well documented. However, while marketing use cases are common, the application in natural science seems rare.

While the location data is used by Twitter to serve relevant content, everyone is free to build upon the API to use this in social science research or commercial applications. However, I have hardly seen a citizen science project tapping into geo-refed twitter images as natural occurrence data.

Study, survey or the focus of public attention will bias the information as well as the proximity of roads, urban areas and other accessibility constraints such as land ownership, topography and similar.

I would like to see a few packages streamlining the process from Twitter species posts, maybe even image classification to location and timestamped occurrence information, along with bias description and correction proposals. This also would fall largely with GBIF’s data mobilisation scope, right? For example, a Taiwanese initiative extracted social media data for an endemic moth contributing to GBIF’s occurrences data sets. More examples are still to come.

Who can point me to existing packages or projects?

Some species seem to get more attention than others. For example, big cats and other attractive mammals rank higher than others in citizen science projects and social media.

Citizen reports currently account for the majority of alien species detections, arising from the massive observation effort that the physical and digital ‘eyes and ears’ of citizens provide, combined with crowd-sourced species identification. As reporting alien species sightings is generally not mandatory, there is interest in whether mining social media data via image recognition and/or natural language processing can improve existing passive citizen surveillance cost-effectively. Using examples from Australia illustrates how citizen surveillance for most vertebrate groups is currently effective using existing voluntary reporting mechanisms. Suppose we are to deploy data mining and analysis of social media data to help detect introductions of invasive alien species. In that case, we need to conduct it in a manner where it adds value and is trusted.

Source: Do we need to mine social media data to detect exotic vertebrate pest introductions?

Leave a comment