RRoCCET21: SMRF, a Cloud-Based Social Media Research Framework

Abstract:

Clemson University’s Watt Artificial Intelligence program has supported a wide range of research projects involving the collection and analysis of social media data. Despite the diverse application areas including political science, brand management, and cybersecurity, these projects have each involved a standard set of steps on the road from concept formation to the generation of research outputs. Though there are many tools available for specific components of the social media analytics research process, integration of these tools remains an extreme challenge for many researchers in the social sciences. To date, the Watt AI program has developed bespoke solutions to meet the needs of specific researchers. This approach does not scale well beyond a small number of engagements, limits the potential for independent activity on the part of domain experts, and increases the burden of long-term project maintenance. We designed the Social Media Research Framework (SMRF) to be a reusable, low-maintenance toolset for automating many of the rote activities encountered in social media analytics research such as data collection, data labeling, model deployment, and model monitoring. In this presentation, we discuss the challenges of software-intensive research in the social sciences, describe the thinking behind the design of SMRF, and share our researcher-friendly cloud-deployment model using IBM Cloud CodeEngine. Finally, we present a case study applying SMRF to the analysis of all Tweets from members of the US House of Representatives.

Case Study Summary:

The scientific problem we tackled:
We would like to better understand the behavior patterns of US political elites on social media and the impact that those behaviors have on others using social media platforms. For example, we would like to know the extent to which the use of polarizing language by politicians teaches other social media users to resort to polarizing language.
The computational methods we used:
Large corpora of social media messages are collected and analyzed on an ongoing basis. Types of analysis include topic and sentiment analysis of individual messages using natural language processing (NLP) methods and graphical analysis of the relationships between social media users and messages.
The cloud resources we used:
We make use of generic cloud-based services like managed SQL databases, cloud object storage, and virtual servers for running applications that continuously collect and analyze data. We use IBM Cloud's serverless utility (IBM Cloud Functions) to run periodic data management jobs. We have also made use of IBM Watson's NLP capabilities as part of our content analysis of social media messages. We currently process around 2,000 messages per day with plans to scale this 10-fold in the near future.

Author Bio:

Dr. Hudson Smith has a Ph.D. in physics from the Ohio State University. His current research interests include a focus on Natural Language Processing techniques for the analysis of large language corpora in the social sciences.

For further information:

http://www.clemson.edu/cecas/departments/ece/faculty_staff/faculty/hsmith.html

RRoCCET21 is a conference that was held virtually by CloudBank from August 10th through 12th, 2021. Its intention is to inspire you to consider utilizing the cloud in your research, by way of sharing the success stories of others. We hope the proceedings, of which this case study is a part, give you an idea of what is possible and act as a “recipe book” for mapping powerful computational resources onto your own field of inquiry.