Social media data is now widely used by many academic researchers. However, long-term social media data collection projects, which most typically involve collecting data from public-use APIs, often encounter issues when relying on local-area servers (LANs) to collect high-volume streaming social media data over long periods of time. We present a cloud-based data collection, pre-processing, and archiving infrastructure, and argue that this system mitigates or resolves the problems most typically encountered when running social media data collection projects on LANs at minimal cloud-computing costs. We show how this approach works in different cloud computing architectures, and how to adapt the method to collect streaming data from other social media platforms.

For further information:

RRoCCET21 is a conference that was held virtually by CloudBank from August 10th through 12th, 2021. Its intention is to inspire you to consider utilizing the cloud in your research, by way of sharing the success stories of others. We hope the proceedings, of which this case study is a part, give you an idea of what is possible and act as a “recipe book” for mapping powerful computational resources onto your own field of inquiry.