What are (some) people saying about ChatGPT on Twitter?
I’m teaching a graduate-level course on digital research methods this upcoming fall semester, and I decided to update my social media analysis unit, taking into account the latest changes to the Twitter API. When these changes were announced in February, many people were understandably upset by the company’s decision to severely limit the number of calls that can be made with the free version of Twitter’s API. This is particularly hard on academics who rely on Twitter data for their research projects. For instance, researchers who study things like hate speech and misinformation will now have a much tougher time getting access to data for their research.
Although I don’t typically employ large-scale social media analytics in my own research, students are often interested in these methods and it’s a great way to develop some basic coding skills and understanding of different text analysis tools. After poking around a bit, I found that I could still get access to some Twitter data through an old twitter mining package rtweet. However, I’m not sure how much longer this package is going to be functional, because according to this blog post from the creator, no new updates will be made to rtweet after July first. Bummer.
But I didn’t know that when I began messing around with rtweet, so I figured I could at least get a blog post out of it!
My favorite thing about rtweet is its simple integration into the r programming language and the “tidy text” approach to text analysis. Basically, tidy text is an r package that structures large amounts of text data into individual “tokens” (words, sentences, phrases, etc.) that you can then analyze for things like word count or n-gram correlation (a fancy way of saying “words that tend to appear together”). If you have any interest in computational text analysis, I highly recommend the book Text Mining with R (free through this link). It provides a ton of hands-on exercises with different kinds of texts (literature, scientific documents) to demonstrate the functionality of a tidy text approach.
For the grad course I’m teaching, I wanted to develop a short workshop-style exercise we could do in class where students use rtweet to search Twitter for a particular term or topic, download the data, clean it up, and then visualize it. After perusing the rtweet documentation, and considering the limitations of the new Twitter API, I decided to employ the search_tweets function, which returns search results on Twitter from “the past 6-9 days.” The function supports a boolean logic (AND/OR), meaning that you can put in more than one search term and it will only return tweets that contain both of those words. Building from my recent curiosity with generative AI as a writing technology, I decided to find recent tweets that contained both “chatgpt” AND “writing.” Here is what the code looks like.
chatgpt_writing <- search_tweets("chatgpt writing", lang = "en", n = Inf, include_rts = FALSE)
Basically, this piece of code pulls english language tweets containing my search terms (minus retweets) and puts them into a data table named “chatgpt_writing.” There’s additional code you will need to apply to clean up the tweets (remove common words, organize in tidy text format, etc.), but once you do all of that, you will generate a data table with two columns and one row for each word in the data set. Here’s a snapshot of what that data looked like once it was cleaned up in this way (P.S. If you're interested, the full code for this can be found at this github repository):
Once the data is in this format, you can use some other r packages to visualize it. For this example, I just wanted to look at the top 15 words used across the dataset. Here’s what I found:
Seeing words like “content” and “prompt” is not too surprising considering that the use of ChatGPT for things like content marketing and prompt engineering tends to dominate the online discourse about generative AI. “People” stands out as a somewhat interesting term here, considering it’s not really clear how it’s being used in relation to this conversation. Maybe some tweets are talking about AI replacing human writers? Or maybe this is just a stylistic fluke in the way users tweet about AI (“People are writing college essays with ChatGPT!”)?
Regardless, I like this as an in-class exercise because it’s 1) easy to understand, and 2) not particularly helpful. In other words, it’s a good reminder that doing social media analysis in writing and rhetoric fields, particularly with statistical tools, is probably going to leave you with more questions than answers. It can be tempting (even for me) to approach these tools as a kind of magic wand that will allow me some kind of instant access to the rhetorical dimensions of an online conversation. But as is often the case with research, things are a lot messy in practice. Getting some initial results, especially in a “tidy” format, should be seen as the beginning of your inquiry, not the answer to your question.
Hopefully the rtweet package still works in the fall, but if not, then at least this blog post can serve as a reminder of the fragility of conducting digital research in an ever-evolving media ecosystem.