https://cdn.iflscience.com/images/413a8b3c-eb6e-5008-b02a-95b8a46dbc47/default-1590596918-cover-image.jpg
Neale Cousland/Shutterstock

Heyyyy Duuuuuude: Researchers Study 100 Billion Tweets To Characterize “Stretchable Words”

by

We’re all guilty of typing a "hahahaha" every now and then (even if we’re not actually laughing) or a "suuuuuuure" (when we’re definitely not sure of something), but an analysis of about 10 percent of all tweets over an 8-year period (around 100 billion in all), has shown that Twitter users are not afraid to include an elongated word or two.

Thousands of these so-called “stretchable words” were identified in tweets by researchers from the University of Vermont, Burlington, in the most comprehensive study to date of their use on social media. Examples ranged from “playyyyyyed” to “booooooobies”, “chelseaaaaaa” to “everrrrrrrrrry.” The use of a rather emphatic “pppppppplleeeaaaassseeeeeee” was also spotted when users wanted to stand out to celebrities they were asking to follow.

From this sample, the authors of the study, published in PLOS ONE, created two traits to characterize the words that they found. The first, balance, refers to the degree to which different letters tend to be repeated. For example “hahahaha” has a high degree of balance, as both the “h” and the “a” are repeated equally, whereas “suuuuure” has a lower degree of balance as the “u” is repeated a lot more than the other letters.

The second characteristic is stretch, i.e. how long the word can go on for. Shorter words or sounds (such as “ha”) were found to have a higher stretch as they are easily repeated. Longer words, on the other hand, lack this stretch appeal, and often just had their final letter repeated, such as “infinityyyy.”

“We were able to comprehensively collect and count stretched words like 'gooooooaaaalll' and 'hahahaha', and map them across the two dimensions of overall stretchiness and balance of stretch, while developing new tools that will also aid in their continued linguistic study, and in other areas, such as language processing, augmenting dictionaries, improving search engines, analyzing the construction of sequences, and more,” the authors of the study said in a statement.

One area where sequences are of particular importance is genetics. The researchers hope that future work into the field, potentially looking at patterns of mistypings and misspellings, could provide a method to further our grasp on the genetic code.

https://cdn.iflscience.com/images/960f9350-727e-5474-8a4c-0c4f2b281d50/content-1590596542-232640.jpg
"The tree of laughter."  Researchers created spelling trees such as this one for "ha", to show the multitude of ways the phrase was stretched. A line to the right represents an "a" and a line to the left is for a "h". Gray et al, 2020