Jun 26, 2015

Making social media research more reliable and reproducible

The social media revolution is a good data source for those scientists interested in human behavior.

Juergen Pfeffer and Derek Ruths, in a scientific paper based on the shortcomings of studies about social media, wrote, “Powerful computational resources combined with the availability of massive social media data sets has given rise to a growing body of work that [measures] population structure and human behavior at unprecedented scale.”

This huge amount of data providing detailed information about human attitudes is important but according to Pfeffer and Ruths, “mounting evidence suggests that many of the forecasts and analyses being produced misrepresent the real world.” These results are full of troubles and most of them are connected with other areas and not easy to overcome. 

Population bias is one of the main problems of social media. In social science research, large samples are necessary to solve the problem of sampling bias, which is the probability to not choose at random the objective group and consequently the possibility of not being representative.

This is far more probable to take place with a group of 20 people than a group of 20,000.

Because of that, sampling biases are rarely corrected for, and often are not even accepted at all, write Pfeffer and Ruths. However, the groups of people who use each platform can have distinctive characteristics.

The lack of unrestricted access to data is a more important problem. In the mentioned study, it said that social media companies use algorithms to sample and filter the results without detailed information about the concrete process. Moreover, replication is also restricted due to privacy limitations.

Last but not least, the biggest problem in social media studies is publication bias. A great deal of failed studies aren’t published, while positive results studies are almost always published. If negative results aren’t published, Pfeffer and Ruths think that it’s impossible to tell how much chance is involved in the positive findings. Getting researchers to publish negative results—and getting publishers to accept them into the journals—would be a huge step forward in solving this problem.

All of these problems —publication bias, sampling problems, open access to data, and inaccurate self-reporting—are problems in many fields.

Ultimately, these problems are not impossible to fix. However, it’s essential to resolve the methodological limits in the process so in order to identify reliable and consistent results. And in the meantime, we probably need to take analyses of social media data with a truckload of salt. 2014. 

Science, 2014. DOI: 10.1126/science.1257756

Written by Dr. David Alcantara and Paula Ruíz for The All Results Journals.