How survey statistics may be predictable without asking people

Jan Ketil Arnulf

Statistics in surveys on organizational behaviour may be predictable prior to asking real respondents, indicates new study.

KNOWLEDGE @ BI: The meaning of surveys

The study was conducted by an international team of behavioral and information scientists from BI Norwegian Business School, Leeds School of Business, University of Colorado at Boulder, and University of Malaysia at Sarawak.

The researchers applied computer algorithms to analyze the meaning of the questions asked in a range of commonly used organizational surveys. These covered topics such as leadership, motivation, job satisfaction and personality tests.

The purely linguistic data were then compared to the statistics from hundreds and sometimes thousands of real respondents to the same questions.

More than 80% explained

The commonly used statistics to describe the survey responses emerged as largely corresponding to the information contained in the language of the items.

In some cases, the linguistic properties among the items explained over 80% with the variation in data. Linguistic properties enabled the researchers to accurately predict the relationships among the questions, the so-called item correlation matrix.

The most well-known surveys were all explicable in this way, as only the personality test seemed to evade semantic analysis.

Digital language analysis

The study has several interesting implications.

First, it shows how digital language analysis has advanced as a tool for social science. Technologies related to indexing and storing of information–used daily on the Internet by most people–can be used to explore and predict in measures of behaviour, such as survey responses.

More surprisingly, it also shows how relations in survey data are sometimes so strongly overlapping with typical use of language that these seem predictable prior to the study.

Disturbing to researchers

While the findings may be encouraging to people working in linguistics and text analysis, they may be more disturbing to researchers in the field of organizational behaviour (OB).

Research on topics such as leadership and motivation relies heavily on advanced statistical analyses of survey data. If the emerging structures of surveys are computable from resources known prior to collecting data, it implies that information collected from surveys may not be what it purports to be. This has earlier been claimed on theoretical grounds, but the current study is the first to show this empirically.

The authors are continuing the development of a semantic theory of survey responses (STSR) to explain the findings and their ramifications, and hope to offer a novel perspective in the debate about the nature of survey data.

This research benefited greatly from support from the U.S. National Science Foundation (grant NSF 0965338) and the National Institutes of Health through Colorado Clinical & Translational Sciences Institute (grant NIH/CTSI 5 UL1 RR025780).


Arnulf, J. K., Larsen, K. R., Martinsen, Ø. L., & Bong, C. H. (2014). Predicting Survey Responses: How and why semantics shape survey statistics in organizational behavior. PLOS ONE.


Published 4. September 2014

You can also see all news here.