In conversation with Professor Bertus van Rooy

This is a first of a series of interviews I will be conducting with people who have interesting and relevant things to share about linguistic data collection. I chose Bertus van Rooy, Professor of English Linguistics at the University of Amsterdam, and my colleague, for this inaugural interview, as he was instrumental in developing the initial idea about this project, and provided invaluable feedback. The interview below is based on a conversation we had about the importance of addressing the data collection challenges and opportunities we are currently facing.

Before coming to the University of Amsterdam, Professor Van Rooy worked at the North-West University in South Africa as Professor in the School of Languages. He is a member of the editorial boards of, among others, World Englishes, English World-Wide and the International Journal of Learner Corpus Research, and also a former president of the International Association for World Englishes.

Tell me something about the kinds of data that you collected and you worked with in the early days. What was that like?

I first worked with speech data collection in the early 90s, for phonetics projects. We had repurposed a lab which was originally designed for language lessons, from the era of the audiolingual method for language teaching, and equipped it for speech recording, which was done on magnetic tapes at the time. Shortly after this, I got into spoken language data collection for sociolinguistic purposes, mainly for student projects. My big ploy at the time was to use a very hefty tape recorder set in a very big fixed microphone with a stand that I put in front of the speakers to simulate an unnatural context, which got people to speak in formal and very self-conscious ways. Then, I would end the interview, switch off the microphone, put away the equipment, and have a casual chat with the person after that, which was the point at which I also switched on the hidden microphone in my jacket pocket. That was an easy way to get really spontaneous speech. People just relaxed after the formal set-up. I don’t think I ever transcribed or did anything with the formal data. I just discarded that and only focused on the surreptitiously recorded parts. At the end of the interview I would ask the person permission to use that part of recording. At the time, nobody thought that was particularly invasive, so I got away with it and it got me recordings of some really nice data.

How much data were you able to collect for research projects in those days?

The first ones were very small BA-thesis-like projects, and then my first early pieces of research after finishing my BA, while I was doing my Master’s in phonetics. I continued playing with sociolinguistic data a bit, and that carried on after my PhD. At that point, I was teaching third-year sociolinguistics classes, and I’d send the students home over weekends to their families in the rural areas where they’d record conversations around the dinner table. There weren’t mobile phones with recording devices, so just a small pocket recorder that they would put on the table and record dinner table conversations. We had good data from that, typically ten to fifteen conversations would be enough to do an article from, depending on your interest. If you wanted to do syntactic variation, that would typically not be enough, but for features of spoken language interaction it is sufficient. When I migrated from phonetics to syntax, my need for data increased. At the time, our interest was in recording things as naturally as possible and finding all kinds of ways to do that, such as hidden microphones or students having conversations with their own families around the dinner table – those were the things we thought might work.

You raise the issue of linguists’ search for creative methods of ‘tricking’ speakers into speaking naturally. However, post-Corona, a large part of our social conversations is conducted via video-call software. How do you think this would affects what we see as ‘naturally occurring speech’?

That is a difficult question and I think it depends on the kinds of conversations. Something that I do quite a lot these days is to video-call my parents, whereas twenty years ago, if I were not close to them, I would have written them letters. Now, I video-call them, so I keep much more regular contact with them and I don’t feel very strained there, compared to, for instance, when I’m lecturing via Zoom. It’s much more the case of ‘I’m looking at them and they are looking at me’, so I’m not very self-conscious when talking to my parents using video call. But those conversations are different from telephone conversations, because you have the extra visual support and you can move around the house, or I can put the phone next to me when I’m preparing dinner and continue talking to my mother. So, having a conversation for half an hour now becomes just something that I do. These types of conversations are also different from actually having an in-person visit with my parents, because in that case we would probably not talk as much. Now we’ve got half an hour set aside and that has to be filled by talking, as we are not just going to stare at each other. So, it creates a new type of conversation, a new type of social engagement that a sociolinguist should be interested in. And it could be an interest along comparative lines, looking at, for instance, how a video-call conversation differs from a telephone conversation, or from a face-to-face conversation. I think we are in a stage where we probably haven’t grappled with the sociolinguistics of these new types of conversations that are enabled by the new technologies.

Whereas even a couple of years ago, collecting data via Skype, for instance, would not have even been thought of as a viable option for recording naturally occurring speech, we are now in a situation where Skype conversations are naturally occurring speech, as we have been forced to conduct the majority of our conversations through these tools. How can we approach this from a linguistic perspective?

I remember my first appointments came without a computer. I got my first office computer in 1996. Computers were around but it wasn’t an assumed and necessary piece of office equipment. The switch from writing to typing and to typing on a computer and then to email coming in replacing letters were all very gradual steps. When the lockdowns happened, things that had been around for a long time, and perhaps had designated uses, suddenly became much more useful than they were before. Before the lockdown, I was used to Skype and not much else. Everything else I had looked at as a new task of technology acquisition, something that I looked at as an obstruction. And suddenly it got to a point where it took me five minutes to master a new piece of software. So, the software, or the fact that it was an electronic platform, became less of an issue, as this became the best possible way of maintaining contact for either personal or professional purposes.

It would have been wonderful if we had recordings of people hesitantly using these platforms three years ago and people using these platforms now. And it would still be interesting to see what will happen three years from now, if normal face-to-face contact becomes possible again, in principle, for work places. I suspect we will retain a lot of the online stuff because work places with a large number of employees may not require employees to travel to a central point to meet face to face anymore. So, these online media might become the new normal after Covid as well. I think new things are happening, and if we want to do good sociolinguistic research we should probably sample this as much as possible now, and continue to collect data almost like, in corpus-linguistic terms, building monitor corpora of these things, while making sure to keep accurate notes of the time and the conditions and the available choices at the point of data collection. On the other hand, while collecting data by recording conversations conducted via online media might make things easier, I think we need to ask very serious questions about the relationship between this new type of data and older types of data. I don’t think it’s simply a matter of replicating online what happens in face-to-face conversations.

What do you think would be some areas where linguistics can benefit from this type of data? What could we learn about how people use language in these contexts that we couldn’t have learned otherwise?

The first obvious thing that I would think of – but maybe it’s not the best thing, because it’s so obvious – is to look at the ways in which patterns of interaction change. For instance, overlapping speech that is non-problematic in face-to-face conversations becomes problematic in online events. You have distortions, problems in following somebody, and a whole new set of routines for dealing with that. Certain politeness routines needed to do the metalinguistic work of turning over the floor are made explicit all of a sudden in ways that they weren’t before. The way we use minimal responses, or backchannels to keep the conversation going, may also change or decrease in video-call contexts.

In general, I think the nature of interaction also potentially differs. Suddenly, you are communicating with the same person perhaps via telephone, via short messages and via longer emails and then it gets to a point where you realise that it’s taking too long to resolve the issue via text-based communication and you say ‘let’s quickly meet via Zoom or Skype’, so conversations suddenly continue from a different context that was established in a different medium. And these different media and contexts of information integrate seamlessly and become part of the same conversation. If these conversations happen with a lot of shared knowledge between participants, and that shared knowledge is not knowledge that got shared by means of a synchronous conversation that is part of the video or the audio call, new co-texts come into being. I don’t know if we are ready to deal with that yet, as we should be looking at the total set of communicative contact points, rather than isolating speech from text. That’s a new challenge that we haven’t thought of in the past, because we have been sort of medium-based in our data collection and, in the Labovian interview, working through stages of formality or styles; now, we have to cross multiple mediums to get to the same conversation.

This blurring of spaces becomes parallel with the blurring of social relations, perhaps. Our concept of ‘sites of data collection’ was much more discreet. Suddenly, we were forced to use the same tools to keep in touch for professional and private purposes. So, we probably need to think very clearly about what we do when conducting research, but at the same time avoid overthinking in a space where we don’t yet have a good enough understanding of what’s going on. My recommendation would almost be to be opportunistic for the next few months and collect as much data as possible, but be very careful in documenting the extra textual variables, so that, hopefully, over time the patterns will emerge. It’s almost like embarking on an ethnomethodological project.

When we think about early career researchers being confronted with this situation, do you think it would still be possible or viable to find a way to collect data through video-call software that would be ontologically and epistemologically (in terms of the knowledge we can gain from it) similar to data collected by the traditional face-to-face method?

I want to speculate that researchers with experience in the older methods would be more likely to try to tweak the new environment to replicate the old methods. And it’s a sign of old age almost, of being stuck and being married to your method. And if we think back, the traditional method was designed to elicit spontaneous language in a context where the fact of recording it introduced, as Labov called it, the Observer’s Paradox. And it might well be that in this new environment, recording introduces less of an Observer’s Paradox, so it might not be necessary to get ontologically to good data. The epistemological question of getting results that we can situate in the field of discourse that is established in the discipline, which includes, amongst others, comparability to previous work – that is a harder thing to achieve. I almost want to shout to younger researchers to take this opportunity, since it is likely that younger researchers are more skilled at online lives than older researchers. Use that to your advantage and reserve replication of the sociolinguistic interview for a second phase, after you have established a sort of baseline data, then go back to the same participants, and then use the pretence of the linguistic interview about their experiences as a starting point, and take them through the various styles of the Labovian interview. In other words, first get ‘uncontaminated’ data of speech in these new contexts, and then see what were the ‘contaminations’ caused by the method in the past.

How can we deal with data protection regulations at this point? Is there clarity on these issues?

Probably not. If we wait for ethics committees to get that kind of clarity, the Corona virus opportunity might be over, so if you want to go that route, you have to write something to the ethics committee to convince them that you take reasonable precautions, but you cannot anticipate everything at this stage. These reasonable precautions would be ensuring that you protect the privacy of speakers, and a very standard thing is to allow people to delete stuff from their own recordings. You can also guarantee a reasonable degree of anonymity. I think that one has to be bold here and write an application that concedes to the unknown here and take the idea of ethics in research at face value, and that is to enhance the quality of research as much as it protects the dignity of individuals. This is an opportunity for ethics review boards to show that they are also interested in enhancing research quality, and not obstructing research opportunity.