Monday, December 9, 2019

People trust happy-sounding artificial agents more, even in the face of behavioral evidence of untrustworthiness

If your device could smile: People trust happy-sounding artificial agents more. Ilaria Torre, Jeremy Goslin, Laurence White. Computers in Human Behavior, December 9 2019. https://doi.org/10.1016/j.chb.2019.106215

Highlights
• Smiling can be heard in the voice without any visual cue.
• This ‘smiling voice’ elicits higher trusting behaviors than a neutral one.
• The higher trust persists even when the speaker is untrustworthy.
• This has implications for the design of voice-based artificial agents.

Abstract: While it is clear that artificial agents that are able to express emotions increase trust in Human-Machine Interaction, most studies looking at this effect concentrated on the expression of emotions through the visual channel, e.g. facial expressions. However, emotions can be expressed in the vocal channel too, yet the relationship between trust and vocally expressive agents has not yet been investigated. We use a game theory paradigm to examine the influence of smiling in the voice on trusting behavior towards a virtual agent, who responds either trustworthily or untrustworthily in an investment game. We found that a smiling voice increases trust, and that this effect persists over time, despite the accumulation of clear evidence regarding the agent’s level of trustworthiness in a negotiated interaction. Smiling voices maintain this benefit even in the face of behavioral evidence of untrustworthiness.

Keywords: TrustSmiling voiceVirtual agents


5. Discussion

Using an investment game paradigm, we found that positive vocal emotional expression – smiling voice – increases participants’ implicit trust attributions to virtual agents, compared with when agents speak with an emotionally neutral voice. As previously observed, the monetary returns of the agent also affected implicit trust, so that participants invested more money in the agent that was behaving generously. Critically, however, there was no interaction between behavior and vocal emotional expression: smiling voice enhanced trust regardless of the explicit behavioral cues that the virtual agent provided to its trustworthiness. The effect of smiling voice in the game, supported by our questionnaire findings, adds to previous studies on emotional expression, showing that the display of a positive emotion increases trust and likeability, even in the vocal channel (Scharlemann et al., 2001; Krumhuber et al., 2007; PentonVoak et al., 2006). Smiling was a consistent predictor of investments overall. That is to say, while participants’ investments were primarily driven by the virtual player’s generosity or meanness, they also overall invested more money in the smiling agents. This contrasts with the predictions of the EASI model (Van Kleef et al., 2010), according to which the display of a positive emotion in an incongruent context (such as the mean behavior condition) should elicit uncooperative behaviors. While Van Kleef et al. (2010) listed social dilemma tasks based on Prisoner’s Dilemma among possible competitive situations, it is possible that participants in an iterated investment game view it as an essentially cooperative task. Specifically,
while typical Prisoner’s Dilemma tasks involve a dichotomous choice (cooperate/defect), in our experiment, even in the mean condition, the agent was still returning a (small) amount of money, which might have been seen as a partially cooperative signal by participants. If participants are reluctant to give up on cooperation — as shown by the fact that investments increase in the second half of the game in the mean condition (Fig. 3) — they might be even more reluctant to give up on partners who seem to encourage them to cooperate, with their positive emotional expression. In Krumhuber et al. (2007), people explicitly and implicitly trusted smiling faces more than neutral faces, regardless of the sincerity of their smile, and genuine smiles were trusted more than fake smiles (Krumhuber et al., 2007). Similarly, Reed et al. (2012) found that people displaying either Duchenne or non-Duchenne smiles were more likely to cooperate in a one-shot investment game (Reed et al., 2012). Thus, displaying an emotion, even a feigned one, might be preferred to not displaying any emotion at all, hence the increased investments to the mean smiling agents. Additionally, participants might have felt more positive emotions themselves upon hearing a smiling agent. In fact, emotional expressions can evoke affective reactions in observers (Geday et al., 2003), which may subsequently influence their behavior (Hatfield et al., 1994), and this ‘emotional contagion’ might be transmitted through the auditory channel as well. If this is the case, participants might have trusted the smiling agents more because feeling a positive emotion themselves might have prompted them to behave in a cooperative manner (Schug et al., 2010; Mieth et al., 2016). These results show similarities with Tsankova et al. (2015), who found that people rated trustworthy faces and voices as happier (Tsankova et al., 2015). Although they addressed the issue from the opposite direction – "Are trustworthy stimuli perceived as happier?" rather than "Are happy stimuli perceived as trustworthy?" – taken together, the studies suggest a bidirectionality in the perception of trustworthiness and cues to positive emotion, congruent with a ’halo effect’ of positive traits (Lau, 1982). The smiling-voice effect suggests that, in the absence of visual information, the audio equivalent of a Duchenne smile might act as a relative ‘honest signal’ of cooperation. As mentioned before, Duchenne smiles are smiles describing genuine happiness or amusement (Ekman & Friesen, 1982). Traditionally, in the visual domain they can be distinguished from other types of smiles because they involve the contraction of the ‘Orbicularis Oculi’ muscle, which is a movement that is notoriously more difficult to fake (Ekman & Friesen, 1982; Schug et al., 2010). Obviously, in the auditory channel it is not possible to detect a genuine smiling voice from this muscular movement. However, it is possible that a smiling voice which sounds happy might be the auditory equivalent of a Duchenne smile. As participants indicated that the smiling voices used in this study did sound happy, it is possible that the expression of happiness and amusement in the speech signal led listeners to believe that the agent could be trusted. A limitation of this study is that no video recordings were taken during the audio recordings of the speakers used in this experiment. This means that, while every effort was was made to ensure consistency in the smile production, it is possible that our speakers might have produced different kinds of smiles. As is well known in emotion theory, smiles can convey many different meanings, and several different facial expressions of smiles are known (e.g. Rychlowska et al., 2017; Keltner, 1995). However, much of the research on the effect of different types of smiles on person perception and decision making has concentrated on the difference between polite (non-Duchenne) and genuine (Duchenne) smiles (e.g. Chu et al., 2019; Krumhuber et al., 2007; Reed et al., 2012). Traditionally, these two are characterised by different muscle activation, with non-Duchenne smiles only activating the Zygomaticus Major muscle, and Duchenne smiles also activating the Orbicularis Oculi muscle (Frank et al., 1993). However, recent studies have suggested that Orbicularis Oculi activation in Duchenne smiles might actually be a by-product of the Zygomaticus Major activation (Girard et al., 2019; Krumhuber & Manstead, 2009). Also, the acoustics of smiling are only affected by activation of the Zygomaticus Major muscle, which contributes to vocal tract shape, but not of Orbicularis Oculi. Following past research that Orbicularis Oculi activation is the only thing that distinguishes Duchenne from non-Duchenne smiles, we would still expect both smiles to sound the same, as the Zygomaticus Major activation would be the same. Still, research on the acoustic characteristics of different types of smiles is lacking. Drahota et al. (2008-04) obtained three different smiling expressions – Duchenne smiles, nonDuchenne smiles, and suppressed smiles – as well as a neutral baseline, from English speakers, and asked participants to correctly identify these four expressions. Participants were only able to reliably distinguish Duchenne smiles from non-smiles, but the majority of the other smile types were classified as non-smiles. Furthermore, they only performed pairwise comparisons between a smile type and a non-smile, but they did not compare differences in identification between two different smile types. Even though they only had 11 participants, which warrants for a much-needed replication of this study, this finding suggests that people might only be able to acoustically discriminate between two categories, smile and non-smile. Similar results were obtained in studies using different types of visual smiles in decision-making tasks. Previous work using cooperative games with Duchenne and non-Duchenne (facial) smiles have shown that people made the same decisions regardless of the type of smile (Reed et al., 2012; Krumhuber et al., 2007). This suggests that people might react according to a broad, dichotomous smile category (smile vs. non-smile), even though the smiles in the experiment stimuli were of different qualities. This corroborates previous findings in nonconscious mimicry, whereby facial EMG recordings were different when viewing a face with a Duchenne smile and a neutral expression, but not when viewing a face with a non-Duchenne smile and a neutral expression (Surakka & Hietanen, 1998). This contrasts with Chu et al. (2019), who
found that participants cooperated more with a confederate expressing a non-Duchenne smile, than with a confederate expressing a Duchenne smile, following a breach of trust. However, in this study the confederate only showed the smiling expression after the cooperate/defect decision was made, whereas in Reed et al. (2012); Krumhuber et al. (2007), as well as in the current study, the smiling expression was displayed before the decision was made. As Chu et al. (2019) point out, this factor might have influenced the decisions and could explain the different behaviors. For example, participants might interpret an emotional expression – such as a smile – after a decision as being an appraisal of that decision. People might put more cognitive effort into understanding this appraisal, as this is essential for shaping future interactions, hence the more accurate discrimination of different smile types. As de Melo et al. (2015, 2013) suggest, a happy expression following the decision to cooperate conveys a different meaning than a happy expression following the decision to defect. This is also consistent with the EASI model (Van Kleef et al., 2010). On the other hand, a happy expression shown before the decision to cooperate / defect might rather convey some information about the emotional state of the person in question, and might be kept independent from that person’s actual behavior in the game. Also, counterparts’ smiles may lead people to anticipate positive social outcomes (Kringelbach & Rolls, 2003). Thus, it seems that the timing of emotional expression in relation to the behavior of interest drastically changes the interpretation of that, and future, behaviors. It would be very interesting to replicate the current experiment with different smiling voices, shown before and after the action is taken in the game. Also, if a similar study were to be replicated, the actual facial expression of the speakers could be recorded in order to determine whether different facial expressions correspond to different auditory smiles, both in terms of objective measures (acoustics) and in terms of perception and behavior correlates in the game. So far, we have compared our results with previous studies that used facial smiles. These comparisons are necessary, as at the time of writing there are virtually no studies that have employed trust games with expressive voices. However, emotional expressions are naturally multimodal, and it is possible that a certain emotion expressed only in the voice might elicit different behaviors than if it were expressed only in the face, or in a voice + face combination. In fact, previous research suggested that an ’Emotional McGurk Effect’ might be at play (Fagel, 2006; Mower et al., 2009; Pourtois et al., 2005). Thus, our current results can only inform the design of voice-based artificial agents, but should not be extended to the design of embodied agents. The results from questionnaires validate the behavioral measures obtained from the investment game. We found that people consistently gave higher ratings of trustworthiness and liking to the smiling agents, and to the agents that behaved generously in the game. Again, the lack of interactions between smiling and behavior suggests that the smiling voice mitigates negative reactions following an untrustworthy behavior. We also found some evidence that individual differences among participants might play a role in trusting behavior, as shown by the 3-way interaction between behavior, game turn, and gender (Section 4.1). The effect of gender on trusting and trustworthiness has been widely studied using game theoretic paradigms, but so far there has been no definite conclusion on whether women trust more / are more trustworthy than men, or vice versa (e.g. Chaudhuri et al., 2013; Bonein & Serra, 2009; Slonim & Guillen, 2010). Our results support previous findings showing that we tend to trust people of the opposite gender more (Slonim & Guillen, 2010), as men in our experiment invested more money than women to the virtual agents, which had a female voice. They also support findings that men trust more than women in general (Chaudhuri & Gangadharan, 2007). However, these conclusions only hold insofar as the generous behavior condition is concerned, as in the mean condition men actually trusted the virtual agent less than women did. A similar behavior was previously observed in Haselhuhn et al. (2015), who found that men showed less trust following a trust breach on the trustee’s part (Haselhuhn et al., 2015). Also, Torre et al. (2018) showed that people who formed a first impression of trustworthiness of a virtual agent punished it when the agent behaved in an untrustworthy manner, by investing less money than to an agent whose first impression was lower. Thus, a ’congruency effect’ might be at play here: our male participants might have formed a first impression of trustworthiness of the female agents (Slonim & Guillen, 2010); when this first impression was congruent with the observed behavior (in the generous condition), the agent received more monetary investments from the male participants. On the other hand, when the first impression was incongruent with the observed behavior (mean condition), it received less (cf. Torre et al., 2018). Participants’ age did not have an effect on the behavioral results from the investment game, but it did influence participants’ explicit ratings of the artificial agents’ trustworthiness, with older people indicating lower trust. This is consistent with the idea that younger people trust technology more, perhaps due to a higher degree of familiarity (e.g. Scopelliti et al., 2005; Giuliani et al., 2005; Czaja & Sharit, 1998). However, we did not match participants’ age – or gender– systematically, so more research is needed on the role of individual differences on trust towards voice-based artificial agents. Finally, speaker identity was varied randomly rather than wholly systematically in our experimental design, and so we included speaker identity as a random rather than fixed effect in our analyses. It is possible, indeed likely, that participants’ trust attributions were influenced by the virtual agents’ unique vocal profiles as well as their behavior and smiling status. In fact, Fig. 7 shows that people invested more money with speaker B2, followed by speakers R1, R2, and B1 (mean overall investments = £5.46, £4.76, £4.11, £3.56, respectively). This is not unexpected: voices carry a wide variety of information about the speaker, such as gender, accent, age, emotional state, socioeconomic background, etc., and all this information is implicitly used by listeners to
form an initial impression of the speaker; a short exposure to someone’s voice is enough to determine if that someone can be trusted (McAleer et al., 2014). For example, in the free-text comments explaining the liking rating to each voice, one participant remarked that smiling speaker B2 “varied in tone and was much more interesting to listen to” and neutral speaker B2 was “calm and convincing”; on the other hand, smiling speaker R2 was “mellow and monotone"and neutral speaker R2 “sounded bored and insincere”. Smiling speaker B1 was “quite annoying” and the neutral version “didn’t seem trustworthy or reassuring”, “sounded too neutral” and even “too fake”. Thus, when designing a voice for an artificial agent, it is important to also keep in mind what effect its specific vocal imprint will have on the user (see als McGinn & Torre, 2019). Nevertheless, any potential between-speaker differences in the current experiment were nested within the effect of smiling voice, as all speakers were recorded in both smiling and neutral conditions

No comments:

Post a Comment