I know a dog when I see one: dogs (Canis familiaris) recognize dogs from videos. Paolo Mongillo, Carla Eatherington, Miina Lõoke & Lieta Marinelli. Animal Cognition, Mar 19 2020. https://rd.springer.com/article/10.1007/s10071-021-01470-y
Abstract: Several aspects of dogs’ visual and social cognition have been explored using bi-dimensional representations of other dogs. It remains unclear, however, if dogs do recognize as dogs the stimuli depicted in such representations, especially with regard to videos. To test this, 32 pet dogs took part in a cross-modal violation of expectancy experiment, during which dogs were shown videos of either a dog and that of an unfamiliar animal, paired with either the sound of a dog barking or of an unfamiliar vocalization. While stimuli were being presented, dogs paid higher attention to the exit region of the presentation area, when the visual stimulus represented a dog than when it represented an unfamiliar species. After exposure to the stimuli, dogs’ attention to different parts of the presentation area depended on the specific combination of visual and auditory stimuli. Of relevance, dogs paid less attention to the central part of the presentation area and more to the entrance area after being exposed to the barking and dog video pair, than when either was paired with an unfamiliar stimulus. These results indicate dogs were surprised by the latter pairings, not by the former, and were interested in where the barking and dog pair came from, implying recognition of the two stimuli as belonging to a conspecific. The study represents the first demonstration that dogs can recognize other conspecifics in videos.
In this study, we employed a cross-modal, expectancy violation paradigm to assess whether dogs can recognize the species of conspecifics from videos. Dogs were presented with pairs of auditory and visual stimuli, which could be any combination of dog-related on non-dog-related vocalization and video. Dogs’ orientation towards the presentation area, as a function of the presented pair of stimuli, was analysed during two time intervals, in which different mechanisms were most likely at play.
The first interval spanned from the onset of the vocalization to the last frame in which the video of the animal crossing the screen was visible. Dogs’ orientation in this interval therefore reflected a proximate reaction to the presence of the stimuli, rather than an after-effect of the pairing.
Dogs spent almost the entire interval oriented toward the projection area. Moreover, dogs’ attention to specific regions of the projection area roughly followed the stimulus occupation of such regions. This finding is most likely a direct result of the capacity of motion stimuli to elicit orientation responses, an effect that is particularly relevant for stimuli abruptly appearing within the visual field (Hillstrom and Yantis 1994) and for stimuli depicting animate entities (Pratt et al. 2010), two features that characterised the visual stimuli that were presented in this experiment.
A breakdown analysis of dogs’ orientation to the different parts of the projection area revealed that dogs spent longer time looking at the exit area when a dog video was projected than when the unfamiliar species was projected. Therefore, dogs were more likely to visually follow the dogs’ video until it left the presentation area, than the unfamiliar species video. The finding is consistent with the notion that familiarity drives attentional responses for visual stimuli (Christie and Klein 1995). There is some direct evidence that this process also applies to dogs, in particular when presented with representations of dogs’, such as face photographs (Racca et al. 2010) or biological movement (Eatherington et al. 2019). Overall, the findings support the idea that dogs did at least perceive the dog video as a familiar stimulus.
Evidence that dogs did recognise the dog-related stimuli as belonging to a dog, however, comes from the analysis of attention patterns after the stimuli had disappeared. In this time interval, dogs spent less time oriented towards the central part of the presentation area when a bark was followed by the appearance of a dog video, than when any of such two stimuli was paired with an unfamiliar counterpart. In accordance with the violation of expectancy paradigm, longer looking at the main projection area reflected a surprised reaction to the pairing of an unfamiliar-species stimulus with a dog stimulus. Analogous interpretations of longer looking times have been found in studies in dogs (Adachi et al. 2007) and other species including cats (Takagi et al. 2019), horses (Lampe and Andre 2012; Nakamura et al. 2018), crows (Kondo et al. 2012) and lions (Gilfillan et al. 2016). Therefore, this result clearly indicates that dogs perceived the appearance of the dog video as an expected consequence of the barking, implying they had appropriately recognized both stimuli as belonging to a dog. Following presentation of dog stimuli, dogs also spent longer time looking at the entrance region of the presentation area, than when either dog stimulus was paired with an unfamiliar-species stimulus. No such effect was observed for attention to the exit region. Although the reason for this pattern of results is not immediately clear, we believe the result is further indication that dogs retained the pair of dog stimuli as coherently representing a dog; in this sense, dogs may have been interested in where the animal came from, especially since nothing indicated the presence of such animal before its sudden appearance. The lack of differences in attention to the exit region, on the other hand, could reflect a relatively low need to monitor an animal who was moving away from the observer.
When both stimuli belonged to an unfamiliar species, the pattern of dogs’ attention to the presentation area was less clear-cut than those observed when presented with dog stimuli. On the one hand, attention to the central part of the presentation area when non-dog stimuli were paired was not different than that observed when dog stimuli were paired. The similarity in reaction may suggest dogs considered the appearance of the unfamiliar individual as a plausible consequence of the unfamiliar vocalization, much as they considered the appearance of the dog an unsurprising consequence of the bark. Unsurprised reactions to pairs of unfamiliar stimuli in an expectancy violation test have also been reported before (e.g. Adachi et al. 2007). As already discussed for the pair of dog stimuli, the high amount of attention paid to the entrance region could indicate the interest in where an unknown (but plausible) type of animal came from. On the other hand, dogs’ attention to the central part of the presentation area after non-dog stimuli pairs were presented was also not lower than when a dog/non-dog stimuli pair was presented. A possible explanation is that dogs’ attention patterns after being exposed to the two unfamiliar stimuli was driven by the interest in such novel stimuli, rather than by a violated expectation. Indeed, different studies showed neophilic reactions by dogs (e.g. Kaulfuß and Mills 2008; Racca et al. 2010). Of particular relevance, as it deals with visual preference, the study by Racca and collaborators (2010) showed that while dogs pay preferential attention to familiar rather than novel images of dogs, the opposite is true for other classes of stimuli, including images of objects or of human faces. Along this reasoning, hearing a novel auditory stimulus drove attention to the entrance region, and seeing a novel visual stimulus drove attention to both the entrance and central region (the latter being predominantly occupied when the stimulus became fully visible).
One question arising from our results whether dogs showed a different response to the pairing of the bark and dog video merely because they were familiar with both stimuli, without implying classification of the stimuli as belonging to a dog. The literature provides some indications that this may not be the case. For instance, Gergely and collaborators (2019) showed that dogs exposed to a conspecific vocalization pay more attention to pictures of dogs than of humans, a species dogs were highly familiar with. Moreover, a recent functional neuroimaging study revealed greater activation of visual cortical areas in dogs, when exposed to videos of conspecific faces than when exposed to human faces, suggesting the existence of species-specific processing mechanisms (Bunford et al. 2020). Taken together, these findings suggest dogs do possess the ability to visually discriminate dogs from another familiar species. Whether such ability is the result of exposure alone or is aided by a predisposition is impossible to state by the results of the present or of other studies in dogs. Findings in humans indicate that experience builds on top of predispositions in determining one’s ability to identify motion features as belonging to a conspecific (reviewed by Hirai and Senju 2020). A thorough understanding of if and how the same factors impact on dogs’ ability to recognize other animals would require further experiments, which are currently ongoing in our laboratory.
Few other studies have attempted to demonstrate dogs’ ability to recognize the species of other conspecifics in figurative representations, providing suggestive though not conclusive evidence (Autier-Dérian et al. 2013; Gergely et al. 2019). The present findings differ in important ways from all previous attempts. First, in all other studies, the stimuli depicted animal heads, whereas our stimuli represented lateral views of the animal’s whole body. Our findings imply that a detailed frontal view of the head is not a necessary stimulus for dogs to recognize a conspecific, at least if motion information is available. Indeed, a crucial difference between the present and earlier studies was that we presented videos rather than still images, allowing us to incorporate information about movement. Our own laboratory showed dogs are attracted by the motion of a laterally walking dog (Eatherington et al. 2019) and studies in other species highlight how motion cues alone can be used for the recognition of conspecifics (Jitsumori et al. 1999; Nunes et al. 2020). Thus, the presence of motion information in our experiment may have played a role in allowing dogs to appropriately identify the conspecific’s video. The abovementioned studies indicate that morphology, independently from motion, can also be individually sufficient to the aims of recognition (Jitsumori et al. 1999; Nunes et al. 2020). However, these studies only depicted heads, a stimulus that is rich in features useful to the aims of recognition, even to the level of the individual. Our findings indicate that even more limited morphological details provided by a lateral, whole body view, paired with motion information may be sufficient for dogs to recognize a conspecific.
Finally, research on dog visual cognition has used the cross-modal and expectancy violation paradigms; for instance, similar paradigms have been successfully used to demonstrate dogs’ recognition of humans’ identity or sex (Adachi et al. 2007; Ratcliffe et al. 2014), or expectations about conspecifics’ body size (Taylor et al. 2011). However, to the best of our knowledge, this method had never been used in dogs with videos and some methodological considerations seem useful at this stage. First, while videos were projected, dogs spent most of their time oriented towards the presentation area, indicating the stimuli were able to attract the dogs’ attention (at least from a behavioural standpoint), a crucial and often problematic aspect of research on visual cognition. Second, even after the stimulus disappeared, dogs remained oriented towards the presentation area for a significant portion of the allowed 30 s—suggesting maintenance of interest in what had been projected. Third, the analysis of dogs’ orientation across subsequent presentations suggests limited habituation through the first two trials, but a significant decrement starting from the third trial. Overall, these results indicate the method is suitable to study dogs’ spontaneous cross-modal processing of auditory and animated visual stimuli, and that dogs can be presented with up to two presentations before their attention starts to decline.