Thursday, January 3, 2019

High mutual cooperation rates in rats learning reciprocal altruism: This finding allows to infer that the learning of reciprocal altruism has early appeared in evolution

High mutual cooperation rates in rats learning reciprocal altruism: The role of payoff matrix. Guillermo E. Delmas, Sergio E. Lew, B. Silvano Zanutto. PLOS One, Jan 2 2019. https://doi.org/10.1371/journal.pone.0204837

Abstract: Cooperation is one of the most studied paradigms for the understanding of social interactions. Reciprocal altruism -a special type of cooperation that is taught by means of the iterated prisoner dilemma game (iPD)- has been shown to emerge in different species with different success rates. When playing iPD against a reciprocal opponent, the larger theoretical long-term reward is delivered when both players cooperate mutually. In this work, we trained rats in iPD against an opponent playing a Tit for Tat strategy, using a payoff matrix with positive and negative reinforcements, that is food and timeout respectively. We showed for the first time, that experimental rats were able to learn reciprocal altruism with a high average cooperation rate, where the most probable state was mutual cooperation (85%). Although when subjects defected, the most probable behavior was to go back to mutual cooperation. When we modified the matrix by increasing temptation rewards (T) or by increasing cooperation rewards (R), the cooperation rate decreased. In conclusion, we observe that an iPD matrix with large positive reward improves less cooperation than one with small rewards, shown that satisfying the relationship among iPD reinforcement was not enough to achieve high mutual cooperation behavior. Therefore, using positive and negative reinforcements and an appropriate contrast between rewards, rats have cognitive capacity to learn reciprocal altruism. This finding allows to infer that the learning of reciprocal altruism has early appeared in evolution.


---

High mutual cooperation rates in rats learning reciprocal altruism: The role of payoff matrix

High mutual cooperation rates in rats learning reciprocal altruism: The role of payoff matrix

  • Guillermo E. Delmas, 
  • Sergio E. Lew, 
  • B. Silvano Zanutto
PLOS
x



Introduction


Altruism is a behavior by an individual that may be to his disadvantage but benefits others individuals. At first sight, Darwin’s natural selection theory does not explain altruistic behavior. Theories have been proposed to account altruist behavior: kin selection [1], group selection and reciprocal altruism [2] among others. In the reciprocal altruism theory, the loss experienced by an individual for being altruist returns later on behalf of the reciprocal partner. Thus, in the long term, being altruist becomes the most useful strategy. In this regard, Triver’s theory of reciprocal altruism explains how natural selection favors reciprocal altruism between non-related individuals. Perhaps the most insightful example of such behavior is the one observed among vampire bats, where individuals share blood with others who have previously shared their food [3].

Since 1971, Iterated Prisoner’s Dilemma (iPD) has been a useful tool to study reciprocal altruism [4]. In the iPD, two players must choose between two possible behaviors: to cooperate or to defect. Rewards and punishments are defined in a 2x2 payoff matrix. When the game is played indefinitely, which is its iterated version, mutual cooperative behavior is favored. When played once, to defect is the best strategy [5]. However, when the game runs indefinitely, evolutionary stable strategies (ESS) emerge [67] and, under certain constraints imposed to the payoff matrix, mutual cooperation appears as the best strategy whenever reciprocity is maintained (Pareto Optimum). Among a huge number of reciprocal strategies, tit for tat is one of the most simple ones [8]. It is based on two simple rules: to cooperate in the first trial and, in the following, to do what the other player (opponent) did in the last trial.

Among many reciprocal behaviors, reciprocity and reciprocal altruism were well documented in several species. Although cooperation is needed to succeed in both reciprocity and reciprocal altruism, the latter adds the possibility of obtaining reward by defecting an opponent. Some experiments show reciprocal altruism behavior by means of iPD paradigm in different ways, but the results were either low levels of cooperation [9] or depended on a treatment that enhanced cooperation preference (mutualism matrix) [1012]. Direct reciprocity, which is established between two individuals, has been observed in monkeys [1315] and in rats [1619]. While food quality seemed to impact on cooperative behavior, a key factor to obtain reliable cooperation levels was the opponent’s behavior. In this sense, individuals tended to be more cooperative with opponents that had cooperated in the past. However, when reciprocal altruism is studied, differences between species come to light. Thus, while reciprocal altruism has been proven in monkeys, birds and rats failed to reach high levels of cooperation, even for complex combinations of rewards and punishments in the payoff matrix and treatments to induce preference [910122023]. The reasons why some species do not learn reciprocal altruism remain obscure. A possible explanation is that animals are not able to discriminate low contrast reward contingencies. Indeed, it has been shown that rats fail to discriminate the amount of reward when the number of reward units is larger than three [2426]. In this work, we designed an iPD setup to maximize the contrast among reinforcers. The amounts of pellets were chosen in order to minimize positive reinforcement earned in each trial and to keep rats motivated (hungry), [27]. In order to evaluate if animals developed ALLC strategy by place preference (after animals learned iPD) they were trained on reversal. We also evaluated reward maximization studying how the payoff matrix components promote or disrupt altruistic behavior.

Materials and methods


Subject


We used thirty male Long-Evans rats (weight 300-330g and two months old) provided by the IBYME-CONICET, divided in two experiments. In the first one, eighteen rats (twelve experimental and six opponent), and in the second, twelve rats (six experimental and six opponent). Experimental subjects were housed in pairs (to allow social interaction), and opponent rats were housed individually. All rats were food restricted and maintained at 90-95% for experimental subjects, and 80-85% for opponents of free feeding body weight, all with tap water available ad libitum. The housing room was at 22°C ± 2°C and 12/12 h light/dark cycle (with lights on at 9 am). Pre-training was performed on a single standard operant chamber (MED associates Inc., USA) equipped with two stimulus light and retractable levers below the light and feeders. Also the chambers were inside an anechoic chamber with white noise (with a flat power spectral density). The iPD experiments were performed in ad hoc dual chamber equipped with levers, lights and feeders (Fig 1A). The chambers were connected by windows allowing rats to make olfactory and eye contact. The lever’s height was 80% of maximum height of the forepaws while rearing [27]. The dual chamber is shown in supplementary material (see S2 Fig). At the end of daylight, supplementary food was provided to allow rats to maintain body weight.

thumbnail
Fig 1. High level of cooperation in iPD.
(A) Dual operant box diagram and the matrix with positive(blue) and negative(red) reinforcement is shown. The iPD game had four possible states: R(reward) mutual cooperation, P(punishment) mutual defection, T(temptation) in which subject defected and opponent cooperated and S(sucker) subject cooperated and opponent defected. The opponent´s light was driven in order to perform a Tit for tat strategy. (B,C) Time-course of cooperation and timeout rate along the last 23 games sessions. In the last 5 sessions, the mean ± sem of cooperation was 0.86 ± 0.05 and timeout was 0.23 ± 0.08. (D) Total reward versus timeout for all animals (color bar means cooperation mean). Each animal was compared with the regression line fit to a population with cooperation level set to 60% (black continuous line). The higher the cooperation levels, the larger the total reward and the lower the total timeout. (E) Markov Chain diagram shows the probabilities of transition between states (p(c|T−1) = 0.76, p(c|R−1) = 0.85, p(c|S−1) = 0.93, p(c|P−1) = 0.87). The arrow represents transitions: driven by cooperation in blue, and driven by defection in red (the arrow thickness is proportional to transition probability). Below, bars show occupancy ratio when cooperation reaches stability. Probabilities were: p(R) = 0.76, p(T) = 0.1, p(P) = 0.04, p(S) = 0.1. Asterisks denote significant differences from multiple comparisons using one-way ANOVA and Bonferroni correction. (F) Evolution cooperation rate before and after reversal. Graphs show a moving average with samples of 3 sessions (the mean and sem from reversal on the last five sessions was 0.87 ± 0.04).

Pre-experimental training


All rats had a shaping procedure to learn the response (press a lever) to get a reinforcement (pellets). To prevent animals from choosing a lever place over the other, they learned to get reward from both sides by changing the side of conditioned stimulus. The side was changed after eight trials. All rats learned to press the correct lighting lever after four sessions. Each rat was trained in 2 sessions per day, each trial began with the inter-trial interval (ITI) during 5 seconds, it was followed by the conditioning stimulus (light) for either 45 seconds or until a lever was pressed. One second before food is delivered, the feeder was lighted. In the opponent’s training, they learned to press the lever when the light was on. In the task, the side of the active lever was chosen pseudo-randomly (allowing the same side no more than four times). The opponent subject had to perform a fix ratio treatment up to FR = 5 to get rewards.

Experiment


To study the reciprocal altruism in an iterated Prisoner’s Dilemma game (iPD), we used a payoff matrix with positive and negative reinforcements. Positive reinforcements were pellets (Bio-Serv 45 mg Dustless Precision Pellets) and negative reinforcement was timeout (a fix delay in starting a new trial). The payoff of the experimental subject was according to the matrix, and the opponent’s payoff was 1 pellet when the correct lighted lever was pressed. For the opponent, when the incorrect lever was pressed, there was no contingency and no pellet was delivered. The trial finishes after 45 seconds elapsed, or when the correct lever is pressed. The iPD game has four possible occupancy states where experimental and opponent individual behaviors can be as follows: both cooperate (mutual cooperation, R), both do not cooperate (mutual defection, P), experimental subject does not cooperate when the opponent cooperates (T), and experimental cooperates when the opponent does not cooperate (S). The amount of pellets preference was previously tested on a discrimination test, showing that rats prefer 2 pellets rather than 1 pellet (data not showed). We performed two sessions per day and each session had 30 trials. Each experimental subject was trained with the same opponent. The training was finished after five consecutive sessions with no changes in the cooperation rate. We defined cooperation (C) and defection (D) lever in the iPD box. The single iPD trial procedure was as follows: (1) ITI time, (2) then, the light (CS) was turned on, (3) after this, both rats made their responses, the light was turned off and the reinforcement was delivered according to a payoff matrix, (4) if positive reinforcement was assigned, the feeder’s light was turned on, and a second later a reward was delivered. The opponent’s Conditioned Stimulus (light) was controlled following a Tit for tat strategy. The opponent received a pellet after pressing three times the lever (FR = 3, so as to be enough time in front of the window until the experimental subject choose a lever). If negative reinforcement (timeout) was assigned, delay time started, and the opponent subject got a pellet reward. (5) After either five seconds eating time expired or timeout was completed, a new trial started. In the first experiment the payoff matrix was: 1 pellet for mutual cooperation (PR = 1), 2 pellets when the experimental subject defected and the opponent cooperated (PT = 2), 4 seconds of timeout for mutual defection (PP = 4seconds), and 8 seconds of timeout when the experimental subject cooperated and the opponent defected (PS = 8). At the end of these experiments, the four rats with the best performance in cooperation were trained in a reversion treatment (see Fig 1F). When rats were trained on reversal, the sides of C and D lever were interchanged in subject and opponent chambers. In that sense, if animals developed a place-preference behavior, they will not learn the new side in order to maximize reward. In the second experiment we used six naive experimental rats on a different payoff matrix with greater temptation (PR = 1, PT = 3, PP = 4, PS = 8). After training, we divided rats in two groups, depending on cooperation levels. The first group (Treat 2A) with high cooperation rate was trained with the payoff matrix (PR = 1, PT = 5, PP = 4, PS = 8) with greater temptation for T state (Treat 3A). The other group (with low cooperation rate, Treat 2B) was trained with the matrix (PR = 2, PT = 3, PP = 4, PS = 8, Treat 3B) that enhances cooperative behavior (in comparison with (PR = 1, PT = 3, PP = 4, PS = 8), but with low contrast between positive rewards (see Table 1). All experimental procedures were approved by the ethics committee of the IByME-CONICET and were conducted according to the NIH Guide for Care and Use of Laboratory Animals.2.1 Subjects and Housing.

thumbnail
Table 1. Data summary.
Treatment 1: testing of high cooperation and reversion. Treatment 2 and 3: effect in cooperation by change of pay-off matrix. The matrix changed over the group with same word (A or B).

Statistic.


All statistical analyses were performed using statistics library from open source software Octave and MATLAB. We pooled the data from the last five sessions where cooperation rate was stable (to calculate cooperation rate we counted the number of times a rat chose the cooperation lever per session). We compared individual’s means of cooperation along treatment using a two-sided Wilcoxon rank sum test. To test whether the probability of cooperation after each outcome (T, R, P or S) was different from chance (0.5), we performed a Chi-square goodness of fit test with Bonferroni corrected value of 0.05/n. To compare mean rate of the different outcomes for each game, we performed an ANOVA two tails test. When significant α = 0.05, multiple post-hoc pairwise comparative tests were performed with Bonferroni corrected value of α = 0.0125. The individual’s decision rules can be described by the components of transition vectors and Markov Chain diagram. The transition vector was made up of probabilities of cooperation when the previous trials resulted in state p(c|R−1), T(temptation) p(c|T−1), S(sucker) p(c|S−1) or P(punishment), p(c|P−1) respectively. If every component of this vector is 0.5, the agent’s decision rule is random mode. Markov Chain diagram show the graphic representation of the complete decision making rule for each rat.

Results


We trained twelve rats in iPD against an opponent that plays Tit for tat strategy. Tit for tat is based on two simple rules: to cooperate in the first trial and, in the following, to do what the other player (opponent) did in the last trial. Fig 1A shows a schema of the different choices a subject can do in each trial. Thus, when the subject cooperates, it receives one pellet (PR) or eight seconds timeout (PS) depending on whether the opponent choice was to cooperate or to defect. On the other hand, when the subject defects, it receives 2 pellets (PT) or four seconds timeout (PP), according to whether the opponent choice was to cooperate or to defect respectively. The criteria for cooperation was an established preference for pressing C lever (cooperation) over D lever (defection) in more than 60% of the trials for five or more consecutive sessions. Eight out of twelve animals learned to cooperate (cooperation rate 0.86 ± 0.05, mean ± s.e.m), reaching criteria in 30 ± 4 sessions (mean ± s.e.m). In Fig 1B, we show the mean cooperation levels for those animals during the last twenty three sessions before reaching criteria. The inset in Fig 1B shows the mean cooperation level for each animal during the last five training sessions. As a consequence of the increase in cooperation levels, the average total timeout per session decreased as training progressed (0.23 ± 0.08, mean ± sem, see Fig 1C).

Due to the fact that different sequences of lever pressing can give the same amount of reward and/or timeout independently of the cooperation level, we analyzed the relationship between total reward and timeout for each animal in comparison to a simulated population. A regression line was fit to a population of 100,000 simulated individuals with cooperation level set to 60%, (see Fig 1D). Each simulated individual had one different strategy and each one was a combination of thirty C and D choices (session length). An individual that plays an iPD game with 60% of its choices in C will be near to the line, regardless of its strategies. As it can be seen in the figure, for the cooperator group when the cooperation level increases, the larger are the total reward, and the lower the total timeout. For the non cooperator group placed in the opposite side of the figure, it can be seen that both cooperation and reward were low and timeout was high. The regression line at 60% of cooperation separates both groups (marked with a red circle in the Fig 1D). This shows that no behavior with low level of cooperation (subgroup in blue range) can obtain both high level of reward and small amount of timeout as in the cooperative group. The average strategies of both group can be represented by Markov model diagram. We built one Markov model for the group of cooperative animals (see Fig 1E) averaging occupancy state rate and transition probabilities in the group. In the iPD there are four possible occupancy states where experimental and opponent individual behaviors can be as follows: R (both cooperate or mutual cooperation), P (both do not cooperate or mutual defection), T (experimental subject does not cooperate when the opponent cooperates), and S (experimental cooperates when the opponent does not cooperate). The cooperative group showed that the permanency in R state was high and, whenever the animal defects (states T and P), it returns to cooperate immediately. Indeed all conditional probabilities to cooperate given a previous outcome were near 1. Besides, the rate of R state was the highest and other states near zero. The probability of R state was significantly different to other states (p = < 1e−8, ANOVA two-way test, n = 8). On the contrary, in the group of non-cooperative animals, any states were significantly different to the other p > 0.05, F = 0.353, ANOVA two-way test, n = 4) and the probability to cooperate given a previous states did not evidence preference for any defined strategy (see Table 1 conditional probability to cooperate). For the group of non-cooperative animals Markov model (see S1 Fig, supplementary materials).

To discard the fact that animals had a preference for one of the levers and, in consequence, their behavior biased independently of the training paradigm, we selected the best four cooperators and applied a reversal procedure immediately after cooperation was reached. All animals learned to cooperate after reversal (cooperation rate, 0.87 ± 0.04, mean ± sem), (see Fig 1F).

We then asked how the ratio in the amount of positive reinforcement of R and T states affects cooperation learning and maintenance. We defined a contrast index CI that measures the relationship between the amount of reward in R and T as follows:Thus, in the experiment shown in Fig 1, the CI was  which is the maximum contrast level constrained to a payoff matrix that favors cooperation, that is, 2PR > PT + PS, assuming that S becomes a negative stimulus induced by timeout. We trained six animals with a payoff matrix (PR = 1, PT = 3, PP = 4, PS = 8) and found that three animals learned to cooperate (0.88 ± 0.01, mean ± sem, see Fig 2A), while others did not (0.64 ± 0.13, mean ± sem, see Fig 2B. The last group was non cooperator, since both their conditional probabilities to cooperate and occupancy R state ratios were near chance. For details see Table 1. Then we changed the amount of reward in order to increase/decrease CI in the cooperative/non-cooperative groups. As it can be seen, a high value of , related to a pay-off matrix (PR = 1, PT = 5, PP = 4, PS = 8), disrupts cooperation in cooperative group, Fig 2A. The cooperation was 0.604 ± 0.102, mean ± sem whereas before 0.88 ± 0.01). When a lower value of  was applied for non cooperator group and the matrix (PR = 2, PT = 3, PP = 4, PS = 8) empowers the cooperation in two out of three animals, cooperation rate 0.711 ± 0.04, mean ± sem, whereas before 0.64 ± 0.13 (see Table 1).

thumbnail
Fig 2. Effect of changes in the amount of positive reinforcement of R and T.
(A) The rats were pre-trained by pay-off matrix [PR = 1, PT = 3, PP = 4, PS = 8 and contrast ] (filled dots) and the cooperation was strongly affected by change of temptation payoff, decreasing when T payoff increased and matrix with changed to [R = 1, T = 5, P = 4, S = 8 and contrast ] (open circles). There was a significant difference (red circle) in two animals with p < 9.8e−06 (wilcoxon rank-sum test) and the other did not modify her behavior in spite of matrix change. (B) The cooperation enhanced when the matrix changed to [R = 2, T = 3, P = 4, S = 8 and ] (open circles) and the difference was statistically different (p < 0.0062) in two of three subjects, because one had no significant difference after matrix change, p > 0.05(cooperation: 0.7063). (C) The 3D plots related cooperation, reward and timeout. In the group of cooperative animals (filled dots), the change in T (3 pellets to 5 pellets) increased both timeout and reward in order to decrease cooperation (open circles). The comparison between cooperation mean of both groups was significantly different, p < 0.05. (D) In the group of non-cooperative animals (filled dots), they learned to cooperate (open circles) by receiving more reward without significant changes in total timeout. The cooperation was significantly different, p > 0.05. (E,F) The mean of occupancy state rate graph (last five sessions) from cooperative (left) and non-cooperative (right) groups (Mean ± sem). Asterisks denote significant difference, after matrix changed, among T, R, P or S state occupancy and dash line indicates the level of equal rate in each state (that corresponds to a strategy with strongly random component). Before changes (filled dots) and after changes (open circles).

We analyzed how these changes in strategies impact on the amount of received reward and timeout penalties. In the group of cooperative animals, the change in T (3 pellets to 5 pellets) increased both timeout and only a bit reward, as expected when states T, P and S become more probable. The occupancy states ratio before and after matrix change had significant differences among all states, p < 0.05 (wilcoxon ranksum test), (see Fig 2C and 2E). It is worth noting however that the amount of received reward is not the maximum allowed, which would be delivered in the case of an animal that alternates from state T to S indefinitely. On the other hand, when we applied a matrix with a lower contrast  to the group of non-cooperative animals, they enhance significantly their cooperation level, receiving more reward without significant changes in total timeout, (see Fig 2D). In Fig 2F, we show the state occupancy probabilities for this group before and after the change in the payoff matrix. It can be seen that the occupancy state ratio of R had significantly increased after the change in the payoff matrix. It can be observed a significant difference in R and P states, (pR < 0.008 and pP < 0.048, wilcoxon rank-sum test). We showed that when the contrast index increased using a matrix to favor cooperation the animals learned to cooperate, but when the index increased and the matrix favor defection the animals stopped cooperating.

From the results shown in Figs 1 and 2, it is reasonable to ask whether a fine tuning in contrasted reward encourages cooperative behavior. We have shown that eight out of twelve animalas (66%) acquired a cooperative behavior when CI was , while three out of six (50%) succeeded when CI was , as expected when temptation payoff increases. In the same line of reasoning, animals that learned cooperation under  disrupted their cooperative behavior when CI was increased to , while those that had not learned acquired a cooperative behavior when CI was decreased to Fig 3A exemplifies the occupancy and transition probabilities for an animal that disrupted its cooperative behavior when  was changed to . The opposite can be seen in the example of Fig 3B. A non-cooperative animal under a became cooperative when CI was decreased to Fig 3C and 3D show cooperation levels and normalized rewards. A normalized reward was calculated as quotient between the total reward obtained in a session, and the maximum reward achieved using the best strategy. If the opponent subject plays a Tit for tat strategy, the best strategy will depend on the pay-off matrix values. In this way, if the matrix favors cooperation, ALLC will be the best one. In contrast, when the payoff matrix favors no cooperation, alternate between C and D will be the best strategy. It can be seen that both variables follow an inverted U profile as a function of contrast index CI, as expected when a delicate balance between rewards at R and T is mandatory.

thumbnail
Fig 3. Markov chain diagrams and contrast index.
Markov chain diagrams are shown (the size of circle means of occupancy state rate and the arrow’s width are proportional to the probability of cooperate given (A) occupancy state and transition probabilities for an animal that disrupted its cooperative behavior when contrast index  was changed to  and pay-off matrix was changed [PTPRPPPS] = [3p, 1p, 4s, 8s] to [5p, 1p, 4s, 8s] (p = pellet and s = seconds). The thickness of blue arrows (conditional probabilities of cooperation) become thinner after change (for values see Table 1). (B) The opposite situation can be seen, non-cooperative animal becomes more cooperative when  was decreased to in a matrix that favors cooperation. The blue arrows become thicker after change (for values see Table 1). (CD) shows cooperation and timeout levels as a function of CI. Here, it can be seen that both variables follow an inverted U profile in correlation with the contrast index increase and if the payoff matrix favors or not the cooperation behavior.

Discussion and conclusion


In this work, we study the contrasted role between reinforcements in the learning of reciprocal altruism learning in rats. Traditionally, reciprocal altruism is achieved by playing the iterated prisoner’s dilemma game (iPD) when an experimental subject is confronted to a reciprocal opponent. The payoff matrix used has positive and negative reinforcements with high contrasted between positive and negative pairs and also uses discriminating amount of reinforcements [2526]. In our experiment, pellets were used as positive reinforcements, and timeout as negative reinforcement. In this way, the positive and negative reinforcements acted as strengtheners of mutual cooperation behavior likelihood [28]. Our results show for the first time high levels of cooperation (86,11%) and mutual cooperation (76,32%) in iPD, (see Fig 1B). Previous published works have taught reciprocity using iPD game, showing that animals prefer short-term benefits or only improve a poor level of cooperation [49202930]. In other works, authors employed a special treatment to enhance cooperation preference [10233132]. A possible explanation is that using standard matrices (for example: PT = 6, PR = 4, PP = 1, PS = 0), animals were not able to discriminate between the amount of reinforcement obtained in the long-term in comparison to short-term [24]. For example, if a rat played four sessions [C C C C] he would get 16 pellets, and if played [C D D D] he would get 12 pellets. In our experiment, rats using the same choices earn 4 pellets and no timeout in the first case, and 3 pellets plus a 16 seconds timeout in the second case.

A dynamic system can be represented with Markov diagrams and its associated state transition vector. In this case, each state (T, R, P, S, see Results section) will have two associated conditional probabilities: to cooperate or not to cooperate given state. In an IPD game with an opponent using a Tit for tat strategy, a rational player should maximize the positive reinforcement and cancel the negative reinforcement. In this way, while the opponent performed a reciprocal behavior, the player follows an ALLC strategy with conditional cooperation probability near 1, independent of previous states (T, R, P o S). In a pay-off matrix with addable value (as for an example (PT = 6, PR = 4, PP = 1, PS = 0), it is possible to calculate the cooperative strategy through mathematical analysis [3334], but in our experiment positive and negative reinforcers have different units (pellets and time respectively). Due to this reason, we did a single analysis using the Markov chain diagram. In the first experiment, we found that animals adopted two well defined strategies. On one hand, a group of 8 animals proved to have learned a cooperative strategy while other 4 animals responded at random (see S1B Fig, Supporting information). The strategy of the first group, (see Fig 1E), show that conditional probabilities to cooperation given previous state T, R, P or S were near 1 (0.760, 0.845, 0.929 and 0.870, respectively) and in this fashion after defected they immediately return to the mutual cooperation state, R. In various works, results were presented with Markov diagrams and its associated transition vector [10112332] and showed that conditional probabilities of cooperation were not high when facing a reciprocal opponent. In this protocol, with the matrix (PT = 2, PR = 1, PP = 4sPS = 8s), there are two theoretical strategies that maximize appetite reinforcement: one is ALLC strategy and the other an alternating between cooperation (C) and defection (D) strategy. The latter, also maximizes positive reinforcement when alternating between cooperation and defection options, but it also increases negative reinforcement (timeout). In this case, ALLC strategy is the only one that maximizes positive reinforcement and minimizes the negative one (Pareto Optimum). Since negative reinforcement is timeout, ALLC strategy gives more food per unit of time. In this case, the role of the negative reinforcement appears.

In order to evaluate if animals developed ALLC strategy by place preference (after animals learned iPD) or by reward maximization, they were trained on reversal, (see Fig 1F), and we observed that animals relearn reciprocal altruism when they are exposed to a new lever’s contingency.

Finally, after animals adopted a strategy, we evaluated if a change in the payoff matrix could modify their behavior. Therefore, we studied the effect of modifying positive reinforcements (see Fig 2A and 2B). Animals were pre-trained with a payoff matrix where alternating between C and D strategy gives more positive reinforcements than with an ALLC strategy, keeping the same negative reinforcement as in the first experiment. We observed that only half of the animals learned to cooperate although all of them obtained the same mean amount reward (pellet) (see Fig 2C and 2D). The cooperative group was trained with a matrix where the pay-off T was increased (Fig 2A), then we observed that cooperative behavior decreased. Animals reduced frequency of R state and increased frequency of P state, proving that they preferred a small-immediate option instead of a large-delayed option. This behavior is similar to the one observed in birds ([30]). In the second group, we applied a matrix that keeps the proportions of reinforcements in T and R similar to the most common matrix (PT = 3pPR = 2p equal proportion to PT = 6pPR = 4). It was observed that animals modified their behavior and became more cooperative (Fig 2B). These results show that rats that learned to cooperate with an appropriate matrix stop cooperating when a temptation payoff (T) is sufficiently increased (matrix with high contrast index). However, if non-cooperative animals are trained with a matrix that favors cooperation (matrix with low contrast index), they become cooperators. In the latter case, the achieved cooperation level was comparable to results shared in diverse bibliography. We observe that if an iPD matrix uses large positive reward, it improves less cooperation than one with small rewards, shown that satisfying the relationship among iPD reinforcement was not enough to achieve high mutual cooperation behavior. The reciprocal altruist behavior in humans, monkeys and elephants has been studied in laboratories showing high levels of cooperation [13153537], however in rats and birds those levels of cooperation were much lower. Our results show that by using positive and negative reinforcements and an appropriate contrast between rewards, rats have cognitive capacity to learn reciprocal altruism. This finding allows to deduce learning of reciprocal altruism appeared early in evolution.

Supporting information in the original article





Acknowledgments


Supported by PICT 2012-1519 and PICT-2016-2145. We wish to thank Lic. Melanie Marino.

References in the original article


Wednesday, January 2, 2019

There is a mostly negative correlation between patient income & medical spending within almost all all countries; medical spending in all countries is concentrated in a small share of the population

Medical Spending around the Developed World. Eric French, Elaine Kelly. Fiscal Studies, vol. 37, no. 3–4, pp. 327–344 (2016) 0143-5671, https://www.ifs.org.uk/publications/8751

Abstract: We bring together estimates of patterns of medical spending in all nine countries considered in this issue – Canada, Denmark, England, France, Germany, Japan, the Netherlands, Taiwan and the United States. Comparing estimates across countries reveals three principal findings. First, medical spending in the calendar year of death accounts for 5–10 per cent of aggregate medical spending for the whole population and 9–20 per cent for those aged 65 and over. Spending in Taiwan is a little higher, at 16 per cent for the whole population and 29 per cent for the over-65s. Second, there is a mostly negative correlation between patient income and medical spending within all countries, except Japan and Taiwan for the over-65s and Taiwan and the US for the under-25s. Third, medical spending in all countries is concentrated in a small share of the population and is persistent over time, although the degree of concentration and persistence varies across countries.

Learning and memory are thought to be supported by experience-dependent neuronal plasticity; found mechanism of postsynaptic localization of AMPA-type glutamate receptors & their regulation

Mechanisms of postsynaptic localization of AMPA-type glutamate receptors and their regulation during long-term potentiation. Olivia R. Buonarati et al. Sci. Signal.  Jan 01 2019:Vol. 12, Issue 562, eaar6889. http://stke.sciencemag.org/content/12/562/eaar6889

Gloss: Learning and memory are thought to be supported by experience-dependent neuronal plasticity, which on a cellular level is expressed as long-term changes (such as potentiation or depression) of synaptic responses. Glutamate-gated ion channels known as AMPA receptors mediate basal neurotransmission. Their postsynaptic functional availability can be selectively modulated in correlation with a given stimulus. This review discusses the molecular basis of AMPA receptor trafficking to and anchoring at excitatory postsynaptic sites and their regulation by protein kinases.

Abstract: l-Glutamate is the main excitatory neurotransmitter in the brain, with postsynaptic responses to its release predominantly mediated by AMPA-type glutamate receptors (AMPARs). A critical component of synaptic plasticity involves changes in the number of responding postsynaptic receptors, which are dynamically recruited to and anchored at postsynaptic sites. Emerging findings continue to shed new light on molecular mechanisms that mediate AMPAR postsynaptic trafficking and localization. Accordingly, unconventional secretory trafficking of AMPARs occurs in dendrites, from the endoplasmic reticulum (ER) through the ER-Golgi intermediary compartment directly to recycling endosomes, independent of the Golgi apparatus. Upon exocytosis, AMPARs diffuse in the plasma membrane to reach the postsynaptic site, where they are trapped to contribute to transmission. This trapping occurs through a combination of both intracellular interactions, such as TARP (transmembrane AMPAR regulatory protein) binding to α-actinin–stabilized PSD-95, and extracellular interactions through the receptor amino-terminal domain. These anchoring mechanisms may facilitate precise receptor positioning with respect to glutamate release sites to enable efficient synaptic transmission.

Spread of Deposit Insurance Since the 1970s : Greater deposit insurance generosity produces greater lending & a greater proportion of mortgage loans, which are not offset by declines in banking system leverage

Calomiris, Charles W. and Chen, Sophia, The Spread of Deposit Insurance and the Global Rise in Bank Asset Risk Since the 1970s (December 6, 2018). https://ssrn.com/abstract=3297294

Abstract: We construct a new measure of deposit insurance generosity for many countries, empirically model the exogenous international influences on the adoption and generosity of deposit insurance and show the causal chain from the expansion of deposit insurance generosity to increased overall lending and mortgage loans, and more severe and frequent banking crises. Greater deposit insurance generosity produces greater lending and a greater proportion of mortgage loans, which are not offset by declines in banking system leverage. Increased overall lending and mortgage loans also produce a positive association between deposit insurance and the likelihood and severity of banking crises.

Keywords: deposit insurance, mortgage lending, banking crises, moral hazard
JEL Classification: G01, G18, G21, G28, F55, F65, E32

---
Same authors' summary in Cato Institute: https://www.cato.org/publications/research-briefs-economic-policy/spread-deposit-insurance-global-rise-bank-asset-risk

For the past three decades, a vast amount of literature has developed on the adoption and expansion of deposit insurance and its role in increasing the systemic insolvency risk of banking systems. This literature has shown that the installation of deposit insurance or an expansion of its generosity tends to be associated with higher asset risk, higher leverage, and a greater probability of a banking crisis, suggesting that the rise of deposit insurance may be one of the contributors to the pandemic of unprecedentedly frequent and severe banking crises around the world.

Lifestyle and neurocognition in older adults with cognitive impairments: Aerobic exercise promotes improved executive functioning in adults at risk for cognitive decline

Lifestyle and neurocognition in older adults with cognitive impairments: A randomized trial. James A. Blumenthal, Patrick J. Smith, Stephanie Mabe, Alan Hinderliter, Pao-Hwa Lin, Lawrence Liao, Kathleen A. Welsh-Bohmer, Jeffrey N. Browndyke, William E. Kraus, P. Murali Doraiswamy, James R. Burke, Andrew Sherwood. Neurology, https://doi.org/10.1212/WNL.0000000000006784

Abstract
Objective To determine the independent and additive effects of aerobic exercise (AE) and the Dietary Approaches to Stop Hypertension (DASH) diet on executive functioning in adults with cognitive impairments with no dementia (CIND) and risk factors for cardiovascular disease (CVD).

Methods A 2-by-2 factorial (exercise/no exercise and DASH diet/no DASH diet) randomized clinical trial was conducted in 160 sedentary men and women (age >55 years) with CIND and CVD risk factors. Participants were randomly assigned to 6 months of AE, DASH diet nutritional counseling, a combination of both AE and DASH, or health education (HE). The primary endpoint was a prespecified composite measure of executive function; secondary outcomes included measures of language/verbal fluency, memory, and ratings on the modified Clinical Dementia Rating Scale.

Results Participants who engaged in AE (d = 0.32, p = 0.046) but not those who consumed the DASH diet (d = 0.30, p = 0.059) demonstrated significant improvements in the executive function domain. The largest improvements were observed for participants randomized to the combined AE and DASH diet group (d = 0.40, p = 0.012) compared to those receiving HE. Greater aerobic fitness (b = 2.3, p = 0.049), reduced CVD risk (b = 2.6, p = 0.042), and reduced sodium intake (b = 0.18, p = 0.024) were associated with improvements in executive function. There were no significant improvements in the memory or language/verbal fluency domains.

Conclusions These preliminary findings show that AE promotes improved executive functioning in adults at risk for cognitive decline.

Dishonest behavior depends on both situational factors, such as reward magnitude and externalities, and personal factors, such as the participant’s gender and age

Gerlach, P., Teodorescu, K., & Hertwig, R. (2019). The truth about lies: A meta-analysis on dishonest behavior. Psychological Bulletin, 145(1), 1-44. http://dx.doi.org/10.1037/bul0000174

Abstract: Over the past decade, a large and growing body of experimental research has analyzed dishonest behavior. Yet the findings as to when people engage in (dis)honest behavior are to some extent unclear and even contradictory. A systematic analysis of the factors associated with dishonest behavior thus seems desirable. This meta-analysis reviews four of the most widely used experimental paradigms: sender–receiver games, die-roll tasks, coin-flip tasks, and matrix tasks. We integrate data from 565 experiments (totaling N = 44,050 choices) to address many of the ongoing debates on who behaves dishonestly and under what circumstances. Our findings show that dishonest behavior depends on both situational factors, such as reward magnitude and externalities, and personal factors, such as the participant’s gender and age. Further, laboratory studies are associated with more dishonesty than field studies, and the use of deception in experiments is associated with less dishonesty. To some extent, the different experimental paradigms come to different conclusions. For example, a comparable percentage of people lie in die-roll and matrix tasks, but in die-roll tasks liars lie to a considerably greater degree. We also find substantial evidence for publication bias in almost all measures of dishonest behavior. Future research on dishonesty would benefit from more representative participant pools and from clarifying why the different experimental paradigms yield different conclusions.

Barplots with a restricted y-axis led to a gross underestimation of similarities (i.e., a gross overestimation of the differences); the presentation of similarities achieves more balanced scientific communication

Hanel, P. H. P., Maio, G. R., & Manstead, A. S. R. (2018). A new way to look at the data: Similarities between groups of people are large and important. Journal of Personality and Social Psychology, http://dx.doi.org/10.1037/pspi0000154

Abstract: Most published research focuses on describing differences, while neglecting similarities that are arguably at least as interesting and important. In Study 1, we modified and extended prior procedures for describing similarities and demonstrate the importance of this exercise by examining similarities between groups on 22 social variables (e.g., moral attitudes, human values, and trust) within 6 commonly used social categories: gender, age, education, income, nation of residence, and religious denomination (N = 86,272). On average, the amount of similarity between 2 groups (e.g., high vs. low educated or different countries) was greater than 90%. Even large effect sizes revealed more similarities than differences between groups. Studies 2–5 demonstrated the importance of presenting information about similarity in research reports. Compared with the typical presentation of differences (e.g., barplots with confidence intervals), similarity information led to more accurate lay perceptions and to more positive attitudes toward an outgroup. Barplots with a restricted y-axis led to a gross underestimation of similarities (i.e., a gross overestimation of the differences), and information about similarities was rated as more comprehensible. Overall, the presentation of similarity information achieves more balanced scientific communication and may help address the file drawer problem.

The increase in attractiveness is not because an individual looks more friendly or likable in a group; same happens even if the group is only made of of identical photographs of that person

Carragher, Daniel. Cheerleaders make fools of our first impressions [online]. Australasian Science, Vol. 39, No. 4, Jul/Aug 2018: 26-27. Availability: https://search.informit.com.au/documentSummary;dn=056612671498998;res=IELAPA

Abstract: The "cheerleader effect" - the observation thatpeople appear more attractive when they are in a group - reveals some quirks about how the brain processes complicated visual information.

---
[the increase in attractiveness is not because an individual looks more friendly or likable in a group ... the same effect occurs even if the group is only made of of identical photographs of the same person]