Tuesday, May 17, 2022

From 2019... Teams outperformed individuals in making accurate geopolitical predictions

From 2019... What Makes Foreign Policy Teams Tick: Explaining Variation in Group Performance at Geopolitical Forecasting. Michael Horowitz, Brandon M. Stewart, Dustin Tingley, Michael Bishop, Laura Resnick Samotin, Margaret Roberts, Welton Chang, Barbara Mellers and Philip Tetlock. The Journal of Politics, Vol. 81, No. 4, Oct 2019. https://www.journals.uchicago.edu/doi/abs/10.1086/704437

Abstract: When do groups—be they countries, administrations, or other organizations—more or less accurately understand the world around them and assess political choices? Some argue that group decision-making processes often fail due to biases induced by groupthink. Others argue that groups, by aggregating knowledge, are better at analyzing the foreign policy world. To advance knowledge about the intersection of politics and group decision making, this paper draws on evidence from a multiyear geopolitical forecasting tournament with thousands of participants sponsored by the US government. We find that teams outperformed individuals in making accurate geopolitical predictions, with regression discontinuity analysis demonstrating specific teamwork effects. Moreover, structural topic models show that more cooperative teams outperformed less cooperative teams. These results demonstrate that information sharing through groups, cultivating reasoning to hedge against cognitive biases, and ensuring all perspectives are heard can lead to greater success for groups at forecasting and understanding politics.

5 What Kinds of Teams Succeed? Modelling Team Communication

To test hypothesis 2 and hypothesis 3 concerning what explains variation in the ability of groups to forecast, we focus on the content of forecast explanations. In particular, we examine explanations given by individuals in the team conditions. By understanding how different kinds of teams (trained teams, untrained teams, and top teams) use explanations, we can begin unpacking what makes teams more or less effective. We find several patterns in the content of explanations that help to explain top team success. When making their predictions, participants —whether in the individual or team condition—could also choose to provide an explanation for their forecast. There was a comment box underneath the place where individuals entered their forecasts and participants were encouraged to leave a comment that included an explanation for their forecast. For participants in an individual experimental condition, only the researchers would see those explanations. For participants in a team experimental condition, however, their teammates would be able to see their explanation/comment. These explanations therefore potentially provide useful information to help identify what leads to forecasting accuracy, giving us a way to test hypotheses 2 and 3.

5.1 The Conversational Norms Of Successful Geopolitical Forecasting Groups

An obvious starting point is to ask whether, on average, individuals differ in how extensively they made explanations (i.e., how many comments per IFP) and how intensively (i.e., how long were the comments). Both of these metrics give us a sense of forecaster engagement - since those that explain their predictions are likely more engaged than those that do not. We do this by contrasting behavior by whether a forecaster was on a team or not, whether they were on a team that got training, or not, and whether they were on a top team. Below, we switch from focusing on the extent of engagement to the intensity of engagement, when it occurs.
To calculate the degree of extensive engagement, for each individual we first calculated the total number of explanations made per IFP for which the individual made at least one explanation. Then for each individual we calculated their average number of comments per IFP, averaging over all of the forecasting questions they answered. Thus, for any person we know the average number of explanations they will give for an prediction task.
Figure 3 plots the resulting distribution of this value for each group (individuals, untrained teams, trained teams, and top teams). The x-axis is scaled along a base 10 log for each individual’s score because this distribution is heavily skewed. The log transformation reduces the presentational influence of extreme outliers in this distribution. Each group is presented as a different density plot, with the height of the plot giving a relative estimate of how many observations were at the particular value of the x-axis.15 We observe that both individuals and untrained teams have relatively low levels of average responses per IFP. Trained teams and particularly top teams have considerably higher average responses per IFP.
Next we calculate how intensively individuals engage with explaining their prediction. For each individual we calculated the median length of their first explanation of an IFP. We use the first explanation for a variety of reasons. First, as seen in Figure 3, individuals that were not on a team, or were in untrained teams, rarely made more than one explanation per IFP. Second, we are most interested in individuals providing information and analysis to others on their team. Someone’s first explanation is an important first step in doing this. Figure 4 shows the distribution for the four conditions. We see that individuals who are in top teams are clearly engaging in more intensive explanation compared to individuals in other conditions.
Next, we combine Figures 3 and 4 and plot each individual’s value of their extensive engagement and intensive engagement in Figure 5. Here we separate out the plots by each of our groups and overlay a contour plot to give a sense of the distribution of data in this space. As expected, we observe that top teams tend to have more individuals who are engaging both more extensively per IFP and more intensively. On the other hand, while people not on teams on occasion would provide multiple explanations per IFP, most did not. Teams with and without training had individuals who provided more lengthy explanations, but these teams do not have individuals who both supplied multiple responses to an IFP and began their engagement with an IFP with a lengthy explanation (which could then be read by other participants on their team).
We also examined other metrics of intensive engagement. Figure 6 plots the fraction of total words in explanations that came after the first response.16 The plot shows a low proportion of total words coming after the very first explanation from individuals. Teams did better, with more intensive engagement after the first explanation by trained teams and top teams.
Figure 7 investigates the degree to which explanations are generated by a single member of a team or a broader discussion amongst multiple participants. To measure this we calculate for each IFP, in each team, the total number of explanations of the most prolific responder. We then divided this by the average number of responses within the team to that IFP to generate a score for each team/IFP combination. We then plot the distribution of these scores by condition in Figure 7. This shows a distinct pattern illustrating strong effects for one particular type of team - top teams. Prolific posters for top teams posted four times as much as the team average. But for non-top teams, the relative contribution of the most prolific posters was significantly higher. Essentially, in non-top teams, a single person often completely dominates the conversation while top teams featured broader conversations among more team members.

No comments:

Post a Comment