What’s One More Study?

Skye Winters
Oct 5
11 min read

This week, I went through and did some research into how game researchers have investigated the concept of reputation systems. I then went and created an expansion of said systems that could be used for my gossip system. Finally, I discuss developing the study designs for both Gaydar, the Believability Scale, and Project Companion.

Research Through Design

For this week, Rae and I continued our development of our prototype for Project: Companion. On my side of things the following tasks were completed:

Creation of a choice selection system
Creation of a companion speaking system
Research into additional works related to Gossip / Social / Reputation systems
Ideation of a Relationship-Based Gossip Spreading System
Creation of a first draft study design for evaluating our system.

For the choice selection system, it's a simple implementation of the classic ability for narratives to branch. However, it also features the option of auto-play choices in which the valid branch with the most amount of requirements is selected (with ties being broken randomly). The reason for this system is to allow the player to have more ways to exert control on the situation and feel like they are actually experiencing the events of the game.

For the companion speaking system, we decided a useful feature for our narrative would be that the player could speak with their companion any time they are not actively in a conversation. So we created a system based on our previous work in Soul Purpose that allows the player at any point to begin a dialogue conversation with their companion through pressing a button.

Findings from Literature Review

This week, I read through four more papers related to our topic area of NPC Gossip Systems. My notes are as follows:

In Brown and Kraev (2021), they discuss their creation of a methodology for tracking reputation systems in games. They describe how the relationships and information sharing that occurs between NPCs can be represented through social network graphs. And using those graphs, the spread of information could be handled. They then propose that to determine what information is shared between NPCs, you can use the metrics of

How much an NPC likes / dislikes the same types of things as another NPC
What the current mood of the speaker NPC is

Then for each node of gossip, it would contain information on when that knowledge was obtained, what the related topics of that knowledge are, and how important that knowledge is to reputation calculation. Additionally, some knowledge would be able to be forgotten, while others would never be forgotten.

Overall, while a nice and simple solution, I would say it still leaves many areas to be desired which our solution is trying to address such as adding nuance to what a piece of info means (for example their system treats all info related to spiders the same regardless of whether that is pro or anti spider). Additionally, they don’t have an actual system in place to limit the amount of information sharing that could occur nor what the relationship is between characters. Finally, they don’t have an implementation of the system beyond theory.

Then in Brown and Qu 2015, they provide a call to action for why these systems are beneficial citing how systems like the morality system in bioshock are detrimental to a games enjoyment and can cause conflict with the narrative. Yet they also discuss that games face two unique challenges for such a system of

Potentially locking players out of critical information / game segments
Tackling how to account for fast travel and players trying to “out race” the spread of information

Finally, in do Couto et al (2016) and then Carneiro et al (2019), a reputation system for Catan is presented which focuses on trust and reputation. In their system, they focus on how as the number of positive interactions increases, an NPCs willingness to work with the player should also increase and as the number of negative increases, the NPC should be more hesitant to trust the player. Through their system, they aim to make NPCs more engaging to play against through adding an additional social strategy element to games beyond simply doing whatever it takes to win.

However, across all three of these solutions, very limited evidence is provided to demonstrate the effectiveness of these reputation systems in enhancing a player's experience. The only evaluation of such a system was by Carneiro et al which was problematic due to its experimental design and limited sample (n=6).

Relationship-Based Gossip Spreading System

For our relationship-based gossip spreading system, the following brainstorming occurred to lead us to our result:

Assumptions being made:

That an NPC will share any knowledge with any other NPC
- Solution: Create a method of tracking relationship levels
That an NPC will share all knowledge with other NPCs when given the chance
- Solution: Create a system that will detect gossip shared during a conversation
  - Potential problem: What gossip is shared if a player is not present during it?

What if we made a system in which each NPC has a set of different gossip values they know where:

Gossip Value:

Information
Source
Target
Juiceness
- This is how much the information is desired to be shared
- The level of friendship gives a boost to this value such that (enemy = 2, acquaintance = 1, friend = 0, best friend = -1)
Secretness
- This is how much the person giving the information doesn’t want to share the information
- The level of friendship with the target of the info gives this a boost such that (enemy = 2, acquaintance = 0, friend = -1, best friend = -2)

Then for determining if information is shared, you will determine which gossip has Juiceness > Secretness and then take the X most juiciness from the group and then share that information. For secretness, it is based on the lowest friendship score in the group. Each gossip will have an initial starting value.

For example take the gossip information:

Gossip Info:

Information: HasACrushOnPeriwinkle
Type: Negative
Source: Sage
Target: Sage
Juiceness: 8
Secretness: 9

So if Sage is talking with their best friend, Forest, about this then they will share the information since the secretness would be only 7 for that conversation. Whereas if Lavender, their enemy, was in the conversation, Sage would not share the info since the secretness would be 11. Additionally, if Lavender were to find out about the info then the gossip info would look like

Gossip Info:

Information: HasACrushOnPeriwinkle
Type: Negative
Source: Forest
Target: Sage
Juiceness: 8 (+2)
Secretness: 9

Which would make it very likely that lavender would be going around sharing it with nearly everyone.

Additionally, you could make it so that different gossip has different categories such as (positive, neutral, negative) which may alter the scores like

Positive Info:

Juiciness: (enemy = -1, acquaintance = 0, friend = 1, BFF = 2)
Secretness: (enemy = 1, acquaintance = 0, friend = -1, BFF = -2)

Neutral Info:

Juiciness: (enemy = 0, acquaintance = 0, friend = 0, BFF = 0)
Secretness: (enemy = 0, acquaintance = 0, friend = 0, BFF = 0)

Negative Info:

Juiciness: (enemy = 2, acquaintance = 0, friend = -1, BFF = -2)
Secretness: (enemy = 3, acquaintance = 0, friend = -1, BFF = -2)

Personal Info:

Juiciness: (enemy = 0, acquaintance = 0, friend = -1, BFF = -2)
Secretness: (enemy = 1, acquaintance = 2, friend = 0, BFF = -1)

Since you will likely want to tell people about your BFF good news but not bad news or personal news. And you would be willing to tell your BFF nearly any news but you would absolutely not want your enemy to know anything that is damning against you.

Finally, the benefit of this system is that then for situations with NPCs grouped together you could simply automate the process because each NPC has their own internal starting gossip information and the rest could be emergent based off when they have an opportunity to speak.

Experimental Methodology

For testing and evaluating the effectiveness of this solution, we also then went through and created a User Testing plan that we will be using to conduct a pre-test of the current solution and then expand upon later in the semester. The methodology goes as follows:

Participants will be given consent form
Participants will complete a simple tutorial walking them through how to use the game
Participants will be randomly assigned to either complete the Treatment Version or the Control Version
Participants will complete a post survey

Then for the survey itself, participants will be asked the following 15 questions to evaluate their experience and then be given the option to leave any additional comments for their thoughts:

10-point Differential Semantic Scale with a Non-Believable-Believable pair

Likert-Scale for each question of 7 point scale of Strongly Disagree to Strongly Agree

Believability

The Party Members interacted with the the other party members
The Party Members had relationships with the other party members
The Party Members’ relationships evolved over time
The Party Members’ relationships influenced their behaviours

Narrative Transportation

I felt like I was at the party shown in the game
I was mentally involved in the narrative while playing the game
I wanted to learn how the game ended
The game affected me emotionally
While playing the game, I had a vivid image of the party members
While playing the game, I had a vivid image of my companion

User Testing Experience

I felt like I understood what the questions in this survey were asking
I felt like I could easily use the application
I enjoyed the game’s story
I enjoyed playing the game

The reason for including these questions is that the first one tests the view of believability of the party guests. The second category uses my in development believability scale for social believability, the third uses an adaption of the SF-NT scale to test for narrative transportation, and the final set evaluates whether the results and methodology are meaningful and sound.

After the test, the following correlations will be calculated in a correlation table:

Narrative Transport, Believability, Social Believability, Game Enjoyment, Narrative Enjoyment, Effectiveness of Evaluation

Finally, the remaining aspects to still be designed are the title screen and the tutorial screen which will be developed in the following week.

Independent Studies

Gaydar

For Gaydar, I finished studying experimental methodology in “Experimental and Quasi-Experimental Designs for Research” by Campbell and Stanley (2011). Through my readings I learned more about external validity concerns and three different experimental design setups which they refer to as Design 4, Design 5, and Design 6.

Some of the key takeaways were as follows:

External validity can never be logically solved in a simple way due to questions regarding the generalizability of any study result
There is the concern of if a pretest is given, it will alter the participants behavior in such a way that it's no longer representative by predisposing them to focus on a specific aspect
If it becomes difficult to obtain participants for a study due to them self selecting their participation, it could call into question your results. For example if your trying to study views on trans folks, and after asking 100 people only 10 agree to answer, those results may be biased towards the reasons why only those 10 answered
The results may be tied to how you conducted your study such that they cannot be reproduced in a different testing setup. For example, if the study only works when the developer is there to answer questions about how to play
The results may be tied to the period in which you studied them. For example, if you're accessing views on trans individuals right after an anti-trans president is elected
If the participant knows they are being studied, they may alter their behavior

Additionally, they describe the following three setups

Design 4: Pre-test post-test design with one treatment group and one control group
Design 5: Pre-test post-test design with one treatment group w/ pretest, one treatment group w/o pretest, one control group w/ pretest, and one control group w/o pretest
Design 6: Post-test only design with one treatment group and one control group

After all these results, Rae and I are likely planning to conduct a Design 6 style experiment. The reason for this decision is that it will allow us to avoid the potential issue of the pretest jeopardizing our results. Additionally, we will likely use a cover story in order to further mask this fact. Furthermore, by doing Design 6 over Design 5, we will be able to halve the number of participants we will require.

GDC Talks

This week I looked into watching three different GDC talks to continue my quest of learning from the captured knowledge of game design experts. My notes went as follows:

In my first talk, “Technical Tools for Authoring Branching Dialogue”, two developers from Oblivion gave a presentation on their home built dialogue engine. Over the course of the talk, audience members were walked through all the various elements of the dialogue engine and why those decisions were made. Some of the key points that the studio mentioned being helpful were:

The ability to collapse and expand nodes
The ability to link nodes through references rather than needing to draw lines
The ability for the nodes to auto build in an organized layout
The ability for debug tools to allow for viewing what the current game state is and what the current status of linking nodes criteria are
The ability to modify global variables
The ability to modify save state
The ability to export choice history
Ability to view all invalid script paths or calls
Ability to view what nodes are relying on what conditions and see how often a condition is used

Overall, while not directly relevant to my thesis, several good concepts to incorporate into my own work with dialogue tools.

In my second talk, “Dialogue System Driven Dialogue in Mafia III”, the speakers describe how they went about creating a two variable system for determining the emotional state / disposition of three crime bosses. Furthermore, they broke down the journey of how they flushed it out more and more to avoid adding complexity that would lose the heart of the system. The most interesting part related to my thesis is that the talk serves as an example of how the Illusion of Life can assist with causing NPCs to be more believable.

In my third talk “Compiling Your Story: Using Techniques from Compiler Design to Check Your Narrative”, the speaker describes essentially what the title is saying. However, it requires the ability to compile your dialogue tree into byte code which is a bit beyond what I am looking to do so not as useful although conceptually interesting.

Believability

For this week, I went through and began writing my IRB for my scale validation study. However, as I was doing so, the quantity of survey questions required to do the experiment greatly stood out to me as a potential flaw due to burning out the test taker. As such, I went through and greatly simplified the scale so that instead of around 70 questions, it's now closer to 27 questions with each sub construct containing about 2-6 questions. Additionally, instead of trying to do a convergent validity test with Narrative Transportation which would have added an additional six questions, I decided to further model Barreto et al. 2017’s paper and use a scale from 1 - 10 asking for how believable the NPC is. Finally, I will doing a series of video clips with each one looking at a separate construct so in the end it will include around 7 * 22 = 154 questions which may be even further narrowed down as the study progresses.

On another note, this week I was introduced to a service called MTurk which is often used by researchers to gather participants. Additionally, another service called Qualtric can also help with gathering participants. So I may consider using those sites in order to gain a better sample of participants and a higher number of participants for my survey.

The Wrap Up

Overall, this week I was able to make a lot more progress in understanding my field's literature and have begun working on implementing it into my own research. In the coming weeks I am hoping to do more work in taking what I am reading and using it to write some papers to synthesize my findings.

Work Cited

Barreto, N., Craveirinha, R., & Roque, L. (2017). Designing a Creature Believability Scale for Videogames. In N. Munekata, I. Kunita, & J. Hoshino (Eds.), Entertainment Computing – ICEC 2017 (Vol. 10507, pp. 257–269). Springer International Publishing. https://doi.org/10.1007/978-3-319-66715-7_28

Brown, J. A., & Qu, Q. (2015). Systems for player reputation with NPC agents. 2015 IEEE Conference on Computational Intelligence and Games (CIG), 546–547. https://doi.org/10.1109/CIG.2015.7317670

Brown, J., Lee, J., & Kraev, N. (2021). Reputation Systems for Non-Player Character Interactions Based on Player Actions. Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, 13(1), 151–157. https://doi.org/10.1609/aiide.v13i1.12950

Campbell, D. T., & Stanley, J. C. (2011). Experimental and quasi-experimental designs for research. Wadsworth.

Carneiro, L. R., Delgado, C. A. D. M., & da Silva, J. C. P. (2019). Social Analysis of Game Agents: How Trust and Reputation can Improve Player Experience. 2019 8th Brazilian Conference on Intelligent Systems (BRACIS), 485–490. https://doi.org/10.1109/BRACIS.2019.00091

do Couto, F. S., Delgado, C. A. D. M., & da Silva, J. C. P. (2016). A Trust and Reputation Framework for Game Agents: Providing a Social Bias to Computer Players. 2016 5th Brazilian Conference on Intelligent Systems (BRACIS), 193–198. https://doi.org/10.1109/BRACIS.2016.044

https://gdcvault.com/play/1025981/Compiling-Your-Story-Using-Techniques

https://youtu.be/4J0KVdzx52w?si=4UkMpZ-RevvrNLJO

https://gdcvault.com/play/1025962/Technical-Tools-for-Authoring-Branching