This checklist is used in combination with the guidelines for the Replicability Line of Assessment for the ISWC 2020 Reproducibility Track.
Hypothesis and Overall Design Evaluation
- Overall design of evaluation
- Methods to acquire evidence of support
- Independent & dependent variables
- Environment factors to control for
- (in case of several hypotheses) which (parts of) experiments to which hypothesis refer
Target and Study User Groups
- Target user group - demographics, (level of) domain expertise & technological experience, experience with Semantic Web technologies
- Study user group - demographics, (level of) domain expertise & technological experience, experience with Semantic Web technologies
- Differences & commonalities between the target and study user groups
- Recruitment channels & venues
- Compensation provided for participants
- For publicly available datasets - version, retrieval date and location, other metadata
- For publicly available datasets - preprocessing scripts & resources
- For private datasets - characteristics of the dataset and how they refer to the study and tasks.
- For private datasets - example inputs for the different tasks. Where possible sample anonymised data should be provided.
- Study conditions (within-subject vs between-subject studies) and number of participants
- Task assignment and balancing for ordering effects
- Tasks and input data per task for each condition (all combinations of tasks if participants have received different (sets of) tasks)
- Tasks related to users’ (level of) expertise
- Common tasks vs rare tasks
- Solution(s) to the tasks, highlighting where these differ significantly between users
- Success criteria - binary or continuous & how it is determined
- Unexpected results (interim and final) and how these contribute to findings and future work
- Hardware configuration and special purpose hardware (for instance, eye-tracking cameras)
- Software environment and special purpose software (for instance, screen recording applications)
- Surrounding environment & special environment conditions if the system/method is supposed to be used in such
- Interaction context (for instance, touch interaction, joystick, large & high resolution displays, etc.)
- Presence of observers/members of the evaluation team and their role
- The level of expertise/experience of each member of the evaluation team should be documented (one of: novice, some experience, experienced, very experienced).
- Provide a timeline of all evaluation phases
- Motivate your choice in cases several options are available (for instance, selection of questionnaires)
- Implementation details of think-aloud protocols or similar
Analysis of Collected Data
Anonymized raw data
- Measurements of dependent variables
- Overall time & time per task for each participant, against expected times (average, minimum and maximum length of each session, whether one-off or longitudinal)
- Answers to standardized or custom questionnaires
- Observer notes if one has been present
- Results per group (in cases the study participants possess various backgrounds)
- Motivate data analysis method and statistical tests
- Relevant scripts, libraries & other resources for analysis and generating the figures
- Potential biases and threats to validity
- Data on pilot studies should also be submitted, in addition to key changes made prior to final study, along with explanations for these.
Acknowledgements: These guidelines are inspired by Valentina’s experience from participating in the organization of the VOILA! Workshop series as well as co-authoring of the A Framework to Conduct and Report on Empirical User Studies in Semantic Web Contexts paper. We thank Aba-Sah Dadzie, Catia Pesquita and Patrick Lambrix for sharing their experience, feedback and suggestions while refining this document.