We at MAYA have been interested for a while about the differences between usability tests where the tasks are well-defined beforehand and those that use a looser structure; where the user has greater autonomy to explore the interface or the product. Observing the user while they’re allowed to explore a system on their own has merit — after all, they won’t have a usability test moderator telling them what task to do next when they’re using the system in earnest. On the other hand, if there’s no structure to the test, a participant may not encounter many areas of the user interface, or it may take more users to get complete coverage of a system. It’s also hard to make objective measurements (error rates, time-on-task) if the tasks are generated in an ad-hoc fashion — not only will the tasks not be known by the moderator, but each participant will have a different task set. Mark Hurst at Creative Good insists that what he calls Listening Labs are superior to usability tests where the tasks are defined beforehand. Jared Spool suggests that we let the users define their own tasks. Of course, one can’t go into a test unprepared. Even if you don’t define tasks beforehand you have to be well-prepared, as a proficient interviewer will be before an interview. In fact, there might be more preparation needed for an ad-hoc style user test than for one that has tasks defined. The truth is probably that there’s a continuum from ad-hoc to well-defined, that neither extreme on the continuum is well-suited to getting the best results, that the proper compromise between the extremes differs from product to product, and that the best test must be designed on a case-by-case basis depending on the thing being tested, the goals of the test, and the users. Here are a couple of links that help frame the debate: Mark Hurst’s “4 Words to Improve User Research” post Joshua Kaufman’s description of a canonical usability test, including generation of task scenarios Jared Spool’s article where he refers to task-based tests as “scavenger hunts” DialogDesign (Rolf Molich)‘s Comparative Usability Evaluations, where they tested systems using different methods, then compared the results. Although the results of the 5 CUEs are interesting and instructive, none of them allows comparison of results between ad-hoc and a task-based usability testing methods. Some allow comparison between usability testing and expert evaluation, but most of the conclusions reached after the CUE exercises centered around the issue that usability professionals need to improve communication of recommendations to engineering teams.