Opinions are worthless: how we test a new visual design

You explore several visual styles and now you’re left with a selection of UI-mockups… Which one is best? How do you make sure you’re not relying on ‘opinions’? We solved this question with a structured test-approach. 👨‍🚀

One of our clients is giving a major overhaul to their flagship product. Design, interaction, code: everything will be modernized to fit with the 21st century. We’re incredibly happy to help our innovative client along this journey.

The flagship product we’re talking about is the biggest knowledge bank used by the majority of courts, judges and lawyers in The Netherlands. The existing version had a decent UI, but no good fit with the brand. We decided to overhaul the UI-design before kicking off the design system.

Phase 1: brand research and iterative design

Working closely with the product owner we researched the brand: their representation and reputation. We looked at brand values, style guides, other products, marketing pages, etcetera — and distilled the target brand traits. Based on this research, we iterated thru several UI-designs, color palettes and typography.

We settled on four distinct UI-designs and five possible forms of typography (i.e.: font, weights, line-height, font-size, color).

Phase 2: setting up the test

You can trust Maurice to deliver on his visual-design skills: all four UI-designs are distinctly different and all of them are good. The designs support the most important user actions, the screens look pretty, text is readable, colors are accessibility-proof and the design fits with the brand.

So how do you evaluate which one is ‘the best’?

We wanted to prevent getting people’s opinions…

Just asking people which they prefer will be entirely based on personal preference and those opinions are worthless: every person is different, and what you like one moment you might not like the next (if we ask you right after your morning coffee we’ll get a very different answer than after a whole day of exhausting meetings).

With the help of the excellent resources of the Nielsen Norman Group we came up with the following structure for our 30-minute test-sessions:

  1. 5 second tests with unstructured association
  2. Readability testing
  3. Structured association and rating
  4. Open interview We created a simple Keynote-slide deck with the four parts. In each part we randomized the slides to eliminate any order-effect (the first thing you see influences whatever you see afterwards, and you’ll always remember the first and last thing best). We show every participant all designs because of the variability of responses. As a baseline, we also included the current visual style.

Participants sit in front of a laptop, with a note-taker on the left and the experiment-leader on the right.

Want to use our slide-deck for your own UI-research? Download it here!

Data beats opinions

Phase 3: let’s test!

We invited a group of five participants and had them go through each part of the test:

The five second test

We use the five-second tests because we want to know the first-impressions people have, their gut-feeling.

Each design is shown for 5 seconds after which we ask four simple questions (“is it trustworthy? Why?”, “What do you remember?”, “Which words do you associate?”, “Which rating between 1–10 do you give?”).

To help our participants we also show these questions on screen after each design. As this test is a bit counter-intuitive we start of with an example: a screenshot of a well-known shopping site.

Tip: we ask the participants for ‘free word association’, which is difficult because they don’t do that every day. I help them by giving an example during the first screenshot (“I associate it with a candy store, because of all the extreme colors”). This shows them no answer is wrong.

Readability testing

As our platform is used for reading very technical documentation choosing the correct typography is incredibly important.

We picked paragraphs of existing legal text from the platform and created five slides with each three paragraphs in a different typography. We made sure each text has a comparable difficulty: each text includes complex legal terms and has references to legal documents (long numbers).

We ask our participants to read out these texts aloud and give a gut-feeling on readability (a scale 1–5). More importantly, we observe their behavior: are they leaning forward, are they straining their eyes, are they stuttering while reading or making mistakes.

Structured association and open interview

For the last part we give each participant a paper with a list of sixteen words. They are asked to select three to five words that describe the design. The list contains words related to the brand traits and contains positive and negative wordings (you need at least 40% of the words to be negative!), e.g.: ‘boring’, ‘formal’, ‘old-fashioned’, ‘trustworthy’.

They are then asked to rate the best and the worst design. For this they are allowed to use the keyboard and go through the designs.

Based on these answers we go through an open interview, asking about the motivations behind the chosen words and why they liked certain designs.

Phase 4: summarize, iterate, test again

Testing’s done! Now we analyze and summarize our results (I wrote about analyzing results before). Here’s a tip: we write our findings immediately in a Numbers document (Apple’s equivalent of Office Excel) which allows for easy analysis. With Numbers we can categorize over the different designs and immediately see trends.

Based on the results we iterate on our designs and test again. By testing with two groups of five people we remove personal bias and can give an optimal direction for the future UI of the application.

Don’t forget, you can download our slide-deck for your own UI-research.

That’s a wrap on this article!

Published previously on Medium: Opinions are worthless: how we test a new visual design