Evaluating AI tools is a team sport

In my AI + UXR workshop, I teach attendees to check the subprocessors of AI tools to understand what models they’re using under the hood. Amazingly, we may have beaten even Simon Willison to the punch on this.

Most AI research features and tools available today are wrappers over existing LLMs. They make the LLMs marginally easier to use and may do all the prompting for you, but at the cost of reducing your own control over your input and output. Using third-party tools means you have to trust that:

  • The model and settings are appropriate for the task at hand

  • There haven’t been any unexpected changes to the model they’re using (see, e.g., the recent issues with GPT-4o becoming sycophantic)

  • The prompts from the toolmaker are effective

  • The right parts of your research artifacts (e.g., transcripts) are getting inserted into the prompt appropriately (i.e., through RAG or similar)

So before deciding to use a third-party tool, it’s important to have as much sense as you can about what models they’re using and how they’re controlling them. Checking the subprocessor list is one way to do this.

I learned about checking subprocessors from Vitorio Miliano, who collaborated with me on all of my workshop content (and also built our custom AI tools).

Vitorio and I, in turn, have constantly turned to Simon Willison’s work to contextualize recent AI developments, limitations, and strategies for evaluation.

So it was delightful validation to see Simon Willison mention in a May 11 post that he’s “recently learned that checking an organization's list of documented subprocessors is a great way to get a feel for how everything works under the hood.”

This highlights the critical importance of experimenting and learning from each other while AI is still getting established. Simon Willison is an undisputed expert, and yet we’re all still figuring out different pieces of the puzzle at different times as we explore different AI use cases.

If you’re working with AI and user research, I’d love for you to share in our next group exploration. I’ll be running my next workshop through Rosenfeld Media in June. Consider joining us!

(And if you’re reading this after June, you can contact me about running a private workshop for your team.)