Deploying AI doppelgangers to de-identify user research recordings

Thanks for joining me at Rosenfeld Media’s first Designing with AI conference! This page offers supporting information for my demo “Deploying AI doppelgangers to de-identify user research recordings.”

Contents

Original and redacted videos from the presentation
How much biometric redaction is enough?
AI “redaction” tools and their issues
A repeatable custom character creation flow for Wonder Studio
Alternative AI tools
Acknowledgements, resources, and license information

Keep in touch: Connect on LinkedIn or grab time to chat with me about something else.

Original and redacted videos

Under biometric privacy laws, user research recordings containing users’ faces or voices can put your company at risk for lawsuits and fines. AI offers new solutions for UX teams who want to keep research recordings longer without violating biometric privacy laws. We used off-the-shelf tools ElevenLabs and Wonder Studio to intelligently redact users’ voices, faces, and bodies in research videos.

Here’s an example of an excerpt from a video-taped interview as it originally appeared, and with the voice and body redacted.

How much biometric redaction is enough?

When replacing someone in a recording with an avatar, how much redaction is enough?

An avatar has a new face and body, but it still moves in the much same way as the person it replaces.

A synthetic voice sounds different to the human ear, but it may have the same lilt and cadence of the original speech.

So is there some legally accepted definition of “voiceprint” or “facial recognition” that we can check our output against and know with certainty that we have done enough to satisfy the law?

As far as I can tell, there isn’t (any privacy lawyers in the audience are welcome to correct me). New biometric privacy laws are still being created. And even BIPA–one of the oldest biometric privacy laws–is only now starting to be interpreted by the courts through lawsuits.

Privacy policy decisions are ultimately made by lawyers based on their specific organization’s risk tolerance. So the goal of this work, then, is not to argue that any specific method offers a definitive answer, but to provide a variety of options to assess in partnership with your legal team.

“The goal of this work, then, is not to argue that any specific method offers a definitive answer, but to provide a variety of options to assess in partnership with your legal team.”

So with that said, let’s talk about the tools we used to create the examples above.

AI “redaction” tools and their issues

For the redacted videos above, we used ElevenLabs Speech-to-Speech Generator to redact the voice, and Wonder Dynamics Wonder Studio to redact the face and body.

These tools automate audio and video editing in ways that seem like magic, and which would not have been possible a few years ago.

But like any new technology, they have their quirks. The examples we showcase above offer a best-case scenario of what output can look like under ideal conditions.

For anyone considering using these tools in their own work, here is a summary of issues encountered in our test runs where output was less than ideal.

Note: Biometric redaction is off-label usage for these tools, and they may not have appropriate licensing terms in place for real-world application today. As always, check with your legal team.

ElevenLabs Speech-to-Speech Generator

ElevenLabs worked incredibly well as long as we were processing clean voice recordings with minimal background noise and few uncommon words.

Background noise, whether sustained machine hum or brief bangs and clangs, garbled the output as ElevenLabs apparently attempted to incorporate them into the speech. Uncommon words, such as names, were garbled into some kind of strange Simlish.

We were able to get much better output from ElevenLabs by cleaning background noise from our test recordings using audio editing software (Replay), but we did not find a good workaround for uncommon words. In research applications, this could present a problem for participants discussing product names or using technical jargon.

If you’re looking to apply tools like ElevenLabs for research redaction today, we recommend:

Only using them to process high-quality voice recordings with little to no background noise (although removing background noise also works, up to a point)
Considering using a lapel mic for participants to get cleaner voice capture
Reviewing output for intelligibility and errors before destroying original audio recordings
Keeping transcripts alongside synthetic speech output to help interpret poorly recognized words or noises

Wonder Dynamics Wonder Studio

Note: Wonder Dynamics announced that they were acquired by Autodesk on May 21, 2024. It is unclear what impact this will have, if any, on using Wonder Studio as outlined here.

Wonder Studio created jaw-dropping avatar animations for our videos with surprisingly little input required. It was robust and held up well even in videos with wobbly camera work and blurry focus.

That said, there were a few major pain points that kept the experience from being effortless.

Processing times and limits are restrictive. The elephant in the room is that Wonder Studio’s entry-level plan offers a mere 150 seconds of video processing time per month. Even their Pro plan is limited to 10 minutes. So videos tend to be very short, and they still take a long time to process (around an hour for the example above). This is not a tool–as of June 2024–that would allow you to dump in hours of research footage and comb through the output later on. With its current limitations, Wonder Studio would work better for creating a proof-of-concept video or maybe a research highlights reel (assuming you could get a plan with appropriate terms for sensitive data).
Wonder Studio doesn’t understand occlusion. When Wonder Studio renders a 3D character, it places it in front of anything else in the scene. This means characters’ bodies appear to be “floating” in front of whatever furniture they’re standing behind, and objects they hold in their hands disappear or appear to be hovering behind them. For the purposes of research redaction, this is more of a quality issue than a showstopper. It may cause your video to look wonky, but you can usually still work out what your participant is supposed to be doing. However there could be cases where bad occlusion could obscure important information.
Custom characters are not beginner-friendly. While Wonder Studio’s basic flow is pretty plug-and-play, the same can’t be said of creating custom 3D characters. (And you’re going to want to, because Wonder Studio’s stock library currently contains only two human options.) Using custom characters requires you to create (or purchase) them yourself, which was extremely challenging as a layperson, even using beginner-friendly tools. There’s also no guarantee that your model, as it appears in Blender, will render exactly the same in Wonder Studio. And there’s a high cost for failure, as you’re limited to six character uploads per month on the entry-level plan. (We did eventually come up with a repeatable process that worked–more on this below.)

If you want to use tools like Wonder Studio for research redaction, we recommend:

Understanding the processing limits of the tool (150 seconds is not much!)
Avoiding occlusion when filming (use a second camera to capture details if you have to)
Avoiding extreme close-ups and views from the back
Making sure you’re comfortable using limited stock 3D characters (or building in time and/or budget for custom models)

A general issue: Terms of service

One issue we haven’t addressed in the above discussion is the agreements you sign to use third-party platforms, especially concerning the rights they have to your data, as well as any content they may prohibit you from uploading. Handing over sensitive user data to any third-party company or tool is always a risk. We recommend having your own legal team review any tool’s agreements before uploading any user research recordings. Since user research recordings do contain biometric and other personally identifying information, you may need to negotiate custom enterprise agreements rather than using pre-existing plans.

A repeatable custom character creation flow for Wonder Studio

If you’re creating avatars for research video redaction, there’s value in keeping that avatar consistent with how the participant appears in the real world. For the example video, we wanted to keep the participant’s skin color, gender, hair color, and even clothing as consistent as possible with her real-world appearance because those characteristics can have meaning for research.

The simplest approach would have been to hire a 3D artist to create a custom avatar, but this solution requires an additional investment for every new research participant.

Our alternative approach was to use Reallusion Character Creator 4 (CC4) software and a series of Blender add-ons to create a repeatable, novice-friendly character creation pipeline. This pipeline allowed us to customize a character’s face, body, and clothes in CC4 using a somewhat more beginner-friendly interface than creating a model from scratch (that said, there was still a considerable learning curve). We then used the CC2Wonder Blender add-on to convert it for Wonder Studio, and Wonder Dynamics’ own Blender add-on to assign bones and blendshapes and to validate that the model would work in Wonder Studio. (See tutorials and guides at the end of this page for details.)

This process created a working 3D model, but we still encountered various issues with the final texture and color rendering in Wonder Studio. Additional changes we made were to:

Delete models’ eyelashes (this made a massive improvement in character appearance)
Restrict which eyeballs we used
Set colors darker than expected initially and also change shaders

Alternative AI tools

You may want to consider alternatives to ElevenLabs and Wonder Studio for licensing or functionality reasons. We have not evaluated other tools, but we have compiled a list of available options that you can use as a jumping-off point for your own investigation.

for voice replacement

As of April 2024, there are a number of tools for voice replacement that are relatively simple to use. However, many of these tools by default offer voice models that are inappropriate for research (e.g., cartoon characters), or that are of questionable provenance (e.g., voice models trained from audio books by performers who did not opt in).

You have the option of using voice replacement tools as online platforms (which is probably higher risk for sensitive data), as self-hosted apps, or as code that you must download and compile yourself.

Note: We have not evaluated these tools and cannot make any guarantees about their quality or safety.

Online platforms for voice replacement

Kits (both an online platform and an app)
MetaVoice Studio
Resemble AI
ToneShift

Voice replacement apps that run on your own computer or server

Dubbing AI
Kits (both an online platform and an app)
Koe Recast

Voice replacement code

MetaVoice Live
NS2VC
PlayVoice so-vits-svc-5.0
Realtime Voice Changer
Retrieval-based Voice Conversion WebUI
StarGANv2-VC (not to be confused with StarGAN-VC2)
Tiger Costume
TinyVC

Multimodal models are also a big area of research right now, but in our testing these did not perform as well as dedicated speech models for research voice redaction.

for face and body replacement

When it comes to replacing the face and body of a person in a video, we were unable to find any existing tools that work exactly like Wonder Studio (although Adobe and Alibaba both recently showcased similar-looking demos).

The next-best alternatives to Wonder Studio work in one of two ways:

Automating one step of the larger end-to-end process that Wonder Studio fully automates (e.g., just motion capture, estimating lighting, making a 3D model, or rigging a 3D model); or
AI “rotoscoping,” where AI redraws part of the scene in a video (which can be targeted at human faces and bodies through prompts)

As Option 1 is not a standalone solution, we focused on Option 2.

Note: We have not evaluated these tools and cannot make any guarantees about their quality or safety.

AI video “rotoscoping” code:

StableVideo – uses text prompts
Video-P2P – uses text prompts
VidToMe – uses text prompts

Ground-A-Video – uses text prompts
MotionEditor – uses two videos as input
Rerender A Video – uses text prompts
SmoothVideo – uses text prompts

Acknowledgements

This work was created with extensive collaboration from technical consultant Vitorio Miliano at Tertile, LLC. Vitorio contributed to this project end to end, including compiling the extensive lists of alternatives to ElevenLabs and Wonder Studio, as well as troubleshooting a majority of the character creation workflows and making the solutions lay-accessible.

I extend my appreciation to everyone at MakeATX for their participation in the redaction video demo, without which this project wouldn’t have been possible. MakeATX is a laser cutting workspace that offers classes as well as custom projects, and everyone we worked with was wonderful.

I’m also indebted to Uday Gajendar, Lou Rosenfeld, Nathan Gold, and my fellow demo presenters at Rosenfeld’s Designing with AI conference: Jorge Arango, Bryce Benton, Yulya Besplemennova, Trisha Causley, and Fisayo Osilaja. Thank you also to my colleagues Dr. Diego Castaneda, Kristin Johnson, Keira Phifer, Sara Telfer, and Leslie Waugh for helpful discussions and feedback.

Many thanks to Nikola Gaborov at Wonder Dynamics for assistance with character troubleshooting.

If you would like to use custom characters in Wonder Studio, you may wish to consider hiring a dedicated contractor. We spoke briefly with Scott M. (Upwork), Olayemi S. (Upwork) and Chatamodel (Fiverr), although we did not end up engaging their services for this project.

Thanks also to Dr. Shuai Yang for use of a still frame generated by Rerender A Video.

Resources

Below are resources and tools used in the course of this project.

Biometric privacy laws

Audio tools

Background noise removal (AI stemming): Replay
Voice replacement: ElevenLabs Speech-to-Speech Generator

Video tools

Custom model creation: Reallusion Character Creator 4
Custom model editing: Blender (versions 3.4 & 3.6)
Character Creator to Wonder Studio conversion: CC2Wonder
Wonder Studio model validation: Wonder Studio Blender add-on
Human replacement with 3D model: Wonder Studio
Rerender A Video online demo (single frame): Rerender A Video

Image tools

Face inpainting experiments: Fooocus

Tutorials and guides for Character Creator 4 and Wonder Studio conversion

S-Lab License 1.0 (for Rerender a Video ONLY)

Redistribution and use for non-commercial purpose in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
In the event that redistribution and/or use for commercial purpose in source or binary forms, with or without modification is required, please contact the contributor(s) of the work.