Testing speech recognition with Playwright
How to automate testing in-browser speech recognition, so you can stop talking to the computer and go back mumbling to yourself.
While working on my voice-controlled teleprompter app, which was my first experience with speech-related APIs, I discovered a new type of test that's worse than flaky tests, worse than manual tests, even worse than flaky manual tests.
I've discovered the out loud flaky manual test.
The problem
As you're working, you need to be constantly speaking into the microphone because that's what you're testing. This makes people around you constantly wonder if you're talking to them this time or not, you're interrupting their work and bringing attention to how you pronounce the word "little".
The solution
Since we're already using Playwright, it turns out we can make Chrome listen to our prerecorded voicemail.
Key part are these Chrome flags:
--use-fake-ui-for-media-stream
avoids granting permissions to the microphone--use-fake-device-for-media-stream
creates a dummy device--use-file-for-fake-audio-capture=path/to/audio.wav
plays a pre-recorded file on the said fake device--disable-audio-input
disables any actual microphones you might have so you know you're listening to the correct (fake) device- use
channel: "chrome"to refer to Chrome in the Playwright settings to ensure speech recognition API is enabled, usingchannel: "chromium"(or nothing) will not work as of this writing.
We can now run a test with Playwright in Chrome and, when we start voice recognition (or any microphone-related operation), it will play our WAV file.
The end.
The new problem
But wait, this will play the same WAV file every time, so how do we test different phrases?
What if I need to speak in English and then in Croatian?
The solution #2
What we do is we:
- create multiple WAV files
- for each, we create a separate Playwright project
- ensure the specific test is bound to the specific project and nothing else, since we'll be asserting exactly what we want to do
This is very tedious to do manually, so let's engineer a small helper which does it automagically:
- we put all our WAV files into
tests/fixtures, all of them named likespeech-<phrase said>.wav - for each file we find, we create the already mentioned Playwright project, making sure to tag the project with the file name:
return { name: name, use: { ...devices["Desktop Chrome"], channel: "chrome", // !important launchOptions: { args: [ "--disable-audio-input", "--use-fake-device-for-media-stream", "--use-fake-ui-for-media-stream", `--use-file-for-fake-audio-capture=${filePath}`, ], }, }, // only target tests which have our name as their tag grep: new RegExp(`@${name}`), };[Example]: Playwright project for a specific speech WAV file - in our test, we tag it with the file name so it only runs in the correct project:
test( "should progress prompter position as speech is recognized", // opt-in to a very specific file to use { tag: "@speech-hello-world-how-are-you" }, async ({ page }) => { await page.fill( "#script-input", "hello world, how are you doing today?", ); await page.locator('[role="speech-control"]').click(); // wait for the length of the audio file await page.waitForTimeout(5000); await page.locator('[role="speech-control"]').click(); const currentWord = await page .locator(".word.current") .textContent(); expect(currentWord).toBe("doing"); }, );[Example]: Playwright testing speech recognition using a very specific file - make sure to exclude the speech-related tests from your "normal" Playwright projects:
{ name: "Chrome", use: { ...devices["Desktop Chrome"], channel: "chrome" }, // anything tagged with speech* is skipped in the non-audio projects grepInvert: /@speech/, },[Example]: exclude speech tests from your regular Playwright projects
Full example of playwright.config.js
async function generateSpeechProjects() {
const fixturesDir = join(__dirname, "tests/fixtures");
try {
const files = readdirSync(fixturesDir);
// Find all FLAC files and decode them to WAV if needed
const flacFiles = files.filter(
(file) => parse(file).ext.toLowerCase() === ".flac",
);
for (const flacFile of flacFiles) {
const { name } = parse(flacFile);
const flacPath = join(fixturesDir, flacFile);
const wavPath = join(fixturesDir, `${name}.wav`);
await decodeFlacToWav(flacPath, wavPath);
}
// Now find all WAV files (including newly decoded ones)
const updatedFiles = readdirSync(fixturesDir);
const wavFiles = updatedFiles.filter(
(file) => parse(file).ext.toLowerCase() === ".wav",
);
return wavFiles.map((file) => {
const { name } = parse(file);
const filePath = join(fixturesDir, file);
return {
name: name,
use: {
...devices["Desktop Chrome"],
channel: "chrome",
launchOptions: {
args: [
"--disable-audio-input",
"--use-fake-device-for-media-stream",
"--use-fake-ui-for-media-stream",
`--use-file-for-fake-audio-capture=${filePath}`,
],
},
},
grep: new RegExp(`@${name}`),
};
});
} catch (error) {
console.warn(
`Warning: Could not process fixtures: ${error instanceof Error ? error.message : String(error)}`,
);
return [];
}
}
export default defineConfig({
projects: [
// regular tests
{
name: "Chrome",
use: { ...devices["Desktop Chrome"], channel: "chrome" },
grepInvert: /@speech/,
},
// audio tests
...await generateSpeechProjects(),
],
});
That's it, now we're able to add new speech recognition tests by just adding as many new fixture files as we need and asserting against them.
Bonjour.

