Microsoft’s new AI can simulate anyone’s voice with 3 seconds of audio

BENJ EDWARDS

28 September 2023

12 min

PME All levels

BLUF

Microsoft's VALL-E synthesises speech from short audio clips, maintaining emotional and acoustic nuances. Applications abound, but potential misuse raises concerns.

Summary

KEY POINTS:

VALL-E replicates emotional tone and acoustic settings.
Applications encompass text-to-speech and speech editing.
It generates discrete audio codec codes from prompts.
Microsoft hasn't released VALL-E's code due to potential misuse.
Researchers propose a detection model and commit to Microsoft AI Principles.

See: VALL-E (X) (microsoft.com)

READ: Microsoft’s new AI can simulate anyone’s voice with 3 seconds of audio

References

Media Check Ars Technica - Media Bias/Fact Check (HIGH CREDIBILITY)
Collections | The Runway (airforce.gov.au)
ADDITIONAL READING RAAF RUNWAY (PME)
RAAF RUNWAY: RATIONALE, GUIDELINES, LEARNING OUTCOMES, ETC

Close menu

Microsoft’s new AI can simulate anyone’s voice with 3 seconds of audio

BLUF

Summary

KEY POINTS:

See: VALL-E (X) (microsoft.com)

References

Related articles

What If the Biggest AI Fear Is AI Fear Itself?

Why data will always be a precious commodity in the AI world

All the Jobs AI Is Coming for, According to a UK Study

Cryptographers Just Got Closer to Enabling Fully Private Internet Searches