GDPR Compliant Transcription with AI

Aerial top-down view of an island covered with trees and rocks in the middle of a large body of water with the sun to the left and the island casting a shadow as the water gets darker to the right.
Aerial top-down view of an island covered with trees and rocks in the middle of a large body of water with the sun to the left and the island casting a shadow as the water gets darker to the right.

TLDR;

Well that's silly not to read this article, since this is a how-to on setting up a GDPR compliant method of transcribing audio with the help of AI. 🙃

Links to references at the end of the article. 👇

Intro

When we want to process transcriptions in a GDPR compliant way, there are two great options for Mac users, both leveraging Whisper AI from Open AI famous for its ChatGPT.

"Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web. We show that the use of such a large and diverse dataset leads to improved robustness to accents, background noise and technical language. Moreover, it enables transcription in multiple languages, as well as translation from those languages into English. We are open-sourcing models and inference code to serve as a foundation for building useful applications and for further research on robust speech processing."

How is this GDPR compliant you might ask, you will process and transcribe locally on your machine, meaning none of the data is sent anywhere. However, it requires us to download the entire model onto our computers.

The first option is using MacWhisper. The free tier plan is pretty accurate, minimal setup and can record right within the app. Check the system requirements before installing.

The second option can be a bit more intimidating and contains a few more steps, but gets more accurate output. The reason for this is MacWhisper only provides the small model with the free tier, but using the second method gets us the large model for free. Let's focus on the second option.

If you stick with me to the end, I will show you how to make things easier with shortcuts that we can live with. Also note, that for both options the translate feature requires data to be sent, then you have to check if that is compliant with your privacy requirements. As an alternative, you can just run the transcription multiple times to different target languages.

Become a hacker 🥷

Just kidding, but you will feel like one after we get through these steps. First let's open up the Terminal application.

Setup

Xcode. We need to first make sure Xcode is setup, Apple's developer tools, which may already be installed. Enter the following into the terminal command line.

sudo xcode-select --install

Homebrew. This is a package manager for various libraries.

sudo /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

You may find that you need to add Homebrew to your PATH, refer to the documentation here.

Whisper. Make a folder where you want to install Whisper, and navigate to it . For example, a folder named "Tools".

cd Tools

Then using Git, clone the repository code. Installing Xcode will provide you the Git package.

git clone https://github.com/ggerganov/whisper.cpp.git

Navigate into the whisper repository directory.

cd whisper.cpp

Download the model

bash ./models/download-ggml-model.sh large

Build the model from the downloaded assets

make large

ffmpeg. If the tool you use, such as QuickTime to record saves files as .m4a or similar, we need to be able to convert to .wav in order for Whisper to understand it.

brew install ffmpeg

Now you have everything you need to get started.

Let's do it!

Convert audio to .wav

We are going to first create a folder named "data" inside the whisper.cpp directory.

You can do this with the Finder or using the terminal with the following command:

mkdir data

*It will be useful to make a shortcut to this folder in Finder.

To convert all audio files to .wav, make sure you are in the /whisper.cpp directory in the terminal and run:

mkdir -p output;
for i in ./data/*; do
  f=$(echo "${i##*/}");
  filename=$(echo $f | cut  -d'.' -f 1);
  ffmpeg -i "$i" -acodec pcm_s16le -ac 1 -ar 16000 "./output/$(echo $filename).wav";
done

At this point you should see a new folder called "output". In that folder you will see the .wav version of your original audio file. I recommend adding this as a shortcut in Finder as well.

This is a large piece of code to either remember or to keep somewhere to copy and paste in. At the end of this article I will show you a way to make a shortcut for this. Also note that the command converts ALL audio files in the data folder, so you want to move or delete converted files.

Transcribe

We need to create a .vtt file. So back in the terminal and making sure we are in the /whisper.cpp directory:

for i in output/*.wav; do ./main -m ./models/ggml-large.bin -l no --output-vtt -f "$i"; done

This will give a Norwegian output, however if you want it in English, just change no to en.

And now you should see both .wav and .vtt files in the /output folder.

This is also a hard-to-remember and clunky to copy-paste piece of code so we will also create a shortcut for this at the end. Note that similar to the convert command, this will transcribe ALL .wav files in output, so you want to move or delete audio files that already have transcriptions.

Mac Finder window opened to the Whisper.cpp folder. A red rectangle around the data and output folders with arrows pointing to the Favourites sidebar where data and output are also included as shortcuts.

Uploading to a UI for navigation

HappyScribe free subtitle editor graphical user interface with subtitle and timeline on the left and an audio player on the right with controls.

Now that we have an audio file and accompanying subtitles (vtt) file, we want to upload, but not really upload, to a graphical user interface to navigate, search and edit the files. HappyScribe has a great, free visual editor here. Best of all, the data only gets sent to your browser and nowhere else. Nerds, feel free to open the network tab of the browser and watch the traffic (hint: there isn't any except between your Mac and browser).

Upload the text

Click on New subtitles and upload the .vtt

Upload the audio

Then click on Select audio file and upload the .wav file.

HappyScribe free subtitle editor find and replace dialog to search through text from an uploaded .vtt file. Contains an input field for finding a text and options for case-sensitivity and word matching and a replace input field with an option to preserve the capitalisation of the first letter followed by a preview of the changes.

Use CMD + f to do some cool stuff 🤯

Let's make life easier 🧘‍♀️

We don't want to copy and paste those lines of code each time we want to transcribe do we?! Nope. So we can actually create our own shortcuts (functions) to use in the command line.

To setup a function, we open up the configuration file called .zshrc. If you have VS Code installed, then use the codecommand, otherwise use nano.

code ~/.zshrc

or

nano ~/.zshrc

For many, this will be an empty screen. Otherwise if you have made aliases for other commands, you might see them here as well.

Copy and paste the following into the terminal.

function toWav () {
	cwd=$(pwd);
	mkdir -p output;
	for i in ${cwd}/data/*;
	do f=$(echo "${i##*/}");
	filename=$(echo $f | cut -d'.' -f 1);
	ffmpeg -i "$i" -acodec pcm_s16le -ac 1 -ar 16000 "${cwd}/output/$(echo $filename).wav";
	done
}

function trx () {
	cwd=$(pwd);
	for i in ${cwd}/output/*;
	do ${cwd}/main -m ${cwd}/models/ggml-large.bin -l $1 --output-vtt -f "$i";
	done
}
Screenshot of the Terminal app with Nano opened on .zshrc and two functions defined for conversion to wav file and transcription to vtt

Next save your changes, if you are in nanothen it is control + x then y.

Now we need to update the terminal.

source ~/.zshrc

To test it out:

Navigate the terminal to the whisper.cpp directory, something like this:

Make sure you have an audio file in whisper.cpp/data.

To convert to .wav format: toWav is all you need to enter in the terminal.

To see if it worked, you can check the /output folder for the converted audio.

Next, to transcribe trx <lang> and you will get the transcription in the target language in the same output directory, where <lang> can be en for English or no for Norwegian. So for example if I wanted to transcribe the audio to English, it would look like this: trx en.

Here's everything in a nutshell from beginning to end:

  1. Record audio, place it in Tools/whisper.cpp/data.

  2. Open the terminal and navigate to the whisper folder.

  3. Convert the audio file if it is not already a .wav file.

  4. Transcribe the file.

cd Tools/whisper.cpp
toWave
trx en

How do I update the model?

To keep the model updated, just repeat the steps previously used to install and make the model.

And voilà, you are ready to transcribe audio in a GDPR compliant method using AI for more effective transcriptions. There is a lot more you can do with Whisper AI so have a look at the documentation.

Thanks for reading! ✌️

References

Big thanks to my main source of information: Ida Aalen

WhisperAI

HappyScribe


© 2023 Stephen Chiang