Secure transcription with Whisper
Secure Transcription. What to do if you don't have Copilot or Teams Premium
Knowledge is the currency today. Dozens, if not hundreds, of meetings are held every day in Our Company on the Microsoft Teams platform. There are other sources as well. Audio recordings. Phone recordings. Video and voice itself. Hours of recordings. Gigabytes of data. Words spoken, decisions made, key arrangements with clients. Internal arrangements. Often this knowledge is fleeting. Stored only in a video file that no one returns to because there is not enough time to listen to it again. Even recorded training sessions are too extensive to return to. A mature Organization knows that there is value hidden in these recordings that needs to be extracted. It can bring something up to speed. However, a challenge arises. What if we don't yet have Microsoft Copilot Assistant implemented? How about additionally micro mulling the topic of using machine learning or LLM, in lieu of buying a license? What if security policies prohibit us from sending sensitive recordings to public language models available on the web? Maybe the recordings are too large or you have too many?
The solution is an engineering approach. Local. Secure. Scalable. Simple, by the way. Free of charge.
The dilemma of data security in the AI era
Aware Management and security departments are rightly concerned about Shadow IT. Employees, wanting to make their work easier, often copy meeting content or upload files to free tools on the Internet. That's a risk. It's a loss of control. Our customers' data, contract information, development strategies - all this must be protected. Physically and logically.
The requirements are clear. We need a transcript. We need a summary. But we also need 100 percent assurance that the bytes of data will not leave Our computer. That they will be processed on Our hardware. Under Our control.
Technology in the Service of Business: OpenAI Whisper
The answer to this challenge is to use open models, such as OpenAI Whisper, running locally (Windows 11 or even 10 will do). This is a technology that allows speech-to-text conversion with unprecedented precision, supporting many languages, including Polish. It is not a „boxed” solution, but the difficulty is not great if you read this article. Potentially, too, once configured it can be easily recalled.
The use of this model does not require a public cloud. It only requires adequate computing power. Processor. A graphics card (doesn't even have to be nvidia). RAM. We run the environment at Home. On a laptop or on a dedicated server inside the Organization.
We will base the process on two pillars: Miniconda (for environmental management) and OpenAI Whisper (Transcription engine). Both reliable and legitimate. This approach gives us flexibility.
Below I present a proven installation path for Windows systems. So let's move from theory to practice.
- Foundation. Miniconda Installation
I often encounter the question: why not install Python directly? The answer is simple. Dependency management. There will be more and more AI projects. If we lump everything together, we will quickly run into compatibility problems (the so-called „dependency hell”). Miniconda allows us to create isolated „containers” for each project separately.
It's cleanliness. It's safety. It's professionalism. And here's the link (Installing Miniconda - Anaconda)
Installation steps:
- Download the installer Miniconda for Windows (64-bit version) from the official Conda documentation site.
- Run the installer. I recommend installing for „Just Me” (for the current user only) to avoid problems with administrator privileges when managing packages later.
- Once the installation is complete, preferably perhaps not for those who prefer a graphical GUI, via the command line, navigate to the miniconda3 folder
- We do not work on the „living organism” of the main system. We are creating a dedicated environment. Let's call it working name myenv2 (or whatever). That way, if anything goes wrong, we simply delete the environment and your system remains intact.
- Type the following command in the console to create an environment with a specific, stable version of Python (3.10 for the greatest compatibility):
conda create -name myenv2 python=3.10.
6. preparing the workspace
Confirm the installation by typing y. Once the process is complete, we need to enter this „virtual room”. Activate the environment:
conda activate myenv2
You will notice a change. You will see (myenv2 or your name) in front of the prompt. You are now in a separate, secure area. Command line view above
7 Engine and tools. Installation of FFmpeg and Whisper
The AI model alone is not enough. To process audio/video files, we need codecs and a tool to handle media streams. The industry standard here is FFmpeg. Its manual installation in Windows can sometimes be cumbersome for less technical workers. With Conda, we'll do it with a single command that downloads and configures everything automatically.
Type:
conda install -c conda-forge ffmpeg
Once we have the multimedia foundation, we install the actual Whisper model directly from the Python repository (pip).
Type:
pip install -U openai-whisper
The system will download the necessary libraries. Torch. Tiktoken. Whisper. This may take a while depending on your Organization's Internet connection. You will certainly be surprised at the number of screens that will fly by.
8 Verification and initial start-up
The infrastructure is ready. Now it's time for a test. Prepare an audio file, e.g. meeting.mp3 or a video file (in my case it's mov) and put it in an easily accessible folder (e.g. on your C drive in the Data directory, I put it in the miniconda directory which may not be very BHP, but it's the fastest when it comes to typing the path).
In the console, go to this directory and run the transcription. We will use the medium model - it represents the golden mean between speed and accuracy for Polish (turbo is the default).
Bash
whisper test-video.mov -model medium -language en
What is happening now?
Your computer becomes a computing server. The processor and graphics card analyze the audio spectrum. Not a byte is sent to the network.
After a few minutes/hours (depending on the length of the recording), text files will appear in the folder. Test-video.txt. test-video.srt (or preset format if you use parameters).
And there are many parameters, so if the quality is weaker or the accuracy needs to be improved, you can further paste on the parameters.
The data has been processed. Knowledge has been extracted. Security preserved.
The process is transparent:
- Downloading a recording of a meeting, training, phone recording, other recordings from Teams, Zooma where there is no transcription, video from cameras
- Extraction of the soundtrack is not required - mp3 or video such as mov
- Processing it through the Whisper model.
- Receive a text file with a full transcript of the conversation.
This is the foundation. Only having the text, we can think about further processing. About analysis. About conclusions. Safely. Locally. At Us. Sometimes you just need to read or keep kB on disk ( in place of extensive video data)
From text to knowledge. A summary without the cloud
Having a transcript, we face another challenge: how do we summarize it without sending it to ChatGPT? This is where Local Language Models (LLMs) come in, which we can host in our own infrastructure. Models such as Llama or Mistral, run with tools like Ollama or LM Studio.
These tools allow us to „feed” the model with our transcription and ask it to draw key conclusions, task lists or dates. All this happens inside Our network. Inside Our computer. There is no risk of leakage. There is only pure calculation. An article on the local model is coming soon. Having a transcription of training or other materials you can also use your Copilot license. Here the risks of working with public LLMs are less so you give him an additional source to work with. Of what? Analysis. Synthesis. An inference. A report.
This gives the employee ready notes from the meeting. He can go back to key passages. He can search for what exactly was promised to the customer. The time savings are enormous. The risk of human error - minimized.
Why think about hybrid architecture?
The solution described here is proof that it is possible to build innovation based on available resources, while maintaining full security. It is a solution that requires some technical knowledge when first deployed. 100% works. 100% free.
That's why we at ISCG are taking a broader view. Local processing is a great step to understand the potential of AI. To learn how valuable our data is. To prepare for the implementation of Enterprise class solutions.
Microsoft Copilot is the next level of this evolution. It's a tool that does it all automatically, built into the ecosystem we work with every day. But implementing Copilot requires preparation. It requires „cleaning up” permissions. It requires data classification. It requires governance.
If your Company is not yet ready for a full cloud deployment, on-premises solutions are a bridge. They allow you to build a culture of working with AI. They allow you to accustom your Team to the new reality.
Partnership in the digital age
Implementing such solutions is not just installing software. It's changing business processes. It's a change in thinking about data. It requires experience. It requires a Partner who understands both the code and the business.
As ISCG, we help Companies walk this path. From simple local scripts to advanced Microsoft Copilot implementations with full security policies. We help you choose the right technology tools. Visualization. Data mining.
Don't leave your data fallow. Don't risk their security in public tools. Build an environment with us that will support your business.
The technology is available. The models are available. The knowledge is at your fingertips.
It's up to Us how we use it. Whether we let the meetings be just wasted time or turn them into a knowledge base.
Safe. Locally. Consciously. Profitably.
If you find the topic interesting, want to consult more, are concerned about the availability of internal data, lack of policies and the environment needs to be audited below you have a link to schedule a meeting. Feel free to contact us. Let's talk about your security in the age of AI. Let's talk about Copilot. Let's talk about the future of your Organization.
