GLUE - Cognitive Services Accelerator
GLUE is a lightweight, Python-based collection of scripts to support you at succeeding with speech and text us
Accelerator Description
GLUE is a lightweight, Python-based collection of scripts to support you at succeeding with speech and text use-cases based on Microsoft Azure Cognitive Services. It not only allows you to batch-process data, but also glues together the services of your choice in one place and ensures an end-to-end view on the training and testing process.
Prerequisites
-
Before getting your hands on the toolkits, make sure your local computer is equipped with the following frameworks and base packages:
- Python (required, Version >=3.8 is recommended).
- VSCode (recommended), but you can also run the scripts using PowerShell, Bash etc.
- Stable connection for installing your environment and scoring the files.
- ffmpeg for audio file conversion (only for TTS use cases).
- If you are using Windows, download it from here and see the description here.
- In case you are using Linux, you can install it via command line using a package manager, such as apt-get install ffmpeg.
Modules
-
GLUE consists of multiple modules, which either can be executed separately or ran as a central pipeline:
- Batch-transcribe audio files to text transcripts using Microsoft Speech to Text Service (STT).
- Batch-synthesize text data using Microsoft Text to Speech Service (TTS).
- Batch-evaluate reference transcriptions and recognitions.
- Batch-score text strings on an existing, pre-trained Microsoft LUIS-model.
Use Case
- Automatized generation of synthetic speech-model training data.
- Batch-transcription of audio files and evaluation given an existing reference transcript.
- Scoring of STT-transcriptions on an existing LUIS-model.
Accelerator Components
- Central application orchestrator of the toolkit.
- Glues together the single modules in one place as needed.
- Reads input files and writes output files.
- Batch-transcription of audio files using Microsoft Speech to Text API.
- Allows baseline models as well as custom endpoints.
- Functionality is limited to the languages and locales listed on the language support page.
- Batch-synthetization of text strings using Microsoft Text to Speech API.
- Supports Speech Synthesis Markup Language (SSML) to fine-tune and customize the pronunciation, as described in the documentation.
- Retrieves high-quality audio file from the API and converts it to the Microsoft speech format as well as a version underlaid by the noise of a telephone line.
- Functionality is limited to the languages and fonts listed on the language support page.
- Make sure the voice of your choice is available in the respective Azure region (see documentation).
- Batch-scoring of intent-text combinations using an existing LUIS model.
- See the following quickstart documentation in case you need some inspiration for your first LUIS-app.
- Configureable scoring treshold, if predictions only want to be accepted given a certain confidence score returned by the API.
- Writes scoring report as comma-separated file.
- Returns classification report and confusion matrix based on scikit-learn.
- Evaluation of transcription results by comparing them with reference transcripts.
- Calculates metrics such as Word Error Rate (WER), Sentence Error Rate (SER), Word Recognition Rate (WRR).
- Implementation based on github.com/belambert/asr-evaluation.
- See some hints on how to improve your Custom Speech accuracy.
- Collects API and configuration parameters from the command line (ArgumentParser) and the config.ini.
- Collection of helper functions which do not have a purpose on their own, rather complementing the orchestrator and keeping the code neat and clean.
- In case there is a need for custom components, we recommend to add them to this module.
- Creates folder for every run using the naming convention YYYYMMDD-[unique ID].
This section describes the single components of GLUE, which can either be ran autonomously or, ideally, using the central orchestrator.
glue.pystt.py
tts.py
luis.py
evaluate.py
params.py
helper.py