22 Feb 2022

GLUE - Cognitive Services Accelerator

GLUE is a lightweight, Python-based collection of scripts to support you at succeeding with speech and text us

đź”—GLUE Repositoryđź”—

Accelerator Description

GLUE is a lightweight, Python-based collection of scripts to support you at succeeding with speech and text use-cases based on Microsoft Azure Cognitive Services. It not only allows you to batch-process data, but also glues together the services of your choice in one place and ensures an end-to-end view on the training and testing process.

Prerequisites

    Before getting your hands on the toolkits, make sure your local computer is equipped with the following frameworks and base packages:
  • Python (required, Version >=3.8 is recommended).
  • VSCode (recommended), but you can also run the scripts using PowerShell, Bash etc.
  • Stable connection for installing your environment and scoring the files.
  • ffmpeg for audio file conversion (only for TTS use cases).
    • If you are using Windows, download it from here and see the description here.
    • In case you are using Linux, you can install it via command line using a package manager, such as apt-get install ffmpeg.

Modules

    GLUE consists of multiple modules, which either can be executed separately or ran as a central pipeline:
  • Batch-transcribe audio files to text transcripts using Microsoft Speech to Text Service (STT).
  • Batch-synthesize text data using Microsoft Text to Speech Service (TTS).
  • Batch-evaluate reference transcriptions and recognitions.
  • Batch-score text strings on an existing, pre-trained Microsoft LUIS-model.

Use Case

  • Automatized generation of synthetic speech-model training data.
  • Batch-transcription of audio files and evaluation given an existing reference transcript.
  • Scoring of STT-transcriptions on an existing LUIS-model.

Accelerator Components

    This section describes the single components of GLUE, which can either be ran autonomously or, ideally, using the central orchestrator.

    glue.py
  • Central application orchestrator of the toolkit.
  • Glues together the single modules in one place as needed.
  • Reads input files and writes output files.

  • stt.py
  • Batch-transcription of audio files using Microsoft Speech to Text API.
  • Allows baseline models as well as custom endpoints.
  • Functionality is limited to the languages and locales listed on the language support page.

  • tts.py
  • Batch-synthetization of text strings using Microsoft Text to Speech API.
  • Supports Speech Synthesis Markup Language (SSML) to fine-tune and customize the pronunciation, as described in the documentation.
  • Retrieves high-quality audio file from the API and converts it to the Microsoft speech format as well as a version underlaid by the noise of a telephone line.
  • Functionality is limited to the languages and fonts listed on the language support page.
  • Make sure the voice of your choice is available in the respective Azure region (see documentation).

  • luis.py
  • Batch-scoring of intent-text combinations using an existing LUIS model.
  • Configureable scoring treshold, if predictions only want to be accepted given a certain confidence score returned by the API.
  • Writes scoring report as comma-separated file.
  • Returns classification report and confusion matrix based on scikit-learn.

  • evaluate.py
  • Evaluation of transcription results by comparing them with reference transcripts.
  • Calculates metrics such as Word Error Rate (WER), Sentence Error Rate (SER), Word Recognition Rate (WRR).
  • Implementation based on github.com/belambert/asr-evaluation.
  • See some hints on how to improve your Custom Speech accuracy.

  • params.py
  • Collects API and configuration parameters from the command line (ArgumentParser) and the config.ini.

  • helper.py
  • Collection of helper functions which do not have a purpose on their own, rather complementing the orchestrator and keeping the code neat and clean.
  • In case there is a need for custom components, we recommend to add them to this module.
  • Creates folder for every run using the naming convention YYYYMMDD-[unique ID].

Access The Repository Here:

đź”—GLUE Repositoryđź”—