Linguify, all rights reserved. 2025 ©

Main features

Our Speech Analysis API offers high-accuracy speech-to-text transcription, supporting multiple English accents and real-time processing. The pronunciation evaluation system provides phoneme-based scoring and detailed feedback on pronunciation accuracy. Vocabulary analysis detects complex words, analyzes part-of-speech, and scores vocabulary complexity. Test score estimation simulates IELTS and PTE speaking scores with a weighted scoring system for comprehensive feedback.

box icon

Overview

The Speech Analysis API enables speech-to-text transcription, pronunciation evaluation, vocabulary analysis, and test score estimation for IELTS and PTE. It helps users assess spoken English proficiency with high accuracy.

box icon

Authentication & Setup

To use the API, clone the repository and install dependencies:

git clone [repository-url]
cd speech-analysis-api pip
install -r requirements.txt
box icon

Core Features

Speech-to-Text Transcription

– Uses Google Speech API for real-time transcription.

Pronunciation Evaluation

- Analyzes phoneme similarity and word accuracy.

Vocabulary Analysis

– Detects rare words and assigns complexity scores.

Test Score Estimation

– Simulates IELTS (0-9) and PTE (0-100) speaking scores.

box icon

API Endpoints

/transcript – Converts speech to text.

/evaluate – Assesses pronunciation and vocabulary.

Response Format: JSON output with scores and feedback.

box icon

Usage & Implementation

import requests
files = {'audio': open('audio.wav', 'rb')}
response = requests.post("http://localhost:5000/transcript", files=files)
print(response.json())

Examples for Java, Node.js, and cURL are also available.

box icon

Pricing & Licensing

Basic: $49/month – 1000 API calls.

Pro: $149/month – 5000 API calls, priority support.

Pro: $149/month – 5000 API calls, priority support.

Enterprise: Custom pricing, unlimited calls.

Download source

1. Repository Access

Clone the repository and navigate to the project folder:


                      git clone [repository-url]
                      cd speech-analysis-api
                    

2. Dependencies Installation

Install the required dependencies using pip:


                      pip install -r requirements.txt
                    

3. Required Python Packages

Ensure the following Python packages are installed:


                      pip install flask==2.0.1 speech_recognition==3.8.1 pronouncing==0.2.0 nltk==3.6.3
                    

4. NLTK Data Installation

Download the necessary NLTK data:


                      import nltk
                      nltk.download('words')
                      nltk.download('cmudict')
                      nltk.download('averaged_perceptron_tagger')
                    

File structure

Let's talk about what's inside the package.

Docs file hierarchy
  • Expand all
  • Collapse all
  • speech-analysis-api/ -
    • app.py - # Main application file
    • requirements.txt - # Python dependencies
    • README.md - # Documentation
  • config -
    • default.py - # Default configuration
    • production.py - # Production configuration
  • tests/ -
    • test_transcription.py -
    • test_evaluation.py -
    • test_scoring.py -
  • static/ - # Static files
    • samples/ - # Sample audio files
  • docs/ - quidem adipisci, laudantium, inventore ea totam vel temporibus labor
    • custom.less - # Additional documentation
    • API.md -
  • DEPLOYMENT.md -

Application structure

Within a simple schema.

The core modules of the application include Speech Recognition, which handles audio files, integrates with Google Speech API, supports WAV format, and includes error handling. The Pronunciation Evaluation module analyzes phonemes, calculates pronunciation scores, and provides feedback. The Vocabulary Analysis module uses NLTK to assess word complexity, detect rare words, and score based on length and part-of-speech. The Scoring module estimates IELTS and PTE scores, aggregates results, and generates performance feedback. The API includes two main endpoints: /transcript, which converts audio to text, and /evaluate, which evaluates pronunciation and vocabulary for one or more files.

Getting started

System Requirements

Python 3.6+
4GB RAM minimum
1GB free disk space
Internet connection
Compatible operating system (Windows/Linux/macOS)
copy
# Create virtual environment
					python -m venv venv
					source venv/bin/activate # Linux/Mac
					venv\Scripts\activate	# Windows
					
					# Install dependencies
					pip install -r requirements.txt
					
					# Download NLTK data
					python -m nltk.downloader all
					
					# Configure environment
					cp config/default.py config/local.py
					 
					# Edit config/local.py with your settings
					
					# Run application
					python app.py
					

API Implementation Examples

Python Implementation
copy

						import requests
						class SpeechAnalysisAPI:
						def 	init	(self, base_url="http://localhost:5000"): 
							self.base_url = base_url
						def transcribe(self, audio_file_path):
						"""
						Transcribe a single audio file 
						"""
							with open(audio_file_path, 'rb') as audio_file: 
							files = {'audio': audio_file}
							response = requests.post(f"{self.base_url}/ transcript", files=files)
							return response.json()
						
						def evaluate(self, audio_file_paths):
							"""
							Evaluate one or multiple audio files 
							"""
							files = []
							for path in audio_file_paths: files.append(('audio', open(path, 'rb')))
						
							response = requests.post(f"{self.base_url}/evaluate", files=files)
							return response.json()
						
						# Usage example
						api = SpeechAnalysisAPI()
						 
						# Transcribe single file
						result = api.transcribe('path/to/audio.wav') print("Transcription:", result['transcription'])
						
						# Evaluate multiple files
						results = api.evaluate(['audio1.wav', 'audio2.wav'])
						for result in results['results']:
						print(f"IELTS Score: {result['ielts_score']}") print(f"PTE Score: {result['pte_score']}")					  
  
Java Implementation
copy

					import java.io.File; 
					import java.nio.file.Files; 
					import okhttp3.*;
					public class SpeechAnalysisAPI { 
						private final String baseUrl; 
						private final OkHttpClient client;

						public SpeechAnalysisAPI(String baseUrl) {
							this.baseUrl = baseUrl;
							this.client = new OkHttpClient();
						}

						public String transcribe(String audioFilePath) throws IOException {
							File audioFile = new File(audioFilePath); RequestBody  requestBody = new MultipartBody.Builder()
								.setType(MultipartBody.FORM)
								.addFormDataPart("audio", audioFile.getName(), RequestBody.create(MediaType.parse("audio/wav"), audioFile))
								.build();

							Request request = new Request.Builder()
								.url(baseUrl + "/transcript")
								.post(requestBody)
								.build();
							try (Response response = 
							client.newCall(request).execute()) {
								return response.body().string();
							}
						}
					public String evaluate(String[] audioFilePaths) throws IOException { 		
						MultipartBody.Builder builder = new MultipartBody.Builder()
							.setType(MultipartBody.FORM);
						for (String path : audioFilePaths) { 
							File audioFile = new File(path);
							builder.addFormDataPart("audio", audioFile.getName(), RequestBody.create(MediaType.parse("audio/wav"),
							audioFile));
						}
					Request request = new Request.Builder()
					.url(baseUrl + "/evaluate")
					.post(builder.build())
					.build();
				try (Response response = client.newCall(request).execute()) {
					return response.body().string();
				}
			}
			// Usage example
			public static void main(String[] args) { 
				SpeechAnalysisAPI api = new SpeechAnalysisAPI("http:// localhost:5000");
			try {
			// Transcribe single file
			String transcription = api.transcribe("audio.wav"); System.out.println("Transcription: " + transcription);
			// Evaluate multiple files
			String[] files = {"audio1.wav", "audio2.wav"}; 
			String evaluation = api.evaluate(files); 
			System.out.println("Evaluation: " + evaluation);
		}
		catch (IOException e) { 
			e.printStackTrace();
		}
	}
}				

Node.js Implementation
copy

const axios = require('axios');
const FormData = require('form-data');
const fs = require('fs');

class SpeechAnalysisAPI {
  constructor(baseUrl = 'http://localhost:5000') {
    this.baseUrl = baseUrl;
  }

  async transcribe(audioFilePath) {
    try {
      const formData = new FormData();
      formData.append('audio', fs.createReadStream(audioFilePath));

      const response = await axios.post(`${this.baseUrl}/transcript`, formData, {
        headers: formData.getHeaders(),
      });

      return response.data;
    } catch (error) {
      throw new Error(`Transcription failed: ${error.message}`);
    }
  }

  async evaluate(audioFilePaths) {
    try {
      const formData = new FormData();

      audioFilePaths.forEach((path) => {
        formData.append('audio', fs.createReadStream(path));
      });

      const response = await axios.post(`${this.baseUrl}/evaluate`, formData, {
        headers: formData.getHeaders(),
      });

      return response.data;
    } catch (error) {
      throw new Error(`Evaluation failed: ${error.message}`);
    }
  }
}

// Usage example
async function main() {
  const api = new SpeechAnalysisAPI();

  try {
    // Transcribe single file
    const transcription = await api.transcribe('audio.wav');
    console.log('Transcription:', transcription);

    // Evaluate multiple files
    const evaluation = await api.evaluate(['audio1.wav', 'audio2.wav']);
    console.log('Evaluation:', evaluation);
  } catch (error) {
    console.error('Error:', error.message);
  }
}
main();			
  
  

Browser support

Specifically, we support the latest versions of the following browsers and platforms. On Windows, we support Internet Explorer 9+. More specific support information is provided below.

  • Safari

  • Opera

  • FireFox

  • IE 9+

FAQ

Begin typing your question. If we don't have an answer for it in our FAQ, please leave us a message on our contact page.

  • Why only WAV file support? ?

    WAV files provide uncompressed audio data, ensuring the highest quality for speech recognition. Support for other formats (MP3, M4A) is planned for future releases.
  • How is the IELTS score calculated?

    The IELTS score is calculated using a weighted combination of pronunciation accuracy (60%) and vocabulary complexity (40%). The raw scores are normalized to the IELTS 9- band scale.
  • What affects the vocabulary score??

    The vocabulary score considers: - Word rarity (using NLTK corpus) - Word length (longer words score higher) - Word complexity - Usage of academic/advanced vocabulary
  • How accurate is the speech recognition?

    The speech recognition accuracy depends on: - Audio quality (background noise, clarity) - Speaker’s pronunciation - Speaking speed - Microphone quality Average accuracy is around 95% for clear audio.
  • Can I use this for languages other than English? ?

    Currently, the system is optimized for English only. Multi-language support is planned for future releases.