Linguify API Docs

Main features

Our Speech Analysis API offers high-accuracy speech-to-text transcription, supporting multiple English accents and real-time processing. The pronunciation evaluation system provides phoneme-based scoring and detailed feedback on pronunciation accuracy. Vocabulary analysis detects complex words, analyzes part-of-speech, and scores vocabulary complexity. Test score estimation simulates IELTS and PTE speaking scores with a weighted scoring system for comprehensive feedback.

Overview

The Speech Analysis API enables speech-to-text transcription, pronunciation evaluation, vocabulary analysis, and test score estimation for IELTS and PTE. It helps users assess spoken English proficiency with high accuracy.

Authentication & Setup

To use the API, clone the repository and install dependencies:

git clone [repository-url]

cd speech-analysis-api pip

install -r requirements.txt

Core Features

Speech-to-Text Transcription

– Uses Google Speech API for real-time transcription.

Pronunciation Evaluation

- Analyzes phoneme similarity and word accuracy.

Vocabulary Analysis

– Detects rare words and assigns complexity scores.

Test Score Estimation

– Simulates IELTS (0-9) and PTE (0-100) speaking scores.

API Endpoints

/transcript – Converts speech to text.

/evaluate – Assesses pronunciation and vocabulary.

Response Format: JSON output with scores and feedback.

Usage & Implementation

import requests

files = {'audio': open('audio.wav', 'rb')}

response = requests.post("http://localhost:5000/transcript", files=files)

print(response.json())

Examples for Java, Node.js, and cURL are also available.

Pricing & Licensing

Basic: $49/month – 1000 API calls.

Pro: $149/month – 5000 API calls, priority support.

Enterprise: Custom pricing, unlimited calls.

Download source

1. Repository Access

Clone the repository and navigate to the project folder:


                      git clone [repository-url]
                      cd speech-analysis-api

2. Dependencies Installation

Install the required dependencies using pip:


                      pip install -r requirements.txt

3. Required Python Packages

Ensure the following Python packages are installed:


                      pip install flask==2.0.1 speech_recognition==3.8.1 pronouncing==0.2.0 nltk==3.6.3

4. NLTK Data Installation

Download the necessary NLTK data:


                      import nltk
                      nltk.download('words')
                      nltk.download('cmudict')
                      nltk.download('averaged_perceptron_tagger')

File structure

Let's talk about what's inside the package.

Docs file hierarchy

Expand all
Collapse all

speech-analysis-api/ -
- app.py - # Main application file
- requirements.txt - # Python dependencies
- README.md - # Documentation
config -
- default.py - # Default configuration
- production.py - # Production configuration
tests/ -
- test_transcription.py -
- test_evaluation.py -
- test_scoring.py -
static/ - # Static files
- samples/ - # Sample audio files
docs/ - quidem adipisci, laudantium, inventore ea totam vel temporibus labor
- custom.less - # Additional documentation
- API.md -
DEPLOYMENT.md -

Application structure

Within a simple schema.

The core modules of the application include Speech Recognition, which handles audio files, integrates with Google Speech API, supports WAV format, and includes error handling. The Pronunciation Evaluation module analyzes phonemes, calculates pronunciation scores, and provides feedback. The Vocabulary Analysis module uses NLTK to assess word complexity, detect rare words, and score based on length and part-of-speech. The Scoring module estimates IELTS and PTE scores, aggregates results, and generates performance feedback. The API includes two main endpoints: /transcript, which converts audio to text, and /evaluate, which evaluates pronunciation and vocabulary for one or more files.

Getting started

System Requirements

Python 3.6+

4GB RAM minimum

1GB free disk space

Internet connection

Compatible operating system (Windows/Linux/macOS)

copy

# Create virtual environment
					python -m venv venv
					source venv/bin/activate # Linux/Mac
					venv\Scripts\activate	# Windows
					
					# Install dependencies
					pip install -r requirements.txt
					
					# Download NLTK data
					python -m nltk.downloader all
					
					# Configure environment
					cp config/default.py config/local.py
					 
					# Edit config/local.py with your settings
					
					# Run application
					python app.py

API Implementation Examples

Python Implementation

copy


						import requests
						class SpeechAnalysisAPI:
						def 	init	(self, base_url="http://localhost:5000"): 
							self.base_url = base_url
						def transcribe(self, audio_file_path):
						"""
						Transcribe a single audio file 
						"""
							with open(audio_file_path, 'rb') as audio_file: 
							files = {'audio': audio_file}
							response = requests.post(f"{self.base_url}/ transcript", files=files)
							return response.json()
						
						def evaluate(self, audio_file_paths):
							"""
							Evaluate one or multiple audio files 
							"""
							files = []
							for path in audio_file_paths: files.append(('audio', open(path, 'rb')))
						
							response = requests.post(f"{self.base_url}/evaluate", files=files)
							return response.json()
						
						# Usage example
						api = SpeechAnalysisAPI()
						 
						# Transcribe single file
						result = api.transcribe('path/to/audio.wav') print("Transcription:", result['transcription'])
						
						# Evaluate multiple files
						results = api.evaluate(['audio1.wav', 'audio2.wav'])
						for result in results['results']:
						print(f"IELTS Score: {result['ielts_score']}") print(f"PTE Score: {result['pte_score']}")

Java Implementation

copy


					import java.io.File; 
					import java.nio.file.Files; 
					import okhttp3.*;
					public class SpeechAnalysisAPI { 
						private final String baseUrl; 
						private final OkHttpClient client;

						public SpeechAnalysisAPI(String baseUrl) {
							this.baseUrl = baseUrl;
							this.client = new OkHttpClient();
						}

						public String transcribe(String audioFilePath) throws IOException {
							File audioFile = new File(audioFilePath); RequestBody  requestBody = new MultipartBody.Builder()
								.setType(MultipartBody.FORM)
								.addFormDataPart("audio", audioFile.getName(), RequestBody.create(MediaType.parse("audio/wav"), audioFile))
								.build();

							Request request = new Request.Builder()
								.url(baseUrl + "/transcript")
								.post(requestBody)
								.build();
							try (Response response = 
							client.newCall(request).execute()) {
								return response.body().string();
							}
						}
					public String evaluate(String[] audioFilePaths) throws IOException { 		
						MultipartBody.Builder builder = new MultipartBody.Builder()
							.setType(MultipartBody.FORM);
						for (String path : audioFilePaths) { 
							File audioFile = new File(path);
							builder.addFormDataPart("audio", audioFile.getName(), RequestBody.create(MediaType.parse("audio/wav"),
							audioFile));
						}
					Request request = new Request.Builder()
					.url(baseUrl + "/evaluate")
					.post(builder.build())
					.build();
				try (Response response = client.newCall(request).execute()) {
					return response.body().string();
				}
			}
			// Usage example
			public static void main(String[] args) { 
				SpeechAnalysisAPI api = new SpeechAnalysisAPI("http:// localhost:5000");
			try {
			// Transcribe single file
			String transcription = api.transcribe("audio.wav"); System.out.println("Transcription: " + transcription);
			// Evaluate multiple files
			String[] files = {"audio1.wav", "audio2.wav"}; 
			String evaluation = api.evaluate(files); 
			System.out.println("Evaluation: " + evaluation);
		}
		catch (IOException e) { 
			e.printStackTrace();
		}
	}
}

Node.js Implementation

copy


const axios = require('axios');
const FormData = require('form-data');
const fs = require('fs');

class SpeechAnalysisAPI {
  constructor(baseUrl = 'http://localhost:5000') {
    this.baseUrl = baseUrl;
  }

  async transcribe(audioFilePath) {
    try {
      const formData = new FormData();
      formData.append('audio', fs.createReadStream(audioFilePath));

      const response = await axios.post(`${this.baseUrl}/transcript`, formData, {
        headers: formData.getHeaders(),
      });

      return response.data;
    } catch (error) {
      throw new Error(`Transcription failed: ${error.message}`);
    }
  }

  async evaluate(audioFilePaths) {
    try {
      const formData = new FormData();

      audioFilePaths.forEach((path) => {
        formData.append('audio', fs.createReadStream(path));
      });

      const response = await axios.post(`${this.baseUrl}/evaluate`, formData, {
        headers: formData.getHeaders(),
      });

      return response.data;
    } catch (error) {
      throw new Error(`Evaluation failed: ${error.message}`);
    }
  }
}

// Usage example
async function main() {
  const api = new SpeechAnalysisAPI();

  try {
    // Transcribe single file
    const transcription = await api.transcribe('audio.wav');
    console.log('Transcription:', transcription);

    // Evaluate multiple files
    const evaluation = await api.evaluate(['audio1.wav', 'audio2.wav']);
    console.log('Evaluation:', evaluation);
  } catch (error) {
    console.error('Error:', error.message);
  }
}
main();

Browser support

Specifically, we support the latest versions of the following browsers and platforms. On Windows, we support Internet Explorer 9+. More specific support information is provided below.

Chrome
Safari
Opera
FireFox
IE 9+

FAQ

Begin typing your question. If we don't have an answer for it in our FAQ, please leave us a message on our contact page.

Why only WAV file support? ?

WAV files provide uncompressed audio data, ensuring the highest quality for speech recognition. Support for other formats (MP3, M4A) is planned for future releases.
How is the IELTS score calculated?

The IELTS score is calculated using a weighted combination of pronunciation accuracy (60%) and vocabulary complexity (40%). The raw scores are normalized to the IELTS 9- band scale.
What affects the vocabulary score??

The vocabulary score considers: - Word rarity (using NLTK corpus) - Word length (longer words score higher) - Word complexity - Usage of academic/advanced vocabulary
How accurate is the speech recognition?

The speech recognition accuracy depends on: - Audio quality (background noise, clarity) - Speaker’s pronunciation - Speaking speed - Microphone quality Average accuracy is around 95% for clear audio.
Can I use this for languages other than English? ?

Currently, the system is optimized for English only. Multi-language support is planned for future releases.

Give us a little love

Your support helps us improve and expand our Speech Analysis API. Whether it's a one-time donation or ongoing sponsorship, every contribution drives innovation and enhances user experience. Join us in shaping the future of speech technology!

Main features

Overview

Authentication & Setup

Core Features

API Endpoints

Usage & Implementation

Pricing & Licensing

Download source

1. Repository Access

2. Dependencies Installation

3. Required Python Packages

4. NLTK Data Installation

File structure

Let's talk about what's inside the package.

Application structure

Within a simple schema.

Getting started

System Requirements

API Implementation Examples

Python Implementation

Java Implementation

Node.js Implementation

Browser support

Specifically, we support the latest versions of the following browsers and platforms. On Windows, we support Internet Explorer 9+. More specific support information is provided below.

Chrome

Safari

Opera

FireFox

IE 9+

FAQ

Begin typing your question. If we don't have an answer for it in our FAQ, please leave us a message on our contact page.

Why only WAV file support? ?

How is the IELTS score calculated?

What affects the vocabulary score??

How accurate is the speech recognition?

Can I use this for languages other than English? ?

Give us a little love

Your support helps us improve and expand our Speech Analysis API. Whether it's a one-time donation or ongoing sponsorship, every contribution drives innovation and enhances user experience. Join us in shaping the future of speech technology!