Main features
Our Speech Analysis API offers high-accuracy speech-to-text transcription, supporting multiple English accents and real-time processing. The pronunciation evaluation system provides phoneme-based scoring and detailed feedback on pronunciation accuracy. Vocabulary analysis detects complex words, analyzes part-of-speech, and scores vocabulary complexity. Test score estimation simulates IELTS and PTE speaking scores with a weighted scoring system for comprehensive feedback.
Download source
1. Repository Access
Clone the repository and navigate to the project folder:
git clone [repository-url]
cd speech-analysis-api
2. Dependencies Installation
Install the required dependencies using pip:
pip install -r requirements.txt
3. Required Python Packages
Ensure the following Python packages are installed:
pip install flask==2.0.1 speech_recognition==3.8.1 pronouncing==0.2.0 nltk==3.6.3
4. NLTK Data Installation
Download the necessary NLTK data:
import nltk
nltk.download('words')
nltk.download('cmudict')
nltk.download('averaged_perceptron_tagger')
-
speech-analysis-api/ -
- app.py - # Main application file
- requirements.txt - # Python dependencies
- README.md - # Documentation
-
config -
- default.py - # Default configuration
- production.py - # Production configuration
-
tests/ -
- test_transcription.py -
- test_evaluation.py -
- test_scoring.py -
-
static/ -
# Static files
- samples/ - # Sample audio files
-
docs/ -
quidem adipisci, laudantium, inventore ea totam vel
temporibus labor
- custom.less - # Additional documentation
- API.md -
- DEPLOYMENT.md -
Application structure
Within a simple schema.
The core modules of the application include Speech Recognition, which handles audio files, integrates with Google Speech API, supports WAV format, and includes error handling. The Pronunciation Evaluation module analyzes phonemes, calculates pronunciation scores, and provides feedback. The Vocabulary Analysis module uses NLTK to assess word complexity, detect rare words, and score based on length and part-of-speech. The Scoring module estimates IELTS and PTE scores, aggregates results, and generates performance feedback. The API includes two main endpoints: /transcript, which converts audio to text, and /evaluate, which evaluates pronunciation and vocabulary for one or more files.
Getting started
System Requirements
# Create virtual environment
python -m venv venv
source venv/bin/activate # Linux/Mac
venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt
# Download NLTK data
python -m nltk.downloader all
# Configure environment
cp config/default.py config/local.py
# Edit config/local.py with your settings
# Run application
python app.py
API Implementation Examples
Python Implementation
import requests
class SpeechAnalysisAPI:
def init (self, base_url="http://localhost:5000"):
self.base_url = base_url
def transcribe(self, audio_file_path):
"""
Transcribe a single audio file
"""
with open(audio_file_path, 'rb') as audio_file:
files = {'audio': audio_file}
response = requests.post(f"{self.base_url}/ transcript", files=files)
return response.json()
def evaluate(self, audio_file_paths):
"""
Evaluate one or multiple audio files
"""
files = []
for path in audio_file_paths: files.append(('audio', open(path, 'rb')))
response = requests.post(f"{self.base_url}/evaluate", files=files)
return response.json()
# Usage example
api = SpeechAnalysisAPI()
# Transcribe single file
result = api.transcribe('path/to/audio.wav') print("Transcription:", result['transcription'])
# Evaluate multiple files
results = api.evaluate(['audio1.wav', 'audio2.wav'])
for result in results['results']:
print(f"IELTS Score: {result['ielts_score']}") print(f"PTE Score: {result['pte_score']}")
Java Implementation
import java.io.File;
import java.nio.file.Files;
import okhttp3.*;
public class SpeechAnalysisAPI {
private final String baseUrl;
private final OkHttpClient client;
public SpeechAnalysisAPI(String baseUrl) {
this.baseUrl = baseUrl;
this.client = new OkHttpClient();
}
public String transcribe(String audioFilePath) throws IOException {
File audioFile = new File(audioFilePath); RequestBody requestBody = new MultipartBody.Builder()
.setType(MultipartBody.FORM)
.addFormDataPart("audio", audioFile.getName(), RequestBody.create(MediaType.parse("audio/wav"), audioFile))
.build();
Request request = new Request.Builder()
.url(baseUrl + "/transcript")
.post(requestBody)
.build();
try (Response response =
client.newCall(request).execute()) {
return response.body().string();
}
}
public String evaluate(String[] audioFilePaths) throws IOException {
MultipartBody.Builder builder = new MultipartBody.Builder()
.setType(MultipartBody.FORM);
for (String path : audioFilePaths) {
File audioFile = new File(path);
builder.addFormDataPart("audio", audioFile.getName(), RequestBody.create(MediaType.parse("audio/wav"),
audioFile));
}
Request request = new Request.Builder()
.url(baseUrl + "/evaluate")
.post(builder.build())
.build();
try (Response response = client.newCall(request).execute()) {
return response.body().string();
}
}
// Usage example
public static void main(String[] args) {
SpeechAnalysisAPI api = new SpeechAnalysisAPI("http:// localhost:5000");
try {
// Transcribe single file
String transcription = api.transcribe("audio.wav"); System.out.println("Transcription: " + transcription);
// Evaluate multiple files
String[] files = {"audio1.wav", "audio2.wav"};
String evaluation = api.evaluate(files);
System.out.println("Evaluation: " + evaluation);
}
catch (IOException e) {
e.printStackTrace();
}
}
}
Node.js Implementation
const axios = require('axios');
const FormData = require('form-data');
const fs = require('fs');
class SpeechAnalysisAPI {
constructor(baseUrl = 'http://localhost:5000') {
this.baseUrl = baseUrl;
}
async transcribe(audioFilePath) {
try {
const formData = new FormData();
formData.append('audio', fs.createReadStream(audioFilePath));
const response = await axios.post(`${this.baseUrl}/transcript`, formData, {
headers: formData.getHeaders(),
});
return response.data;
} catch (error) {
throw new Error(`Transcription failed: ${error.message}`);
}
}
async evaluate(audioFilePaths) {
try {
const formData = new FormData();
audioFilePaths.forEach((path) => {
formData.append('audio', fs.createReadStream(path));
});
const response = await axios.post(`${this.baseUrl}/evaluate`, formData, {
headers: formData.getHeaders(),
});
return response.data;
} catch (error) {
throw new Error(`Evaluation failed: ${error.message}`);
}
}
}
// Usage example
async function main() {
const api = new SpeechAnalysisAPI();
try {
// Transcribe single file
const transcription = await api.transcribe('audio.wav');
console.log('Transcription:', transcription);
// Evaluate multiple files
const evaluation = await api.evaluate(['audio1.wav', 'audio2.wav']);
console.log('Evaluation:', evaluation);
} catch (error) {
console.error('Error:', error.message);
}
}
main();
Browser support
Specifically, we support the latest versions of the following browsers and platforms. On Windows, we support Internet Explorer 9+. More specific support information is provided below.
-
Chrome
-
Safari
-
Opera
-
FireFox
-
IE 9+
FAQ
Begin typing your question. If we don't have an answer for it in our FAQ, please leave us a message on our contact page.
-
Why only WAV file support? ?
WAV files provide uncompressed audio data, ensuring the highest quality for speech recognition. Support for other formats (MP3, M4A) is planned for future releases. -
How is the IELTS score calculated?
The IELTS score is calculated using a weighted combination of pronunciation accuracy (60%) and vocabulary complexity (40%). The raw scores are normalized to the IELTS 9- band scale. -
What affects the vocabulary score??
The vocabulary score considers: - Word rarity (using NLTK corpus) - Word length (longer words score higher) - Word complexity - Usage of academic/advanced vocabulary -
How accurate is the speech recognition?
The speech recognition accuracy depends on: - Audio quality (background noise, clarity) - Speaker’s pronunciation - Speaking speed - Microphone quality Average accuracy is around 95% for clear audio. -
Can I use this for languages other than English? ?
Currently, the system is optimized for English only. Multi-language support is planned for future releases.