Spectral Entropy and Zero Crossing Rate for Endpoint Detection of Speech Signals

tags: Speech signal processing  Endpoint detection  matlab  algorithm

The idea of ​​using double threshold method:

First, the spectral entropy of the noise segment (high randomness and high confusion) is greater than the speech segment, distinguishing the voiced and noise of the voice, and the voiced voice is retained. The unvoiced phase and noise of the time, and then the short-term zero-crossing rate of the unvoiced voice is lower than the noise segment, the unvoiced voice is separated from the noise, and the unvoiced voice is retained, thereby obtaining a complete voice segment, and realizing the endpoint detection of the voice segment, which can be automatically Remove the noise segment.

Spectral entropy definition: Among them: according to the idea of ​​matrix operation editing program: first calculate the fft result of the framing matrix, and convert it to dB units (experimental test using dB as a unit has obvious effect) squared to get the top N of each column of matrix Y vs Y The denominator cigmaY of the probability calculation is added by / 2 lines, and then cigY is expanded into N lines, which is convenient for matrix operation with ./ to obtain the probability matrix P and substitute it into the final spectral entropy calculation formula. The code is as follows: freq = fft (frame_w, N);% Convert the data from the time domain to the frequency domain spect = real (10log10 (freq));% Change the frequency domain result unit to dB
Y=spect
conj (spect);% calculated energy sumY = sum (Y (1: end / 2, :)); sumY = sumY (ones (1, N), :);
P = Y. / sumY;% calculate the sample point probability
H = -sum (P (1: end / 2,:). * (log2 (P (1: end / 2,:))));% is defined by the spectral entropy formula

According to the idea of ​​the double threshold method, the detection process traverses the frame matrix once. If the spectral entropy of a frame is less than the high spectral entropy threshold, it may enter the speech segment and continue to traverse, if the spectral entropy of a frame is less than When the low spectral entropy threshold is reached, determine that the speech segment has been entered, record the position of the frame, and traverse from the position. If the short-term zero-crossing rate of a frame is greater than the zero-crossing rate threshold, determine the frame as the end of the noise header The position of the record, change the frame position to noiseEnd, jump out and traverse forward. Then the idea of ​​the noise tail processing is similar. If the spectral entropy of a frame is less than the low spectral entropy threshold, and the spectral entropy of the next frame is higher than the threshold, it is determined that the frame is in the speech segment, and it starts to traverse backward, adding a zero crossing rate greater than If the zero-crossing rate threshold is exceeded, record the frame position as noiseBegin and jump out of all traversal.

The idea is still very clear, the code is not difficult to write.
The complete code below (for reference only, the voice data used in the experiment is: oh.mat, you may need to modify the corresponding threshold parameters if you do it yourself)
% Ocross: short-term zero crossing rate
%~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
% Input parameters:
% frame_w frame frame matrix
% output parameters:
% zerocross Zero-crossing rate sequence
%~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
function [zerocross]=Ocross(frame_w)
[frame_length,frame_number]=size(frame_w);
zerocross=zeros(1,frame_number);
for i=1 : frame_number    
u = frame_w (:, i);% Take out a frame
    for j=1 : frame_length-1
if u (j) * u (j + 1) <0% judge whether it is zero crossing
zerocross (i) = zerocross (i) +1;% is zero crossing, record once
end% end zero crossing judgment
% end single frame loop
end% end frame number loop

% Experiment 4 endpointDetection
% Apr.18 2020
%~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
clear
close all
clc

%~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

%~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
%
load(‘oh.mat’,‘data’);
fs=10000;
N=1024;

width = 3; % Width in inches
height = 3; % Height in inches
alw = 0.75; % AxesLineWidth
fsz = 13; % Fontsize
lw = 1.2; % LineWidth 1.5
msz = 7; % MarkerSize

frame_time=20e-3;
[frame_m,frame_w,frame_length,frame_shift,frame_number]=enframe(data,fs,20e-3,10e-3,‘hamming’);
% Call the framed windowing function to obtain the framed matrix, framed windowed matrix, frame length, number of frames
timeAxis=(1:frame_number)*20e-3;
zerocross=Ocross(frame_w);
plot(data)
legend (‘Original voice time domain map’, ‘Location’, ‘best’)
xlabel (‘Time (in points)’);
ylabel (‘amplitude’);

mag = sum (abs (frame_w));% amplitude
f = (0:N/2-1)/Nfs;% calculation frequency sequence
freq = fft (frame_w, N);% Convert data from time domain to frequency domain
spect=real(10
log10 (freq));% Convert frequency domain result unit to dB
Y=spect.conj (spect);% calculated energy
sumY=sum(Y(1:end/2,:));
sumY=sumY(ones(1,N),:);
P = Y. / sumY;% calculate the sample point probability
H=-sum(P(1:end/2,:).
(log2 (P (1: end / 2,:))));% is substituted into the formula defined by the spectral entropy
figure();
plot(timeAxis,H)
legend (‘Spectral Entropy Time Domain Map’, ‘Location’, ‘best’)
xlabel (‘Time / s’);
ylabel (‘amplitude’);
figure()
subplot(211)
plot(timeAxis,mag)
subplot(212)
plot(timeAxis,H)
EntropyHigh = max (H) * 0.995;% spectral entropy high threshold
EntropyLow = min (H) * 1.06; Low threshold of% spectral entropy
hold on
plot([timeAxis(1), timeAxis(end)], [EntropyHigh, EntropyHigh], ‘r’, ‘LineWidth’,lw, ‘MarkerSize’, msz);
plot([timeAxis(1), timeAxis(end)], [EntropyLow, EntropyLow], ‘g’, ‘LineWidth’,lw, ‘MarkerSize’, msz);
legend (‘Spectral entropy versus assignment time-domain plot’, ‘High spectral entropy threshold’, ‘Low spectral entropy threshold’, ‘Location’, ‘best’)
xlabel (‘Time / s’);
ylabel (‘amplitude’);
figure()
T = 24;% sets the zero-crossing rate threshold
plot(timeAxis,zerocross)
hold on
plot([timeAxis(1), timeAxis(end)], [T, T], ‘r’, ‘LineWidth’,lw, ‘MarkerSize’, msz);
legend (‘Zero crossing rate’, ‘threshold’, ‘Location’, ‘best’)
xlabel (‘Time / s’);
ylabel (‘amplitude’);

figure()
plot(data)
hold on
for i=1:frame_number
if(H(i)<EntropyLow && H(i-1)>EntropyLow )
noiseEnd=i-1;
for j=noiseEnd:-1:1
if(zerocross(j)>T)
noiseEnd=j;
break
end
end
end
if(H(i)<EntropyLow && H(i+1)>EntropyLow )
noiseBegin=i+1;
for j=noiseBegin:frame_number
if(zerocross(j)>T)
noiseBegin=j;
break
end
end
break
end
end
endIndex=noiseEnd*(frame_length+1)/2;
beginIndex=noiseBegin*(frame_length+1)/2;
noise1=data(1:endIndex);
noise2=data(beginIndex:length(data));
plot(noise1,‘r’)
hold on
plot(beginIndex:length(data),noise2,‘g’)
legend (‘Original voice’, ‘First noise segment’, ‘Tail noise segment’, ‘Location’, ‘best’)
xlabel (‘Time (number of sampling points)’);
ylabel (‘amplitude’);
figure()
oh_clean=data(endIndex:beginIndex);
plot(oh_clean)
legend(‘cleanData’,‘Location’,‘best’)
xlabel (‘Time (number of sampling points)’);
ylabel (‘amplitude’);
save(‘oh_clean’)

Intelligent Recommendation

Zero Crossing Rate and Python implementation

Zero crossing rate (Zero Crossing Rate,ZCR) Refers to the number of times the voice signal passes through the zero point (from positive to negative or from negative to positive) in each frame. This fe...

Double threshold speech endpoint detection (Python implementation)

Written in front It took a few days to finish the first experiment of audiovisual signal processing. In fact, it is quite simple, share it here. This article introduces the method of voice endpoint de...

Speech endpoint detection and speech segmentation based on double threshold method

voice_activity_detection Audio Split Audio endpoint detection and speech segmentation based on double threshold method Code is on my githubvoice_activity_detection If you think it's a bit useful, plea...

Short-term average zero level of speech signals

First, short-term average zero zero 1. For continuous speech signals, it can be examined for the case of the time domain waveform through the time axis; 2. For discrete signals, it is essentially the ...

Voice breakpoint detection (short-term improvement of subband spectral entropy)

1. Audio Analysis 1. Output the sync time point information of the voice, and the time point is expressed in milliseconds; 2. Split the voice into multiple wav files; Endpoint detection: determine the...

More Recommendation

Speech Recognition Series 5-Voice Active Endpoint Detection (VAD)

Speech Recognition Series 7 - Voice Active Endpoint Detection (VAD) First, the introduction Voice Activity Endpoint Detection (VAD) is an old topic used to separate speech signals from non-speech sign...

Speech signal double threshold method endpoint detection MATLAB

Speech signal double threshold method endpoint detection MATLAB You can read this article carefully to understand the basic grammar of MATLAB....

Matlab2013a learning endpoint detection segmentation short speech .wav file

First of all, a brief introduction, Here, introduce voicebox: matlab-based voice detection, various functions used to process voice signals in the early stage. Pre-emphasis, framing, windowing, endpoi...

BLDC back-EMF zero-crossing detection calculation

In the picture, PHASE_A, PHASE_B, PHASE_C are respectively connected to the A, B, and C lines of the motor. After a voltage divider network, they are NULL_A, NULL_B, NULL_C, and then connected to the ...

Time domain characteristics of speech signal (2) Zero crossing number

Introduction to Zero Crossing Zero-crossing analysis is a commonly used method to estimate the frequency of a speech signal. The speech signal is a wideband signal. For a continuous signal, the wavefo...

Copyright  DMCA © 2018-2026 - All Rights Reserved - www.programmersought.com  User Notice

Top