Estimation of the glottal flow from the speech or singing voice

Utilize este identificador para referenciar este registo: https://hdl.handle.net/1822/47578

Título:	Estimation of the glottal flow from the speech or singing voice
Outro(s) título(s):	Estimação do impulso glótico do discurso ou do canto
Autor(es):	Beleza, Hugo Miguel Ferreira
Orientador(es):	Mendes, Rui Ferreira, Aníbal
Palavras-chave:	Impulso glótico Estimação do impulso glótico Filtragem inversa Integração no domínio das frequências Estimação do impulso glótico no domínio das frequências Glottal pulse Estimation of the glottal pulse Filter Algorithm Frequency domain glottal source estimation
Data:	3-Mar-2016
Resumo(s):	O processo de produção humana de voz é, resumidamente, o resultado da convolução entre o sinal de excitação, o impulso glótico, e a resposta impulsiva resultante da função de transferência do trato vocal. Este modelo de produção de voz é frequentemente referido na literatura como um modelo fontefiltro, em que a fonte representa o fluxo de ar que sai dos pulmões e passa pela glote (espaço entre as pregas vocais), e o filtro retrata as ressonâncias do trato vocal e a radiação labial/nasal. Estimar a forma do impulso glótico a partir do sinal de voz é de importância significativa em diversas áreas e aplicações, uma vez que as características de voz relacionadas, por exemplo, com a qualidade da voz, esforço vocal e distúrbios da voz, devem-se, principalmente, ao fluxo glotal. No entanto, este fluxo é um sinal difícil de determinar de forma direta e não invasiva. Ao longo das últimas décadas foram desenvolvidos vários métodos para estimar o impulso glótico mas sem o desenvolvimento de um algoritmo eficiente e automático. A maioria dos métodos desenvolvidos baseia-se num processo designado por filtragem inversa. A filtragem inversa representa a desconvolução, ou seja, procura obter o sinal de entrada aplicando o inverso da função de transferência do trato vocal ao sinal de saída. Apesar da simplicidade do conceito, o processo de filtragem inversa não é simples uma vez que o sinal de saída pode incluir ruído e não é alcançável modelar com precisão as características do filtro do trato vocal. Nesta dissertação apresentamos um novo método de filtragem de um sinal de modo a melhorar um método robusto de estimação da fonte glótica, no domínio das frequências, que usa uma característica de fase baseada nos Atrasos Relativos Normalizados (NRD) dos harmónicos. Este modelo é aplicado a diversos sinais de voz (sintéticos e reais), e os resultados obtidos da estimação do impulso glótico são comparados com os obtidos usando outros métodos analisados no estado da arte com e sem o referido método de filtragem. The human speech production system is, briefly, the result of the convolution between the excitation signal, the glottal pulse, and the impulse response resulting from the transfer function of the vocal tract. This model of voice production is often mentioned in the literature as a source-filter model, where the source represents the flow of the air leaving the lungs and passing through the glottis (space between the vocal folds), and the filter stands for the resonances of the vocal tract and the lip/nostrils radiation. The estimation of the shape of the glottal pulse from the speech signal is of significant importance in many fields and applications, since the most important features of speech related to voice quality, vocal effort and speech disorders, for example, are mainly due to the voice source. Unfortunately, the glottal flow waveform which is at the origin of the glottal pulse, is a very difficult signal to measure directly and non-invasively. Several methods to achieve the estimation of the glottal flow have been proposed over the last decades, but an efficient and automatic algorithm which performs reliably is not yet available. Most of the developed methods are based on the inverse filtering method. The inverse filtering approach represents a deconvolution process, i.e., it seeks to obtain the source signal by applying the inverse of the vocal tract transfer function to the output speech signal. Despite the simplicity of the concept, the inverse filtering procedure is complex because the output signal may include noise and it is not straightforward to accurately model the characteristics of the vocal tract filter. In this dissertation we discuss a new filtering method for voiced signals with the goal to improve the assessment of a robust frequency-domain algorithm for glottal source estimation that uses a phaserelated feature based on the Normalized Relative Delays (NRDs) of the harmonics. This model is applied to several speech signals (synthetic and real), and the results of the estimation of the glottal pulse are compared with the ones obtained using other state-of-the-art methods with and without the presence of that filtering method.
Tipo:	Dissertação de mestrado
Descrição:	Dissertação de mestrado em Bioinformática (área de especialização em Engenharia)
URI:	https://hdl.handle.net/1822/47578
Acesso:	Acesso aberto
Aparece nas coleções:	BUM - Dissertações de Mestrado DI - Dissertações de Mestrado CEB - Dissertações de Mestrado / MSc Dissertations

Ficheiros deste registo:

Ficheiro	Descrição	Tamanho	Formato
Hugo Miguel Ferreira Beleza.pdf	Tese	2,29 MB	Adobe PDF	Ver/Abrir

Ver registo completo Sugerir correção Estatísticas