Esophageal speeches modified by the Speech Enhancer Program®


Esophageal speeches modified by the Speech Enhancer Program®

Sriwimon Manochiopinig1* and Panuthat Boonpramuk2

1Department of Rehabilitation Medicine, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok, Thailand; 2Department of Control Systems and Instrumentation Engineering, Faculty of Engineering, King Mongkut’s University of Technology Thonburi, Thailand


Esophageal speech appears to be the first choice of speech treatment for a laryngectomy. However, many laryngectomy people are unable to speak well. The aim of this study was to evaluate post-modified speech quality of Thai esophageal speakers using the Speech Enhancer Program®. The method adopted was to approach five speech–language pathologists to assess the speech accuracy and intelligibility of the words and continuing speech of the seven laryngectomy people. A comparison study was conducted to review the changes in speech quality of esophageal speech of the original and enhanced samples. The results found that originally all esophageal speakers had various articulation disorders. The error patterns were substitution, addition, omission and distortion. These speech defects seemed to relate to structure and source of speech changed after a laryngectomy. After being converted by the Speech Enhancer Program, accuracy of the inaccurate words increased in 8.14±7.21 phonemes. In contrast, accuracy of the accurate ones decreased in 9.14±9.24 phonemes. For speech intelligibility, there was no perceived difference between the original speech and the converted speech with a rating scale of 2 and 4 for intelligibility of words and continuing speech, respectively. In addition, the converted speech was perceived subjectively to be subtly clearer and more intelligible than the original speech. In addition, pitches of the converted speech were perceived to be higher or lower in comparison to the original ones. Moreover, converted speech was perceived as a soft voice. Rough, strained, unpleasant and unnatural speech characteristics in the converted speech were also noted. In discussion, all esophageal speech characteristics and disorders found in this study were inherent with the literatures. This study concluded that the Speech Enhancer Program might have some benefit for esophageal speakers as it could enhance changes. Evidently, transforming the program in some manner was indicated for future implementation. Further study is recommended in larger and varied population groups to reveal the speech patterns and related factors. In addition, comparison studies among different speech remediation techniques and among different perceptions should be used to reveal the efficacy of the program.

Keywords: esophageal speech; Thai speech intelligibility; Thai speech accuracy; Speech enhancer

*Correspondence to: Sriwimon Manochiopinig, Department of Rehabilitation Medicine, Faculty of Medicine Siriraj Hospital, 2 Prannok Road, Bangkoknoi, Bangkok 10700, Thailand, Email:

Received: 30 October 2013; Revised: 17 April 2014; Accepted: 15 May 2014; Published: 17 June 2014

Journal of Assistive, Rehabilitative & Therapeutic Technologies 2014. © 2014 Sriwimon Manochiopinig and Panuthat Boonpramuk. This is an Open Access article distributed under the terms of the Creative Commons Attribution-Noncommercial 3.0 Unported License (, permitting all non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Citation: Journal of Assistive, Rehabilitative & Therapeutic Technologies 2014, 2: 23226 -


After a laryngectomy, the functional goals of the persons are to preserve swallowing (1), speech (2) and quality of life (3). Laryngectomy causes deterioration of acoustic and aerodynamic voice parameters (1). Losing their voice is a great tragedy and causes great burdens in daily life (4). Conventional speech rehabilitation is used to help people to communicate again by using esophageal speech, or devices such as an artificial larynx (1). Esophageal speech appears to be the first line of speech treatment choices because of many advantages. For example, it is a hands-free oral communication that needs no extra instrument and has no cost. However, there are some burdens and disadvantages. Mastery of esophageal speech as the primary means of oral communication is time consuming and requires effort (3). Moreover, the obvious esophageal speech characteristic itself is unpleasant and poorly accepted socially (1, 5). The problematic characteristics are speech inaccuracy (6), speech unintelligibility (5), reduced loudness (7), limited number of syllables per breath (8), and annoying noise from stoma during phonation with poor perceptual vocal quality, i.e., roughness, breathiness, strain (9, 10). Numerous laryngectomy people are unable to speak well, neither continuing for a long time nor in a noisy environment (10). They tend to speak in one word, or a short phase/sentence. More than half of them ended up as artificial larynx users, one third are esophageal speakers with a small number speechless (1). Consequently, their abilities are limited and they risk being stigmatized as handicapped (11, 12). Optimistically, there will be special interventions to improve speech and maintain socially acceptable voice quality.

Recently, many computer programs have been developed for speech assessing and training (1315). However, none are available for Thai esophageal speakers. Once the Speech Enhancer Program (16) was developed, we wanted to review the effectiveness of this program. The purpose of this study was to evaluation the Speech Enhancer Program’s efficacy in modifying esophageal speech.

Methods and materials


Thai esophageal speakers

There were seven Thai male post-laryngectomy with a mean age of 65.4 (S.D.±6.31) ranging from 56 to 75 years. All were literate and used esophageal speech in daily communication for 4–31 years with a mean of 15 years (S.D.±9.42). Their speech abilities were at least at word level.

Thai speech–language pathologists

They were five Thai female speech–language pathologists (including the first author) with a mean age of 55.8 years (S.D.±4.39) ranging from 48 to 59 years. All had at least a master degree in Communication Disorders with a mean of 27.2 years (S.D.±3.11) of clinical experience ranging from 22 to 30 years.


The speech tests

The complete speech tests for eliciting samples were composed of two components. The first one was a standardized articulation test: the Thai Articulation Test (TAT) (13). The TAT was a list of 68 Thai words representing 68 Thai phonemes. The second one consisted of six passages for assessing continuing speech. Each passage was selected randomly from current newspapers.

The Speech Enhancer Program

The Speech Enhancer Program (16) was developed to improve esophageal speech. The program consists of two filters and one amplifier. The first filter was four bands of FIR band pass filter, which was applied for enhancing esophageal speech by reducing some frequency that occurred after surgery. The second filter was a Kalman filter, which was applied for smoothing the signal after passing through the first filter. Finally, there was an amplifier which was used for increasing the filtered signal. The program could adjust cut-off frequency, filter order and gain of each band of the filter for each person. Figures 1 and 2 show examples of the Speech Enhancer Program pages.

Fig 1
Figure 1. Setting page shows setting cut-off frequency, filter order, and gain.

Fig 2
Figure 2. Monitor page shows the signal in time domain and frequency domain: input signal is in the upper half, and enhanced signal is in the lower half.


The institute reviewer board approved the study, and written informed consent was obtained from all participants. All esophageal speech samples were recorded twice by the second author: one for an original speech sample, another for a converted speech sample as modified by the Speech Enhancer Program. They were recorded in the same procedure that was conducted in a sound proof room at the Department of Control Systems and Instrumentation Engineering, Faculty of Engineering King Mongkut’s University of Technology Thonburi. For the purpose of achieving the optimal speech samples, there was no time frame to complete the test. As a result, seven esophageal speakers completed the articulation test and five among them completed reading the passages. The speech samples were recorded onto CDs that were distributed to five speech–language pathologists.

Speech–language pathologists individually listened to the records. They transcribed the findings, evaluated speech accuracy, rated intelligibility, and noted their perceptions. For evaluating speech accuracy, each phoneme was judged as normal or abnormal. There were four types of abnormal pattern: addition, substitution, omission, and distortion. For assessing speech intelligibility, they were five rating scales: 1) normal speech intelligibility, 2) understandable without knowing the topics, 3) understandable by guessing the topics, 4) partially understandable, able to guess only some words, and 5) unintelligible, cannot understand at all. Assessors were blind to the speakers, but aware of the speech samples as original and converted ones.

Data analysis

This study was a descriptive study. A comparison between the original speech samples and converted speech samples that were modified through the program was performed by using the SPSS (version 11).


Speech accuracy

Esophageal speakers had various articulation problems with different error patterns. Such error patterns were substitution, addition, omission and distortion of the phonemes. The inaccurate phonemes were /h/, /ʔ/, /s/, /r/, /th/, /ph/, /kh/, /m/, /d/, /k/, /f/, /n/, /tɕh/, /tɕ/, /khw/ and /phl/. Noticeably, means of inaccurate phoneme were higher than means of accurate phonemes, as shown in Table 1. In contrast, there was no significant difference between original and converted speech samples regarding to the number of accuracy/inaccuracy phonemes. These characteristics were found in both speech samples of the word level. Specifically, means of inaccurate phoneme were 40.14 (±17.74) ranging from 15 to 62, and 41.14 (±16.78) ranging from 20 to 60 for original speech and converted speech, respectively. Meanwhile, the means of accurate phonemes were 27.71 (±17.51) ranging from 6 to 52 and 26.71 (±16.75) ranging from 8 to 48 for original speech and converted speech, respectively.

Table 1. Articulation characteristics of the words
Articulation characteristic (number of phonemes) Original speech, mean (S.D.) Converted speech, mean (S.D.) Change post-modified, mean (S.D.)
Accurate 27.71 (17.51) 26.71 (16.75)
To be inaccurate=9.14 (9.24)
Remained accurate=18.57 (14.5)
Inaccurate 40.14 (17.74) 41.14 (16.78)
To be accurate=8.14 (7.21)
Remained inaccurate=32 (5.66)

Further analysis was conducted to review more about the phonemic changes of the converted speech sample. As shown in Table 1, there were subtle differences in the numbers of phonemes that were changed to be accurate and to be inaccurate. More specifically, mean of the phonemes that were changed to be accurate was 8.14 (±7.21), ranging from 5 to 23. Meanwhile, the mean of phonemes that was changed to be inaccurate was 9.14 (±9.24), ranging from 2 to 28. Similar findings were also found regarding the number of phonemes that remained accurate and inaccurate. The accurate phonemes that remained accurate ranged from 2 to 41 with a mean of 18.57 (±14.5), whereas the inaccurate phonemes that remained inaccurate ranged from 10 to 56 with a mean of 32 (±5.66).

Similar patterns of inaccurate speech of the continuing speech achieved from reading the passage were found. They were substitution, addition, omission and distortion of the phonemes. In the original speech, it was found that the mean of inaccurate phoneme was 4.43 (±1.89) ranging from 2 to 8, with the mean of inaccurate occurrence per passage being 6.8 (±3.73) ranging from 2 to 17 as shown in Table 2. In the converted speech, it was found that the mean of inaccurate phonemes was 3.4 (±3.61) ranging from 0 to 8, with a mean of inaccurate occurrence per passage being 5.11 (±3.28) ranging from 0 to 11.

Table 2. Inaccurate phoneme and inaccurate occurrence of continuing speeches
Characteristic (phoneme, occurrence) Original speech, mean (S.D.) Converted speech, mean (S.D.)
Inaccurate phoneme 4.43 (1.89) 3.4 (3.61)
Inaccurate occurrence (time/passage) 6.8 (3.73) 5.11 (3.28)

Speech intelligibility

Comparison of the speech intelligibility between the original speech sample and the converted speech sample found that there was no difference. The average speech intelligible level was rated consistently with four and two for word and continuing speech, respectively. It meant that speech–language pathologists could guess and understand some words. In addition, they could understand the continuing speech without knowing the topics. Furthermore, two out of five speech–language pathologists reported that the converted speech was subtly more intelligible than the original ones without a change of the rating scale.

Speech characteristic

Perceptual voice evaluation was performed. It was found that voices of both speech samples were soft, low tone, rough, and strained with the different degrees of impairment. There was audible noise through the stoma. Moreover, noise generated from forceful air taking was found either for the next speech production or for respiration purposes.

According to the length, rate and rhythmic pattern of the speech samples, they read in short phases/sentences, spoke slowly, and paused during phonation. The mean length of reading utterance was 5.7 ranging from 1 to 13 syllables per utterance.

Pitch of the converted speech was perceived as either raised or dropped. These inconsistency changes were found individually and also in the group. Furthermore, less or no noise was audible during speaking. The converted speech was perceived as clearer and more intelligible than the original speech. In contrast, roughness and strain seemed to be more perceptible and detectable in the converted speech. Unpleasant and unnatural speech of the converted speech was noted.


It is recorded that speech disorders of esophageal speakers relate to structure defects and sources of speech energy that changes after laryngectomy (5, 6). In agreement with other studies, our esophageal speakers exhibited various speech problems with different degrees of impairment (6, 17). Thus, converted speech changes might be unpredictable. A higher mean of inaccurate phoneme than the mean of accurate phonemes was found both in original speech and converted speech (Table 1). Although the number of modified phonemes found to be accurate and those found to be inaccurate were nearly equal (Table 1), interestingly, the inaccurate phonemes and the inaccurate occurrences were found less often in the converted speech than in the original speech (Table 2). More precisely, the program is noticeably able to create the changes and enhance speech. Such change may be a fine indicator of the Speech Enhancer Program’s effectiveness.

Reasonably, the findings that indicated poorer intelligibility of words (rated as ‘partially understandable and able to guess only some words’) than continuing speech (rated as ‘understandable without knowing the topics’) are expected, which may be due to the context of the passages. Speech–language pathologists may predict the words in the continuing speech easier than the specific words themselves when the words stand alone. Another reason may be because of less or no audible noise from the stoma of the converted speech during speech production (2). With or without less noisy sound disturbing the speech loudness, the converted speech may be perceived to be clearer than the original speech. Moreover, the occurrence of either increased or decreased speech accuracy of the word in the converted speech is observed, which may be because of noise elimination. Although it is on a small scale and does not affect the intelligibility rating as shown in Tables 1 and 2, the converted speeches are perceived as subjectively clearer and gained more intelligibility. These are worth bringing up, as these may also imply the effectiveness of the Speech Enhancer Program for some points of view.

Generally, the converted speech samples retained the same characteristics of the original speech sample, except for articulation accuracy (17). Although there is no significant difference, some subtle speech characteristic changes may be observed. Subsequently, changes in converted speech are perceived as soft voice (7) with high or low pitch quality. Moreover, the converted speech is also criticized as rough, strained, unpleasant and unnatural speech (2, 9).

The soft voice problem may be easy to work out by applying a high power amplifier in the Speech Enhancer Program. Otherwise, the pitch quality may be solved by alternating the filter to fit each speaker individually. Therefore, transforming the Speech Enhancer Program for each employment is required.

In contrast to normal speakers that are only able to create speech production during inhalation, esophageal speakers engage in inhalation during speech production (10). During speech production, laryngeal speakers attempted to swallow the air through the mouth quickly in a forceful manner in order to support air for the speech production (10). Such speech breathing behaviors cause strained speech in addition to the roughness (1, 10). As a result, esophageal speech was perceived to be slow with a short phase and/or sentence, and thus pauses during phonation (8, 17).

The potential physiologically based sources and poorly designed forceful breathing behaviors for speech production are hard to avoid for the esophageal speakers (6, 17). In spite of the clearer speech outcomes, perceived forceful and unpleasant characteristics in the converted speech are still noted. In this condition, speech rehabilitation will be a preferred method of problem solving, particularly to instruct the esophageal speakers how to manage the air for speech production properly (4).

Gaining new information and knowledge will be fundamental for improving the Speech Enhancer Program. Any future study should incorporate an amplifier for speech signal and develop a new list of intelligibility tests. Normally, speech–language pathologists are familiar with the articulation test; therefore, they may unconsciously predict the words instead of making an evaluation solely on listening to the speech samples. Thus, a standardized articulation test should be avoided to eliminate assessor’s bias. In addition, a long passage is not recommended either, because a majority of the esophageal speakers are not able to speak in a long sentence and/or in connected sentences. To require these persons to read a long passage will not meet construct validity.

This study may carry some limitations and bias. One obvious limitation was the need to recruit esophageal speakers. It was difficult to include the esophageal speakers who had similar impairment or severity of speech disorder. This is in spite of the fact that severity could affect the verbal ability and may be used to evaluate the effectiveness. Subsequently, the result may include some errors. Notably, this study is not randomized and partially blind, so it possibly contains bias. Although there may be potential biases and limitations, by means of employing universal phonetic transcription, qualified assessors, and considerable speech sample size, the results of this study could be valuable. Due to the small sample size, the level of generalization of the findings should be taken into account as the results from a pilot study that needs further research studies to make fuller inference and implementations.


This study implied that the Speech Enhancer Program may have some benefit for esophageal speakers. Being short of the significant improvements, the accuracy in the converted speech was subtly changed. Thus, esophageal speech could be improved through the program. However, renovation of the program was considered necessary for future employment. Future studies should comprehensively assess articulation patterns, speech characteristics, related factors in a large population group, and layperson’s perceptions of the esophageal speech before and after speech enhancement in comparison with the speech–language pathologists’ perceptions. In addition, the studies should compare the gains of modified speech and voice samples of tonal language versus non-tonal language, as well as the gains of modified speech of some neurological speech sample, such as dysarthria. In addition, comparison studies among different speech remediation techniques that speech–language pathologists employ and enhancement using the Speech Enhancer Program should be used to review the efficacy of the training program. Its effectiveness in accent reduction programs may also be conducted.


We would like to thank Faculty of Medicine Siriraj Hospital, Mahidol University, Faculty of Engineering, King Mongkut’s University of Technology Thonburi, and National Electronics and Computer Technology Center (NECTEC), Thailand for their support and grant.

Conflicts of interest and funding

The authors have not received any benefits from any private organization to conduct this study.


  1. Hillman RE, Walsh MJ, Wolf GT, Fisher SG, Hong WK. Research speech–language pathologists, speech–language pathologists at participating VA medical centers, and the department of veterans affairs laryngeal cancer study group. Functional outcomes following treatment for advanced laryngeal cancer part I # voice preservation in advanced laryngeal cancer part II # laryngectomy rehabilitation: the state of the art in the VA system. Annals of Otology, Rhinology and Laryngology. 1998;5: 1–27.
  2. Topaloğlu I, Koçak I, Saltürk Z. Multidimensional evaluation of vocal function after suprecricoid laryngectomy with cricohyoidopexy. Annals of Otology, Rhinology and Laryngology. 2012;6:407–12.
  3. Schuster M, Lohscheller J, Kummer P, Hoppe U, Eysholdt U, Rosanowski F. Quality of life in laryngectomy after prosthetic voice restoration. Folia Phoniatrica et Logopaedica. 2003;5:211–19. Publisher Full Text
  4. Culton GL, Gerwin JM. Current trends in laryngectomy rehabilitation: a survey of speech–language pathologists. Otolaryngology and Head and Neck Surgery. 1998;4:458–63.
  5. Finiza C, Bergman B. Health-related quality of life in patients with laryngeal cancer: a post-treatment comparison of different modes of communication. Laryngoscope. 2001;5:918–23. Publisher Full Text
  6. Searl J. Bilabial contact pressure and oral air pressure during tracheoesophageal speech. Annals of Otology, Rhinology and Laryngology. 2007;4:304–11.
  7. Max L, Steurs W, De Bruyn W. Vocal capacities in esophageal and tracheoesophageal speakers. Laryngoscope. 1966;1:93–6.
  8. Salmon SJ. Commonalities among alaryngeal speech methods. In: Doyl PC, Keitch RL, eds. Contemporary considerations in the treatment and rehabilitation of head and neck cancer: voice, speech, and swallowing. Austin, TX: PRO-ED; 2005, pp. 59–74.
  9. Clark JG, Stemple JC. Assessment of three modes of laryngeal speech with a synthetic sentence identification (SSI) task in varying message-to-competition ratios. Journal of Speech and Hearing Research. 1982;3:333–8.
  10. Bohnenkamp TA, Forrest KM, Klaben BK, Stager J. Lung volumes used during speech breathing in tracheoesophageal speakers. Annals of Otology, Rhinology and Laryngology. 2011;8:550–8.
  11. Schuster M, Lohscheller J, Kummer P, Hoppe U, Eysholdt U, Rosanowski F. Voice handicap of laryngectomy with tracheoesophageal speech. Folia Phoniatrica et Logopaedica. 2004;1:62–7. Publisher Full Text
  12. Attieh AY, Searl J, Shahaltough NH, Wreikat MM, Lundy D. Voice restoration following total laryngectomy by tracheoesophageal prosthesis: effect on patients’ quality of life and voice handicap in Jordan. Health Quality Life Outcomes. 2008;6:26–35. doi: 10.1186/1477-7525-6-26. Publisher Full Text
  13. Manochiopinig S, Pracharitpukdee N, Lerstsarunyapong S. The articulation characteristics of normal Thai children aged 3–10 years assessing by using the Thai Articulation Test (TAT). Siriraj Medical Journal. 1998;8:763–9.
  14. Manochiopinig S, Thubthong N, Kayasith P. Dysarthric speech characteristics speech of Thai stroke patients. Disability and Rehabilitation. Assistive Technology. 2008;6:332–8.
  15. Huang DZ. Dr. Speech. Seattle, WA: Tiger DRS; 1998.
  16. Tuangpermsub N, Boonpramuk P, Polwisate W, Kayasith P. Improvement of esophageal speech by adaptive line enhancement with bias model. The 2nd International Convention on Rehabilitation Engineering & Assistive Technology. Bangkok, Thailand: The Imperial Queen’s Park. 13–15 May 2008; 74–7.
  17. Diedrich WM. Anatomy and physiology of esophageal speech. In: Salmon SLJ, Mount KH, eds. Alaryngeal speech rehabilitation: for clinicians by clinicians. Austin, TX: PRO-ED; 1999, pp. 13–19.
About The Authors

Sriwimon Manochiopinig
Faculty of Medicine Siriraj Hospital, Mahidol University

Department of Rehabilitation Medicine

Panuthat Boonpramuk

Article Metrics

Metrics Loading ...

Metrics powered by PLOS ALM

Related Content