Praat Tutorials

Download the tutorial "Introduction to Praat: the Basics"

Download the tutorial "Scripting in Praat: the Basics"

!Download the Sound & TextGrid objects for these tutorials "Wave&"


Praat Scripts

f0 detection.

A note:

In order to avoid possible pitch tracking errors, pitch floor and pitch ceiling are set in the scripts below to the values q15*0.83 (where ‘q’ stands for percentile) and q65*1.92. These formulae (as well as the formulae q25*0.75 – q75*1.5) have been shown [1, 2] to give a better estimation of pitch extrema, i.e. to exclude more octave errors or microprosodic effects at the extreme of the f0 distribution, than setting pitch floor and ceiling parameters to the default values (60 – 600 Hz) or to default values depending on the gender of the speaker (female: 100 – 500 Hz; male: 75 – 300 Hz).


[1] De Looze,C. and Hirst, DJ., Detecting changes in key and range for the automatic modelling and coding of intonation., In Speech Prosody 2008, Campinas, Brazil.

[2] De Looze, C., Analyse et interprétation de l'empan temporel des variations prosodiques en français et en anglais contemporain. Doctoral thesis, 2010, Université de Provence.

From a sound file, this script calculates f0 min, max, sd, key (median) and span (max-min).

This script creates a Textgrid from a Sound object; Annotates the TextGrid in interpausal runs; and calculates f0 min, max, sd, key (median) and span (max-min) for each interpausal run.

From a TextGrid object, this script calculates f0 min, max, sd, key (median) and span (max-min) for each annotated phrase (syntactic, phonological, prosodic, ..).


From a TextGrid object, this script calculates a speaker's speech rate in terms of phonemes and/or syllables per second as well as calculates pause number and duration.

From a TextGrid object, this script calculates interval (e.g. word, syllable, phoneme) duration.

From a TextGrid object, this script calculates pauses' duration and their position.

This script gives TextGrid or Sound objects' duration.


This script calculates vowels' formants 1, 2 and 3.

This script draws a vowel quadrilateral for the vowels labelled in a table.

This script draws a vowel quadrilateral for the vowels labelled in a table after calculating their mean.

TextGrid modification.

This script allows to concatenate (and not to merge) two TextGrid objects.

This script allows to create, from an annotation in phonemes, syllables or words, a tier annotated in Interpausal Runs.


Two Praat plugins for the Automatic Detection of Register and Tempo Variations

Download the

Download the Tutorial_ADoReVA.pdf

Download the plugin

Download the Tutorial_ADoTeVA.pdf


As pointed out many years ago by Bolinger [1], a major drawback of most scalar systems for representing intonation patterns is the difficulty in separating global pitch changes (determined by variations in register key and span) from local pitch characteristics (determined by changes in the phonological representation of intonation). How to distinguish for example a high fall in a narrow register from a low fall in a wide register? To answer this argument, changes in the f0 domain are accounted for by admitting two level tones such as assumed in the AM theory [3-5] or more, such as opted in INTSINT [2]. However, while these models appear adequate for the analysis of short read sentences (as often employed in laboratory speech), the fact that they implicitly assume that a speaker's key and span remain unchanged makes their use fragile for the analysis of spontaneous speech, which variations in register may convey, among others, information about the speaker’s identity or again about the discourse structure, i.e. the hierarchical dimension and relational organisation of the discourse. In addition, models have to cope with the overlap between on the one hand short-term features such as segmental duration and longer-term ones such as tempo variations which also convey extralinguistic as well as paralinguistic functions. This also makes the analysis and modelling of the temporal organisation of speech difficult. How to distinguish for example a short phoneme in a slow tempo from a long phoneme in a fast tempo? These difficulties show the importance of understanding the temporal span of longer-term prosodic variations when describing short-term variations, in particular for the study of spontaneous speech and its functions.

The difficulty in defining the temporal span of register and tempo variations comes from the fact that these variations operate over many different domains. I propose two clustering algorithms, ADoReVA and ADoTeVA, which aim at automatically detecting these variations. First, the algorithms calculate the difference between two consecutive units according to their register (ADoReVA) and tempo (ADoTeVA). Then, they generate a binary tree structure in the form of a layered icicle diagram which enables the graphical representation of register (Figure 1) and tempo variations. This representation allows the definition of the hierachical structure and relational organisation of discourse units as reflected by register and tempo changes. Groups of units are therefore distinguished and an analysis of the distance between the leaf nodes allows boundary strength measurements between them. The larger the distance, the stronger the boundary between two groups. On the contrary, a short distance suggests that two consecutive units belong to the same group of units.

Figure 1. Extract of a layered icicle diagram representation as obtained with the algorithm ADoReVA. Units are grouped together according to their register level and span. The representation shows that the unit “le premier” and “ministre” belong to the same group (according to the register level of each unit); on the contrary, we clearly distinguish the group of units “le premier ministre ira-til à Beaulieu” from the group of units “le village de Beaulieu est en grand émoi”. In fact, the distance between the leaf nodes “à Beaulieu” and “le village” indicates a strong frontier (corresponding to a break in the tree structure). The colour scale indicates the register level for each unit. The warmer the colour, the higher the key.



[1] Bolinger, D. (1951) Intonation: Levels vs. Configurations, in Word, 7, 199-210.
[2] Hirst, D.J. (2007) A Praat Plugin for MOMEL and INTSINT with improved algorithms for modelling and coding of intonation, in Proc. Int. Conf. Phonetic Sci. XVI, Saarbrucken.
[3] Ladd, DR. (1996) Intonational Phonology, Cambridge University Press, Cambridge, G.B.
[4] Pierrehumbert, J. (1980) The Phonology and Phonetics of English Intonation, PhD Dissertation, Cambridge, Mass., MIT.
[5] Silverman, K. et al (1992) ToBI: A Standard for Labeling English Prosody in Second International Conference on Spoken Language processing. ISCA.