Spoken Language Processing: AuToBI v1.3

This release to AuToBI is a more traditional milestone release than v1.2 was. Trained models and a new .jar file will be available on the AuToBI site shortly.

There are improvements to performance that are thoroughly documented in a submission to IEEE SLT 2012. These improvements were achieved from two sources.

First, AuToBI uses importance weighting to improve classification performance on skewed distributions. I found this to be a more useful approach than the standard under- or over-sampling. This is discussed in a paper that will appear at Interspeech next month.

Second, inspired by features that Taniya Mishra, Vivek Sridhar and Aliaster Conkie developed at AT&T, I included some new features which had a big payoff. (They described these features in an upcoming Interspeech 2012 paper). One of the most significant was to calculate the area under a normalized intensity curve. This has a strong correlation with duration, but is more robust. You could make an argument that it approximates "loudness" by incorporating duration and intensity. This is a pretty poor psycholinguistic or perceptual argument so I wouldn't make it too strongly, but it could be part of the story.

Here is a recap of speaker-independent acoustic-only performance on the six ToBI classification tasks on BURNC speaker f2b.

Task	Version 1.2	Version 1.3
Pitch Accent Detection	81.01% F1:83.28	84.83% F1:86.58
Intermediate Phrase Detection	75.41% F1:43.15	77.97% F1:44.43
Intonational Phrase Detection	86.91% F1:74.50	90.36% F1:76.49
Pitch Accent Classification	18.46% Average Recall:18.97	16.33% Average Recall:21.06
Phrase Accent Classification	48.34% Average Recall:47.99	47.44% Average Recall:48.31
Phrase Accent/Boundary Tone Classification	73.18% Average Recall:25.92	74.47% Average Recall:26.02

There are also a number of improvements to AuToBI from a technical side and as a piece of code.

First of all, unit test coverage has increased from ~11% to ~73% between v1.2 and v1.3.

Second, there was a bug in the PitchExtractor code causing a pretty serious under prediction of unvoiced grames. (A big thanks to Victor Soto for finding this bug.)

Third, memory use is much lower by a more aggressive deletion of prediction attributes, and through a modification of how WavReader works.

I'd like to thank Victor Soto, Fabio Tesser, Samuel Sanchez, Jay Liang, Ian Kaplan, Erica Cooper and, as ever, Julia Hirschberg and anyone else who has been using AuToBI, for their patience and feedback.

I've been pretty lax about posting here. I'll try to get better about it in the coming academic year.

This fall is full of travel which will lead to a lot of ideas and not enough time to work on them.

Spoken Language Processing

Thursday, August 16, 2012

AuToBI v1.3

No comments:

Disqus for Spoken Language Processing Blog