Task #3855
closedTask #3680: RA4a - Automatic error prediction
Task #3698: Experiment with one-class clasification for join cost enhancements
More data for artefacts collection
0%
Description
We need more data for listening tests. Especially we need to increase the coverage of rare vowels. Currently we have:
phone | total | OK | artefact |
a | 78 | 60 | 18 |
e | 82 | 46 | 36 |
i | 49 | 30 | 19 |
o | 92 | 50 | 42 |
u | 23 | 22 | 1 |
A | 123 | 17 | 104 |
E | 4 | 4 | 0 |
I | 23 | 17 | 6 |
O | 0 | 0 | 0 |
U | 4 | 4 | 0 |
We can either try to find additional words in the corpus (shorter, though), or build "artificial" words by joining two halves of words (or words transitions) from the corpus.
Files
Updated by Tihelka Dan almost 9 years ago
- File prepare_words.py prepare_words.py added
Adding (rather messed) script prepare_words.py which was used to select the original list of words used for the listening tests. From the full list, only words starting/ending with unvoiced consonants were used.
Updated by Matoušek Jindřich almost 9 years ago
- Target version changed from RA1: Analysis of artifacts in synthetic speech to RA4: Automatic error prediction and signal modification
Updated by Grůber Martin over 8 years ago
- Status changed from New to Assigned
- Assignee changed from Grůber Martin to Tihelka Dan
I would also need a script which is used for word parts combining (and words synthesis) as it will be probably necessary to build "artificial" words.
Updated by Tihelka Dan over 8 years ago
- File asf2json_mix.py asf2json_mix.py added
- Assignee changed from Tihelka Dan to Grůber Martin
Script asf2json_mix.py should take ASF file with individual word instances and create a set of JSON definitions which can them be passed to TTS scripting to create synthetic words for listening.
Updated by Grůber Martin over 8 years ago
- Status changed from Assigned to Postponed