A single document or a zip-file containing documents to convert.
Basic (utterance detection and tokenisation)
Conversation Analysis notation (Jefferson system)
Corpus of Academic Spoken English (CASE)
Find more details on the home page
laugh hah laugh heh laugh hih laugh hoh laugh ha laugh ho laugh hee laugh h laugh haha laugh hehe laugh hoho laugh ha-ha laugh he-he laugh ho-ho filled-pause ehm filled-pause er filled-pause erm filled-pause uh filled-pause uhm filled-pause um filled-pause unh backchannel hm backchannel huh backchannel m backchannel mh backchannel mhm backchannel uh-huh backchannel uh-uh backchannel uhuh backchannel hunm backchannel nu-huh backchannel nuhuh backchannel uh-hum backchannel uhum backchannel hunh backchannel unh-unh backchannel unhunh interjection ah interjection aha interjection ah-ah interjection ahah interjection ugh interjection urgh interjection tsk
matched repeated characters
English only, uses the Stanford CoreNLP tagger. Alternatively, the XML produced by XTranscript should be compatible with other XML aware taggers, such as the