Abstract: This paper summarizes our recent efforts made to transcribe real-life Call Center conversations automatically with respect to non-verbal acoustic events, as well. Future Call Centers – as cognitive infocom systems – must respond automatically not only for well formed utterances but also for spontaneous and non-word speaker manifestations and must be robust against sudden noises. Conversational telephony speech transcription itself is a big challenge, primarily we address this issue on real-life (Bank and Insurance) tasks. In addition, we introduce several non-word acoustic modeling approaches and their integration to LVCSR (Large Vocabulary Continuous Speech Recognition). In the experiments, one and two…channel (client and agent speech merged into one or left in two separate audio stream) transcription results, cross-task results and the handling of transcription data insufficiency are investigated – in parallel with the non-verbal acoustic event modeling. On the agent side less than 15% word error rate could be achieved and the best error rate reduction is 20% (relative) due to the inclusion of various written corpora and due to acoustic event handling.
Show more