Dr Yoshi Gotoh
PhD
School of Computer Science
Lecturer
Student Projects Officer
Foundation Year Tutor
Member of the Speech and Hearing (SpandH) research group
y.gotoh@sheffield.ac.uk
+44 114 222 1908
+44 114 222 1908
Regent Court (DCS)
Full contact details
Dr Yoshi Gotoh
School of Computer Science
Regent Court (DCS)
211 Portobello
Sheffield
S1 4DP
School of Computer Science
Regent Court (DCS)
211 Portobello
Sheffield
S1 4DP
- Profile
-
Yoshi is a lecturer in the Department of Computer Science. He has a first degree in Engineering form the University of Tokyo and a PhD from Brown University.
- Research interests
-
Yoshi has been working in the field of speech and spoken language processing for years. His current interests include audio visual processing, in particular, video analysis and video information retrieval.
- Publications
-
Journal articles
- Graph-based topic models for trajectory clustering in crowd videos. Machine Vision and Applications, 31. View this article in WRRO
- Generating natural language tags for video information management. Machine Vision and Applications, 28(3-4), 243-265. View this article in WRRO
- View this article in WRRO A statistical model for annotating videos with human actions. Pakistan Journal of Statistics, 32(2), 109-123.
- A framework for creating natural language descriptions of video streams. Information Sciences, 303, 61-82. View this article in WRRO
- A unified spatio-temporal human body region tracking approach to action recognition. Neurocomputing, 161, 56-64. View this article in WRRO
- Spoken document retrieval based on confusion network with syllable fragments. International Journal of Advanced Robotic Systems, 9.
- On the subjectivity of human-authored summaries. NAT LANG ENG, 15, 193-213.
- Glasgow University at TRECVID 2009. 2009 TREC Video Retrieval Evaluation Notebook Papers.
- A cascaded broadcast news highlighter. IEEE T AUDIO SPEECH, 16(1), 151-161.
- Information extraction from broadcast news. PHILOS T ROY SOC A, 358(1769), 1295-1309.
- View this article in WRRO Topic-based mixture language modelling. Natural Language Engineering, 5(4), 355-375.
- View this article in WRRO Efficient training algorithms for HMM's using incremental estimation. IEEE T SPEECH AUDI P, 6(6), 539-548.
- Taggers for parsers. Artificial Intelligence, 85(1-2), 45-57.
- Taggers for parsers. Artificial Intelligence, 84(1-2), 357-357.
- Analysis of LPC/DFT features for an HMM-based alphadigit recognizer. IEEE Signal Processing Letters, 3(4), 103-106.
Conference proceedings papers
- University of Engineering & Technology, Lahore the University of Sheffield at TRECVID 2015: Instance search. 2015 TREC Video Retrieval Evaluation, TRECVID 2015
- The University of Sheffield and University of Engineering & Technology, Lahore at TECVID 2014: Instance search task. 2014 TREC Video Retrieval Evaluation, TRECVID 2014
- 3D visual speech animation using 2D videos. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp 2367-2371). Brighton, 12 May 2019 - 17 May 2019. View this article in WRRO
- Graph-based correlated topic model for motion patterns analysis in crowded scenes from tracklets. British Machine Vision Conference 2018, BMVC 2018
- Graph-based correlated topic model for motion patterns analysis in crowded scenes from tracklets. British Machine Vision Conference 2018, BMVC 2018
- Graph-based correlated topic model for trajectory clustering in crowded videos. IEEE Winter Conference on Applications of Computer Vision (pp 1029-1037), 12 March 2018 - 14 March 2018. View this article in WRRO
- Medical image colorization for better visualization and segmentation. Medical Image Understanding and Analysis, Vol. 723 (pp 571-580) View this article in WRRO
- View this article in WRRO Natural language descriptions for human activities in video streams. INLG 2017 - 10th International Natural Language Generation Conference, Proceedings of the Conference (pp 85-94)
- Natural language descriptions of human activities scenes: corpus generation and analysis. 5th Workshop on Vision and Language. Berlin
- Analysis of visemes in the GRID corpus. Abstract of UKspeech
- Overlapped interest and the impact of visual and audio information in the human perception. Abstract of UKspeech
- The University of Sheffield and University of Engineering & Technology, Lahore at TRECVID 2016: Video to text description task. 2016 TREC Video Retrieval Evaluation, TRECVID 2016
- Corpus generation and analysis: incorporating audio data towards curbing missing information. Proceedings of KDWEB
- Describing spatio-temporal relations between object volumes in video streams. AAAI Workshop - Technical Report, Vol. WS-15-14 (pp 2-8)
- University of Engineering & Technology, Lahore the University of Sheffield at TRECVID 2015: Instance search. 2015 TREC Video Retrieval Evaluation, TRECVID 2015
- Manifold matching with application to instance search based on video queries. ICISP. Cherbourg, 30 June 2014.
- Alignment of nearly-repetitive contents in a video stream with manifold embedding. ICASSP. Firenze
- Video clip retrieval by graph matching. ECIR. Amsterdam
- The University of Sheffield and University of Engineering & Technology, Lahore at TECVID 2014: Instance search task. 2014 TREC Video Retrieval Evaluation, TRECVID 2014
- Action recognition: spatio-temporal human body region tracking approach. CAIP - REACTS workshop. York
- Spatio-temporal manifold embedding for nearly-repetitive contents in a video stream. CAIP. York
- Spatio-temporal human body segmentation from video stream. CAIP. York
- The University of Sheffield, Harbin Engineering University and University of Engineering & Technology, Lahore at TRECVID 2013: Instance search & semantic indexing. 2013 TREC Video Retrieval Evaluation, TRECVID 2013
- The University of Sheffield, Harbin Engineering University and University of Engineering & Technology, Lahore at TRECVID 2013: Instance Search & Semantic indexing. TRECVID
- The University of Sheffield and Harbin Engineering University at TRECVID 2012: Instance Search. TRECVID
- Human focused video description. ICCV - VECTaR workshop. Barcelona
- Video scene classification based on natural language description. ICCV - ARTEMIS workshop. Barcelona
- Towards coherent natural language description of video streams. ICCV - SIG workshop. Barcelona
- Nearly-repetitive video synchonisation using nonlinear manifold embedding. ICASSP. Dallas
- University of Sheffield at TRECVID 2008: Rushes Summarisation and Video Copy Detection.. TRECVID
- Shot alignment in pre-production video. MLMI. Utrecht
- University of Sheffield at TRECVID 2007: Shot Boundary Detection and Rushes Summarisation.. TRECVID
- Speaker Role Based Structural Classification of Broadcast News Stories. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4 (pp 141-144)
- Relative Evaluation of Informativeness in Machine Generated Summaries. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4 (pp 145-148)
- Mutli-stage compaction approach to broadcast news summarisation. Interspeech. Lisbon
- On the subjectivity of human authored short summaries. ACL Workshop: Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarizati. Ann Arbor
- Maximum entropy segmentation of broadcast news. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5 (pp 1029-1032)
- Decremental feature-based compaction. DUC Workshop. Boston
- From text summarisation to style-specific summarisation for broadcast news. ADVANCES IN INFORMATION RETRIEVAL, PROCEEDINGS, Vol. 2997 (pp 223-237)
- Are extractive text summarisation techniques portable to broadcast news?. ASRU'03: 2003 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING ASRU '03 (pp 489-494)
- Exploring the style-technique interaction in extractive summarization of broadcast news. ASRU'03: 2003 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING ASRU '03 (pp 495-500)
- Statistical language modelling. TEXT- AND SPEECH-TRIGGERED INFORMATION ACCESS, Vol. 2705 (pp 78-105)
- Punctuation Annotation Using Statistical Prosody Models. Proceedings of the ISCA Workshop on Prosody in Speech Recognition and Understanding (pp 35-40)
- Sentence boundary detection in broadcast speech transcripts. ISCA ASR Workshop. Paris
- Variable word rate n-grams. 2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI (pp 1591-1594)
- Integrated transcription and identification of named entities in broadcast speech. Eurospeech. Budapest
- Statistical annotation of named entities in spoken audio. ESCA Workshop: Accessing Information in Spoken Audio. Cambridge
- Named entity tagged language models. ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI (pp 513-516)
- Document space models using latent semantic analysis. Eurospeech. Rhodes
- Microphone-array speech recognition via incremental MAP training.. ICASSP. Atlanta
- Incremental ML estimation of HMM parameters for efficient training. ICASSP. Atlanta
- Incremental MAP estimation of HMMs for efficient training and improved performance. ICASSP. Detroit
- Using MAP estimated parameters to improve HMM speech recognition performance. ICASSP. Adelaide
- View this article in WRRO Improving audiovisual active speaker detection in egocentric recordings with the data-efficient image transformer. Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU 2023). Taipei, Taiwan, 16 December 2023 - 16 December 2023.
- Exploration of verbal descriptions and dynamic indoors environments for people with sight loss. proceedings of ACM CHI 2023
- Natural language descriptions for video streams. V&L Net Workshop. Sheffield, December 2012.
- Spatio-temporal SIFT and its application to human action classification. ECCV - VECTaR workshop. Firenze, October 2012.
- Spatio-temporal video representation with locality-constrained linear coding. ECCV - ARTEMIS workshop. Firenze, October 2012.
- Generating coherent natural language annotations for video streams. ICIP. Orlando, September 2012.
- Natural language descriptions of visual scenes: corpus generation and analysis. EACL workshop. Avignon, April 2012.
- Describing video contents in natural language. EACL workshop. Avignon, April 2012.
- Speaker role based structural classification of broadcast news stories. Interspeech 2007
- Relative evaluation of informativeness in machine generated summaries. Interspeech 2007
Working papers
- Grants
-
Current Grants
- Multimedia Analysis for Unsupervised Dubbing In Entertainment (MAUDIE), InnovateUK, 04/2018 to 03/2021, £393,115, as Co-PI
Previous Grants
- S3L: Statistical Summarization of Spoken Language, EPSRC, 12/2001 to 09/2005, £284,248, as Co-PI
- Professional activities and memberships
-
Member of the Speech and Hearing research group