Dr Xingyi Song
School of Computer Science
Lecturer in Computational Media Analysis, Natural Language Processing
Outreach Support
Member of the Natural Language Processing research group


+44 114 222 1867
Full contact details
School of Computer Science
Regent Court (DCS)
211 Portobello
Sheffield
S1 4DP
- Profile
-
Dr Xingyi Song, a Lecturer in Computational Media Analysis at the Department of Computer Science, University of Sheffield. He is a member of the Natural Language Processing group and GATE team (https://gate.ac.uk/)
Previously he worked as a machine translation specialist at Iconic Translation Machine (2015-2016) and Research Associate for several EU funded projects such as Kconnect, Knowmak and Risis2 (from 2016-2021)) at the University of Sheffield.
He completed his MSc and PhD in Natural Language Processing group at the University of Sheffield. His research interests are in Natural Language Processing, Computational Social Science, sentiment analysis and Bio-medical text processing.
- Publications
-
Journal articles
- Cross-modal augmentation for few-shot multimodal fake news detection. Engineering Applications of Artificial Intelligence, 142, 109931-109931.
- Comparison between parameter-efficient techniques and full fine-tuning: a case study on multilingual news article classification. PLoS ONE, 19(5). View this article in WRRO
- Similarity-Aware Multimodal Prompt Learning for fake news detection. Information Sciences, 647, 119446-119446.
- Classifying COVID-19 Vaccine Narratives. International Conference Recent Advances in Natural Language Processing, RANLP, 648-657.
- Don’t waste a single annotation: improving single-label classifiers through soft labels. Findings of the Association for Computational Linguistics: EMNLP 2023.
- An exploratory study on utilising the web of linked data for product data mining. SN Computer Science, 4(1). View this article in WRRO
- Text mining occupations from the mental health electronic health record: A natural language processing approach using records from the Clinical Record Interactive Search (CRIS) platform in south London, UK. BMJ Open, 11(3).
- Using ontologies to map between research data and policymakers’ presumptions: the experience of the KNOWMAK project. Scientometrics. View this article in WRRO
- A Python script for adaptive layout optimization of trusses. Structural and Multidisciplinary Optimization. View this article in WRRO
- CogStack - experiences of deploying integrated information retrieval and extraction services in a large National Health Service Foundation Trust hospital. BMC Medical Informatics and Decision Making, 18. View this article in WRRO
- CogStack - Experiences Of Deploying Integrated Information Retrieval And Extraction Services In A Large National Health Service Foundation Trust Hospital.
- VaxxHesitancy: A Dataset for Studying Hesitancy towards COVID-19 Vaccination on Twitter. Proceedings of the International AAAI Conference on Web and Social Media, 17, 1052-1062.
- Similarity-Aware Multimodal Prompt Learning for Fake News Detection. SSRN Electronic Journal.
- Classification aware neural topic model for COVID-19 disinformation categorisation. PLOS ONE, 16(2), e0247086-e0247086.
Chapters
- The CLEF-2024 CheckThat! Lab: Check-Worthiness, Subjectivity, Persuasion, Roles, Authorities, and Adversarial Robustness, Lecture Notes in Computer Science (pp. 449-458). Springer Nature Switzerland
- Overview of the CLEF-2024 CheckThat! Lab: Check-Worthiness, Subjectivity, Persuasion, Roles, Authorities, and Adversarial Robustness, Lecture Notes in Computer Science (pp. 28-52). Springer Nature Switzerland
Conference proceedings papers
- SheffieldVeraAI at SemEval-2024 Task 4: Prompting and fine-tuning a Large Vision-Language Model for Binary Classification of Persuasion Techniques in Memes. Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), June 2024 - June 2024.
- Optimising LLM-Driven Machine Translation with Context-Aware Sliding Windows. Proceedings of the Ninth Conference on Machine Translation (pp 1004-1010), November 2024 - November 2024.
- Enhancing Data Quality through Simple De-duplication: Navigating Responsible Computational Social Science Research. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (pp 12477-12492), November 2024 - November 2024.
- SheffieldVeraAI at SemEval-2023 Task 3: Mono and Multilingual Approaches for News Genre, Topic and Persuasion Technique Classification. Proceedings of the The 17th International Workshop on Semantic Evaluation (SemEval-2023), July 2023 - July 2023.
- Classification-Aware Neural Topic Model CombinedWith Interpretable Analysis - For Conflict Classification. International Conference Recent Advances in Natural Language Processing, RANLP (pp 666-672)
- Categorising Fine-to-Coarse Grained Misinformation: An Empirical Study of the COVID-19 Infodemic. International Conference Recent Advances in Natural Language Processing, RANLP (pp 556-567)
- Comparative Analysis of Engagement, Themes, and Causality of Ukraine-Related Debunks and Disinformation (pp 128-143)
- Comparing topic-aware neural networks for bias detection of news. Proceedings of 24th European Conference on Artificial Intelligence (ECAI 2020), Vol. 325 (pp 2054-2061). Santiago de Compostela, Spain, 29 August 2020 - 2 September 2020. View this article in WRRO
- View this article in WRRO
- Team Bertha von Suttner at SemEval-2019 Task 4: Hyperpartisan News Detection using ELMo Sentence Representation Convolutional Network. Proceedings of the 13th International Workshop on Semantic Evaluation, June 2019 - June 2019.
- View this article in WRRO
- A deep neural network sentence level classification method with context information. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (pp 900-904). Brussels, Belgium, 31 October 2018 - 4 November 2018. View this article in WRRO
- Comparing Attitudes to Climate Change in the Media using sentiment
analysis based on Latent Dirichlet Allocation. Proceedings of the 2017 EMNLP Workshop: Natural Language Processing
meets Journalism, September 2017 - September 2017.
Datasets
Preprints
- Addressing Topic Granularity and Hallucination in Large Language Models for Topic Modelling, arXiv.
- Large Language Models Offer an Alternative to the Traditional Approach of Topic Modelling, arXiv.
- Don't Waste a Single Annotation: Improving Single-Label Classifiers Through Soft Labels, arXiv.
- Suicide prediction with natural language processing of electronic health records, Cold Spring Harbor Laboratory.
- Examining the Limitations of Computational Rumor Detection Models Trained on Static Datasets, arXiv.
- Comparison between parameter-efficient techniques and full fine-tuning: A case study on multilingual news article classification, arXiv.
- Bio-SIEVE: Exploring Instruction Tuning Large Language Models for Systematic Review Automation, arXiv.
- Finding Already Debunked Narratives via Multistage Retrieval: Enabling Cross-Lingual, Cross-Dataset and Zero-Shot Learning, arXiv.
- Navigating Prompt Complexity for Zero-Shot Classification: A Study of Large Language Models in Computational Social Science, arXiv.
- A Large-Scale Comparative Study of Accurate COVID-19 Information versus Misinformation, arXiv.
- Examining Temporalities on Stance Detection towards COVID-19 Vaccination, arXiv.
- Similarity-Aware Multimodal Prompt Learning for Fake News Detection, arXiv.
- SheffieldVeraAI at SemEval-2023 Task 3: Mono and multilingual approaches for news genre, topic and persuasion technique classification, arXiv.
- VaxxHesitancy: A Dataset for Studying Hesitancy towards COVID-19 Vaccination on Twitter, arXiv.
- Comparative Analysis of Engagement, Themes, and Causality of Ukraine-Related Debunks and Disinformation, arXiv.
- Classifying COVID-19 vaccine narratives, arXiv.
- Categorising Fine-to-Coarse Grained Misinformation: An Empirical Study of the COVID-19 Infodemic, Research Square Platform LLC.
- Categorising Fine-to-Coarse Grained Misinformation: An Empirical Study of COVID-19 Infodemic, arXiv.
- Bio-YODIE: A Named Entity Linking System for Biomedical Text, arXiv.
- A Deep Neural Network Sentence Level Classification Method with Context Information, arXiv.
- Cross-modal augmentation for few-shot multimodal fake news detection. Engineering Applications of Artificial Intelligence, 142, 109931-109931.
- Grants
-
ASIMOV: AI-as-a-service, Innovate UK, 01/2024 - 03/2025, £142,691, as PI.