Dr Xingyi Song
School of Computer Science
Lecturer in Computational Media Analysis, Natural Language Processing
Member of the Natural Language Processing research group
+44 114 222 1867
Full contact details
School of Computer Science
Regent Court (DCS)
211 Portobello
Sheffield
S1 4DP
- Profile
-
Dr Xingyi Song, a Lecturer in Computational Media Analysis at the Department of Computer Science, University of Sheffield. He is a member of the Natural Language Processing group and GATE team (https://gate.ac.uk/)
Previously he worked as a machine translation specialist at Iconic Translation Machine (2015-2016) and Research Associate for several EU funded projects such as Kconnect, Knowmak and Risis2 (from 2016-2021)) at the University of Sheffield.
He completed his MSc and PhD in Natural Language Processing group at the University of Sheffield. His research interests are in Natural Language Processing, Computational Social Science, sentiment analysis and Bio-medical text processing.
- Publications
-
Journal articles
- Comparison between parameter-efficient techniques and full fine-tuning: a case study on multilingual news article classification. PLoS ONE, 19(5). View this article in WRRO
- Examining Temporalities on Stance Detection Towards COVID-19 Vaccination. 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings, 6732-6738.
- Navigating Prompt Complexity for Zero-Shot Classification: A Study of Large Language Models in Computational Social Science. 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings, 12074-12086.
- Examining the Limitations of Computational Rumor Detection Models Trained on Static Datasets. 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings, 6739-6751.
- Large Language Models Offer an Alternative to the Traditional Approach of Topic Modelling. 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings, 10160-10171.
- Similarity-Aware Multimodal Prompt Learning for fake news detection. Information Sciences, 647, 119446-119446.
- Classifying COVID-19 Vaccine Narratives. International Conference Recent Advances in Natural Language Processing, RANLP, 648-657.
- Don’t waste a single annotation: improving single-label classifiers through soft labels. Findings of the Association for Computational Linguistics: EMNLP 2023.
- VaxxHesitancy: A Dataset for Studying Hesitancy Towards COVID-19 Vaccination on Twitter.. CoRR, abs/2301.06660.
- Finding Already Debunked Narratives via Multistage Retrieval: Enabling Cross-Lingual, Cross-Dataset and Zero-Shot Learning.. CoRR, abs/2308.05680.
- An exploratory study on utilising the web of linked data for product data mining. SN Computer Science, 4(1). View this article in WRRO
- Text mining occupations from the mental health electronic health record: A natural language processing approach using records from the Clinical Record Interactive Search (CRIS) platform in south London, UK. BMJ Open, 11(3).
- Using ontologies to map between research data and policymakers’ presumptions: the experience of the KNOWMAK project. Scientometrics. View this article in WRRO
- A Python script for adaptive layout optimization of trusses. Structural and Multidisciplinary Optimization. View this article in WRRO
- CogStack - experiences of deploying integrated information retrieval and extraction services in a large National Health Service Foundation Trust hospital. BMC Medical Informatics and Decision Making, 18. View this article in WRRO
- CogStack - Experiences Of Deploying Integrated Information Retrieval And Extraction Services In A Large National Health Service Foundation Trust Hospital.
- VaxxHesitancy: A Dataset for Studying Hesitancy towards COVID-19 Vaccination on Twitter. Proceedings of the International AAAI Conference on Web and Social Media, 17, 1052-1062.
- Similarity-Aware Multimodal Prompt Learning for Fake News Detection. SSRN Electronic Journal.
- Classification aware neural topic model for COVID-19 disinformation categorisation. PLOS ONE, 16(2), e0247086-e0247086.
Chapters
- The CLEF-2024 CheckThat! Lab: Check-Worthiness, Subjectivity, Persuasion, Roles, Authorities, and Adversarial Robustness, Lecture Notes in Computer Science (pp. 449-458). Springer Nature Switzerland
- Overview of the CLEF-2024 CheckThat! Lab: Check-Worthiness, Subjectivity, Persuasion, Roles, Authorities, and Adversarial Robustness, Lecture Notes in Computer Science (pp. 28-52). Springer Nature Switzerland
Conference proceedings papers
- Identifying and Aligning Medical Claims Made on Social Media with Medical Evidence. 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings (pp 8580-8593)
- Navigating Prompt Complexity for Zero-Shot Classification: A Study of Large Language Models in Computational Social Science.. LREC/COLING (pp 12074-12086)
- Large Language Models Offer an Alternative to the Traditional Approach of Topic Modelling.. LREC/COLING (pp 10160-10171)
- Examining the Limitations of Computational Rumor Detection Models Trained on Static Datasets.. LREC/COLING (pp 6739-6751)
- Examining Temporalities on Stance Detection towards COVID-19 Vaccination.. LREC/COLING (pp 6732-6738)
- SheffieldVeraAI at SemEval-2024 Task 4: Prompting and fine-tuning a Large Vision-Language Model for Binary Classification of Persuasion Techniques in Memes. Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), June 2024 - June 2024.
- Overview of the CLEF-2024 CheckThat! Lab Task 6 on Robustness of Credibility Assessment with Adversarial Examples (InCrediblAE). CEUR Workshop Proceedings, Vol. 3740 (pp 321-338)
- Optimising LLM-Driven Machine Translation with Context-Aware Sliding Windows. Proceedings of the Ninth Conference on Machine Translation (pp 1004-1010), November 2024 - November 2024.
- Enhancing Data Quality through Simple De-duplication: Navigating Responsible Computational Social Science Research. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (pp 12477-12492), November 2024 - November 2024.
- GATE Teamware 2: An open-source tool for collaborative document classification annotation. EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of System Demonstrations (pp 145-151)
- SheffieldVeraAI at SemEval-2023 Task 3: Mono and Multilingual Approaches for News Genre, Topic and Persuasion Technique Classification. Proceedings of the The 17th International Workshop on Semantic Evaluation (SemEval-2023), July 2023 - July 2023.
- Classification-Aware Neural Topic Model CombinedWith Interpretable Analysis - For Conflict Classification. International Conference Recent Advances in Natural Language Processing, RANLP (pp 666-672)
- Categorising Fine-to-Coarse Grained Misinformation: An Empirical Study of the COVID-19 Infodemic. International Conference Recent Advances in Natural Language Processing, RANLP (pp 556-567)
- A Large-Scale Comparative Study of Accurate COVID-19 Information versus Misinformation.. CoRR, Vol. abs/2304.04811
- Classifying COVID-19 Vaccine Narratives.. RANLP (pp 648-657)
- Don't waste a single annotation: improving single-label classifiers through soft labels.. EMNLP (Findings) (pp 5347-5355)
- Comparative Analysis of Engagement, Themes, and Causality of Ukraine-Related Debunks and Disinformation (pp 128-143)
- Comparative Analysis of Engagement, Themes, and Causality of Ukraine-Related Debunks and Disinformation.. SocInfo, Vol. 13618 (pp 128-143)
- Comparing topic-aware neural networks for bias detection of news. Proceedings of 24th European Conference on Artificial Intelligence (ECAI 2020), Vol. 325 (pp 2054-2061). Santiago de Compostela, Spain, 29 August 2020 - 2 September 2020. View this article in WRRO
- Using deep neural networks with intra- And inter-sentence context to classify suicidal behaviour. LREC 2020 - 12th International Conference on Language Resources and Evaluation, Conference Proceedings (pp 1303-1310)
- View this article in WRRO RP-DNN: A tweet level propagation context based deep neural networks for early rumor detection in social media. LREC 2020 - 12th International Conference on Language Resources and Evaluation, Conference Proceedings (pp 6094-6105)
- Team Bertha von Suttner at SemEval-2019 Task 4: Hyperpartisan News Detection using ELMo Sentence Representation Convolutional Network. Proceedings of the 13th International Workshop on Semantic Evaluation, June 2019 - June 2019.
- View this article in WRRO Team Bertha von Suttner at SemEval-2019 Task 4: Hyperpartisan News Detection using ELMo Sentence Representation Convolutional Network. Proceedings of the 13th International Workshop on Semantic Evaluation. Minneapolis, Minnesota, USA, 6 June 2019 - 7 June 2019.
- A deep neural network sentence level classification method with context information. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (pp 900-904). Brussels, Belgium, 31 October 2018 - 4 November 2018. View this article in WRRO
- A Deep Neural Network Sentence Level Classification Method with Context Information. 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018) (pp 900-904)
- Comparing Attitudes to Climate Change in the Media using sentiment analysis based on Latent Dirichlet Allocation. Proceedings of the 2017 EMNLP Workshop: Natural Language Processing meets Journalism, September 2017 - September 2017.
- Sheffield Systems for the English-Romanian WMT Translation Task. Proceedings of the First Conference on Machine Translation
- Data selection for discriminative training in statistical machine translation. Proceedings of the 17th Annual Conference of the European Association for Machine Translation, EAMT 2014 (pp 45-52)
- BLEU deconstructed: Designing a Better MT Evaluation Metric. Proceedings of the 14th International Conference on Intelligent Text Processing and Computational Linguistics (CICLING)
- Regression and Ranking based Optimisation for Sentence Level Machine Translation Evaluation. Proceedings of the Sixth Workshop on Statistical Machine Translation. Edinburgh, UK
Datasets
Preprints
- Enhancing Data Quality through Simple De-duplication: Navigating Responsible Computational Social Science Research, arXiv.
- Addressing Topic Granularity and Hallucination in Large Language Models for Topic Modelling, arXiv.
- Large Language Models Offer an Alternative to the Traditional Approach of Topic Modelling, arXiv.
- Don't Waste a Single Annotation: Improving Single-Label Classifiers Through Soft Labels, arXiv.
- Suicide prediction with natural language processing of electronic health records, Cold Spring Harbor Laboratory.
- Examining the Limitations of Computational Rumor Detection Models Trained on Static Datasets, arXiv.
- Comparison between parameter-efficient techniques and full fine-tuning: A case study on multilingual news article classification, arXiv.
- Bio-SIEVE: Exploring Instruction Tuning Large Language Models for Systematic Review Automation, arXiv.
- Finding Already Debunked Narratives via Multistage Retrieval: Enabling Cross-Lingual, Cross-Dataset and Zero-Shot Learning, arXiv.
- Navigating Prompt Complexity for Zero-Shot Classification: A Study of Large Language Models in Computational Social Science, arXiv.
- A Large-Scale Comparative Study of Accurate COVID-19 Information versus Misinformation, arXiv.
- Examining Temporalities on Stance Detection towards COVID-19 Vaccination, arXiv.
- SheffieldVeraAI at SemEval-2023 Task 3: Mono and multilingual approaches for news genre, topic and persuasion technique classification, arXiv.
- VaxxHesitancy: A Dataset for Studying Hesitancy towards COVID-19 Vaccination on Twitter, arXiv.
- Comparative Analysis of Engagement, Themes, and Causality of Ukraine-Related Debunks and Disinformation, arXiv.
- Classifying COVID-19 vaccine narratives, arXiv.
- Categorising Fine-to-Coarse Grained Misinformation: An Empirical Study of the COVID-19 Infodemic, Research Square Platform LLC.
- Categorising Fine-to-Coarse Grained Misinformation: An Empirical Study of COVID-19 Infodemic, arXiv.
- Bio-YODIE: A Named Entity Linking System for Biomedical Text, arXiv.
- A Deep Neural Network Sentence Level Classification Method with Context Information, arXiv.
- Grants
-
ASIMOV: AI-as-a-service, Innovate UK, 01/2024 - 03/2025, £142,691, as PI.