publications | Ivory Yang

2026

ICML

Position: Scale is a False Promise for Endangered Languages

Ivory Yang, and Soroush Vosoughi

ICML (main conference), 2026

Abs

As endangered languages disappear, Machine Learning (ML) increasingly frames their revitalization as a problem of scale, emphasizing more data, larger models, and broader coverage. We posit that scale is not the limiting constraint in endangered language revitalization, and that progress lies in methodological and evaluative reorientation. Evidence from Language Identification (LID), Optical Character Recognition (OCR), and synthetic data generation shows that benchmark-driven scaling produces brittle or culturally misaligned outcomes, as evaluation and modeling lack epistemic fit. Advancement in this domain lies in rethinking methodology, by grounding evaluation in cultural fidelity, community trust, and situated use rather than abstract accuracy. The revitalization of endangered languages is not about the universality of success, but the specificity of care afforded to each language and community.
ACL

KinyaProp: Fine-Grained Propaganda Annotation in Kinyarwanda

Fabrice Manzi Niyigaba, Ivory Yang, and Soroush Vosoughi

ACL (main conference), 2026

Abs Paper

Propaganda is a widely used approach for shaping public opinion and disseminating misinformation in news media. While it has recently gained significant attention within the NLP community, research on fine grained propaganda detection remains heavily concentrated in high resource languages. To bridge this gap, we introduce KinyaProp, the first fine-grained propaganda dataset of its kind for Kinyarwanda and, to our knowledge, the first such resource created for a Bantu language. Using this dataset, we evaluate whether state-of-the-art LLMs can function as reliable annotators in a genuinely low resource and culturally grounded setting. Our results show that current multilingual LLMs do not reliably approximate human annotation behavior. Instead, they behave as conservative annotators whose performance is largely limited to lexically explicit cues, substantially under-identifying propaganda and exhibiting extremely low and unstable performance on discourse-level techniques. Our findings highlight an important limitation of recent successes in LLM based annotation reported for high resource languages, demonstrating that such results do not readily transfer to low resource settings, where scalable annotation would be most valuable. We release KinyaProp to support future research on fine grained propaganda detection and to enable more robust evaluation of multilingual models in underrepresented languages.
ICWSM

Detecting and Enhancing Intellectual Humility in Online Political Discourse

Samantha D’Alonzo, Rachel Chen, Weidong Zhang, Melody Yu, and 6 more authors

ICWSM, 2026

Abs

tbd

2025

EMNLP

Recontextualizing Revitalization: A Mixed Media Approach to Reviving the Nüshu Language

Ivory Yang, Xiaobo Guo, Yuxin Wang, Hefan Zhang, and 3 more authors

EMNLP (main conference), 2025

Abs Paper

Nüshu is an endangered language from Jiangyong County, China, and the world’s only known writing system created and used exclusively by women. Recent Natural Language Processing (NLP) work has digitized small Nüshu-Chinese corpora, but the script remains computationally inaccessible due to its handwritten, mixed-media form and dearth of multimodal resources. We address this gap with two novel datasets: NüshuVision, an image corpus of 500 rendered sentences in traditional vertical, right-to-left orthography, and NüshuStrokes, the first sequential handwriting recordings of all 397 Unicode Nüshu characters by an expert calligrapher. Evaluating five state-of-the-art Chinese Optical Character Recognition (OCR) systems on NüshuVision shows that all fail entirely, each yielding a Character Error Rate (CER) of 1.0. Fine-tuning Microsoft’s TrOCR on NüshuVision lowers CER to 0.67, a modest yet meaningful improvement. These contributions establish the first multimodal foundation for Nüshu revitalization and offer a culturally grounded framework for language preservation.
ACL

Visibility as Survival: Generalizing NLP for Native Alaskan Language Identification

Ivory Yang, Chunhui Zhang, Yuxin Wang, Zhongyu Ouyang, and 1 more author

ACL (findings), 2025

Abs Paper

Indigenous languages remain largely invisible in commercial language identification (LID) systems, a stark reality exemplified by Google Translate’s LangID tool, which supports over 100 languages but excludes all 150 Indigenous languages of North America. This technological marginalization is particularly acute for Alaska’s 20 Native languages, all of which face endangerment despite their rich linguistic heritage. We present GenAlaskan, a framework demonstrating how both large language models and specialized classifiers can effectively identify these languages with minimal data. Working closely with Native Alaskan community members, we create Akutaq-2k, a carefully curated dataset of 2000 sentences spanning all 20 languages, named after the traditional Yup’ik dessert, symbolizing the blending of diverse elements. We design few-shot prompting on proprietary and open-source LLMs, achieving nearly perfect accuracy with just 40 examples per language. While initial zero-shot attempts show limited success, our systematic attention head pruning revealed critical architectural components for accurate language differentiation, providing insights into model decision-making for low-resource languages. Our results challenge the notion that effective Indigenous language identification requires massive resources or corporate infrastructure, demonstrating that targeted technological interventions can drive meaningful progress in preserving endangered languages in the digital age.
NAACL

Is it Navajo? Accurate Language Detection in Endangered Athabaskan Languages

Ivory Yang, Weicheng Ma, Chunhui Zhang, and Soroush Vosoughi

NAACL (main conference, oral), 2025

Abs Paper

Endangered languages, such as Navajo—the most widely spoken Native American language—are significantly underrepresented in contemporary language technologies, exacerbating the challenges of their preservation and revitalization. This study evaluates Google’s Language Identification (LangID) tool, which does not currently support any Native American languages. To address this, we introduce a random forest classifier trained on Navajo and twenty erroneously suggested languages by LangID. Despite its simplicity, the classifier achieves near-perfect accuracy (97-100%). Additionally, the model demonstrates robustness across other Athabaskan languages—a family of Native American languages spoken primarily in Alaska, the Pacific Northwest, and parts of the Southwestern United States—suggesting its potential for broader application. Our findings underscore the pressing need for NLP systems that prioritize linguistic diversity and adaptability over centralized, one-size-fits-all solutions, especially in supporting underrepresented languages in a multicultural world. This work directly contributes to ongoing efforts to address cultural biases in language models and advocates for the development of culturally localized NLP tools that serve diverse linguistic communities.
COLING

NüshuRescue: Revitalization of the Endangered Nüshu Language with AI

Ivory Yang, Weicheng Ma, and Soroush Vosoughi

COLING (main conference, oral), 2025

Abs Paper

The preservation and revitalization of endangered and extinct languages is a meaningful endeavor, conserving cultural heritage while enriching fields like linguistics and anthropology. However, these languages are typically low-resource, making their reconstruction labor-intensive and costly. This challenge is exemplified by Nüshu, a rare script historically used by Yao women in China for self-expression within a patriarchal society. To address this challenge, we introduce NüshuRescue, an AI-driven framework designed to train large language models (LLMs) on endangered languages with minimal data. NüshuRescue automates evaluation and expands target corpora to accelerate linguistic revitalization. As a foundational component, we developed NCGold, a 500-sentence Nüshu-Chinese parallel corpus, the first publicly available dataset of its kind. Leveraging GPT-4-Turbo, with no prior exposure to Nüshu and only 35 short examples from NCGold, NüshuRescue achieved 48.69% translation accuracy on 50 withheld sentences and generated NCSilver, a set of 98 newly translated modern Chinese sentences of varying lengths. A sample of both NCGold and NCSilver is included in the Supplementary Materials. Additionally, we developed FastText-based and Seq2Seq models to further support research on Nüshu. NüshuRescue provides a versatile and scalable tool for the revitalization of endangered languages, minimizing the need for extensive human input.
NAACL

Communication is All You Need: Persuasion Dataset Construction via Multi-LLM Communication

Weicheng Ma, Hefan Zhang, Ivory Yang, Shiyu Ji, and 7 more authors

NAACL (main conference, oral), 2025

Abs Paper

Large Language Models (LLMs) have shown proficiency in generating persuasive dialogue, yet concerns about the fluency and sophistication of their outputs persist. This paper presents a multi-LLM communication framework designed to enhance the generation of persuasive data automatically. This framework facilitates the efficient production of high-quality, diverse linguistic content with minimal human oversight. Through extensive evaluations, we demonstrate that the generated data excels in naturalness, linguistic diversity, and the strategic use of persuasion, even in complex scenarios involving social taboos. The framework also proves adept at generalizing across novel contexts. Our results highlight the framework’s potential to significantly advance research in both computational and social science domains concerning persuasive communication.
EMNLP

Enhancing LLM-Based Persuasion Simulations with Cultural and Speaker-Specific Information

Weicheng Ma, Hefan Zhang, Shiyu Ji, Farnoosh Hashemi, and 7 more authors

EMNLP (findings), 2025

Abs Paper

Large language models (LLMs) have been used to synthesize persuasive dialogues for studying persuasive behavior. However, existing approaches often suffer from issues such as stance oscillation and low informativeness. To address these challenges, we propose reinforced instructional prompting, a method that ensures speaker characteristics consistently guide all stages of dialogue generation. We further introduce multilingual prompting, which aligns language use with speakers’ native languages to better capture cultural nuances. Our experiments involving speakers from eight countries show that continually reinforcing speaker profiles and cultural context improves argument diversity, enhances informativeness, and stabilizes speaker stances. Moreover, our analysis of inter-group versus intra-group persuasion reveals that speakers engaging within their own cultural groups employ more varied persuasive strategies than in cross-cultural interactions. These findings underscore the importance of speaker and cultural awareness in LLM-based persuasion modeling and suggest new directions for developing more personalized, ethically grounded, and culturally adaptive LLM-generated dialogues.

2024

ACL

MentalManip: A Dataset For Fine-grained Analysis of Mental Manipulation in Conversations

Yuxin Wang, Ivory Yang, Saeed Hassanpour, and Soroush Vosoughi

ACL (main conference, oral), 2024

Abs Paper

Mental manipulation, a significant form of abuse in interpersonal conversations, presents a challenge to identify due to its context-dependent and often subtle nature. The detection of manipulative language is essential for protecting potential victims, yet the field of Natural Language Processing (NLP) currently faces a scarcity of resources and research on this topic. Our study addresses this gap by introducing a new dataset, named MentalManip, which consists of 4,000 annotated movie dialogues. This dataset enables a comprehensive analysis of mental manipulation, pinpointing both the techniques utilized for manipulation and the vulnerabilities targeted in victims. Our research further explores the effectiveness of leading-edge models in recognizing manipulative dialogue and its components through a series of experiments with various configurations. The results demonstrate that these models inadequately identify and categorize manipulative content. Attempts to improve their performance by fine-tuning with existing datasets on mental health and toxicity have not overcome these limitations. We anticipate that MentalManip will stimulate further research, leading to progress in both understanding and mitigating the impact of mental manipulation in conversations.
EMNLP

Enhanced Detection of Conversational Mental Manipulation Through Advanced Prompting Techniques

Ivory Yang, Xiaobo Guo, Sean Xie, and Soroush Vosoughi

EMNLP (WiNLP), 2024

Abs Paper

This study presents a comprehensive, long-term project to explore the effectiveness of various prompting techniques in detecting dialogical mental manipulation. We implement Chain-of-Thought prompting with Zero-Shot and Few-Shot settings on a binary mental manipulation detection task, building upon existing work conducted with Zero-Shot and Few- Shot prompting. Our primary objective is to decipher why certain prompting techniques display superior performance, so as to craft a novel framework tailored for detection of mental manipulation. Preliminary findings suggest that advanced prompting techniques may not be suitable for more complex models, if they are not trained through example-based learning.