diff --git a/data/xml/2023.ccl.xml b/data/xml/2023.ccl.xml
index 7ae56b94d6..9adc63fd61 100644
--- a/data/xml/2023.ccl.xml
+++ b/data/xml/2023.ccl.xml
@@ -660,7 +660,7 @@
QianHongjin
DouZhicheng
583–599
- “Conditional question answering (CQA) is an important task in natural language processing thatinvolves answering questions that depend on specific conditions. CQA is crucial for domainsthat require the provision of personalized advice or making context-dependent analyses, such aslegal consulting and medical diagnosis. However, existing CQA models struggle with generatingmultiple conditional answers due to two main challenges: (1) the lack of supervised training datawith diverse conditions and corresponding answers, and (2) the difficulty to output in a complexformat that involves multiple conditions and answers. To address the challenge of limited super-vision, we propose LSD (Learning on Structured Documents), a self-supervised learning methodon structured documents for CQA. LSD involves a conditional problem generation method anda contrastive learning objective. The model is trained with LSD on massive unlabeled structureddocuments and is fine-tuned on labeled CQA dataset afterwards. To overcome the limitation ofoutputting answers with complex formats in CQA, we propose a pipeline that enables the gen-eration of multiple answers and conditions. Experimental results on the ConditionalQA datasetdemonstrate that LSD outperforms previous CQA models in terms of accuracy both in providinganswers and conditions.”
+ “Conditional question answering (CQA) is an important task in natural language processing that involves answering questions that depend on specific conditions. CQA is crucial for domainsthat require the provision of personalized advice or making context-dependent analyses, such aslegal consulting and medical diagnosis. However, existing CQA models struggle with generatingmultiple conditional answers due to two main challenges: (1) the lack of supervised training datawith diverse conditions and corresponding answers, and (2) the difficulty to output in a complexformat that involves multiple conditions and answers. To address the challenge of limited super-vision, we propose LSD (Learning on Structured Documents), a self-supervised learning methodon structured documents for CQA. LSD involves a conditional problem generation method anda contrastive learning objective. The model is trained with LSD on massive unlabeled structureddocuments and is fine-tuned on labeled CQA dataset afterwards. To overcome the limitation ofoutputting answers with complex formats in CQA, we propose a pipeline that enables the gen-eration of multiple answers and conditions. Experimental results on the ConditionalQA datasetdemonstrate that LSD outperforms previous CQA models in terms of accuracy both in providinganswers and conditions.”
2023.ccl-1.51
eng
zihan-etal-2023-learning
diff --git a/data/xml/2024.ccl.xml b/data/xml/2024.ccl.xml
new file mode 100644
index 0000000000..2aa2653631
--- /dev/null
+++ b/data/xml/2024.ccl.xml
@@ -0,0 +1,2077 @@
+
+
+
+
+ Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)
+ MaosongSun
+ JiyeLiang
+ XianpeiHan
+ ZhiyuanLiu
+ YulanHe
+ Chinese Information Processing Society of China
+ Taiyuan, China
+ July
+ 2024
+ 2024.ccl-1
+ ccl
+
+
+ MITF:基于图像映射文本特征的跨模态图文检索方法(MITF:Cross-modal Image-text Retrieval Method with Mapping Images to Text Features)
+ LouXinyue馨月娄
+ LiYou铀李
+ QiRui睿齐
+ ChenYufeng钰枫陈
+ XuJinan金安徐
+ 1–14
+ “减小图文信息间的语义鸿沟,促进跨模态信息的对齐与融合一直是解决跨模态图文检索问题的关键。但现有的双流模型因为训练时图像编码器与文本编码器是分开的,导致图文特征的对齐与融合较难。因此,本文提出图像映射文本特征(MITF)网络将不同模态(图像和文本)的信息映射到单一模态(文本),进一步增强跨模态语义的融合和对齐,提高图文检索的性能。具体地,在冻结预训练的中文视觉语言模型Chinese-CLIP参数的情况下,训练一个MITF网络将图像映射为伪语言标记,在此基础上引入提示词自动学习机制提升模型对于伪语言标记的理解能力。同时,在检索时构建Faiss索引提高检索速度。在三个开源数据集的实验结果表明所提方法相比原始Chinese-CLIP模型检索时的Mean Recall指标平均提高了3.7%,检索速度提高了约4倍。同时,图文特征可视化结果进一步表明所提方法提高了图像特征与文本特征的对齐程度。”
+ 2024.ccl-1.1
+ zho
+ xinyue-etal-2024-mitf
+
+
+ 基于ChatGPT查询改写的文档检索方法(Document Retrieval Method Based on ChatGPT Query Rewriting)
+ LiAo澳李
+ TuXinhui新辉涂
+ XiongYinghao英豪熊
+ 15–26
+ “查询改写是一种通过优化查询从而提高检索结果质量的技术。传统的基于伪相关反馈的方法受限于伪相关文档的质量。本文提出了一种基于ChatGPT查询改写的文档检索方法。这种方法不依赖伪相关文档,可以避免伪相关文档质量不高的问题。首先,利用BM25模型进行检索,获得初次检索结果集;同时借助ChatGPT生成新查询;然后分别将原始查询和新查询作为输入,利用重排模型对初次检索结果集进行重排,得到各自的文档相关性得分;最后,将两个查询的文档相关性得分进行融合,得到最终的文档得分。在多个检索测试集上的实验结果表明,相比于基准模型,基于ChatGPT查询改写的文档检索方法在nDCG@10指标上平均提升了约4.5个百分点。”
+ 2024.ccl-1.2
+ zho
+ ao-etal-2024-ji
+
+
+ 基于汉语字词资源的检索增强生成与应用评估(Chinese Character- and Word-Based Retrieval Augmented Generation and Application)
+ YinYaqi雅琦殷
+ LiuYang扬刘
+ WangYue悦王
+ LiangQiliang启亮梁
+ 27–45
+ “汉语遵循“由字组词,由词造句”的原则,字词相关信息是一类基础且关键的计算资源。在大语言模型时代,挖掘并评价该类资源的效用是增强模型语言能力的一个重要研究方面。作为有效促进资源与模型结合的一种方式,检索增强生成目前在该类资源上的应用大都关注模型未学习过的濒危语言,其在模型已学习过语言上的潜在价值有待挖掘。本文基于语言学的视角,构建具有良好例句覆盖率与丰富度的字词资源,并借助检索增强生成技术路线,探索这类资源与不同任务、模型的结合方法。评估实验表明,该方法在所有实验模型与任务中均带来了显著的准确率提升,平均达4.78%,其中,在语素义消歧、词义消歧与隐喻识别任务中分别提升了6.91%、4.24%和3.19%,这展示出字词资源对模型的语言准确理解能力的潜在价值。这些资源构造、方法探索和应用评估,为语言学资源与大语言模型的结合提供了新的思路与方法。”
+ 2024.ccl-1.3
+ zho
+ yaqi-etal-2024-ji
+
+
+ 面向CQL的语料库检索引擎的高效实现(Efficient Implementation of a CQL-oriented Corpus Retrieval Engine)
+ LiuTingchao廷超刘
+ LuLuming鹿鸣鲁
+ YangLiner麟儿杨
+ WangYu雨王
+ 46–56
+ “语料库检索工具在语言学研究领域具有举足轻重的地位,对于高效获取信息至关重要。然而,当前国内语料库检索工具在语料库检索语言上缺乏统一标准,尤其支持语料库查询语言(CQL)的中文语料库检索工具相对稀缺。在使用不同分词粒度的语料库工具进行中文语料库检索时,会遇到噪声或数据召回难问题。为应对这些挑战,我们研发了支持多粒度分词的CQL 解析器系统CAMELS:一款支持CQL 语句检索,且兼容多粒度分词,支持非词典词检索的语料库检索引擎。经过多种分词器的测试,该引擎展现出了优异的召回率,并在性能上超越了BlackLab的检索速度,为语言学工作者提供了更加易用、精准的检索工具。”
+ 2024.ccl-1.4
+ zho
+ tingchao-etal-2024-mian
+
+
+ NNP-TDGM: 基于最近邻提示表征的术语DEF生成模型(NNP-TDGM: Nearest Neighbor Prompt Term DEF Generation Model)
+ ShenSijia思嘉沈
+ WangPeiyan裴岩王
+ WangShengren胜任王
+ WangLibang立帮王
+ 57–70
+ “该文研究基于HowNet的知识库描述语言语法体系的术语DEF自动生成问题,提出基于最近邻提示表征的术语DEF生成模型(NNP-TDGM),将训练集中的术语DEF构造为外显记忆集,在解码器生成(首)义原或关系时,检索与待预测术语概念结构相同或相近的术语所蕴含的核心概念,重要属性和关系类型,辅助模型完成DEF的生成,解决解码器在低频样本上训练不充分的问题。另外,通过提示预训练语言模型获得术语及术语定义内蕴涵概念信息的语义表征向量,改善编码器表征能力不足的问题。经实验验证NNP-TDGM模型生成术语DEF的义原-关系-义原三元组F1值达到31.84%、关系F1值达到53.12%、义原F1值达到51.55%、首义原F1值达到68.53%,相对于基线方法分别提升了3.38%,1.45%,1.08%,0.48%。”
+ 2024.ccl-1.5
+ zho
+ sijia-etal-2024-nnp
+
+
+ SpanCS:面向跨语言代码生成的片段级语码转换(SpanCS: Span-Level Code-Switching for Cross-Lingual Code Generation)
+ ZhuQingfu庆福朱
+ ZhouShiqi士祺周
+ WangShuo硕王
+ ZhangZhiming致铭张
+ WangHaoyu昊钰王
+ ChenQiguang麒光陈
+ CheWanxiang万翔车
+ 71–83
+ “跨语言代码生成旨在将英语到代码的生成能力迁移至其他自然语言。翻译-训 练(Translate-Train)和语码转换(Code-Switching)是实现跨语言迁移的两类经典数据增广方法,两者优势互补但尚未有效结合。为此,本文提出了一种面向跨语言代码生成的片段级语码转换(SpanCS)方法。首先,该方法利用语码转换框架关联源语言上下文与目标语言片段,以促进多种语言的交互和对齐。其次,该方法利用翻译-训练方法从完整的源语言翻译中提取目标语言片段,以保证增广数据与原始数据间的语义一致性。为了公平地评价多种自然语言之间代码生成的性能差异,本文通过人工翻译与校验,基于HumanEval构建了包含10种自然语言的多语言代码生成评测基MHumanEval。该基准上的三个主干模型的实验结果表明,SpanCS在跨语言代码生成任务上一致优于前人的数据增广方法。”
+ 2024.ccl-1.6
+ zho
+ qingfu-etal-2024-spancs
+
+
+ 场景图增强的视觉语言常识推理生成(Scene Graph Enhanced Visual Language Commonsense Reasoning Generation)
+ YuanFan凡袁
+ LiPiji丕绩李
+ 84–97
+ “视觉语言常识推理是一类旨在理解视觉场景的任务,常用于评估人工智能系统的多模态常识推理能力。然而,可靠的常识推理需要细致的场景理解,而现有的基于预训练模型微调的方法却无法有效地利用具体场景中存在的物体关系信息,因此其推理的合理性存在较大的局限性。为解决上述问题,本研究提出了一种场景图增强的视觉语言常识推理生成框架SGEVL。该框架首先使用图像补丁序列提供视觉信息,并通过一种包含注意力模块的门控机制,赋予大型语言模型理解视觉信息的能力。基于该框架的视觉语言能力,进一步提出了一种无位置信息的场景图生成方法。生成的场景图能够显著提升模型对场景信息的理解,从而引导生成高质量的回答和推理。通过在VCR,VQA-X和e-SNLI-VE数据集上分别实验,实验结果表明本文提出的视觉语言常识推理框架性能优于基线模型。此外,通过消融实验和结果可视化,进一步证明了该框架中每个模块的有效性。”
+ 2024.ccl-1.7
+ zho
+ fan-piji-2024-chang
+
+
+ 基于逻辑推理和多任务融合的认知刺激对话生成方法(Cognitive stimulation dialogue generation method based on logical reasoning and multi-task integration)
+ JiangYuru玉茹蒋
+ LiMengyuan梦媛李
+ TaoYuyang宇阳陶
+ QuKeming可明区
+ SheZepeng泽鹏佘
+ ShiShuicai水才施
+ 98–109
+ “在全球老龄化背景下,带有认知刺激的对话系统是保持老年人认知健康的重要手段。中文认知刺激对话数据集(Chinese Cognitive Stimulation Conversation Dataset,CSConv)和模型构建的研究工作刚刚开始。本文将认知刺激对话生成视为一个多任务融合的逻辑思维推理过程,将情感分类任务、决策任务和对话回复生成任务间的逻辑关系,建模为一个推理过程,来引导大语言模型生成。针对决策任务,本文提出分层编码器结构的决策模型。决策实验结果表明,决策模型有效的提高了决策任务的准确率。针对多任务过程,本文提出多任务融合方法,将三个任务对应的模型结合在一起。生成实验结果表明,分类、决策及生成的多任务融合方法,显著提升了对话回复能力,证明了该方法的有效性和先进性。”
+ 2024.ccl-1.8
+ zho
+ yuru-etal-2024-ji
+
+
+ 基于思维链的跨语言多文档摘要生成技术研究(Cross-lingual Multi-document Summarization Based on Chain-of-Thought)
+ QiTian天祁
+ YangJianan建安杨
+ ZhaoTiejun铁军赵
+ YangMuyun沐昀杨
+ 110–133
+ “随着全球化的加速发展,跨语言信息的高效传递与理解变得尤为重要。传统的多文档摘要生成技术可以提升信息获取效率,然而往往忽视了跨语言场景下的特殊挑战。为了缓解这一问题,本文提出了跨语言多文档摘要生成任务。我们首先构建了一个全面的跨语言多文档摘要测试集作为评估基准,其次提出了一种基于思维链技术的跨语言多文档摘要生成方法,并对其进行了实验验证。在实验中,我们使用了几种典型的大语言模型,并通过人工评估和自动评估来验证我们的方法。结果表明,我们提出的基于思维链的方法在跨语言多文档摘要生成任务上取得了显著的性能提升,为解决语言障碍下的信息获取问题提供了有效的解决方案。”
+ 2024.ccl-1.9
+ zho
+ tian-etal-2024-ji
+
+
+ 面向语言学习者的跨语言反馈评语生成方法(Cross-Lingual Feedback Comment Generation for Language Learners)
+ AnJiyuan纪元安
+ ZhuLin琳朱
+ YangErhong尔弘杨
+ 134–149
+ “反馈评语生成任务旨在为语言学习者的产出提供纠偏及解释性的评价,促进学习者写作能力的发展。现有研究主要聚焦于单语的反馈评语生成,如为英语学习者提供英文反馈评语,但这忽略了非母语学习者可能面临的理解障碍问题,尤其当评语中存在陌生的语言知识时。因此,本文提出跨语言反馈评语生成任务(CLFCG),目的是为语言学习者生成母语的反馈评语。本研究构建了首个英甭中跨语言反馈评语生成数据集,该数据集包含英语学习者产出的语句与相应的中文反馈评语,并探索了基于流水线的预训练语言模型引导增强生成方法,将修正编辑、线索词语和语法术语等作为输入的附加信息,引导和提示生成模型。实验结果表明,附加引导信息的预训练语言模型流水线方法在自动评估(BLEU:50.32)与人工评估(Precision:62.84)上表现良好。本文对实验结果进行了深入分析,以期为跨语言反馈评语生成任务提供更多见解。”
+ 2024.ccl-1.10
+ zho
+ jiyuan-etal-2024-mian
+
+
+ 文本样式和主题框架引导下的大模型辅助儿童新闻生成(Text Styles and Thematic Framework Guided Large Modeling to Aid Children’s News Generation)
+ DuXiaomeng晓蒙杜
+ YuDong东于
+ LiuPengyuan鹏远刘
+ 150–170
+ “主流新闻内容多针对成年人设计,不易于儿童理解,难以满足其阅读需求。对此,我们提出了一种基于主题的儿童新闻篇章结构框架(TNC-LLM)。该框架融合了文本样式定义(TSD)和主题类别定义(TCD)两大核心模块,TSD模块采用多种机器学习算法,从不同粒度分析文本样式风格和段落布局等特点,TCD模块针对不同主题进行了内容分析,以揭示儿童新闻的写作特点和内容的倾向性,确保内容的教育性和适宜性。本文实验主要评估了ChatGPT3.5等四个模型在将成年人新闻转换为面向儿童的新闻的性能。实验结果表明,TNC-LLM在儿童新闻内容生成任务中对内容的准确性、文本的趣味性以及教育性等关键维度有显著提升。此外,该框架具有普适性,能够应用于不同类型的大型语言模型。”
+ 2024.ccl-1.11
+ zho
+ xiaomeng-etal-2024-wen
+
+
+ 基于对比学习和排名一致性的古代汉语翻译质量评估模型(Ancient Chinese translation quality evaluation model based on contrastive learning and ranking consistency)
+ LiHuaiming怀明李
+ ShaoYanqiu艳秋邵
+ LiWei炜李
+ 171–182
+ “当前,虽然机器翻译的自动评估技术已展现出良好的性能,但将它们应用于古代汉语到现代汉语的翻译场景时效果并不理想。一方面,这些传统方法能较好地比较质量差异较大的译文的好坏,但是在评估质量相差不大的译文时往往难以区分优劣。另一方面,古代汉语的省略和复杂句式常导致翻译过程中出现漏译现象,而传统评估指标往往会给这类较差的译文偏高的分数。在本文中,我们提出了一种基于对比学习和排名一致性的古代汉语到现代汉语的翻译质量评估模型(CRATE)。该模型通过确保语义相似度和匹配度的排名一致性捕捉译文质量的细粒度排名信息。另外,我们在使用对比学习方法训练译文跟原文的匹配模型时,将原文自身作为负样本,有效解决了传统评估指标在译文出现漏译情况下仍给出高评分的问题。为了证明我们模型的有效性,我们构建了高质量的古代汉语到现代汉语翻译的人工评分测试集。实验结果表明,我们的模型优于强大的基线,与人类评分取得了更显著的相关性。”
+ 2024.ccl-1.12
+ zho
+ huaiming-etal-2024-ji
+
+
+ 基于两种新颖辅助任务的端到端语音翻译(End-to-End Speech Translation Enhanced by Two Novel Auxiliary Tasks)
+ DouHuaixia怀厦窦
+ LvuMengzhe孟哲吕
+ LiJunhui军辉李
+ 183–196
+ “端到端语音翻译具有跨模态和跨语言的特性,如何有效地利用这些特性是一个具有挑战性的问题。本文基于多任务学习框架,提出两种新颖辅助任务。语音增强的文本翻译任务通过在文本翻译任务中融入语音模态信息来缓解语音和文本的模态差异,最终提升语音翻译任务的性能。全局感知条件掩码语言建模任务能够同时建模转录文本和译文进而利用文本的全局上下文信息指导翻译模型的训练。在MuST-C数据集8个语向的实验结果表明,本文的方法显著优于基线系统,并且达到了与其它端到端语音翻译系统可竞争的性能水平。进一步的分析实验表明,本文的方法能够缓解语音和文本之间的模态差异并且在不损害文本翻译任务性能的情况下提升语音翻译任务的性能。”
+ 2024.ccl-1.13
+ zho
+ huaixia-etal-2024-ji
+
+
+ 基于隐性句逗号识别的汉语长句机器翻译(Machine translation of Chinese long sentences based on recognition of implicit period and comma)
+ ZhangWenjuan文娟张
+ LiManjia熳佳李
+ FengWenhe文贺冯
+ 197–205
+ “长句翻译一直是机器翻译的难题。本文根据汉语中相当数量的逗号(句内标点)和句号(句间标点)可相互转化的特点,提出”隐性句号”(可转化为句号的逗号)和”隐性逗号”(可转化为逗号的句号)概念,并实现其自动识别,以将汉语长句变为短句用于汉英机器翻译。为此,首先通过人工与半监督学习结合方法构建了一个隐性句逗数据集,实现了基于预训练模型的隐性句逗识别方法,其中性能最好的HierarchicalBERT作为后续应用模型。进而,实现了基于隐性句逗识别的汉英机器翻译方法。在WMT2018(新闻)和WMT2023(文学)测试语料上基于预训练机器翻译模型的实验表明,对于汉语长句的英译,本文方法相比基准翻译的BLEU值整体有所提高,而且在相对稳健机器翻译模型上,呈现为句子越长本文方法效果越明显。”
+ 2024.ccl-1.14
+ zho
+ wenjuan-etal-2024-ji
+
+
+ 基于知识蒸馏的低频词翻译优化策略(Knowledge Distillation-Based Optimization Strategy for Low-Frequency Word Translation in Neural Machine)
+ GuoYifan逸帆郭
+ ZanHongying红英昝
+ YanZiyue子悦阎
+ XuHongfei鸿飞许
+ 206–216
+ “神经机器翻译通常需要大量的平行语料库才能达到良好的翻译效果。而在不同的平行语料库中,均存在词频分布不平衡的问题,这可能导致模型在学习过程中表现出不同的偏差。这些模型倾向于学习高频词汇,而忽略了低频词汇所携带的关键语义信息。忽略的这些低频词汇也包含重要的翻译信息,可能会对翻译质量产生不利影响。目前的方法通常是训练一个双语模型,然后根据频率为词汇分配不同的权重,通过增加低频词的权重来提高低频词的翻译效果。在本文中,我们的目标是提高那些有意义但频率相对较低的词汇的翻译效果。本文提出使用知识蒸馏的方法来提高低频词的翻译效果,训练在低频词上翻译效果更好的模型,将其作为教师模型指导学生模型学习低频词翻译。进而提出一个更加稳定的双教师蒸馏模型,进一步保证高频的性能,使得模型在多个任务上均获得了稳定的提升。本文的单教师蒸馏模型在英语→ 德语任务上相较于SOTA进一步取得了0.64的BLEU提升,双教师蒸馏模型在汉语→ 英语任务上相较于SOTA进一步取得了0.31的BLEU提升,在英语→ 德语、英语→ 捷克语和英语→法语的翻译任务上相较于基线低频词翻译效果,在保证高频词翻译效果不变化的前提下,分别取得了1.24、0.47、0.87的BLEU提升。”
+ 2024.ccl-1.15
+ zho
+ yifan-etal-2024-ji
+
+
+ 融合确定性因子及区域密度的k-最近邻机器翻译方法(A k-Nearest-Neighbor Machine Translation Method Combining Certainty Factor and Region Density)
+ QiRui睿齐
+ ShiXiangyu响宇石
+ ManZhibo志博满
+ XuJinan金安徐
+ ChenYufeng钰枫陈
+ 217–229
+ “k-最近邻机器翻译(kNN-MT)是近年来神经机器翻译领域的一个重要研究方向。此类方法可以在不更新机器翻译模型的情况下提高翻译质量,但训练数据中高低频单词的数量不均衡限制了模型效果,且固定的k值无法对处于不同密度分布的数据都产生良好的翻译结果。为此本文提出了一种创新的kNN-MT方法,引入确定性因子(CF)来降低数据不均衡对模型效果的影响,并根据测试点周边数据密度动态选择k值。在多领域德-英翻译数据集上,相比基线实验,本方法在四个领域上翻译效果均有提升,其中三个领域上提升超过1个BLEU,有效提高了神经机器翻译模型的翻译质量。”
+ 2024.ccl-1.16
+ zho
+ rui-etal-2024-rong
+
+
+ Ko-LLaMA:基于LLaMA的朝鲜语大语言模型(Ko-LLaMA: A Korean Large Language Model Based on LLaMA)
+ PangJie杰庞
+ YanXiaodong晓东闫
+ ZhaoXiaobing小兵赵
+ 230–241
+ “大语言模型在这两年受到了非常广泛的关注,像ChatGPT和GPT-4这样的大型语言模型(LLMs)极大地改变了自然语言处理研究,并在通向人工通用智能(AGI)的道路上迈出了令人兴奋的步伐。尽管已经开源了LLaMA等几个大型语言模型,但这些模型主要关注英文和中文语料库,对其他语言的适用性有限。而对于少数民族语言如朝鲜语来说,大语言模型的适用性更加有限。在本文中,我们通过扩展LLaMA现有的词表,增加了额外的20000个朝鲜语Token,从而提高了其对朝鲜语的编码和语义理解的能力;并且进一步使用朝鲜语数据进行继续预训练,使用朝鲜语指令微调数据集对模型进行SFT(Supervised Fine-Tuning),并分析了不同数据量对指令精调效果的影响,经过继续预训练和指令微调后的模型显著提高了理解和遵循朝鲜语指令的能力。通过上述训练,极大增强了LLaMA的理解和生成朝鲜语文本的能力,并增强了其遵循指令的能力。实验结果表明,新提出的模型Ko-LLaMA显著提高了原版LLaMA在理解和生成朝鲜语内容方面的能力。此外,在鲜语文本分类数据集YNAT上对Ko-LLaMA与擅长少数民族语言的CINO模型及CINO的多种模型组合以及原版LLaMA和GPT3.5进行了效果对比。结果表明,Ko-LLaMA的朝鲜语文本分类能力远超CINO和CINO的组合模型以及LLaMA和GPT3.5等未经过朝鲜语语料进行词表扩充和继续预训练的大语言模型。”
+ 2024.ccl-1.17
+ zho
+ jie-etal-2024-ko
+
+
+ TiComR:基于提示的藏文对话型阅读理解模型(TiComR: A Prompt-based Tibetan Conversational Reading Comprehension Model)
+ PengmaoCairang才让朋毛
+ SunYuan媛孙
+ 242–253
+ “现有的对话型阅读模型在中英文对话型阅读理解任务中表现出色,但由于藏文在语法结构、表达方式等方面同中英文有显著差异,导致这些模型在对藏文对话型阅读理解的对话历史进行建模时存在困难。鉴于此,本文利用当前大模型的优越能力,提出了一种基于提示的对话历史建模方法-TicomR,以解决藏文对话型阅读理解任务中模型性能受限的问题。该方法通过引入基于提示的学习机制,直接在段落文本中添加提示来突显对话历史,而非修改段落标记嵌入,从而在微调过程中实现对对话历史的精确建模,以增强模型对问题的理解能力。实验结果表明,TiComR模型在藏文对话型阅读理解任务上取得了显著的性能提升,并在英文数据集CoQA上也有较好的表现。本文将TicomR开放供研究使用,http://github.com/Tshor/TicomR。”
+ 2024.ccl-1.18
+ zho
+ cairang-yuan-2024-ticomr
+
+
+ TiLamb:基于增量预训练的藏文大语言模型(TiLamb: A Tibetan Large Language Model Based on Incremental Pre-training)
+ ZhuangWenhao文浩庄
+ SunYuan媛孙
+ ZhaoXiaobing小兵赵
+ 254–267
+ “基于“预训练+微调”范式的语言模型展现了卓越的性能,随着模型规模和训练数据量的扩增,其解决多种自然语言处理任务的能力得到了显著的提高。当前的大语言模型主要支持英汉等主流语言,这限制了藏语等低资源语言在该领域的研究。针对藏语数据稀缺、现有藏语预训练模型效果不够好、下游任务可扩展性差等问题,本文汇总清洗得到26.43GB藏文数据,以开源的LLaMA2-7B作为基座模型,扩充LLaMA2现有词表,增加了约30,000个藏文tokens,提高其藏文编码效率和对藏文的语义理解能力,通过增量预训练得到藏文大语言模型基座TiLamb。根据多种藏文下游任务分别制作数千到几万条不等的微调数据集,微调后的TiLamb在藏文新闻分类、藏文实体关系分类、藏文机器阅读理解、藏文分词、藏文摘要、藏文问题回答、藏文问题生成共七个下游任务中进行验证,多项指标结果相较传统方法和其他藏文预训练模型有大幅提升。本文将TiLamb和部分资源开放供研究使用,https://github.com/NLP-Learning/TiLamb。”
+ 2024.ccl-1.19
+ zho
+ wenhao-etal-2024-tilamb
+
+
+ 基于蒙古文文本语义辅助的噪声鲁棒蒙古语语音情感识别方法研究(Research on Noise-Robust Mongolian Speech Emotion Recognition Methods Based on Mongolian Text Semantics)
+ LiuHuan欢刘
+ LiangKailin凯麟梁
+ ZuoHaolin昊麟左
+ LiuRui瑞刘
+ 268–279
+ “噪声环境下语音情感识别(Speech Emotion Recognition,SER)旨在从带有背景噪声的语音信号中挖掘情感特征并自动预测说话人的情感状态。尽管这项技术在英语、汉语等语言方面取得了迅速的进展,但对于像蒙古语这样的小语种,在噪声环境下的语音情感识别研究仍处于起步阶段,缺乏相关数据集和方法的研究。为了推动蒙古语语音情感识别的发展,本研究首先构建了一个单说话人语音情感识别数据集。之后为了实现噪声环境下准确的蒙古语语音情感识别,我们提出了一种基于文本-语音双模态的带噪蒙古语语音情感识别基线模型 MonSER。文本信息为噪声语音信号提供额外的语义信息。具体来说,我们的模型首先对带噪语音信号进行频谱特征提取,之后使用多语种预训练模型 XLMBert 对语音信号对应的蒙古文文本信息进行编码。随后将上述提取的双模态信息进行融合,并输入分类器进行情感类别的预测。我们利用该数据集进行模型训练并测试模型的有效性。实验结果表明,我们的双模态模型在多种噪声环境下的蒙古语语音情感识别准确率明显优于只以语音为输入的单模态语音情感识别系统。同时,为了模拟实际场景中文本可能缺失的情况,我们提出了两种文本 mask 策略,该文本实验也进一步验证了文本语音双模态的有效性。”
+ 2024.ccl-1.20
+ zho
+ huan-etal-2024-ji
+
+
+ 基于神经编解码语言模型的老挝语韵律建模方法(A Method for Lao Prosody Modeling Based on Neural Codec Language Model)
+ YiNingjing宁静易
+ WangLinqin琳钦王
+ GaoShengxiang盛祥高
+ YuZhengtao正涛余
+ 280–289
+ “为了赋予合成语音类似人类语言的丰富韵律和节奏变化,现有方法普遍采用基于随机数的时长预测器。这些方法通过使用随机数初始化的潜在变量来模拟人类说话的多样节奏变化。然而,由于依赖于随机数噪声的局限性,这些方法合成的语音往往仍然缺乏真实语音的多样性和韵律变化的丰富性。与之前方法不同,本文提出了一种基于神经编解码语言模型(VALL-E)的韵律建模方法,本文利用先验速度和音调时序变化曲线建模韵律变化分布,有效融入神经编解码语言模型训练过程中,并且在推理阶段可通过控制先验时序曲线控制生成语音的韵律。实验证明,本文方法合成英语音频达到了4.05的MOS评分,合成老挝语音频达到了3.61的MOS评分。基于神经编解码语言模型的老挝语韵律建模方法,能很好的在速度和音调方面实现韵律的可控性。”
+ 2024.ccl-1.21
+ zho
+ ningjing-etal-2024-ji
+
+
+ 基于通用依存句法的锡伯语句法树库构建研究(A Dependency Treebank for Xibe based on Universal Dependencies)
+ ZhouHe贺周
+ 290–304
+ “我国是一个多民族、多语种的国家,拥有丰富的民族语言资源。然而,使用人口较少、文化影响力较小的语言普遍面临语言濒危的问题,记录和保存这些语言在语言学、民族学与人类学上都具有重要意义。在本研究中,我们以我国仍在活跃使用的满通古斯语——锡伯语为目标语言,从锡伯语语法书、锡伯语报纸《察布查尔报》以及锡伯语《语文》教材中收集了 1200个句子,以此为语料构建了一个包含词汇、形态以及依存句法信息的树库。本文详细描述了树库的构建过程,深入讨论了标注过程中遇到的难以解决的语言现象,并提出了我们的标注策略。通过标注,我们发现,随着汉语和锡伯语的深层接触,锡伯语不仅在词汇上接受了大量的汉语借词,锡伯语句子结构也受到一定程度的影响。基于所标注的锡伯语树库,我们进行了锡伯语自动句法分析实验,探讨了词、词性、字符特征以及中国少数民族语言预训练模型 CINO对句法分析性能产生的影响。”
+ 2024.ccl-1.22
+ zho
+ he-2024-ji
+
+
+ 基于字节对编码的端到端藏语语音识别研究(End-to-End Tibetan Speech Recognition Study Based on Byte Pair Coding)
+ CaiYuqing郁青蔡
+ WangChao超王
+ RenzengDuojie多杰仁增
+ ZhuYulei宇雷朱
+ ZhangJin瑾张
+ NyimaTashi扎西尼玛
+ 305–313
+ “针对藏语端到端语音识别研究中存在的建模单元不统一和识别效果不理想的问题,本文提出了一种BPE-Conformer-CTC/Attention端到端藏语语音识别方法。首先,该方法采用了字节对编码算法进行语音建模,通过反复合并出现频率最高的字符对,将文本分割成易于管理、有意义的单元,平衡建模单元的粒度,从而解决藏语语音识别中建模单元不统一的问题。其 次 , 使 用 了Conformer编码器 , 有效地融合了音频序列的全局和局部依赖关系,从而增强了模型的表征能力。最后,通过CTC/Attention联合解码策略,加速了对齐和解码过程,进而提高了识别效果的准确性和效率。在开源数据集XBMU-AMDO31和TIBMD@MUCI上的实验结果表明,所提出的BPE-Conformer-CTC/Attention模型分别取得了9.0%和4.6%的词错误率,相较于基线模型Transformer-CTC/Attention,词错误率分别相对降低了14.2%和30.3%。该研究方法为藏语端到端语音识别任务提供了一种有效的解决方案。”
+ 2024.ccl-1.23
+ zho
+ yuqing-etal-2024-ji
+
+
+ 面向对话式阅读理解的高质量藏语数据集构建(Construction of high-quality Tibetan language dataset for conversational reading comprehension)
+ DawaCairen才仁达哇
+ PengmaoCairang才让朋毛
+ SunYuan媛孙
+ 314–325
+ “对话式阅读理解作为对话式人工智能领域的重要研究方向,旨在使机器能够理解自然语言文本,并能够进行多轮对话以解答与文本相关的问题。随着生成式大模型的发展,该任务也成为评测大模型性能的重要指标之一。在此过程中,高质量数据集的构建成为该领域的关键任务。目前,相关算法模型在许多英语数据集上取得了显著进展,甚至超过了人类表现。然而,对于低资源语言,尤其是缺乏相应数据集的藏语,对话式阅读理解研究尚处于起步阶段。本文采用了一种人工与半自动结合的方法策略,构建了藏语对话式阅读理解数据集TiconvQA(Tibetan Conversational QuestionAnswering)。该数据集共包含了20,358个对话对,涵盖了人物、地理和新闻三个领域。每一轮对话包括对话依据文本以及根据文本生成的多轮连续问答对。本文从对话数据的多样性、相关性、语言现象等方面对TiconvQA进行了详尽的分析与质量评估。并对藏文对话式阅读理解任务中存在影响评价指标的五种因素进行了优化。最终,我们采用了三种经典的对话式阅读理解模型以及藏文大模型TiLamb对数据集进行实验评估,实验结果验证了数据集的质量,并表明TiconvQA可用于模型在对话式阅读理解任务中的性能评测。”
+ 2024.ccl-1.24
+ zho
+ cairen-etal-2024-mian
+
+
+ 面向心理健康咨询的藏语数据集及大语言模型构建(Construction of Tibetan Datasets and Large Language Models for Psychological Health Counseling)
+ ZhuMengxiao孟笑朱
+ ShaJiu九沙
+ FengChong冲冯
+ 326–339
+ “焦虑、抑郁已成为人们常见的心理障碍,适度的疏导对于缓解人们精神、心理压力具有重要意义。然而由于病耻感等原因,很多人得不到及时的疏导和治疗。随着人工智能的发展,大语言模型(LLMs)优越的知识融会贯通能力和思维链能力,使得其成为心理疏导的有效工具。然而,现有少量面向心理健康咨询的大语言模型通常针对英文、中文等资源丰富的语种,而对于低资源语言,LLMs在心理咨询领域的应用尚缺少研究。本文以藏语作为低资源语言的代表,研究藏语心理咨询数据集的构建和藏语心理健康大语言模型的构建方法。首先,通过收集现有高质量的中文心理咨询对话数据,并对数据进行处理,生成心理健康多轮对话数据集;其次,构建汉藏翻译工具将其翻译成藏语多轮对话数据,并结合多种机制对数据进行筛选、过滤生成高质量藏语心理健康多轮对话数据;基于构造的数据,采用现有通用大语言模型Baichuan2和LLaMA2模型进行指令调优训练,形成藏语心理健康大语言模型,并将开源用于科学研究。最后通过实验验证了本文发布的藏语心理健康多轮对话数据集以及藏语心理健康咨询大语言模型的有效性。”
+ 2024.ccl-1.25
+ zho
+ mengxiao-etal-2024-mian
+
+
+ 融合多元特征表示的藏文命名实体识别方法赵小兵∗2(Research on Tibetan Named Entity Recognition Using Multi-Feature Fusion Representation)
+ EjianCairang才让俄见
+ ZhouMaoke毛克周
+ ChenBo波陈
+ ZhaoXiaobing小兵赵
+ 340–351
+ “本文针对基于音节嵌入方式的藏文命名实体识别(TNER)中词汇信息和音节部件信息忽略的问题,提出了基于交叉Transformer架构的MECT-TL模型,融合了藏文音节信息、词汇信息和音节部件信息的多元数据特征。MECT-TL通过平面网络结构将藏文音节与词汇信息结合,并整合音节部件信息,有效提升了藏文实体识别的准确性。实验结果显示,相较于主流的TNER基准模型BiLSTM-CRF,本文模型在F1值上提高了5.14个百分点,与基于Transformer架构的TENER模型相比提高了4.18个百分点。这表明,融合藏文词汇和音节部件信息的方法可以显著提高TNER任务的性能。”
+ 2024.ccl-1.26
+ zho
+ cairang-etal-2024-rong
+
+
+ PGA-SciRE:基于大语言模型的数据增强框架进行科学领域的关系(PGA-SciRE:Harnessing LLM on Data Augmentation for Enhancing Scientific Relation Extraction)
+ ZhouYang洋周
+ DanShimin世民单
+ WeiHongkui宏夔魏
+ ZhaoZhehuan哲焕赵
+ FengWenshuo文铄冯
+ 352–369
+ “关系提取旨在识别文本中提到的实体对之间的关系。大语言模型的进步对自然语言处理任务产生了巨大的影响。在这项工作中,我们针对科学领域的关系抽取任务,提出一个名为PGA的数据增强框架,用于提升模型在科学领域的关系抽取的性能。框架引入了两种数据增强的方式,利用大语言模型通过转述原训练集样本,得到句意相同但具备不同表述和形式的伪样本。以及指导大语言模型根据原训练集样本的关系和实体标签,生成暗含对应标签信息的句子。这两种伪样本分别与原数据集共同参与关系抽取模型的训练。实验中PGA框架提高了三个主流模型的科学领域内关系抽取的F1分数。同时,使用大语言模型获得样本也能有效减少人工标注数据的成本。”
+ 2024.ccl-1.27
+ zho
+ yang-etal-2024-pga
+
+
+ UFSC:基于统一特征空间构建的零样本关系抽取(UFSC: A Unified Feature Space Construction for Zero-Shot Relation Extraction)
+ LiuYuchen雨辰刘
+ DuanJianyong建勇段
+ SunKang康孙
+ ZhangQing晴张
+ HeLi丽何
+ WangHao昊王
+ LiuJie杰刘
+ 370–381
+ “零样本关系抽取(ZSRE)旨在从可见关系中学习提取不可见关系的能力。一些研究表明:将样本语句与关系描述匹配进而预测不可见关系的方法,可以有效完成零样本关系抽取任务。然而,现有的匹配框架方法很少统一样本语句与关系描述的特征空间,缺乏对二者特征进行对齐。因此,本文提出一种为匹配框架零样本关系抽取而设计的统一特征空间构建方法。统一样本语句与关系描述的编码模块,并在此基础上引入特征相似损失。同时,为了减轻特征在空间上的聚合现象,引入特征均匀化模块,旨在构建特征更加均匀化的特征空间。本文所提出的方法实现了性能上的提升。与之前最佳的结果相比,在FewRel和Wiki-ZSL数据集上F1值平均提高1.6%和3.4%,体现了统一特征空间构建以及特征均匀化模块的有效性。”
+ 2024.ccl-1.28
+ zho
+ yuchen-etal-2024-ufsc
+
+
+ 多机制整合的中文医疗命名实体识别(Infusing multi-schemes for Chinese Medical Named Entity Recognition)
+ WangShanshan珊珊王
+ ZhangKunyuan焜元张
+ YanRong蓉闫
+ 382–393
+ “在互联网在线医疗领域,由于大多数患者缺乏医学培训,以及不同学科病理特征的复杂性,医患对话文本中的医学命名实体呈现出长且多词的句法特点,给命名实体识别算法提出了新的挑战。 为解决这一问题,本研究融合多个不同粒度的扩张卷积机制,构建了Flat-Lattice-CNN模型。 该模型不仅考虑字符和词语的语义信息以及它们的绝对和相对位置信息,还提取跨越不同距离的多个字符/词语的共现依存关系特征,以此提高医学长命名实体的识别精度。 实验结果表明,本文提出的模型在所评估数据集的命名实体识别任务上有普遍性的性能提升,尤其是在以长实体为主的中文医疗数据集CTDD上,该模型的F 1值提升了约2%,具有更优的表现。”
+ 2024.ccl-1.29
+ zho
+ shanshan-etal-2024-duo
+
+
+ 基于两阶段提示学习的少样本命名实体识别(Two-Stage Prompt Learning for Few-Shot Named Entity Recognition)
+ ShaoJiaxing佳兴邵
+ HuangQi琪黄
+ XiaoCong聪肖
+ LiuJing璟刘
+ LuoWenbing文兵罗
+ WangMingwen明文王
+ 394–405
+ “少样本命名实体识别旨在用少量的标注数据来识别命名实体。近年来受提示学习在少样本场景中表现良好性能的启发,本文探索了基于提示的少样本命名实体识别的方法。已有的基于提示学习的方法是通过列举所有可能的跨度来进行实体识别,这导致了计算成本高以及对实体边界信息未充分利用的问题。本文提出一种基于提示学习的两阶段框架TSP-Few,在不使用源域数据的情况下,进行少样本命名实体识别。第一阶段对种子跨度进行增强、过滤和扩展,其中种子增强模块能够让种子跨度捕获到更丰富的语义信息,种子过滤器能够减少大量的无关跨度,种子扩展模块能够充分利用实体的边界信息,为实体类型分类提供高质量的候选实体跨度。第二阶段利用提示学习方法预测候选跨度的相应类别。此外,为了缓解跨度检测阶段的错误累积,在实体分类阶段引入了负采样策略。跨度检测和实体类型分类任务的独立训练更容易在少样本情况下取得优异的性能。在三个基准数据集上的实验表明,与先进的方法相比,本文提出的方法在性能上有了进一步的提升,并且实验结果也表明了该文模型各个模块的有效性。”
+ 2024.ccl-1.30
+ zho
+ jiaxing-etal-2024-ji
+
+
+ 面向工艺文本的实体与关系最近邻联合抽取模型(Nearest Neighbor Joint Extraction Model for Entity and Relationship in Process Text)
+ YangDanqingxin丹清忻杨
+ WangPeiyan裴岩王
+ XuLijun立军徐
+ 406–417
+ “该 文 研 究 工 艺 文 本 中 实 体 关 系 联 合 抽 取 问 题 , 提 出 了 最 近 邻 联 合 抽 取 模 型(NNJE)。NNJE利用工艺文本中实体边界字间搭配规律建模外显记忆,通过最近邻方法在某种指定关系下为待预测组合检索出具有相似字间搭配的实例,为实体边界识别以及实体对组合提供更有力的限制条件,提升模型预测准确率,改善模型性能。实验设置了工艺文本关系数据集。实验结果表明,该文方法较基线模型准确率P值提高了3.53%,F1值提升了1.03%,优于PURE、CasRel、PRGC与TPlinker等方法,表明提出的方法能够有效地提升三元组抽取效果。”
+ 2024.ccl-1.31
+ zho
+ danqingxin-etal-2024-mian
+
+
+ 融合扩展语义和标签层次信息的文档级事件抽取(Document-Level Event Extraction with Integrating Extended Semantics and Label Hierarchy Information)
+ FuYujiao玉娇符
+ LiaoJian健廖
+ LiYang旸李
+ GuoZhangfeng张峰郭
+ WangSuge素格王
+ 418–430
+ “文档级事件抽取是自然语言处理中的一项重要任务,面临论元分散和多事件提及的挑战,现有研究通常从文档的所有句子中抽取论元,通过论元角色建模捕获实体间关系,忽略了文档中事件-句子间的关联差异性。本文提出了一种融合扩展语义和标签层次信息的文档级事件抽取方法。首先,利用大语言模型对文本和事件类型标签与论元角色标签进行语义扩展,以引入更丰富的背景语义信息;其次,基于关联差异性的事件类型检测模块,获取文档中与事件类型高度相关的句子,通过约束候选实体的抽取范围,来缓解论元分散问题;进一步,针对文档提及的多个事件类型,利用有向无环图从候选实体中抽取论元,获取所有事件要素。在ChFinAnn和DuEE-Fin两个数据集上的实验结果表明,本文提出的方法相比基线模型可以有针对性地缓解多个事件所属论元分散的问题,有效地提升事件抽取的性能。”
+ 2024.ccl-1.32
+ zho
+ yujiao-etal-2024-rong
+
+
+ 融合领域词汇扩充的低资源法律文书命名实体识别(Named Entity Recognition for Low-Resource Legal Documents Using Integrated Domain Vocabulary Expansion)
+ PaerhatiTulajiang吐拉江帕尔哈提
+ SunYuanyuan嫒媛孙
+ CaiAichen艾辰蔡
+ WangYanhua艳华王
+ LinHongfei鸿飞林
+ 431–441
+ “目前基于预训练语言模型的司法领域低资源法律文书命名实体识别研究主要面临两个问题:(1)在低资源语言中,如维吾尔语,法律文书相关的语料极其有限,这种语料资源稀缺限制了基于预训练语言模型的训练和性能。(2)法律文书中使用的专业术语不仅复杂且特定,新的法律术语和概念的出现使得现有的模型难以适应。针对上述问题,本文基于多语言预训练模型mBERT,通过领域词汇扩充及模型微调的方法,提升了模型在维吾尔语法律文书命名实体识别任务的性能。本文首先整理并构建维吾尔语司法领域专业词汇列表,并将其添加到mBERT模型的词汇表中。随后,在人工标注的维吾尔语法律文书命名实体数据集UgLaw-NERD上进行模型微调,验证了该方法的有效性。实验结果表明,相比于仅使用mBERT进行微调的基线模型,融合领域词汇扩充的模型在命名实体识别任务上F1得分提升至89.72%,较基线提高了7.39%。此外,本文还探讨了不同领域词汇扩充量对模型命名实体识别性能的影响,结果显示,领域词汇扩充增强了预训练模型在处理维吾尔语任务中的表现。这些结论为其他低资源语言在司法领域开展基于预训练模型的自然语言处理研究提供了有益的参考。”
+ 2024.ccl-1.33
+ zho
+ tulajiang-etal-2024-rong
+
+
+ 基于动态提示学习和依存关系的生成式结构化情感分析模型(Dynamic Prompt Learning and Dependency Relation based Generative Structured Sentiment Analysis Model)
+ JiaYintao银涛贾
+ CuiJiajia佳佳崔
+ MuLingling玲玲穆
+ ZanHongying红英昝
+ 442–453
+ “结构化情感分析旨在从文本中抽取所有由情感持有者、目标事物、观点表示和情感极性构成的情感元组,是较为全面的细粒度情感分析任务。针对目前结构化情感分析方法错误传递,提示模版适应性不足和情感要素构成复杂的问题,本文提出了基于动态提示学习和依存关系的生成式结构化情感分析模型,根据不同的情感元组构成情况分别设计提示模版,并用模板增强生成式预训练模型的输入,用依存关系增强生成效果。实验结果显示,本文提出的模型在SemEval20221数据集上的SF1值优于所对比的基线模型。”
+ 2024.ccl-1.34
+ zho
+ yintao-etal-2024-ji
+
+
+ 基于方面引导的图文渐进融合的多模态方面级情感分析方法(Aspect-Guided Progressive Fusion of Text and Image for Multimodal Aspect-Based Sentiment Analysis)
+ YanZida自达闫
+ GuoJunjun军军郭
+ YuZhengtao正涛余
+ 454–466
+ “多模态方面级情感分析旨在通过结合图像信息和文本信息来识别特定方面的情感极性。然而,图像和文本作为两种不同的模态,其在数据表现形式和语义表达上存在显著差异,缩小模态鸿沟和跨模态特征融合是多模态方面级情感分析任务中出现的两个关键问题。对此,本文提出了一种基于方面引导的图文渐进融合的多模态方面级情感分析方法,该方法采用图像和文本中重叠的方面信息作为枢轴,利用方面引导的图文对比学习和基于对比的跨模态语义交互来缩小模态差异、促进语义交互,然后在多模态特征空间中整合视觉和文本信息,通过方面引导的基于对比的多模态语义融合来促进跨模态特征融合,从而提升多模态情感分析的性能。在三个多模态方面级情感分析基准数据集上的实验结果证明了本文提出方法的有效性,并且优于其他大多数最先进的多模态方面级情感分析方法。”
+ 2024.ccl-1.35
+ zho
+ zida-etal-2024-ji
+
+
+ 基于联邦知识蒸馏的跨语言社交媒体事件检测(Cross-Lingual Social Event Detection Based on Federated Knowledge Distillation)
+ ZhouShuaishuai帅帅周
+ ZhuEnchang恩昌朱
+ GaoShengxiang盛祥高
+ YuZhengtao正涛余
+ XianYantuan岩团线
+ ZhaoZixiao子霄赵
+ ChenLin霖陈
+ 467–480
+ “社交媒体事件检测是指在从各类社交媒体的内容中挖掘热点事件。在实际情况中,由于数据稀缺,社交媒体事件检测在低资源的情况下表现较差。现有的方法主要通过跨语言知识迁移等方式来缓解低资源问题,但忽略了数据隐私问题。因此,本文提出了基于联邦知识蒸馏的跨语言社交媒体事件检测框架(FedEvent),旨在将富资源客户端知识蒸馏到低资源客户端。该框架通过结合参数高效微调技术和三组对比损失,实现非英文语义空间到英文语义空间的有效映射,并采用联邦蒸馏策略,保障数据隐私的前提下实现知识的迁移。此外,我们还设计了一套四阶段生命周期机制以适应增量场景。最后,我们在真实数据集上进行实验以证明该框架的有效性。”
+ 2024.ccl-1.36
+ zho
+ shuaishuai-etal-2024-ji
+
+
+ 基于生成式语言模型的立场检测探究(Research on Stance Detection with Generative Language Model)
+ ZhangYuanshuo袁硕张
+ LiAohua澳华李
+ YinZhaoning召宁尹
+ WangPanyi潘怡王
+ ChenBo波陈
+ ZhaoXiaobing小兵赵
+ 481–491
+ “近年来,立场检测任务受到越来越多的关注,但相关标注数据在范围和规模上都有限,不能有效支撑基于神经网络的立场检测。为此,本文探索在零样本阯少样本场景下生成式语言模型在立场检测任务上的能力。首先,构建了一个全新的面向立场检测的数据集,包含5个主题,共2500个人工标注样例;然后,在此数据集上进行了一系列探索实验,实验结果表明:生成式语言模型在零样本设定下,采用结构化的提示学习表现良好;增加额外信息能够显著提升模型性能;在少样本设定下,提供相同目标的示例能够明显提升模型性能,而不同目标示例产生了负面作用;使用思维链可以显著提升模型性能;受提示学习的启发,微调预训练语言模型进一步论证提供额外信息对立场检测的增益显著。”
+ 2024.ccl-1.37
+ zho
+ yuanshuo-etal-2024-ji
+
+
+ 基于双图注意力网络的篇章级散文情绪变化分析方法(A Document-Level Emotion Change Analysis Method Based on DualGATs for Prose)
+ LiAilin爱琳李
+ LiYang旸李
+ WangSuge素格王
+ LiShuqi书琪李
+ 492–503
+ “在散文中,作者的情绪会伴随着文章的段落或者句子发生变化,比如从悲伤到快乐、从喜悦到愤怒。为此,本文构建散文情绪变化数据集,提出一种基于双图注意力网络的多种知识融合的情绪变化分析方法。首先,引入意象知识库,建立融合意象知识的句子表示;其次,构建上下文带权依赖图和语篇带权依赖图,通过融合上下文知识和语篇结构,建立了融合上下文知识、语篇结构的句子表示;同时设计愉悦效价识别层,获得融合愉悦效价信息的句子表示;在此基础上,将以上三者表示进行拼接,通过全连接网络得到最终的情绪变化结果。实验结果表明,本文提出的方法可以有效识别情绪变化,为散文阅读理解中的思想情绪变化类问题的解答提供帮助。”
+ 2024.ccl-1.38
+ zho
+ ailin-etal-2024-ji
+
+
+ 基于主题模型与图神经网络的突发公共卫生事件国际舆情演化分析研(International Public Opinion Evolution Analysis on Sudden Public Health Events using Topic Model and Graph Neural Network)
+ GaoJingjian境健高
+ SangGuoming国明桑
+ LiuZhi智刘
+ ZhangYijia益嘉张
+ LinHongfei鸿飞林
+ 504–514
+ “研究突发公共卫生事件国际舆情演变规律,对国际舆情资源进行应急管理和舆论疏导有重要借鉴价值。本文使用谷歌新闻数据库以各国针对COVID-19的报道为对象,构建国际舆情数据集。采用主题模型、图神经网络模型,结合时间、空间维度与舆情生命周期探究全球舆论主题-情感的演化态势,模型准确率为0.7973,F1值为0.7826,性能优于其他基线模型。研究发现,各国舆情呈现放射传播状态。国际媒体舆论的情感倾向和讨论主题存在正相关且随时间进行转变。”
+ 2024.ccl-1.39
+ zho
+ jingjian-etal-2024-ji
+
+
+ 面向社交媒体多特征增强的药物不良反应检测(Adverse drug reaction detection with multi-feature enhancement for social media)
+ LiHao浩李
+ QiuYunzhi云志邱
+ LinHongfei鸿飞林
+ 515–525
+ “社交媒体是药物不良反应(ADR)检测的重要途径之一。本文提出一个基于社交媒体的药物不良反应检测模型DMFE,以全面捕捉患者对药物使用的反馈信息。与传统的文本检测相比,社交媒体数据中通常会有语法不规范与单词拼写错误的问题。本文提取出社交媒体数据的抽象语义表示(AMR)使用图注意力网络(GAT)学习抽象语义特征提高模型对语义信息的理解,使用字符级卷积神经网络(charCNN)捕获字符特征以减少单词拼写错误带来的影响。此外,本文使用提示学习的方法融入荍荥荤荄荒荁药物不良反应领域关键词进一步增强模型对领域知识的理解能力。经实验评估,本文模型DMFE在CADEC、TwiMed两个数据集上F1值与基线模型相比取得最优效果。”
+ 2024.ccl-1.40
+ zho
+ hao-etal-2024-mian
+
+
+ 面向中文文本的情绪持有者抽取研究(Research on Emotion Holder Extraction for Chinese Texts Yawei Sun1,,,Yu Shi1,,,Xu Han2,∗)
+ SunYawei亚伟孙
+ ShiYu宇石
+ HanXu旭韩
+ 526–538
+ “情绪持有者是文本中带有情绪的主体,对这些情绪持有者的分析对文本情绪理解至关重要。然而,现有研究未充分考虑情绪持有者的共指现象,且由于缺乏面向中文语料的情绪持有者抽取数据,这一研究的发展受到了进一步的限制。本文构建了一个针对中文文本的情绪持有者抽取数据集,有效解决了数据中的共指问题。同时,提出了一种融合语义、情绪和词性特征的模型,实现了高效的情绪持有者抽取与共指消解,且在各项性能指标上超越了基线模型。消融实验进一步证明了模型设计的有效性。1”
+ 2024.ccl-1.41
+ zho
+ yawei-etal-2024-mian
+
+
+ AutoRG:一种大小模型协同的自动报告生成框架(AutoRG: An automatic report generation framework for Large and small model collaboration)
+ ZhangJing京张
+ ShuJiangming江明舒
+ ZhangYuxiang宇翔张
+ WuBin斌吴
+ WangWei巍王
+ YuJian剑于
+ SangJitao基韬桑
+ 539–552
+ “自动报告生成技术在提高工作效率和节约人力资源方面具有显著潜力。大语言模型的出现使得报告流畅度与可解释性得到提升。然而,现有工作仍依赖人工,缺乏灵活性和丰富度。同时,小模型错误或冗余的输出与大模型自身的随机性会导致报告质量不稳定。本文提出大小模型协同的自动报告生成框架AutoRG,通过大模型的工具理解与规划能力减少人工干预,提升报告丰富度,并通过信息修正与报告迭代机制提高报告的稳定性。本文以自动专利报告生成为场景,从多个维度对AutoRG进行全面测试。结果表明,该框架在提高报告生成的丰富度和质量稳定性方面具有显著优势。”
+ 2024.ccl-1.42
+ zho
+ jing-etal-2024-autorg
+
+
+ 基于本体信息增强的人类表型概念识别(Ontology Information-augmented Human Phenotype Concept Recognition)
+ QiJiewei杰蔚祁
+ LuoLing凌罗
+ YangZhihao志豪杨
+ WangJian健王
+ LinHongfei鸿飞林
+ 553–567
+ “从文本中自动识别人类表型概念对疾病分析具有重大意义。现存本体驱动的表型概念识别方法主要利用本体中概念名和同义词信息,并未充分考虑本体丰富信息。针对此问题,本文提出一种基于本体信息增强的人类表型概念识别方法,利用先进大语言模型进行数据增强,并设计本体向量增强的深度学习模型来提升概念识别性能。在GSC+和ID-68两个数据集上进行实验,结果表明本文提出方法能够利用本体丰富信息有效提升基线模型性能,取得了先进结果。”
+ 2024.ccl-1.43
+ zho
+ jiewei-etal-2024-ji
+
+
+ 基于机器学习的语音情感声学特征筛选(Acoustic Feature Selection for Speech Emotion Based on Machine Learning)
+ DongWenqi文琪董
+ WangHan涵王
+ ZhangJingwei璟玮张
+ 568–576
+ “筛选有效表达情感的声学特征对语音情感研究至关重要。对具有相同或相似声学特征的情感,声学研究中仅使用基频和时长无法有效区分。本研究扩大声学参数的种类和数量,使用三种机器学习方法,筛选出区分情感类型的多组有效声学参数,补充和完善语音情感声学研究的声学特征集。研究发现,区分不同情感所依赖的声学参数、参数数量、参数贡献都不相同,其中频谱和信噪参数发挥重要作用。本研究为语音情感声学分析的参数选择提供参考。”
+ 2024.ccl-1.44
+ zho
+ wenqi-etal-2024-ji
+
+
+ 基于交互行为语义模式增强的ID推荐方法(Enhanced ID Recommendation Method Utilizing Semantic Patterns of Interactive Behaviors)
+ WangYuanlai远来王
+ BaiYu宇白
+ LianPeng鹏廉
+ 577–587
+ “基于ID的推荐是一种依赖用户或物品的唯一标识符进行推荐的经典推荐方法,这种方法经常面临用户物品交互数据稀疏、符号ID缺失语义信息等问题。该文针对上述问题,假设不同领域的用户-物品交互行为之间存在潜在的模式关联,提出了一种基于交互行为语义模式增强的ID推荐方法。该方法在目标域推荐任务中引入辅助域信息,基于图神经网络对辅助域和目标域信息进行联合编码表示,通过引入交互行为语义模式,将辅助域的用户-物品交互信息以及物品描述信息迁移至目标域,从而实现目标域ID推荐中的交互行为语义增强。在8个公开数据集上的实验结果表明,相比目前的SOTA模型,本文方法表现出更好的推荐效果,其Recall@20与NDCG@20分别具有3% ∼ 30%、1% ∼ 40%的提升。”
+ 2024.ccl-1.45
+ zho
+ yuanlai-etal-2024-ji
+
+
+ 基于双层语义映射的大语言模型辅助古汉语事件抽取半自动标注框架(A Semi-automatic Annotation Framework for Event Extraction in Classical Chinese Assisted by Large Language Models Based)
+ WeiCongcong聪聪卫
+ LiWei炜李
+ FengZhenbing振冰冯
+ ShaoYanqiu艳秋邵
+ 588–599
+ “尽管自然语言处理技术(歎歌歐)在现代语言事件抽取任务(歅歅)上已有较为成熟的解决方案,但针对古汉语事件抽取的研究却受限于标注数据匮乏和文本语义复杂等挑战。因而我们提出使用当前取得巨大成功的大语言模型(歌歌歍歳)来辅助人类标注员进行数据标注。为了应对歌歌歍歳在古汉语上存在的训练不足、语义理解能力欠缺的问题,我们提出了一种基于双层语义映射的歌歌歍歳辅助古汉语事件抽取半自动标注框架,利用古汉语的现代汉语译文,结合事件语义学理论及语义依存分析技术,为歌歌歍歳提供丰富的语义信息表示,从而进一步将语义依存关系逐步映射为具体的事件信息。经过人类标注员的审核反馈,有效克服了现有歎歌歐工具和歌歌歍歳在古汉语事件抽取标注时的局限。实验结果表明,我们的方法不仅提高了古汉语事件抽取标注的准确性和效率,而且减少了对专业人员的依赖和人工标注工作量,为低资源语言标注实践提供了新的方法论,探索了大模型时代数据标注的新方向。”
+ 2024.ccl-1.46
+ zho
+ congcong-etal-2024-ji
+
+
+ 基于文本风格迁移的中文性别歧视文本去毒研究(Research on detoxification of Chinese sexist texts based on text style transfer)
+ PengJian健彭
+ ZuoJiali家莉左
+ TanJingxuan景璇谭
+ WanJianyi剑怡万
+ WangMingwen明文王
+ 600–612
+ “网络社交媒体平台存在一定程度的性别歧视言论,阻碍了互联网健康和社会文明发展。文本风格迁移技术可以减轻文本中的性别歧视,在英语等语言上已有不少研究。但在中文领域,由于缺乏数据集而导致相关研究较少。此外,由于中文语义信息丰富、语言表达多样而导致性别歧视言论毒性的表现形式多样,现有的方法多采用单一文本风格迁移模型因而效果不佳。因此,本文提出了一个基于文本风格迁移的中文性别歧视文本去毒框架,该框架首先根据毒性的表现形式对文本进行分类,进而根据文本毒性表现形式的不同采用不同的处理方式,我们还引入了大语言模型(LLM)构建歧视词词典。实验表明,本文提出的模型能有效地处理中文文本中的性别歧视问题。”
+ 2024.ccl-1.47
+ zho
+ jian-etal-2024-ji
+
+
+ 基于问题扩展的散文答案候选句抽取方法研究(Sentiment classification method based on multitasking and multimodal interactive learning)
+ LeiYang洋雷
+ WangSuge素格王
+ LiShuqi书琪李
+ WangHao浩王
+ 613–624
+ “在散文阅读理解中,一方面问题的题干通常较为简洁、用词较为抽象,机器难以直接理解问题的含义和要求;另一方面,散文文章较长,答案候选句分散在文章的多个段落,给答案候选句的抽取任务带来巨大的挑战。因此,本文提出了一种基于问题扩展的散文答案候选句抽取方法。首先,利用大语言模型抽取文章中与问题题干相关的词,构建问题词扩展库,其次,利用大语言模型强大的生成能力对原问题的题干进行重写,进一步,利用问题词扩展库对其扩展,最后,通过对散文文章分块处理,建立基于全局上下文信息、历史信息的问题和文章句子的相关性判断模型,用于抽取答案候选句。通过在散文阅读理解数据集上进行实验,实验结果表明本文提出的方法提高了散文抽取答案候选句的准确率,为散文阅读理解的生成类问题的解答提供了技术支撑。”
+ 2024.ccl-1.48
+ zho
+ yang-etal-2024-ji
+
+
+ 基于预训练模型与序列建模的音素分割方法(Sequence Modeling)
+ YangShanglong尚龙杨
+ YuZhengtao正涛余
+ WangWenjun文君王
+ DongLing凌董
+ GaoShengxiang盛祥高
+ 625–636
+ “音素分割作为语音处理领域内的一个重要任务,对于关键词识别、自动语音识别等应用具有至关重要的意义。传统方法往往独立预测每一帧音频是否为音素边界,忽视了音素边界与整个音频序列以及相邻帧之间的内在联系,从而影响了分割的准确性和连贯性。本文提出一种基于预训练模型与序列建模的音素分割方法,在HuBERT模型提取声学特征的基础上,结合BiLSTM捕捉长期依赖,再用CRF优化序列,提升了音素边界检测的性能。在TIMIT和Buckeye数据集上的实验表明,本文方法优于现有技术,证明了序列建模在音素分割任务中的有效性。”
+ 2024.ccl-1.49
+ zho
+ shanglong-etal-2024-ji
+
+
+ 近三十年域外汉籍研究的现状与展望—基于文献计量分析和知识图谱绘制(Extraterritorial Chinese Texts in the Last Thirty Years: Research Advances and Future Perspectives Based on)
+ TangRongjun榕骏唐
+ PengZhifeng志峰彭
+ 637–649
+ “域外汉籍研究对中华文化传承传播意义重大、成果丰富,但缺少文献计量分析。本研究的目的是通过繃繎繋繉优质期刊论文,分析中国近縳縰年域外汉籍研究的演进趋势。通过知识图谱、普赖斯理论、繌繥繹繤繥繳繤繯繲縋引文理论、斯皮尔曼等级相关系数与关键词共现、聚类、突现、时线分析研究学科演进趋势。研究发现,繜汉籍整理与数字化研究縢繜文化交流与文化传播縢为前沿热点,揭示出领域内研究工作已从回归普查转向应用转化。本研究为学科发展提供了参考数据。”
+ 2024.ccl-1.50
+ zho
+ rongjun-zhifeng-2024-jin
+
+
+ 面向中文多方对话的机器阅读理解研究(Research on Machine Reading Comprehension for Chinese Multi-party Dialogues)
+ JiangYuru玉茹蒋
+ LiYu宇李
+ NaTingting婷婷那
+ ZhangYangsen仰森张
+ 650–661
+ “在机器阅读理解领域,处理和分析多方对话一直是一项具有挑战性的研究任务。鉴于中文语境下相关数据资源的缺乏,本研究构建了DialogueMRC数据集,旨在促进该领域的研究进展。DialogueMRC数据集作为首个面向中文多方对话的机器阅读理解数据集,包含705个多方对话实例,涵盖24451个话语单元以及8305个问答对。区别于以往的MRC数据集,DialogueMRC数据集强调深入理解动态的对话过程,对模型应对多方对话中的复杂性及篇章解析能力提出了更高的要求。为应对中文多方对话机器阅读理解的挑战,本研究提出了融合篇章结构感知能力的中文多方对话问答模型(DiscourseStructure-aware QA Model for Chinese Multi-party Dialogue,DSQA-CMD),该模型融合了问答和篇章解析任务,以提升对话上下文的理解能力。实验结果表明,相较于典型的基于微调的预训练语言模型,DSQA-CMD模型表现出明显优势,对比基于Longformer的方法,DSQA-CMD模型在MRC任务的F1和EM评价指标上分别提升了5.4%和10.0%;与当前主流的大型语言模型相比,本模型也展现了更佳的性能,表明了本文所提出方法的有效性。”
+ 2024.ccl-1.51
+ zho
+ yuru-etal-2024-mian
+
+
+ 融合半监督学习与同义计算的传染病名称自动映射研究(A study on automatic mapping of infectious disease names by integrating semi-supervised learning and tautology computation)
+ SongPeiyan培彦宋
+ YangQingxiang青香杨
+ HuBoshen博深胡
+ DuBoya博雅杜
+ 662–672
+ “医学古籍蕴含着丰富的专业知识,然而由于古代疾病名称、术语与现代标准表述不一致等问题,严重影响了公共卫生知识组织和服务质量,现有研究主要采用专家手工映射、词义计算等方式解决,存在着工作效率和准确率偏低等问题,以古籍术语辞典作为语料进行挖掘、建立传统医学术语与现代医学术语的同义关系,并映射到国际规范,形成“古-今-外”三语互通的知识库是可行方法。为此,本文以知识组织和知识发现理论为基础,设计了古今疾病名称跨语言自动映射方法,并以传染性疾病名称为例进行验证。具体过程是:首先,利用snowball算法抽取古今疾病名称同义模式,获取了12个与传染性疾病相关的疾病名称关系模式和134个同义词对。其次,依据桑基图从关联性、成熟度和延展性3个角度分析疾病名称历时演变进行可视化关联分析。同时,结合sapbert向量和余弦相似度将传统医学疾病名称向ICD-11国际标准映射,经过人工验证映射结果达到0.23的hit@1、0.42的hit@5以及0.61的hit@10。本文发现,通过专业辞典语料,可以抽取疾病名称的语言变异情况,提高同义术语的发现效率,为构建专业知识库提供更多的入口词和语义关联性,缓解信息孤岛问题。研究还表明,以辞典中的现代医学术语名称作为映射起点,关联到ICD-11国际规范,为开展跨语言领域知识工程建设提供参考,对实现专业知识“古为今用”和国际传播也具有重要现实意义。”
+ 2024.ccl-1.52
+ zho
+ peiyan-etal-2024-rong
+
+
+ 中文语法纠错的多轮解码方法研究(Multi-Turn Decoding for Chinese Grammatical Error Correction)
+ WangXiaoying晓盈王
+ MuLingling玲玲穆
+ XuHongfei鸿飞许
+ 673–687
+ “在语法纠错(Grammatical Error Correction,GEC)任务上,序列到序列(Sequence-to sequence,seq2seq)模型与序列到编辑(Sequence-to-edit,seq2edit)模型相比可以取得相当或更好的性能。序列到编辑模型通常通过多次迭代解码,而序列到序列模型则以从左到右的方式一次性解码,不考虑后续的词语。通过在序列到序列模型中应用多轮解码(Multi-Turn Decoding,MTD)来迭代改进前一轮的修正结果,可能会进一步提升性能。然而,多轮解码会增加推理的计算成本,且前一轮修正中的删除或替换操作可能会导致原始输入中有用的源语句信息丢失。本文提出了一种早停机制来提高效率。同时,为解决源语句信息丢失问题,本文将原始输入与上一轮的修正结果合并为一个序列。在NLPCC2018测试集、FCGEC验证集和NaCGEC测试集的实验结果表明,本文方法可在BART基线上能带来一致且显著的性能提升,F0.5值分别提高了+2.06,+2.31和+3.45,分别取得了47.34,54.58和62.09的F0.5值。”
+ 2024.ccl-1.53
+ zho
+ xiaoying-etal-2024-zhong
+
+
+ 中西谚语多元价值观资源库建设及对比研究(The construction and comparative study of the resource library of Chinese and Western proverbs and multiple values)
+ DuXia霞杜
+ LiuPengyuan鹏远刘
+ YuDong东于
+ 688–699
+ “中西方谚语是中西方文化的结晶,分别蕴含着中西方文化中最基本的价值观。但目前缺乏中西方谚语价值观资源,难以对谚语所体现的中西方价值观进行全面的研究,特别是定量对比研究。因此本文设计了多元价值观体系,包含动机及需求、共同及特色价值观、价值判断和使用场景,根据这个体系构建了中西方谚语多元价值观资源库并进行了考察与对比分析。本文发现中西谚语在价值判断、使用场景及部分价值观上具有相似性,在具体内涵表达上各具独特性。”
+ 2024.ccl-1.54
+ zho
+ xia-etal-2024-zhong
+
+
+ 从句子图到篇章图——基于抽象语义表示的篇章级共指标注体系设计(Discourse-Level Anaphora Annotation System Based on Abstract Semantic Representation)
+ ZhangYixuan艺璇张
+ LiBin斌李
+ XuZhixing智星许
+ LuPengxiu芃秀卢
+ 700–711
+ “篇章共指体现篇章概念的动态转移,成为近年研究热点。本文在梳理共指理论研究的基础上,综述了相关语料库及解析方法,发现共指语料库仍存在以下两个问题:共指关系标注粗疏与基本不考虑整句语义表示的融合。本文以句子级语义标注体系(中文抽象语义表示)为基础构建篇章共指体系,构建了 100 篇共指语料库。本体系涵盖 52 种句内语义关系和 8 种篇章共指关系,二者相结合构建的篇章共指语义图,为篇章级语义分析提供新的框架和数据资源。”
+ 2024.ccl-1.55
+ zho
+ yixuan-etal-2024-cong
+
+
+ 汉语中介语词同现网络研究(A Study on Chinese Interlanguage Co-occurrence Networks QIAN Long1 ZHAO Huizhou2 DING Qian3 WANG Zhimin4)
+ QianLong隆钱
+ ZhaoHuizhou慧周赵
+ DingQian芊丁
+ WangZhimin治敏王
+ 712–725
+ “近年来,运用复杂网络方法进行语言学研究已成为数字人文研究的一条新路径。本文基于214篇日本汉语学习者的书面作文,构建了6个不同能力水平的汉语中介语词同现网络,并探讨了这些网络的结构特性及其动态演变过程。研究结果显示,所有的汉语中介语词同现网络均呈现出小世界属性、无标度属性、异配性和层级结构等复杂网络的特性。这些特性揭示了汉语学习者在词汇使用方面的特定模式:低水平学习者更倾向于将低频词汇与高频词汇进行连接,这可能与学习者减轻认知负荷的习得模式有关;学习者语言水平的提升,中介语网络参数会逐渐向母语者靠拢,但是无法达到母语者的水平;此外,本研究还观察到,语言错误会对中介语网络结构产生影响,引起网络结构的变异。”
+ 2024.ccl-1.56
+ zho
+ long-etal-2024-yi
+
+
+ 基于意合图语义理论的结构标注体系与资源建设∗(System and Resource Construction Based on the Semantic Theory of Chinese-Parataxis-Graph)
+ GuoMengxi梦溪郭
+ LiMeng梦李
+ XunEndong恩东荀
+ RaoGaoqi高琦饶
+ YuZhongyang钟洋于
+ 726–739
+ “意合图是一种以事件为中心的多层次语义表示方法,由事件结构与实体结构构成,通过多层次语义体系设计,实现对事件的多层次分析。本文细化并制定了意合图标注规范,采用分层分级的标注策略,在自主研发的在线标注系统中对新闻语料和国际中文教育阅读语料进行了意合图QNP标注工作。通过本次标注,验证了意合图体系的合理性和可标注性,并构建了意合图语义资源库。”
+ 2024.ccl-1.57
+ zho
+ mengxi-etal-2024-ji
+
+
+ 意合图:中文多层次语义表示方法∗(Parataxis Graph: Multi-level Semantic Representation Method for Chinese)
+ GuoMengxi梦溪郭
+ XunEndong恩东荀
+ LiMeng梦李
+ RaoGaoqi高琦饶
+ 740–749
+ “基于参数的语义表示虽取得成就,但符号化的语义表示仍具有不可忽视的意义。我们在语义学基础上,充分考虑符号化语义表示在NLP领域落地中的需求,提出了一种兼具通用性与扩展性的多层次语义表示方法——意合图。意合图以事件为核心,由事件结构与实体结构构成,通过多层次语义体系设计,提升与场景结合的能力,并力求对不同层级的语言单元作一贯式表示。在资源建设和相关分析实验中取得良好效果。本文将重点介绍意合图设计理念与多层次语义体系。”
+ 2024.ccl-1.58
+ zho
+ mengxi-etal-2024-yi
+
+
+ 由L1到L2的跨语言激活路径研究——基于词汇识别的ERP数据∗(The Impact of Second Language Experience on Native Language Processing Across Different Language Modes)
+ YangSiqin思琴杨
+ HuMei美胡
+ JiangMinghu铭虎江
+ 750–759
+ “本研究运用事件相关电位技术(event-related potentials, ERPs)在不同语言模式下探索二语学习者的二语经验是否会影响母语加工。实验招募了两组中国日语学习者作为被试,分别参与了接近双语模式的短版本实验和接近单语模式的长版本实验。统计结果显示,在短版本实验中,当汉-日同形异义词作为启动词时,语义相关性因素引发的N400波幅差异并不显著,但是引发的LPC波幅差异显著。在长版本实验中,语义相关性因素引发的N400和LPC的波幅都显著。据此,本研究推论,当被试在母语环境下加工母语语义时,二语语义在接近双语模式的短版本实验中被激活并影响母语加工,这种影响仅存在于N400的时间窗口期。但是,在接近单语模式的长版本实验中,二语语义在两个时间窗口里都没有影响母语加工。本研究从语言模式和时间窗口两个维度拓展了对二语经验影响母语加工的认识,对构建高质量的人类语言计算模型和系统具有重要的理论意义和应用价值。”
+ 2024.ccl-1.59
+ zho
+ siqin-etal-2024-l1dao
+
+
+ 大语言模型故事理解能力评价数据集(Benchmarking story comprehension ability of large language model)
+ YanGuohang国航闫
+ GuoYaxin亚鑫郭
+ TanHongye红叶谭
+ ZhangHu虎张
+ 760–773
+ “故事包含大量的社会、物理等常识,同时蕴含深刻的道理,是知识传播、文化传承、价值塑造的重要载体。故事理解是NLP中的一项重要任务。近几年,研究者对大语言模型(LLMs)的语言理解能力进行了很多评估与分析,但由于现有的故事理解数据集大多为答案出现在原文的实体类问题,因此对LLMs故事理解能力的评价与分析非常有限。为此,本文构建了一个寓言故事理解数据集CRMUS,并基于人类故事理解的认知过程:先进行常识推理,然后理解故事寓意,设计了两个任务来评价模型的相应能力。基于CSMUS数据集,我们对多个代表性的LLMs进行了评估,发现:LLMs已经可以较好地理解故事中的常识并进行推理,但在理解故事寓意方面还存在很大提升空间。此外,我们使用项目反应理论(IRT)对数据集进行了质量分析,表明该数据集是高质量的,可以有效地评估LLMs。”
+ 2024.ccl-1.60
+ zho
+ guohang-etal-2024-da
+
+
+ 大语言模型开放性生成文本中的职业性别偏见研究(Generated by Large Language Models)
+ ZhangXu旭张
+ GuoMengqing梦清郭
+ ZhuShucheng述承朱
+ YuDong东于
+ LiuYing颖刘
+ LiuPengyuan鹏远刘
+ 774–789
+ “大语言模型问世以来,在自然语言处理诸多任务上都取得了惊人的表现。但其中可能存在的安全性和公平性问题也引起了人们的重视,特别是模型生成文本可能含有对特定职业、性别等群体的偏见和歧视。本文通过两种性别表征形式,构造了显性和隐性的”性别+职业“提示语,提示大语言模型生成开放性文本,并从情感极性、词汇丰富度和冒犯性程度三个维度对生成文本的偏见进行分析,评估并比较了传统模型与以ChatGPT为代表的大语言模型中的职业显性性别和隐性性别交叉偏见。结果表明,比起单维度的职业、性别身份信息,更复杂的职业性别交叉身份信息会减少ChatGPT生成文本中的偏见,具体表现为情感极性趋于中性,词汇丰富度提高;ChatGPT对于不同类型的职业性别身份展现出差异的态度,对研究型、艺术型等创造类的职业情感极性更高,对事务型、经管型等与人打交道的职业情感极性偏低;另外,ChatGPT相比之前的GPT-2模型在生成能力和消除偏见上有所进步,在多种组合身份提示下的生成文本更加积极、多样,冒犯性内容显著减少。”
+ 2024.ccl-1.61
+ zho
+ xu-etal-2024-da
+
+
+ 大语言模型在中文文本纠错任务的评测(Evaluation of large language models for Chinese text error correction tasks)
+ MuLingling玲玲穆
+ WangXiaoying晓盈王
+ CuiJiajia佳佳崔
+ 790–806
+ “大语言模型(Large Language Models,LLMs)在信息抽取、机器翻译等自然语言处理任务上的能力已被广泛评估,但是在文本纠错方面还主要局限于评价GPT的英文语法纠错能力 。中文文本纠错任务包括中文语法检测 (Chinese Grammatical Error Detection,CGED)和中文语法纠错(Chinese Error Correction,CGEC)两个子任务。本文使用提示的方法评估了国内外的主流大模型在中文语法检测和中文语法纠错任务上的能力。论文设计了不同的提示策略,对结果进行了整体和细粒度的分析。在NLPCC2018和CGED2018测试集上的实验结果表明,ERNIE-4和ChatGLM-4的中文文本纠错能力优于GPT-3.5-Turbo和LLaMa-2-7B-Chat,少样本思维链提示策略性能最优,对词序错误和拼写错误上纠正的准确率较高,说明大模型在低资源下具有较好的中文文本纠错能力。然而测试结果显示大模型的召回率比基线模型高至少14个百分点,说明大模型在中文文本纠错任务上存在过度校正的问题。”
+ 2024.ccl-1.62
+ zho
+ lingling-etal-2024-da
+
+
+ 面向“以A为B”构式语义场景的汉语框架识别数据集构建⋆(Dataset for Recognizing Chinese Semantic Frames based on the Semantic Scenario of the “Yi A Wei B” Construction)
+ YangPeiyuan沛渊杨
+ SuXuefeng雪峰苏
+ LiJuncai俊材李
+ YanZhichao智超闫
+ ChaiQinghua清华柴
+ LiRu茹李
+ 807–818
+ “汉语中普遍存在一些语义场景,其语义核心不是以单个词语呈现,而是通过句子中的某个特定结构来表达。然而当前公开发表的数据集中,只有极少数的数据集将这种特定结构作为语义单元进行研究。汉语框架语义知识库是进行汉语深层语义分析与推理的优质资源,目前其激活框架的基本单位均为句中的一个词。本文以汉语框架语义知识库为基础,引入构式语法,使用2020《人民日报》语料库,以“以A为B”构式为例,建立了基于“以A为B”构式的汉语框架识别数据集,包含23849条例句,相应框架141个。本文使用多个汉语框架识别模型及大语言模型在该数据集上进行了实验,并针对传统框架识别模型在以构式为目标词的框架识别任务中由于目标词信息匮乏导致的识别困难问题,提出了基于目标词转化和数据增强的两种方法,使模型准确率达到了88.19%,有效提升了模型挖掘构式蕴含的深层语义信息的能力。”
+ 2024.ccl-1.63
+ zho
+ peiyuan-etal-2024-mian
+
+
+ 上古汉语分词和词性标注语料库的构建(Construction of Ancient Chinese Word Segmentation and Part-Of-Speech Corpus)
+ KeYonghong永红柯
+ 819–829
+ “针对国内尚无开放的大规模上古汉语分词及词性标注语料库可用的问题,提出以人工为主+机器辅助的标注模式,构建一个包括46部文献的上古汉语分词及词性标记语料库。描述了语料选择、文本分词、词性标注和质量控制等建库过程,分析了该语料库词长、词频、词用等分布,评估了标注质量。已经完成标注的语料库包括323余万字、217万余词。与EvaHan2022基测集和盲测集的分词、词性标注一致度分别为93.70%、89.49%和92.83%、89.86%。该语料库可用于古汉语研究、辞书编撰、语言教学、人工智能等多个领域。”
+ 2024.ccl-1.64
+ zho
+ yonghong-2024-shang
+
+
+ 图解句式结构体系及其树库构建(Diagrammatic Sentence Pattern Structure System and Its Treebank Construction)
+ PengWeiming炜明彭
+ ZhaoMin敏赵
+ SongYuchen昱辰宋
+ HuJiajia佳佳胡
+ SongTianbao天宝宋
+ SuiZhifang志方穗
+ SongJihua继华宋
+ 830–840
+ “句式结构是一种基于句本位语法的形式化句法结构,采用自定义的图解形式呈现句子结构。本文提出了涵盖小句结构、词法结构和句间结构三方面的句式结构体系,阐明了其设计理念以及句本位的析句原则,最后概述了基于该体系构建汉语树库的工程进展情况。”
+ 2024.ccl-1.65
+ zho
+ weiming-etal-2024-tu
+
+
+ 英语科技论文摘要语步结构语料库构建研究(Research on Construction of Corpus for Move Structures in Abstracts of English Scientific Research Articles)
+ LiHongzheng洪政李
+ WangRuojin若锦王
+ FengChong冲冯
+ LiuFang芳刘
+ 841–852
+ “语步结构是学术论文中的文本语篇单位,在语步分析、论文写作等方面具有重要价值。尽管关于学术论文的语步研究非常丰富,但语步标注数据资源仍然相对较少。本研究开发构建了一个英语科技论文摘要语步结构标注语料库,目前已标注近3.4万个语步结构,涵盖了自然语言处理、计算机视觉、通信工程、机械工程等学科领域,同时进行了标注数据统计和分析。语料库构建的第一阶段依靠人工标注形成高质量语料,在第二阶段也是主要阶段,采用了基于BERT的自动识别与标注模型,在保证标注质量的同时能够提升标注速度,扩大标注规模。本研究基于构建的语料库开展了不同学科领域摘要语步结构识别实验,对比了我们的模型与ChatGPT和Claude3等大语言模型的识别效果。结果显示我们的模型在各类语步识别上的F1指标均优于大语言模型,表明了模型的有效性。该语料库目前可公开获取使用,能够为科技论文信息抽取、英语写作智能批改等自然语言处理相关任务和学术用途英语等外语教学与研究等提供必要的数据资源,同时也能有效推动外语教育数字化转型。”
+ 2024.ccl-1.66
+ zho
+ hongzheng-etal-2024-ying
+
+
+ Self-Guide:一种基于自我规划的大语言模型推理增强方法(Self-Guide: Enhancing LLM Reasoning Ability via Self-Plan)
+ LiuYibin艺彬刘
+ LiuZhenghao正皓刘
+ YanYukun宇坤闫
+ YuShi是于
+ WangShuo硕王
+ YangLiner麟儿杨
+ ChenHuimin慧敏陈
+ GuYu峪谷
+ YuGe戈于
+ 853–869
+ “尽管大语言模型在自然语言处理任务中取得显著进展,但其在复杂问题推理等领域还面临着认知负荷问题,即大语言模型在推理过程需要记忆并处理大量信息。因此,如何有效地减少语言模型推理过程中的认知负荷,缓解推理过程中可能出现的认知过载是一个亟待解决的问题。对此本文提出了Self-Guide方法,用于增强语言模型的推理能力。该方法通过指引大语言模型生成常识知识和推理指导,让语言模型基于自我规划来增强其推理能力,并通过与推理链结合的方式对模型的推理过程进行校准。与现有方法不同的是,本文在不对大语言模型进行微调或使用外部工具的情况下,显著提升了语言模型的推理性能。实验结果表明,Self-Guide方法在四种常见推理任务上性能显著优于基线方法,同时相比传统的推理链模型,Self-Guide方法在推理能力较弱的模型上也具有良好的泛化性能。通过结合大语言模型的自我规划和推理能力,Self-Guide方法为提升语言模型的推理能力提供了一种新的有效途径。”
+ 2024.ccl-1.67
+ zho
+ yibin-etal-2024-self
+
+
+ 基于大模型的交互式谎言识别:数据和模型(Unveiling Lies: Enhancing Large Language Models for Real-World Lie Detection in Interactive Dialogues)
+ JiChengwei程炜纪
+ WangSiyuan思远王
+ LiTaishan太山李
+ MouXinyi馨忆牟
+ ZhaoLimin丽敏赵
+ XueLanqing兰青薛
+ YingZhenzhe缜哲应
+ WangWeiqiang维强王
+ HuangXuanjing萱菁黄
+ WeiZhongyu忠钰魏
+ 870–882
+ “面向对话交互过程的谎言识别技术在不同的应用场景有广泛的应用需求。现有的鉴谎技术往往在整体的对话级别上给出最终决策,而缺乏对细粒度谎言特征和线索的逻辑分析,难以满足场景中对于可解释性的需求。本文提出了谎言指征和语义不一致线索的概念,用于帮助识别对话中的谎言,提升鉴谎方法的可解释性。文章同时提出一个谎言识别框架,用于训练谎言识别大语言模型(LD-LLM)。它利用细粒度的谎言指征并且发现对话中是否存在语义不一致线索,以实现更可靠的谎言识别。文章在真实交互场景中构建了两个谎言识别数据集FinLIE和IDLIE,分别关注金融风控场景和身份识别场景。实验结果表明,基于这两个数据集创建的指令数据集微调得到的LD-LLM,在基于真实交互的谎言识别上达到了最先进的水平。”
+ 2024.ccl-1.68
+ zho
+ chengwei-etal-2024-ji
+
+
+ 基于动态聚类与标签空间映射的上下文学习模板构建方法(In-Context Learning Demonstration Construction Method based on Dynamic Clustering and Label Space Mapping)
+ ZhangQi琦张
+ JinXingnan醒男金
+ PeiYu誉裴
+ DuYongping永萍杜
+ 883–893
+ “面向大语言模型提供自然语言指令,可生成预期输出,体现了其上下文学习能力。上下文学习的性能与上下文模板质量密切相关,现有的工作通常使用单一的选择算法进行模板构建,无法充分激发上下文学习能力。本文提出基于动态聚类与标签空间映射的上下文学习模板构建方法,动态选择相关示例,进一步提出聚类筛选方法,实现不同语义簇中示例多样化的选择。设计基于损失函数的排序选择方法,评估模板学习正确标签空间映射分布的能力,排序形成最终模板。在自然语言推理等任务中的实验结果表明,本文提出的方法使两个不同的大语言模型准确率最高分别提升3.2%和8.9%。”
+ 2024.ccl-1.69
+ zho
+ qi-etal-2024-ji
+
+
+ 基于领域信息分解式学习的大语言模型修辞认知增强方法(Method for Enhancing Rhetorical Cognition of Large Language Models Based on Decomposed Learning of Field Information)
+ WangWen雯王
+ YuDong东于
+ LiuPengyuan鹏远刘
+ 894–909
+ “中文修辞手法多样且概念差异性大,大语言模型对部分修辞手法的认知存在缺陷。针对该问题,本文研究如何增强大语言模型的修辞认知能力,并探究其与修辞识别性能之间的关系。为此,本文提出了QAKAG框架,此框架首先引入信息分解式学习思想,通过问答形式检测大语言模型的修辞认知缺陷,然后以四种不同的知识组合方式探究最优信息补充机制,实现了大语言模型修辞认知能力的增强。本文构建了多类别中文修辞句数据集MCRSD和修辞知识库MCRKB,并在ChatGPT4等六个大语言模型上开展实验研究,验证了QAKAG框架对增强大语言模型修辞认知能力的有效性以及其各阶段的必要性。结果表明,在QAKAG框架的增强下,六个大语言模型在多类别修辞识别任务上的性能相较直接回答识别问题的平均F1值提高22.1%,优于Zero-shot-CoT、RAG-BaiKe、Few-Shot5提示策略。”
+ 2024.ccl-1.70
+ zho
+ wen-etal-2024-ji
+
+
+ 基于中间层对齐的异构师生模型知识蒸馏(Knowledge distillation of heterogeneous teacher-student model with intermediate layer loss)
+ ZhaiFeiyan飞燕翟
+ WangRenzhi任之王
+ LiPiji丕绩李
+ 910–928
+ “知识蒸馏技术作为大语言模型时代的一项前沿模型压缩策略,通过将复杂模型的知识有效迁移至简单模型,显著降低了模型的参数规模和计算成本。尽管如此,目前主流的生成式大语言模型蒸馏算法主要集中于优化师生模型间的最后输出层损失,而忽视了对模型中间层的探索。此外,针对中间层蒸馏的研究往往对师生模型的结构一致性有着严格的要求,无法处理异构模型间的蒸馏问题,从而存在明显的局限性。针对这些问题,我们提出了一种新的知识蒸馏算法:引入了中间层蒸馏损失的异构生成式师生大语言模型知识蒸馏算法。该算法首先提取师生模型的中间层信息作为蒸馏对象,随后通过专门设计的中间层映射规则和对齐模块,实现异构模型间基于中间层的知识对齐与损失计算。最后,联合优化各个蒸馏损失的比例。通过在五个相关数据集上进行实验验证,我们的方法在提高蒸馏效果方面展现出显著优势。”
+ 2024.ccl-1.71
+ zho
+ feiyan-etal-2024-ji
+
+
+ 面向小规模大语言模型推理优化的推理路径排序方法(A Reasoning Paths Ranking Method for Reasoning Optimization of Small-scale Large Language Models)
+ LiJun俊李
+ BaiYu宇白
+ LiuYuting雨婷刘
+ 929–940
+ “尽管大语言模型(LLM)在自然语言处理领域取得巨大成功,但是伴随其千亿级参数 规 模 的 训 练 也 产 生 了 巨 大 的 计 算 成 本 。 小 规 模 大 语 言 模 型(SLLM)作 为 低 资 源场景下实现LLM部署的可替代方案,任务处理能力与LLM尚存在明显差距。尽管上下文学习(ICL)等提示方法在一定程度上提升了SLLM的问题处理能力,但基于人工构建的提示往往需要参与者具备特定的专业领域知识,这给LLM的普适推广带来了挑战。针对以上问题,本文提出了一个基于SLLM的问题推理框架,通过在推理路径生成和答案生成两个阶段之间引入基于逐步语义验证器(SSVRP)的推理路径排序选择机制,在无人干预情况下实现SLLM推理能力提升。实验结果表明,SSVRP有效地增强了SLLM的推理性能,在4个推理任务中的平均准确率分别达到了54.3%,90.6%,64.3%和63.7%,并在其中3个推理任务中都取得了最新的SOTA结果。”
+ 2024.ccl-1.72
+ zho
+ jun-etal-2024-mian
+
+
+ 面向中文实体识别的Transformers模型句子级非对抗鲁棒性研究(On Sentence-level Non-adversarial Robustness of Chinese Named Entity Recognition with Transformers Model)
+ WangLibang立帮王
+ WangPeiyan裴岩王
+ ShenSijia思嘉沈
+ 941–954
+ “基于Transformers的中文实体识别模型在标准实体识别基准测试中取得了卓越性能,其鲁棒性研究也受到了广泛关注。当前,中文实体识别模型在实际部署中所面临的句子级非对抗鲁棒性问题研究不足,该文针对该问题开展了研究。首先,该文从理论上分析并发现了Transformer中自注意力、相对位置嵌入及绝对位置嵌入对模型鲁棒性的负面影响。之后,提出了实体标签增强和滑动窗口约束的鲁棒性增强方法,并从理论上证明了提出方法能够提升Transformers模型的实体识别鲁棒性。最后,通过在3个中文数据集的实验,研究了4种基于Transformer的实体识别模型的脆弱性,所提出方法使模型的鲁棒性F1值提升最高可达4.95%。”
+ 2024.ccl-1.73
+ zho
+ libang-etal-2024-mian
+
+
+ 银瞳:基于自适应语义空间学习的中文金融多任务大模型(SilverSight: A Multi-Task Chinese Financial Large Language Model Based on Adaptive Semantic Space Learning)
+ ZhouYuhang宇航周
+ LiZeping泽平李
+ TianSiyu思雨田
+ NiYuchen雨琛倪
+ ZhangJian健张
+ LiuXiang响刘
+ YeGuangnan广楠叶
+ WuJie杰吴
+ ChaiHongfeng洪峰柴
+ 955–972
+ “大语言模型正逐渐被用于各种垂直领域,利用其广泛的知识储备来赋能领域中的多种场景。然而,各领域拥有多种待学习的特定任务,且多源异构的领域数据容易引发模型进行任务迁移时的冲突。基于此,本研究提出自适应语义空间学习框架,利用对语义空间内数据的自适应重分布,提升多专家模型的性能及选择效果,并基于此框架训练了一个金融多任务大模型“银瞳”。研究结果表明,我们的框架只需利用10%的数据就能达到接近全数据训练的效果,并拥有较强的泛化表现。”
+ 2024.ccl-1.74
+ zho
+ yuhang-etal-2024-yin
+
+
+ Enhancing Free-Form Table Question Answering Models by Distilling Relevant-Cell-Based Rationales
+ YangZhiyu
+ WangShuo
+ YanYukun
+ LiuPengyuan
+ YuDong
+ 973–985
+ “Free-form table question answering is a challenging task since tables contain structured contentscompared to plain texts, which requires high-level reasoning abilities to effectively identify cellsthat are relevant to the question and produce a correct and faithful answer based on their relations.Large language models (LLMs) have exhibited remarkable reasoning capabilities in numerousNLP applications. However, in some specific tasks, specially-trained small models can still out-perform LLMs. Furthermore, small models require extremely less computation costs comparedto LLMs. To leverage the strengths of both types of models, we propose a Relevant-Cell-basedKnowledge Distillation with inference-time Teacher Guidance (RCKD-TG) method. This ap-proach aims to combine small free-form table question answering models’ abilities to learn fromhuman annotations and large language models’ abilities to effectively reason from table contents,via applying Relevant-Cell-based rationales distilled from LLMs to small models’ training andinference stages. Our experiments demonstrate the superiority of our method over vanilla smallmodels in correctness, faithfulness, adequacy and fluency, also over general LLMs in adheringto the style of human annotations. We achieve state-of-the-art performance on FeTaQA, a rep-resentative free-form table question answering benchmark. Our result of a 41.3 BLEU scoredemonstrates the feasibility of effectively using small models’ task-specific abilities and LLMs’reasoning capabilities at the same time. Additionally, our method exhibits high computation ef-ficiency and data efficiency. Compared to strong baselines, we achieve better performance withsignificantly less training data.”
+ 2024.ccl-1.75
+ eng
+ zhiyu-etal-2024-enhancing
+
+
+ Enhancing Sequence Representation for Personalized Search
+ WangShijun
+ ZhangHan
+ YuanZhe
+ 986–998
+ “The critical process of personalized search is to reorder candidate documents of the current querybased on the user’s historical behavior sequence. There are many types of information containedin user historical information sequence, such as queries, documents, and clicks. Most existingpersonalized search approaches concatenate these types of information to get an overall userrepresentation, but they ignore the associations among them. We believe the associations ofdifferent information mentioned above are significant to personalized search. Based on a hierar-chical transformer as base architecture, we design three auxiliary tasks to capture the associationsof different information in user behavior sequence. Under the guidance of mutual information,we adjust the training loss, enabling our PSMIM model to better enhance the information rep-resentation in personalized search. Experimental results demonstrate that our proposed methodoutperforms some personalized search methods.”
+ 2024.ccl-1.76
+ eng
+ shijun-etal-2024-enhancing
+
+
+ Joint Similarity Guidance Hash Coding Based on Adaptive Weight Mixing Strategy For Cross-Modal Retrieval
+ SunYaqi
+ YunJing
+ ZhuoqunMa
+ 999–1010
+ “There is a continuous and explosive growth of multimodal data. Efficient cross-modal hash-ing retrieval is of significant importance in conserving computational resources.To further en-hance the attention to informative data within modalities and capture the semantic correlationsin cross-modal data, we propose an enhanced deep Joint-Semantics Reconstructing Hashing al-gorithm, which is the Joint Similarity Guidance Hash Coding Based on Adaptive Weight MixingStrategy(JSGHCA). The algorithm focuses on delving deeper into the correlations of the data incross-modal. We introduce the adaptive weight mixing strategy to construct the semantic affinitymatrix, so that the matrix can identify each modal data with specific weight in each batch. Atthe same time, in the process of the hash code generation, we introduce collaborative attentionmechanism. It helps the model to pay more attention to the local information of each modality,thereby capturing the semantic features within each modality more accurately. Additionally, itenables the model to jointly process the attention across different modalities and extract sharedsemantic features more precisely. Experimental results show that the proposed model is signifi-cantly better than the deep joint semantic reconstruction hash algorithm on multiple benchmarkdatasets.”
+ 2024.ccl-1.77
+ eng
+ yaqi-etal-2024-joint
+
+
+ Generate-then-Revise: An Effective Synthetic Training Data Generation Framework For Event Detection Retrieval
+ DuHuidong
+ SunHao
+ LiuPengyuan
+ YuDong
+ 1011–1022
+ “Large language models (LLMs) struggle with event detection (ED) due to the structured and vari-able number of events in the output. Existing supervised approaches rely on a large amount ofmanually annotated corpora, facing challenges in practice when event types are diverse and theannotated data is scarce. We propose Generate-then-Revise (GtR), a framework that leveragesLLMs in the opposite direction to address these challenges in ED. GtR utilizes an LLM to gen-erate high-quality training data in three stages, including a novel data revision step to minimizenoise in the synthetic data. The generated data is then used to train a smaller model for evalua-tion. Our approach demonstrates significant improvements on the low-resource ED. We furtheranalyze the generated data, highlighting the potential of synthetic data generation for enhancingED performance.Introduction”
+ 2024.ccl-1.78
+ eng
+ huidong-etal-2024-generate
+
+
+ E3: Optimizing Language Model Training for Translation via Enhancing Efficiency and Effectiveness
+ ChenLinqing
+ WangWeilei
+ HuDongyang
+ 1023–1034
+ “In the field of Natural Language Processing (NLP), Large-scale Language Models (LLMs) havedemonstrated exceptional capabilities across a variety of tasks, including question answering,classification, and particularly, natural language understanding. The integration of neural ma-chine translation with LLMs presents significant potential, transforming the paradigms of cross-lingual communication and information exchange. This study investigates the foundational as-pects of LLMs’ translation abilities and identifies effective training methodologies to equip themwith multilingual capacities. We specifically explore the optimal timing for introducing trans-lation capabilities to LLMs via supervised tasks, considering the inherent bilingual nature ofmachine translation. Key questions explored include whether it is more beneficial to integratemultiple languages during the pre-training or supervised fine-tuning (SFT) stages, how varia-tions in language ratios influence LLMs’ translation abilities, and whether longer or shorter textsare more effective for training these models. This research conducts a thorough investigationby training multiple LLMs from scratch with parameter scales in the billions and enhances therobustness of our findings by upgrading the language capabilities of pre-trained open-sourcemodels with parameter scales reaching tens of billions. The aim is to provide a detailed analysisthat elucidates the complexities of augmenting machine translation capabilities within LLMs.”
+ 2024.ccl-1.79
+ eng
+ linqing-etal-2024-e3
+
+
+ Multi-features Enhanced Multi-task Learning for Vietnamese Treebank Conversion
+ ZhangZhenguo
+ LiuJianjian
+ YingLi
+ 1035–1046
+ “Pre-trained language representation-based dependency parsing models have achieved obviousimprovements in rich-resource languages. However, these model performances depend on thequality and scale of training data significantly. Compared with Chinese and English, the scale ofVietnamese Dependency treebank is scarcity. Considering human annotation is labor-intensiveand time-consuming, we propose a multi-features enhanced multi-task learning framework toconvert all heterogeneous Vietnamese Treebanks to a unified one. On the one hand, we exploitTree BiLSTM and pattern embedding to extract global and local dependency tree features fromthe source Treebank. On the other hand, we propose to integrate these features into a multi-tasklearning framework to use the source dependency parsing to assist the conversion processing.Experiments on the benchmark datasets show that our proposed model can effectively convertheterogeneous treebanks, thus further improving the Vietnamese dependency parsing accuracy byabout 7.12 points in LAS.”
+ 2024.ccl-1.80
+ eng
+ zhenguo-etal-2024-multi
+
+
+ SimCLNMT: A Simple Contrastive Learning Method for Enhancing Neural Machine Translation Quality
+ XuMenglong
+ ZhangYanliang
+ 1047–1058
+ “Neural Machine Translation (NMT) models are typically trained using Maximum LikelihoodEstimation (MLE). However, this approach has a limitation: while it might select the bestword for the immediate context, it does not generally optimize for the entire sentence. Tomitigate this issue, we propose a simple yet effective training method called SimCLNMT.This method is designed to select words that fit well in the immediate context and also en-hance the overall translation quality over time. During training, SimCLNMT scores multiplesystem-generated (candidate) translations using the logarithm of conditional probabilities.Itthen employs a ranking loss function to learn and adjust these probabilities to align with thecorresponding quality scores. Our experimental results demonstrate that SimCLNMT consis-tently outperforms traditional MLE training on both the NIST English-Chinese and WMT’14English-German datasets. Further analysis also indicates that the translations generated by ourmodel are more closely aligned with the corresponding quality scores. We release our code athttps://github.com/chaos130/fairseq_SimCLNMT.Introduction”
+ 2024.ccl-1.81
+ eng
+ menglong-yanliang-2024-simclnmt
+
+
+ Translate-and-Revise: Boosting Large Language Models for Constrained Translation
+ HuangPengcheng
+ MuYongyu
+ WuYuzhang
+ LiBei
+ XiaoChunyang
+ XiaoTong
+ JingboZhu
+ 1059–1074
+ “Imposing constraints on machine translation systems presents a challenging issue because thesesystems are not trained to make use of constraints in generating adequate, fluent translations. Inthis paper, we leverage the capabilities of large language models (LLMs) for constrained trans-lation, given that LLMs can easily adapt to this task by taking translation instructions and con-straints as prompts. However, LLMs cannot always guarantee the adequacy of translation, and,in some cases, ignore the given constraints. This is in part because LLMs might be overly confi-dent in their predictions, overriding the influence of the constraints. To overcome this overidingbehaviour, we propose to add a revision process that encourages LLMs to correct the outputs byprompting them about the constraints that have not yet been met. We evaluate our approach onfour constrained translation tasks, encompassing both lexical and structural constraints in mul-tiple constraint domains. Experiments show 15% improvement in constraint-based translationaccuracy over standard LLMs and the approach also significantly outperforms neural machinetranslation (NMT) state-of-the-art methods.IntroductionConstrained translation seeks to generate translations that adhere to pre-specified constraints. Toachieve this, conventional approaches impose constraints on machine translation systems and force themto follow the constraints during inference (Hokamp and Liu, 2017; Hasler et al., 2018; Dinu et al., 2019;Bergmanis and Pinnis, 2021b; Wang et al., 2022b; Ailem et al., 2022). More recently, large languagemodels (LLMs) have been shown to be strong translation systems (Hendy et al., 2023; Moslem et al.,2023). They provide a general way to involve various instructions, demonstrations, and constraints intothe translation process (Mu et al., 2023; Bogoychev and Chen, 2023), enabling us to perform constrainedtranslation using off-the-shelf, well-trained LLMs.”
+ 2024.ccl-1.82
+ eng
+ pengcheng-etal-2024-translate
+
+
+ A Multi-Task Biomedical Named Entity Recognition Method Based on Data Augmentation
+ ZhaoHui
+ ZhaoDi
+ MengJiana
+ LiuShuang
+ LinHongfei
+ 1075–1086
+ “The rapid development of artificial intelligence has led to an explosion of literature in the biomed-ical field, and Biomedical Named Entity Recognition (BioNER) can quickly and accurately iden-tify key information from unstructured text. This task has become an important topic to promotethe rapid development of intelligence in the biomedical field. However, in the Named EntityRecognition (NER) of the biomedical field, there are always some problems of unclear boundaryrecognition, the underutilization of hierarchical information in sentences and the scarcity of train-ing data resources. Based on this, this paper proposes a multi-task BioNER model based on dataaugmentation, using four data augmentation methods: Mention Replacement (MR), Label-wisetoken Replacement (LwTR), Shuffle Within Segments (SiS) and Synonym Replacement (SR)to increase the training data. The syntactic information is extracted by incorporating the inputsentence into the Graph Convolutional Network (GCN), and then the tag information encodedby BERT is interacted through a co-attention mechanism to obtain an interaction matrix. Subse-quently, NER is performed through boundary detection tasks and span classification tasks. Com-parative experiments with other methods are conducted on the BC5CDR and JNLPBA datasets,as well as the CCKS2017 dataset. The experimental results demonstrate the effectiveness of themodel proposed in this paper.”
+ 2024.ccl-1.83
+ eng
+ hui-etal-2024-multi
+
+
+ Biomedical Event Causal Relation Extraction by Reasoning Optimal Entity Relation Path
+ LiLishuang
+ MiLiteng
+ ZhangBeibei
+ XiangYi
+ FengYubo
+ QinXueyang
+ TangJingyao
+ 1087–1098
+ “Biomedical Event Causal Relation Extraction (BECRE) is an important task in biomedical infor-mation extraction. Existing methods usually use pre-trained language models to learn semanticrepresentations and then predict the event causal relation. However, these methods struggle tocapture sufficient cues in biomedical texts for predicting causal relations. In this paper, we pro-pose a Path Reasoning-based Relation-aware Network (PRRN) to explore deeper cues for causalrelations using reinforcement learning. Specifically, our model reasons the relation paths betweenentity arguments of two events, namely entity relation path, which connects the two biomedicalevents through the multi-hop interactions between entities to provide richer cues for predictingevent causal relations. In PRRN, we design a path reasoning module based on reinforcementlearning and propose a novel reward function to encourage the model to focus on the length andcontextual relevance of entity relation paths. The experimental results on two datasets suggestthat PRRN brings considerable improvements over the state-of-the-art models.Introduction”
+ 2024.ccl-1.84
+ eng
+ lishuang-etal-2024-biomedical
+
+
+ Joint Entity and Relation Extraction Based on Bidirectional Update and Long-Term Memory Gate Mechanism
+ QianYili
+ RenEnlong
+ XuHaonan
+ 1099–1111
+ “Joint entity recognition and relation extraction are important tasks in natural language process-ing. While some previous work has recognized the importance of relation information in jointextraction, excessively focusing on relation information without utilizing entity information maylead to information loss and affect the identification of relation tuples. Additionally, ignoring theutilization of original information may result in the loss of hierarchical and semantic information,further reducing the richness of information.To address these issues, we propose a bidirectionalinformation updating mechanism that integrates entity and relation information, iteratively fus-ing fine-grained information about entities and relations. We introduce a long-term memory gatemechanism to update and utilize original information using feature information, thereby enhanc-ing the model’s ability for entity recognition and relation extraction. We evaluated our approachon two Chinese datasets and achieved state-of-the-art results.”
+ 2024.ccl-1.85
+ eng
+ yili-etal-2024-joint
+
+
+ MFE-NER: Multi-feature Fusion Embedding for Chinese Named Entity Recognition
+ LiJiatong
+ MengKui
+ 1112–1122
+ “In Chinese Named Entity Recognition, character substitution is a complicated linguistic phe-nomenon. Some Chinese characters are quite similar as they share the same components or havesimilar pronunciations. People replace characters in a named entity with similar characters togenerate a new collocation but refer to the same object. As a result, it always leads to unrecog-nizable or mislabeling errors in the NER task. In this paper, we propose a lightweight method,MFE-NER, which fuses glyph and phonetic features to help pre-trained language models handlethe character substitution problem in the NER task with limited extra cost. Basically, in the glyphdomain, we disassemble Chinese characters into Five-Stroke components to represent structurefeatures. In the phonetic domain, an improved phonetic system is proposed in our work, makingit reasonable to describe phonetic similarity among Chinese characters. Experiments demon-strate that our method performs especially well in detecting character substitutions while slightlyimproving the overall performance of Chinese NER.”
+ 2024.ccl-1.86
+ eng
+ jiatong-kui-2024-mfe
+
+
+ UDAA: An Unsupervised Domain Adaptation Adversarial Learning Framework for Zero-Resource Cross-Domain Named Entity Recognition
+ LiBaofeng
+ TangJianguo
+ QinYu
+ XuYuelou
+ LuYan
+ WangKai
+ LiLei
+ ZhouYanquan
+ 1123–1135
+ “The zero-resource cross-domain named entity recognition (NER) task aims to perform NER in aspecific domain where labeled data is unavailable. Existing methods primarily focus on transfer-ring NER knowledge from high-resource to zero-resource domains. However, the challenge liesin effectively transferring NER knowledge between domains due to the inherent differences inentity structures across domains. To tackle this challenge, we propose an Unsupervised DomainAdaptation Adversarial (UDAA) framework, which combines the masked language model auxil-iary task with the domain adaptive adversarial network to mitigate inter-domain differences andefficiently facilitate knowledge transfer. Experimental results on CBS, Twitter, and WNUT2016three datasets demonstrate the effectiveness of our framework. Notably, we achieved new state-of-the-art performance on the three datasets. Our code will be released.Introduction”
+ 2024.ccl-1.87
+ eng
+ baofeng-etal-2024-udaa
+
+
+ Triple-view Event Hierarchy Model for Biomedical Event Representation
+ HuangJiayi
+ LiLishuang
+ QinXueyang
+ XiangYi
+ LiJiaqi
+ FengYubo
+ 1136–1147
+ “Biomedical event representation can be applied to various language tasks. A biomedical eventoften involves multiple biomedical entities and trigger words, and the event structure is complex.However, existing research on event representation mainly focuses on the general domain. Ifmodels from the general domain are directly transferred to biomedical event representation, theresults may not be satisfactory. We argue that biomedical events can be divided into three hierar-chies, each containing unique feature information. Therefore, we propose the Triple-views EventHierarchy Model (TEHM) to enhance the quality of biomedical event representation. TEHM ex-tracts feature information from three different views and integrates them. Specifically, due to thecomplexity of biomedical events, We propose the Trigger-aware Aggregator module to handlecomplex units within biomedical events. Additionally, we annotate two similarity task datasetsin the biomedical domain using annotation standards from the general domain. Extensive exper-iments demonstrate that TEHM achieves state-of-the-art performance on biomedical similaritytasks and biomedical event casual relation extraction.Introduction”
+ 2024.ccl-1.88
+ eng
+ jiayi-etal-2024-triple
+
+
+ DialectMoE: An End-to-End Multi-Dialect Speech Recognition Model with Mixture-of-Experts
+ ZhouJie
+ GaoShengxiang
+ YuZhengtao
+ DongLing
+ WangWenjun
+ 1148–1159
+ “Dialect speech recognition has always been one of the challenges in Automatic Speech Recog-nition (ASR) systems. While lots of ASR systems perform well in Mandarin, their performancesignificantly drops when handling dialect speech. This is mainly due to the obvious differencesbetween dialects and Mandarin in pronunciation and the data scarcity of dialect speech. In thispaper, we propose DialectMoE, a Chinese multi-dialects speech recognition model based onMixture-of-Experts (MoE) in a low-resource conditions. Specifically, DialectMoE assigns inputsequences to a set of experts using a dynamic routing algorithm, with each expert potentiallytrained for a specific dialect. Subsequently, the outputs of these experts are combined to derivethe final output. Due to the similarities among dialects, distinct experts may offer assistance inrecognizing other dialects as well. Experimental results on the Datatang dialect public datasetshow that, compared with the baseline model, DialectMoE reduces Character Error Rate (CER)for Sichuan, Yunnan, Hubei and Henan dialects by 23.6%, 32.6%, 39.2% and 35.09% respec-tively. The proposed DialectMoE model demonstrates outstanding performance in multi-dialectsspeech recognition.”
+ 2024.ccl-1.89
+ eng
+ jie-etal-2024-dialectmoe
+
+
+ Distinguishing Neural Speech Synthesis Models Through Fingerprints in Speech Waveforms
+ ZhangChuYuan
+ YiJiangyan
+ TaoJianhua
+ WangChenglong
+ YanXinrui
+ 1160–1171
+ “Recent advancements in neural speech synthesis technologies have brought aboutwidespread applications but have also raised concerns about potential misuse and abuse.Addressing these challenges is crucial, particularly in the realms of forensics and intellec-tual property protection. While previous research on source attribution of synthesizedspeech has its limitations, our study aims to fill these gaps by investigating the identifi-cation of sources in synthesized speech. We focus on analyzing speech synthesis modelfingerprints in generated speech waveforms, emphasizing the roles of the acoustic modeland vocoder. Our research, based on the multi-speaker LibriTTS dataset, reveals twokey insights: (1) both vocoders and acoustic models leave distinct, model-specific fin-gerprints on generated waveforms, and (2) vocoder fingerprints, being more dominant,may obscure those from the acoustic model. These findings underscore the presence ofmodel-specific fingerprints in both components, suggesting their potential significance insource identification applications.”
+ 2024.ccl-1.90
+ eng
+ chuyuan-etal-2024-distinguishing
+
+
+ Knowledge Graph-Enhanced Recommendation with Box Embeddings
+ LiangQiuyu
+ WangWeihua
+ LvLei
+ BaoFeilong
+ 1172–1182
+ “Knowledge graphs are used to alleviate the problems of data sparsity and cold starts in recom-mendation systems. However, most existing approaches ignore the hierarchical structure of theknowledge graph. In this paper, we propose a box embedding method for knowledge graph-enhanced recommendation system. Specifically, the box embedding represents not only the in-teraction between the user and the item, but also the head entity, the tail entity and the relationbetween them in the knowledge graph. Then the interaction between the item and the corre-sponding entity is calculated by the multi-task attention unit. Experimental results show thatour method provides a large improvement over previous models in terms of Area Under Curve(AUC) and accuracy in publicly available recommendation datasets with three different domains.”
+ 2024.ccl-1.91
+ eng
+ qiuyu-etal-2024-knowledge
+
+
+ Readability-guided Idiom-aware Sentence Simplification (RISS) for Chinese
+ ZhangJingshen
+ ChenXinglu
+ QiuXinying
+ WangZhimin
+ FengWenhe
+ 1183–1200
+ “Chinese sentence simplification faces challenges due to the lack of large-scale labeledparallel corpora and the prevalence of idioms. To address these challenges, we pro-pose Readability-guided Idiom-aware Sentence Simplification (RISS), a novel frameworkthat combines data augmentation techniques. RISS introduces two key components: (1)Readability-guided Paraphrase Selection (RPS), a method for mining high-quality sen-tence pairs, and (2) Idiom-aware Simplification (IAS), a model that enhances the compre-hension and simplification of idiomatic expressions. By integrating RPS and IAS usingmulti-stage and multi-task learning strategies, RISS outperforms previous state-of-the-artmethods on two Chinese sentence simplification datasets. Furthermore, RISS achievesadditional improvements when fine-tuned on a small labeled dataset. Our approachdemonstrates the potential for more effective and accessible Chinese text simplification.”
+ 2024.ccl-1.92
+ eng
+ jingshen-etal-2024-readability
+
+
+ A Tone-based Hierarchical Structure of Chinese Prosody
+ LiYa
+ 1201–1211
+ “In Chinese speech engineering, many projects use a conventional, syllable-based prosodic hierarchyas an underlying framework to process natural or synthesized speech. However, Chinese as a tonelanguage has its own way of expressing prosody, that is, through tonal interaction, especially tonesandhi. By utilizing the capacity of tone as a dual unit of pitch and timing, the present study proposesa tone-based, three-layer-four-level structure for Chinese prosody. The three layers are tone, toneprosody, and intonation, respectively composed of one level of pitch units, two levels of toneprosody units (basic and derived), and one level of intonation units. These four levels of units areused to replace syllable, prosodic word, phonological phrase, and intonational phrase in aconventional hierarchy. Tone prosody units are established based on sizes or types of tone sandhidomains, so when applied to the same clause uttered in Mandarin and Shanghai Wu Chinese, theyare timed differently and branched toward different directions at different levels, hence capable ofcapturing rhythmic and melodic patterns of the two distinctive types of Chinese. Overall, given itstheory-friendly design, the proposed structure may be used as a unifying framework in Chinesespeech engineering.”
+ 2024.ccl-1.93
+ eng
+ ya-2024-tone
+
+
+ Linguistic Guidance for Sequence-to-Sequence AMR Parsing
+ TangBinghao
+ LinBoda
+ LiSi
+ 1212–1222
+ “The Abstract Meaning Representation (AMR) parsing aims at capturing the meaning of a sen-tence in the form of an AMR graph. Sequence-to-sequence (seq2seq)-based methods, utilizingpowerful Encoder-Decoder pre-trained language models (PLMs), have shown promising perfor-mance. Subsequent works have further improved the utilization of AMR graph information forseq2seq models. However, seq2seq models generate output sequence incrementally, and inac-curate subsequence at the beginning can negatively impact final outputs, also the interconnec-tion between other linguistic representation formats and AMR remains an underexplored domainin existing research. To mitigate the issue of error propagation and to investigate the guidinginfluence of other representation formats on PLMs, we propose a novel approach of LinguisticGuidance for Seq2seq AMR parsing (LGSA). Our proposed LGSA incorporates the very limitedinformation of various linguistic representation formats as guidance on the Encoder side, whichcan effectively enhance PLMs to their further potential, and boost AMR parsing. The resultson proverbial benchmark AMR2.0 and AMR3.0 demonstrate the efficacy of LGSA, which canimprove seq2seq AMR parsers without silver AMR data or alignment information. Moreover,we evaluate the generalization of LGSA by conducting experiments on out-of-domain datasets,and the results indicate that LGSA is even effective in such challenging scenarios.”
+ 2024.ccl-1.94
+ eng
+ binghao-etal-2024-linguistic
+
+
+ Automatic Construction of the English Sentence Pattern Structure Treebank for Chinese ESL learners
+ ZhuLin
+ XuMeng
+ GuoWenya
+ YuJingsi
+ YangLiner
+ CaoZehuang
+ HuangYuan
+ YangErhong
+ 1223–1238
+ “Analyzing long and complicated sentences has always been a priority and challenge in Englishlearning. In order to conduct the parse of these sentences for Chinese English as Second Lan-guage (ESL) learners, we design the English Sentence Pattern Structure (ESPS) based on theSentence Diagramming theory. Then, we automatically construct the English Sentence PatternStructure Treebank (ESPST) through the method of rule conversion based on constituency struc-ture and evaluate the conversion results. In addition, we set up two comparative experiments,using trained parser and large language models (LLMs). The results prove that the rule-basedconversion approach is effective.”
+ 2024.ccl-1.95
+ eng
+ lin-etal-2024-automatic
+
+
+ Cost-efficient Crowdsourcing for Span-based Sequence Labeling:Worker Selection and Data Augmentation
+ WangYujie
+ HuangChao
+ YangLiner
+ FangZhixuan
+ HuangYaping
+ LiuYang
+ YuJingsi
+ YangErhong
+ 1239–1256
+ “This paper introduces a novel crowdsourcing worker selection algorithm, enhancing annotationquality and reducing costs. Unlike previous studies targeting simpler tasks, this study con-tends with the complexities of label interdependencies in sequence labeling. The proposedalgorithm utilizes a Combinatorial Multi-Armed Bandit (CMAB) approach for worker selec-tion, and a cost-effective human feedback mechanism. The challenge of dealing with imbal-anced and small-scale datasets, which hinders offline simulation of worker selection, is tack-led using an innovative data augmentation method termed shifting, expanding, and shrink-ing (SES). Rigorous testing on CoNLL 2003 NER and Chinese OEI datasets showcased thealgorithm’s efficiency, with an increase in F1 score up to 100.04% of the expert-only base-line, alongside cost savings up to 65.97%. The paper also encompasses a dataset-independenttest emulating annotation evaluation through a Bernoulli distribution, which still led to animpressive 97.56% F1 score of the expert baseline and 59.88% cost savings. Furthermore,our approach can be seamlessly integrated into Reinforcement Learning from Human Feed-back (RLHF) systems, offering a cost-effective solution for obtaining human feedback. All re-sources, including source code and datasets, are available to the broader research community athttps://github.com/blcuicall/nlp-crowdsourcing.”
+ 2024.ccl-1.96
+ eng
+ yujie-etal-2024-cost
+
+
+ DLUE: Benchmarking Document Language Understanding
+ XuRuoxi
+ LinHongyu
+ GuanXinyan
+ SunYingfei
+ SunLe
+ 1257–1269
+ “Understanding documents is central to many real-world tasks but remains a challenging topic.Unfortunately, there is no well-established consensus on how to comprehensively evaluate docu-ment understanding abilities, which significantly hinders the fair comparison and measuring theprogress of the field. To benchmark document understanding researches, this paper summarizesfour representative abilities, i.e., document classification, document structural analysis, docu-ment information extraction, and document transcription. Under the new evaluation framework,we propose Document Language Understanding Evaluation – DLUE, a new task suite whichcovers a wide-range of tasks in various forms, domains and document genres. We also systemat-ically evaluate six well-established transformer models and representative LLMs on DLUE, andfind that due to the lengthy content, complicated underlying structure and dispersed knowledge,document understanding is still far from being solved in complex real-world scenarios.”
+ 2024.ccl-1.97
+ eng
+ ruoxi-etal-2024-dlue
+
+
+ Do Large Language Models Understand Conversational Implicature- A case study with a Chinese sitcom
+ YueShisen
+ SongSiyuan
+ ChengXinyuan
+ HuHai
+ 1270–1285
+ “Understanding the non-literal meaning of an utterance is critical for large language models(LLMs) to become human-like social communicators. In this work, we introduce SwordsmanImp,the first Chinese multi-turn-dialogue-based dataset aimed at conversational implicature, sourcedfrom dialogues in the Chinese sitcom My Own Swordsman. It includes 200 carefully handcraftedquestions, all annotated on which Gricean maxims have been violated. We test eight close-sourceand open-source LLMs under two tasks: a multiple-choice question task and an implicature ex-planation task. Our results show that GPT-4 attains human-level accuracy (94%) on multiple-choice questions. CausalLM demonstrates a 78.5% accuracy following GPT-4. Other models,including GPT3.5 and several open-source models, demonstrate a lower accuracy ranging from20% to 60% on multiple-choice questions. Human raters were asked to rate the explanation ofthe implicatures generated by LLMs on their reasonability, logic and fluency. While all mod-els generate largely fluent and self-consistent text, their explanations score low on reasonabilityexcept for GPT-4, suggesting that most LLMs cannot produce satisfactory explanations of theimplicatures in the conversation. Moreover, we find LLMs’ performance does not vary signif-icantly by Gricean maxims, suggesting that LLMs do not seem to process implicatures derivedfrom different maxims differently. Our data and code are available at https://github.com/sjtu-compling/llm-pragmatics.”
+ 2024.ccl-1.98
+ eng
+ shisen-etal-2024-large
+
+
+ EmoFake: An Initial Dataset for Emotion Fake Audio Detection
+ ZhaoYan
+ YiJiangyan
+ TaoJianhua
+ WangChenglong
+ DongYongfeng
+ 1286–1297
+ “To enhance the effectiveness of fake audio detection techniques, researchers have developed mul-tiple datasets such as those for the ASVspoof and ADD challenges. These datasets typically focuson capturing non-emotional characteristics in speech, such as the identity of the speaker and theauthenticity of the content. However, they often overlook changes in the emotional state of theaudio, which is another crucial dimension affecting the authenticity of speech. Therefore, thisstudy reports our progress in developing such an emotion fake audio detection dataset involvingchanging emotion state of the origin audio named EmoFake. The audio samples in EmoFake aregenerated using open-source emotional voice conversion models, intended to simulate potentialemotional tampering scenarios in real-world settings. We conducted a series of benchmark ex-periments on this dataset, and the results show that even advanced fake audio detection modelstrained on the ASVspoof 2019 LA dataset and the ADD 2022 track 3.2 dataset face challengeswith EmoFake. The EmoFake is publicly available1 now.”
+ 2024.ccl-1.99
+ eng
+ yan-etal-2024-emofake
+
+
+ Going Beyond Passages: Readability Assessment for Book-level Long Texts
+ LiWenbiao
+ SunRui
+ ZhangTianyi
+ WuYunfang
+ 1298–1309
+ “Readability assessment for book-level long text is widely needed in real educational applica-tions. However, most of the current researches focus on passage-level readability assessmentand little work has been done to process ultra-long texts. In order to process the long sequenceof book texts better and to enhance pretrained models with difficulty knowledge, we propose anovel model DSDR, difficulty-aware segment pre-training and difficulty multi-view representa-tion. Specifically, we split all books into multiple fixed-length segments and employ unsuper-vised clustering to obtain difficulty-aware segments, which are used to re-train the pretrainedmodel to learn difficulty knowledge. Accordingly, a long text is represented by averaging mul-tiple vectors of segments with varying difficulty levels. We construct a new dataset of GradedChildren’s Books to evaluate model performance. Our proposed model achieves promising re-sults, outperforming both the traditional SVM classifier and several popular pretrained models.In addition, our work establishes a new prototype for book-level readability assessment, whichprovides an important benchmark for related research in future work.”
+ 2024.ccl-1.100
+ eng
+ wenbiao-etal-2024-going
+
+
+ Mitigating the Bias of Large Language Model Evaluation
+ ZhouHongli
+ HuangHui
+ LongYunfei
+ XuBing
+ ZhuConghui
+ CaoHailong
+ YangMuyun
+ ZhaoTiejun
+ 1310–1319
+ “Recently, there has been a trend of evaluating the Large Language Model (LLM) quality in theflavor of LLM-as-a-Judge, namely leveraging another LLM to evaluate the current output qual-ity. However, existing judges are proven to be biased, namely they would favor answers whichpresent better superficial quality (such as verbosity, fluency) while ignoring the instruction fol-lowing ability. In this work, we propose systematic research about the bias of LLM-as-a-Judge.Specifically, for closed-source judge models, we apply calibration to mitigate the significance ofsuperficial quality, both on probability level and prompt level. For open-source judge models, wepropose to mitigate the bias by contrastive training, with curated negative samples that deviatefrom instruction but present better superficial quality. We apply our methods on the bias evalu-ation benchmark, and experiment results show our methods mitigate the bias by a large marginwhile maintaining a satisfactory evaluation accuracy.”
+ 2024.ccl-1.101
+ eng
+ hongli-etal-2024-mitigating
+
+
+ PPDAC: A Plug-and -Play Data Augmentation Component for Few-shot Extractive Question Answering
+ HuangQi
+ FuHan
+ LuoWenbin
+ WangMingwen
+ LuoKaiwei
+ 1320–1333
+ “Extractive Question Answering (EQA) in the few-shot learning scenario is one of the most chal-lenging tasks of Machine Reading Comprehension (MRC). Some previous works employ exter-nal knowledge for data augmentation to improve the performance of few-shot extractive ques-tion answering. However, there are not always available external knowledge or language- anddomain-specific NLP tools to deal with external knowledge such as part-of-speech taggers, syn-tactic parsers, and named-entity recognizers. In this paper, we present a novel Plug-and-PlayData Augmentation Component (PPDAC) for the few-shot extractive question answering, whichincludes a paraphrase generator and a paraphrase selector. Specifically, we generate multipleparaphrases of the question in the (question, passage, answer) triples using the paraphrase gener-ator and then obtain highly similar statements via paraphrase selector to form more training datafor fine-tuning. Extensive experiments on multiple EQA datasets show that our proposed plug-and-play data augmentation component significantly improves question-answering performance,and consistently outperforms state-of-the-art approaches in few-shot settings by a large margin.”
+ 2024.ccl-1.102
+ eng
+ qi-etal-2024-ppdac
+
+
+ Sentence-Space Metrics (SSM) for the Evaluation of Sentence Comprehension
+ LinJieyu
+ ChenHonghua
+ DingNai
+ 1334–1350
+ “It is a fundamental challenge to evaluate whether a model can truly capture the meaning ofsentences. Evaluation of whether a model well captures the meaning of individual words, how-ever, can be effectively achieved by analyzing whether the model encodes words in a vectorspace where semantically similar words form clusters. Inspired by this approach, we propose theSentence-Space Metrics (SSM) to evaluate model interpretation of sentences, and the sentencespace is constructed based on the pairwise entailment relationships between all sentence pairswithin a sentence pool. We use three metrics to evaluate a sentence space, i.e., (1) sparsity, (2)clustering of related sentences, and (3) similarity with the sentence space measured from hu-mans. The SSM is applied to evaluate 20 models, including ChatGPT, 18 BERT-family modelsfine-tuned for Natural Language Inference (NLI) task, as well as SimCSE, a sentence representa-tion model. The SSM reveals dramatic differences among models: Although all models achievehigh accuracy on standard NLI datasets such as MNLI, none of them mirrors the human behaviorunder the SSM. These results demonstrate that, compared with traditional accuracy measures,the SSM considers pairwise relationships between hundreds of sentences and therefore providea more fine-grained evaluation of model interpretation of sentences.Introduction”
+ 2024.ccl-1.103
+ eng
+ jieyu-etal-2024-sentence
+
+
+ AuditWen: An Open-Source Large Language Model for Audit
+ HuangJiajia
+ ZhuHaoran
+ XuChao
+ ZhanTianming
+ XieQianqian
+ HuangJimin
+ 1351–1365
+ “Intelligent auditing represents a crucial advancement in modern audit practices, enhancing boththe quality and efficiency of audits within the realm of artificial intelligence. With the rise oflarge language model (LLM), there is enormous potential for intelligent models to contribute toaudit domain. However, general LLMs applied in audit domain face the challenges of lackingspecialized knowledge and the presence of data biases. To overcome these challenges, this studyintroduces AuditWen, an open-source audit LLM by fine-tuning Qwen with constructing instruc-tion data from audit domain. We first outline the application scenarios for LLMs in the audit andextract requirements that shape the development of LLMs tailored for audit purposes. We thenpropose an audit LLM, called AuditWen, by fine-tuning Qwen with constructing 30k instructiondataset from 15 audit tasks and 3 layers. In evaluation stage, we proposed a benchmark with 5kinstructions that covers a set of critical audit tasks derived from the application scenarios. Withthe benchmark, we compare AuditWen with other existing LLMs from information extraction,question answering and document generation. The experimental results demonstrate superiorperformance of AuditWen both in question understanding and answer generation, making it animmediately valuable tool for audit.Keyword AuditWen, LLM, instruction dataset, fine-tuning, benchmarkIntroduction”
+ 2024.ccl-1.104
+ eng
+ jiajia-etal-2024-auditwen
+
+
+ Chinese Grammatical Error Correction via Large Language Model Guided Optimization Training
+ LiuXiao
+ LiYing
+ YuZhengtao
+ 1366–1380
+ “Pre-trained language model-based methods for Chinese Grammatical Error Correction (CGEC)are categorized into Seq2Seq and Seq2Edit types. However, both Seq2Seq and Seq2Edit mod-els depend on high-quality training data significantly. Considering the strong generation andinference ability of large language models (LLMs), we propose a large language model-guidedoptimization training method to exploit LLMs to extract error knowledge to optimize the tradi-tional CGEC model training process. On the one hand, we use error types and confusion sets asextra knowledge to guide LLMs to generate diverse pseudo data, thus extending the error distri-bution of our training data. On the other hand, LLMs are utilized to infer the predicted resultsfrom our CGEC models and obtain the re-training data, thus iteratively optimizing our pre-trainedCGEC models. Experiments on two benchmark datasets show that our LLMs-guided optimiza-tion method with small-scale training data can achieve comparable results with baseline modelswith large-scale training data. Detailed comparison experiments demonstrate that both the earlydeviser pseudo data and the later re-training data are extremely useful for traditional CGEC modeloptimization training, and can benefit from each other. We will release our code and prompts athttps://github.com/SakuraAcedia/llm-cgec-got to facilitate future work.”
+ 2024.ccl-1.105
+ eng
+ xiao-etal-2024-chinese
+
+
+ Pattern Shifting or Knowledge Losing? A Forgetting Perspective for Understanding the Effect of Instruction Fine-Tuning
+ ZhangChunkang
+ CaoBoxi
+ LuYaojie
+ LinHongyu
+ CaoLiu
+ ZengKe
+ WanGuanglu
+ CaiXunliang
+ HanXianpei
+ SunLe
+ 1381–1394
+ “Instruction Fine-Tuning(IFT) emerges as an essential step of training large language models torobustly carry out tasks of interest. However, there lacks a systematic investigation about theunderlying mechanisms of instruction fine-tuning, particularly on the forgetting phenomenonafter IFT, known as alignment tax. Therefore, to understand the mechanism of IFT from theforgetting perspective, we investigate the alternation of the text pattern and knowledge withinmodels throughout the entire IFT process. Specifically, we restore fine-tuned models to their baseversion by training them on the data sharing a similar distribution with the pre-training corpusand compare their results Our experiment indicates that there is a stage transition of forgettingduring IFT process: (1) Pseudo Forgetting: in this stage, models mainly shift their familiar textpattern away from pre-training data format while the world knowledge is preserved. Consequently,models will recover to their original performance when they are restored to the base version. (2)Actual Forgetting: in this stage, models forget the acquired knowledge as well. Therefore, theyfail to reach the original performance even if they are restored to the base version.”
+ 2024.ccl-1.106
+ eng
+ chunkang-etal-2024-pattern
+
+
+ Prior Constraints-based Reward Model Training for Aligning Large Language Models
+ ZhouHang
+ WangChenglong
+ HuYimin
+ XiaoTong
+ ZhangChunliang
+ ZhuJingbo
+ 1395–1407
+ “Reinforcement learning with human feedback for aligning large language models (LLMs) trainsa reward model typically using ranking loss with comparison pairs. However, the training pro-cedure suffers from an inherent problem: the uncontrolled scaling of reward scores during rein-forcement learning due to the lack of constraints while training the reward model. This paperproposes a Prior Constraints-based Reward Model (PCRM) training method to mitigate thisproblem. PCRM incorporates prior constraints—specifically, length ratio and cosine similaritybetween outputs of each comparison pair—during reward model training to regulate optimiza-tion magnitude and control score margins. We comprehensively evaluate PCRM by examining itsrank correlation with human preferences and its effectiveness in aligning LLMs via RL. Exper-imental results demonstrate that PCRM significantly improves alignment performance by effec-tively constraining reward score scaling. As another bonus, our method is easily integrated intoarbitrary rank-based alignment methods, such as direct preference optimization, and can yieldconsistent improvement. The code is available at https://github.com/wangclnlp/DeepSpeed-Chat-Extension/tree/PCRM.”
+ 2024.ccl-1.107
+ eng
+ hang-etal-2024-prior
+
+
+ Prompt Engineering 101 Prompt Engineering Guidelines from a Linguistic Perspective
+ HanWenjuan
+ WeiXiang
+ CuiXingyu
+ ChengNing
+ JiangGuangyuan
+ QianWeinan
+ ZhangChi
+ 1408–1426
+ “Deploying tuning-free prompting is challenging in engineering practice: it not only requiresusers to engage in cumbersome trials and errors but is also extremely time-consuming,as even a slight change in wording and phrasing could have a huge impact on the finalperformance. To further investigate the impact of different prompts, in this work, weperform a systematic inspection of four factors in linguistics involved in prompt engineering:syntax, semantics, lexicon, and pragmatics. The empirical results quantify the sensitivityof the output to small textual perturbations in four linguistic factors of prompts. Basedon the analysis of these four factors, we present a series of design guidelines to helphuman users write effective prompts. Human evaluation on amateurs shows that usingthe proposed guidelines helps humans produce prompts with significant gains in zero-shotperformance in Pre-trained Language Models (PLMs) and hence validates the utility ofthe guidelines.”
+ 2024.ccl-1.108
+ eng
+ wenjuan-etal-2024-prompt
+
+
+
+
+ Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 2: Frontier Forum)
+ XinZhao
+ Chinese Information Processing Society of China
+ Taiyuan, China
+ July
+ 2024
+ 2024.ccl-2
+ ccl
+
+
+ 从多模态预训练到多模态大模型:架构、训练、评测、趋势概览(From Multi-Modal Pre-Training to Multi-Modal Large Language Models: An Overview of Architectures, Training,)
+ LiZejun泽君李
+ ZhangJiwen霁雯张
+ WangYe晔王
+ DuMengfei梦飞杜
+ LiuQingwen晴雯刘
+ WangDianyi殿仪王
+ WuBinhao斌浩吴
+ LuoRuipu瑞璞罗
+ HuangXuanjing萱菁黄
+ WeiZhongyu忠钰魏
+ 1–33
+ “多媒体信息在人类社会的发展历程中有着至关重要的作用,构建具有多模态信息处理能力的智能系统也是通往通用人工智能的必经之路。随着预训练技术的发展以及对于通用模型的需求,多模态的研究也从早期的任务特定的方法转移到了构建统一泛用的多模态基座模型上。初步的统一多模态模型探索受到BERT启发,从表征学习的角度出发构建能为不同下游任务提供有效初始化的多模态预训练模型,这类方法尽管有效但仍然在泛用性方面受限于预训练中微调范式,无法更广泛高效地应用。近年来随着大语言模型的发展,以大语言模型为基座的多模态大模型则展现出了巨大的潜力:此类模型有着强大的信息感知,交互,以及推理能力并且能有效泛化到多样的场景下,为新时代的通用人工智能系统提供了切实可行的思路。本文将从构建统一多模态模型的角度出发,介绍和梳理相关工作的发展,从多模态预训练到多模态大模型,介绍对应的架构,训练,评测方法以及发展趋势,为读者提供一个全面的概览。”
+ 2024.ccl-2.1
+ zho
+ zejun-etal-2024-cong
+
+
+ 大模型工具学习进展与挑战(Challenges and Advances in Tool Learning with Foundation Models)
+ LinYankai衍凯林
+ 34–47
+ “本论文综述了大模型工具学习的最新进展与挑战。工具作为人类智慧和能力的延伸,在提升生产力和解决问题方面至关重要。随着大语言模型(Large Language Models)的突破,工具学习得到了广泛关注,通过动态调用外部工具,显著增强了模型解决复杂问题的能力。本文介绍了一个通用的大模型工具学习框架,包括控制器、工具集、环境和感知器四个核心组件。我们详细探讨了四个关键问题:意图理解、规划、工具使用和记忆管理。在意图理解方面,模型需要准确解析用户的输入和隐含意图。规划能力使模型能够将复杂任务分解为可执行的子任务。工具使用方面,介绍了示范学习、教程学习和探索学习三种主要训练策略,通过观察人类示范、阅读工具手册和直接探索来提升模型能力。记忆管理方面,提出了动态记忆管理和信息优先级管理等方法,以提高模型处理复杂任务的效率和准确性。本文分析了当前大模型工具学习的研究进展和每个领域的挑战,为未来研究提供了有价值的见解。希望通过这篇综述,能帮助研究人员和开发者更好地理解和推进大模型工具学习领域的发展。”
+ 2024.ccl-2.2
+ zho
+ yankai-2024-da
+
+
+ 大模型逻辑推理研究综述(Survey on Logical Reasoning of Large Pre-trained Language Models)
+ LiuHanmeng汉蒙刘
+ ZhangYue岳张
+ 48–62
+ “理解自然语言的逻辑结构和关系是机器理解的核心任务,也是人工智能领域的关键研究议题。随着大数据和计算能力的提升,预训练语言模型在逻辑推理方面取得了显著进展,使得大规模模型的逻辑推理能力成为研究的新焦点。本综述旨在全面梳理大模型在逻辑推理领域的研究进展,探讨其对人工智能系统智能水平评估的重要性及其在推动人工智能发展中的作用。 本文首先界定了大模型逻辑推理能力的研究范畴,系统性地讨论了逻辑推理的类型和 特点,并回顾了相关理论的发展,为研究提供了清晰的框架。接着,从任务形式和数 据基准的角度,详细介绍了逻辑推理研究的基础工作,为理解大模型的性能提供了基 准。进一步,本文深入分析了大模型在逻辑推理能力上的现状,通过不同推理类型的 案例研究,展示了大模型的能力表现。同时,本文还探讨了提升大模型逻辑推理能力 的方法,包括预训练、指令微调、解码策略和神经符号混合方法,并对这些方法进行 了比较分析。最后,本文提出了对未来研究方向的展望,旨在激发更多的学术讨论和 探索,推动逻辑推理能力研究的进一步发展。”
+ 2024.ccl-2.3
+ zho
+ hanmeng-yue-2024-da
+
+
+ 大模型时代的多语言研究综述(A Survey of Multilingual Research in the Large Language Model Era)
+ GaoChangjiang长江高
+ ZhouHao昊周
+ SheShuaijie帅杰佘
+ ZhongHaoming昊鸣钟
+ LiuSizhe斯哲刘
+ LaiZhejian哲剑赖
+ WangZhijun志军王
+ HuangShujian书剑黄
+ 63–85
+ “进入大语言模型时代以来,传统的多语言研究模式发生了巨大变化。一些传统任务得到了突破性的解决,也出现了多种新任务,以及许多以多语言大模型为基础、面向大模型能力提升的多语言研究工作。本文针对研究领域中的这一新变化,整理归纳了进入大模型时代以来的多语言研究进展,包括多语言大模型、数据集、任务,以及相关的前沿研究方向、研究挑战等,希望能为大模型范式下的多语言研究的未来发展提供参考和帮助。”
+ 2024.ccl-2.4
+ zho
+ changjiang-etal-2024-da
+
+
+ 大语言模型合成数据方法简述(A Brief Introduction to Synthetic Data for Large Language Model)
+ LiPeiji培基李
+ MaYichuan逸川马
+ YanHang航颜
+ 86–97
+ “大语言模型在过去两年受到了极大的关注,并引起了对通用人工智能的广泛讨论。为了实现通用人工智能,合成数据被认为是其中非常关键的一环。本文将当前常见的数据合成方法归为三类,基于蒸馏的合成数据、基于模型自我进化、基于工具的合成数据。针对每一类合成数据方法,我们简要介绍了几种主流的做法,以期概览各类方法的基本思路以及异同。当前大部分合成数据方法都基于蒸馏,尽管这些方法取得了良好的效果,但其实质是将更强的大模型蒸馏到更小的大模型。这样的方法从降低大模型推理成本的角度具有实际意义,但对于进一步提升大模型能力上限作用有限。基于模型自我进化和基于工具的合成数据研究相对偏少,对于持续提升模型能力,这两个方向需要有更多探索。”
+ 2024.ccl-2.5
+ zho
+ peiji-etal-2024-da
+
+
+ 大语言模型时代的信息检索综述(A Review of Information Retrieval in the Era of Large Language Models)
+ PangLiang亮庞
+ DengJingcheng竞成邓
+ GuJia佳顾
+ ShenHuawei华伟沈
+ ChengXueqi学旗程
+ 98–119
+ “以大语言模型为代表的生成式人工智能迅猛发展,标志着人工智能从判别时代向生成时代的转变。这一进步极大地推动了信息检索技术的发展,本文对大语言模型对信息检索领域的影响进行了深入的综述。从性能改进到模式颠覆,逐步展开论述大语言模型对信息检索领域的影响。针对传统信息检索流程,大语言模型凭借强大的语义理解和建模能力,显著增强索引、检索和排序等信息检索模块的性能。同时,文章也探讨了大语言模型可能取代传统信息检索的趋势,并催生了新的信息获取方式,或将是新一次信息时代的寒武纪。此外,大语言模型对内容生态的深远影响也值得关注。”
+ 2024.ccl-2.6
+ zho
+ liang-etal-2024-da
+
+
+ 对齐的理论、技术与评估(Theories, Techniques, and Evaluation of AI Alignment)
+ JiJiaming嘉铭吉
+ QiuTianyi天异邱
+ ChenBoyuan博远陈
+ YangYaodong耀东杨
+ 120–140
+ “人工智能对齐(AI Alignment)旨在使人工智能系统的行为与人类的意图和价值观相一致。随着人工智能系统的能力日益增强,对齐失败带来的风险也在不断增加。数百位人工智能专家和公众人物已经表达了对人工智能风险的担忧,他们认为乜减轻人工智能带来的灭绝风险应该成为全球优先考虑的问题,与其他社会规模的风险如大流行病和核战争并列(CAIS,2023)。为了提供对齐领域的全面和最新概述,本文深入探讨了对齐的核心理论、技术和评估。首先,本文确定了人工智能对齐的四个关键目标:鲁棒性(Robustness)、可解释性(Interpretability)、可控性(Controllability)和道德性(Ethicality)(RICE)。在这四个目标原则的指导下,本文概述了当前人工智能对齐研究的全貌,并将其分解为两个关键组成部分:前向对齐和后向对齐。本文旨在为对齐研究提供全面且对初学者友好的调研。同时本文还发布并持续更新网站 www.alignmentsurvey.com,该网站提供了一系列教程、论文集和其他资源。更详尽的讨论与分析请见 https://arxiv.org/abs/2310.19852。”
+ 2024.ccl-2.7
+ zho
+ jiaming-etal-2024-dui
+
+
+ 基于大语言模型的自主智能体概述(A Survey on Large Language Model based Autonomous Agents Xu Chen)
+ ChenXu旭陈
+ 141–150
+ “近年来,基于大语言模型的自主智能体受到了学术界和工业界的广泛关注,其关键在于利用大语言模型作为核心控制器,并设计相应的辅助模块增强智能体在动态环境中的演化和适应能力,从而提升智能体自主解决任务的能力。本文通过总结过去工作,抽象出智能体设计的通用范式,并讨论了大模型时代自主智能体能力提升的途径。我们还从个体拓展到系统,深入探讨了多自主智能体系统常见的交互机制和面临的重要问题。”
+ 2024.ccl-2.8
+ zho
+ xu-2024-ji
+
+
+ 浅谈大模型时代下的检索增强:发展趋势、挑战与展望(Enhancing Large Language Models with Retrieval-Augmented Techniques: Trends, Challenges, and Prospects)
+ FengZhangyin掌印冯
+ ZhuKun坤朱
+ MaWeitao伟涛马
+ HuangLei磊黄
+ QinBing兵秦
+ LiuTing挺刘
+ FengXiaocheng骁骋冯
+ 151–168
+ “大型语言模型(LLM) 在各种自然语言任务上表现出了卓越的性能,但它们很容易受到过时数据和特定领域限制的影响。为了应对这些挑战,研究人员整合不同来源的外部信息来增强大语言模型,具体方法如检索增强等。在本文中,我们综合讨论了检索增强技术的发展趋势,包括检索时机规划、检索技术、以及检索结果的利用。此外,我们介绍了当前可用于检索增强任务的数据集和评价方法,并指出了应用和潜在研究方向。我们希望这项综述能够为社区提供对该研究领域的快速了解和全面概述,以启发未来的研究工作。”
+ 2024.ccl-2.9
+ zho
+ zhangyin-etal-2024-qian
+
+
+ 生成式文本质量的自动评估方法综述(A Survey of Automatic Evaluation on the Quality of Generated Text)
+ LanTian天兰
+ MaZiao梓奥马
+ ZhouYanghao杨浩周
+ XuChen晨徐
+ MaoXianling先领毛
+ 169–196
+ “人工评估,作为生成式文本质量评价的金标准,成本太高;自动评估,核心思想在于要使其评估结果与人工评估高度相关,从而实现对生成式文本质量的自动化分析和评价。随着自然语言处理领域相关技术的迭代进步,使得生成式文本质量的自动评估技术,已然经历了多次技术范式的迭代。然而,学界至今依然缺乏对生成式文本质量自动评估技术的系统化总结。因此,本文将首先系统地对已有的生成式文本自动评估方法进行归纳总结,然后分析了生成式文本自动评估方法的主要发展趋势,最后为了使读者更加宏观地了解自动评估整体,对自动评估领域整体的未来研究方向进行了探讨和展望。”
+ 2024.ccl-2.10
+ zho
+ tian-etal-2024-sheng
+
+
+
+
+ Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 3: Evaluations)
+ HongfeiLin
+ HongyeTan
+ BinLi
+ Chinese Information Processing Society of China
+ Taiyuan, China
+ July
+ 2024
+ 2024.ccl-3
+ ccl
+
+
+ Construction of CFSP Model Based on Non-Finetuning Large Language Model
+ HuangFugeng
+ GuoZhongbin
+ LiWenting
+ ChengHaibo
+ 1–9
+ “Chinese Frame Semantic Parsing (CFSP) is an important task in the field of Chinese Natural Language Processing(NLP). Its goal is to extract the frame semantic structure from the sentence and realize the deep understanding of the events or situations involved in the sentence. This paper mainly studies the application of Large Language Model (LLM) for reasoning through Prompt Engineering without fine-tuning the model, and completes three subtasks of Chinese Framework Semantic Parsing tasks: frame identification, argument Identification and role identification. This paper proposes a Retrieval Augmented Generation (RAG) method for target words, and constructs more refined sample Few-Shot method. We achieved the second place on the B rankings in the open track in the “CCL2024-Eval The Second Chinese Frame Semantic Parsing”competition*.”
+ 2024.ccl-3.1
+ eng
+ fugeng-etal-2024-construction
+
+
+ Application of Entity Classification Model Based on Different Position Embedding in Chinese Frame Semantic Parsing
+ ZhouHuirong
+ TianSujie
+ LiJunbo
+ YuanXiao
+ 10–20
+ “This paper addresses three subtasks of Chinese Frame Semantic Parsing based on the BERT and RoBERTa pre-trained models: Frame Identification, Argument Identification, and Role Identification. In the Frame Identification task, we utilize the BERT PLM with Rotary Positional Encoding for the semantic frame classification task. For the Argument Identification task, we employ the RoBERTa PLM with T5 position encoding for extraction tasks. In the Role Identification task, we use the RoBERTa PLM with ALiBi position encoding for the classification task. Ultimately, our approach achieved a score of 71.41 in the closed track of the B leaderboard, securing fourth place and validating the effectiveness of our method.”
+ 2024.ccl-3.2
+ eng
+ huirong-etal-2024-application
+
+
+ Leveraging LLMs for Chinese Frame Semantic Parsing
+ LiuYahui
+ GongChen
+ ZhangMin
+ 21–31
+ “We participate in the open track of the Chinese frame semantic parsing (CFSP) task, i.e., CCL24Eval Task 1, and our submission ranks first. FSP is an important task in Natural Language Processing, aiming to extract the frame semantic structures from sentences, which can be divided into three subtasks, e.g., Frame Identification (FI), Argument Identification (AI), and Role Identification (RI). In this paper, we use the LLM Gemini 1.0 to evaluate the three subtasks of CFSP, and present the techniques and strategies we employed to enhance subtasks performance. For FI, we leverage mapping and similarity strategies to minimize the candidate frames for each target word, which can reduce the complexity of the LLM in identifying the appropriate frame. For AI and RI subtasks, we utilize the results from small models as auxiliary information and apply data augmentation, self-training, and model ensemble techniques on these small models to further enhance the performance of subtasks.”
+ 2024.ccl-3.3
+ eng
+ yahui-etal-2024-leveraging
+
+
+ Chinese Frame Semantic Parsing Evaluation
+ YangPeiyuan
+ LiJuncai
+ YanZhichao
+ SuXuefeng
+ RuLi
+ 32–42
+ “Chinese Frame-semantic Parsing (CFSP) aims to extract fine-grained frame-semantic structures from texts, which can provide fine-grained semantic information for natural language understanding models to enhance their abilities of semantic representations. Based on the CCL-23 CFSP evaluation task, we introduce construction grammar to expand the targets, as basic units activating frames in texts, from word-style to construction-style, and publish a more challenging CFSP evaluation task in CCL-2024. The evaluation dataset consists of 22,000 annotated examples involving nearly 695 frames. The evaluation task is divided into three subtasks: frame identification, argument identification, and role identification, involving two tracks: close track and open track. The evaluation task has attracted wide attention from both industry and academia, with a total of 1988 participating teams. As for the evaluation results, the team from China University of Petroleum won the first place in the closed track with the final score of 71.34, while the team frome Suzhou University won the first place in the open track with the final socre of 48.77. In this article, we reports the key information about the evaluation task, including key concepts, evaluation dataset, top-3 results and corresponding methods. More information about this task can be found on the website of the CCL-2024 CFSP evaluation task.”
+ 2024.ccl-3.4
+ eng
+ peiyuan-etal-2024-chinese
+
+
+ 基于多个大语言模型微调的中文意合图语义解析
+ LiRang让李
+ 43–50
+ “中文意合图对句中成分间的关系进行层次化标注,能有效表示汉语的深层语义结构。传统方法难以对中文意合图中的特殊成分进行特征表示,而近期大语言模型性能的快速提高为复杂自然语言处理任务提供了一种全新思路。在本次任务中,我们尝试使用Prompt-Response方式对大模型进行LoRA微调,让大模型根据输入直接生成格式化的中文意合图三元组序列。我们广泛测试来自不同研发团队、拥有不同参数规模的七个主流大模型,评估基座模型、参数规模、量化训练等因素对微调后模型性能的影响。实验表明,我们的方法展现出远超依存模型的性能,在测试集和盲测集上的F1分别为0.6956和0.7206,获得了本次评测榜一的成绩。”
+ 2024.ccl-3.5
+ zho
+ rang-2024-ji
+
+
+ Chinese Parataxis Graph(CPG) Parsing Based on Large Language Models
+ SunYueYi
+ WangYuxuan
+ 51–61
+ “This paper presents the work submitted for the 23rd China National Conference on Computational Linguistics(Evaluation Workshop)(CCL24-Eval), focusing on the Chinese Parataxis Graph (CPG) Parsing task. CPG represents Chinese natural language hierarchically through relational triplets, providing a consistent representation for linguistic units of varying levels. Our approach has used large-scale language models through full fine-tuning, achieving the result with F1 value at 71.6% in the contest and 74.76% after the contest. Furtehrmore, our team has proposed a combined model that integrates multiple LoRA fine-tuned medium-scale models after the contest. This approach is able to minimize the time and space consumption while keeping the performance of CPG construction task relatively high.”
+ 2024.ccl-3.6
+ eng
+ yueyi-yuxuan-2024-chinese
+
+
+ 基于关系抽取的中文意合图语义解析方法研究
+ HuoHongying虹颖霍
+ HuangShaoping少平黄
+ LiuPengyuan鹏远刘
+ 62–71
+ “意合图是以事件为中心的单根有向语义表征图,在语义计算与应用方面具有重要价值。在乃乃乌中串丰串临中文意合图语义解析评测任务中,为克服意合图为单根有向图、意合图包含隐性事件词以及意合图的语义关系类型十分丰富,导致关系类型过多等诸多方面的难点,本文提出一种将该任务转换为关系抽取的方法。该方法首先对标签进行扩充,分为正向标签和反向标签;其次,对输入进行扩充,将隐性事件词添加到输入中,无须额外对隐性事词进行预测;最后,细分为不带隐性事件词和带隐性事件词的关系抽取任务。实验结果表明,本文方法在官方盲测集上的F1值为64.44%,高出基线模型33.41%,证明了本文方法的有效性。”
+ 2024.ccl-3.7
+ zho
+ hongying-etal-2024-ji
+
+
+ 基于样本设计工程和大模型微调的中文意合图语义解析∗
+ SiHan函司
+ LuoZhiyong智勇罗
+ 72–79
+ “本文介绍了我们在第二十三届中国计算语言学大会中文意合图语义解析评测中提交的参赛系统。中文意合图(Chinese Parataxis Graph,CPG)是以事件为中心的语义表征图,可以对不同层级的语言单元作一贯式表示,是一种通用性与扩展性兼具的语义表征方法。鉴于大语言模型在语义解析任务中的优越性能,我们对Llama3-Chinese-8B-Instruct模型进行了LoRA微调,使其能够生成结构化的意合图表征三元组,并采用了样本设计工程(Sample Design Engineering,SDE)技巧进行微调样本的设计。此外,我们还对不同标签进行了分类微调,探究大模型在不同语义标签预测能力上的差异。最终,我们的参赛系统在任务发布的评测集上F1值达到0.6461,在本次评测任务中获得了第三名的成绩。”
+ 2024.ccl-3.8
+ zho
+ han-zhiyong-2024-ji
+
+
+ 中文意合图语义解析评测
+ GuoMengxi梦溪郭
+ LiMeng梦李
+ JinZeying泽莹靳
+ WuXiaojing晓靖吴
+ RaoGaoqi高琦饶
+ TangGongbo共波唐
+ XunEndong恩东荀
+ 80–86
+ “中文意合图是近年提出的中文语义表示方法。本次评测是首次基于意合图理论的语义分析评测,旨在探索面向意合图理论的语义计算方法,评估机器的语义分析能力。本次评测共有14支队伍报名,最终有7支队伍提交结果,其中有5支队伍提交技术报告与模型,均成功复现。在评测截止时间内,表现最好的队伍使用大语言模型LoRA微调方法获得了F1值为72.06%的成绩。在最终提交技术报告的5支队伍中,有4支队伍使用了大语言模型微调方法,在一定程度上表明了目前技术发展的趋势。”
+ 2024.ccl-3.9
+ zho
+ mengxi-etal-2024-zhong
+
+
+ 基于参数高效微调与半监督学习的空间语义理解
+ LiChenyang晨阳李
+ ZhangLong龙张
+ ZhengQiusheng秋生郑
+ 87–94
+ “本文介绍了我们在第二十三届中文计算语言大会的第四届中文空间语义理解评测任务中提交的参赛模型。该任务旨在测试机器的中文语义理解水平。现有研究显示,机器的中文语义理解水平与人类平均水平相比仍有较大差距。近年来,生成式大规模语言模型在自然语言处理任务中展现了出色的生成和泛化能力。在本次评测中,我们采用了对Qwen1.5-7b模型进行高效微调的方法,以端到端的形式实现空间语义的推理过程,并结合prompt优化和半监督学习提升推理表现。实验结果表明,我们的模型在该任务中取得了领先的效果。”
+ 2024.ccl-3.10
+ zho
+ chenyang-etal-2024-ji
+
+
+ 基于大型语言模型的中文空间语义评测
+ HuoShitu世图霍
+ WangYujun钰君王
+ WuTongjie童杰吴
+ 95–105
+ “本研究的任务旨在让大模型进行实体识别、角色识别、异常识别、信息推理、同义识别任务,综合评估大模型的空间语义理解能力。其中,我们使用普通提示词、工作流提示词和思维链三种提示词策略来探讨大模型的空间语义理解能力,最后发现ERNIE-4在1-shot的普通提示词上表现最佳。最终,我们的方法排名第六,总体准确率得分为56.20%。”
+ 2024.ccl-3.11
+ zho
+ shitu-etal-2024-ji
+
+
+ 基于上下文学习与思维链策略的中文空间语义理解
+ WangShiquan士权王
+ FuWeiwei薇薇付
+ FangRuiyu瑞玉方
+ LiMengxiang孟祥李
+ HeZhongjiang忠江何
+ LiYongxiang永翔李
+ SongShuangyong双永宋
+ 106–112
+ “本技术报告详细介绍了我们团队参加第四届中文空间语义理解评测(SpaCE2024)的方法和成果。SpaCE2024旨在全面测试机器对中文空间语义的理解能力,包括空间信息实体识别、空间信息实体识别、空间信息异常识别、空间方位信息推理和空间异形同义识别五个不同的任务。我们团队采用精心设计的prompt并结合微调的方式激发大语言模型的空间语义理解能力,构建了一个高效的空间语义理解系统。在最终的评估中,我们在空间信息实体识别题目中准确率为0.8947,在空间信息实体识别题目中准确率为0.9364,在空间信息异常识别题目中准确率为0.8480,在空间方位信息推理题目中准确率为0.3471,在空间异形同义识别题目中准确率为0.5631,测试集综合准确率为0.6024,排名第一。”
+ 2024.ccl-3.12
+ zho
+ shiquan-etal-2024-ji
+
+
+ 基于上下文学习的空间语义理解
+ WuHongyan洪艳武
+ LinNankai楠铠林
+ CengPeijian培健曾
+ ZhengWeixiong伟雄郑
+ JiangShengyi盛益蒋
+ YangAimin爱民阳
+ 113–121
+ “空间语义理解任务致力于使语言模型能够准确解析和理解文本中描述的物体间的空间方位关系,这一能力对于深入理解自然语言并支持复杂的空间推理至关重要。本文聚焦于探索大模型的上下文学习策略在空间语义理解任务上的有效性,提出了一种基于选项相似度与空间语义理解能力相似度的样本选择策略。本文将上下文学习与高效微调融合对开源模型进行微调,以提高大模型的空间语义理解能力。此外,本文尝试结合开源模型和闭源模型的能力处理不同类型的样本。实验结果显示,本文所采用的策略有效地提高了大模型在空间语义理解任务上的性能。”
+ 2024.ccl-3.13
+ zho
+ hongyan-etal-2024-ji
+
+
+ The Fourth Evaluation on Chinese Spatial Cognition
+ XiaoLiming
+ HuNan
+ ZhanWeidong
+ QinYuhang
+ DengSirui
+ SunChunhui
+ CaiQixu
+ LiNan
+ 122–134
+ “The Fourth Chinese Spatial Cognition Evaluation Task (SpaCE 2024) presents the first comprehensive Chinese benchmark to assess spatial semantic understanding and reasoning capabilities of Large Language Models (LLMs). It comprises five subtasks in the form of multiple-choice questions: (1) identifying spatial semantic roles; (2) retrieving spatial referents; (3) detecting spatial semantic anomalies; (4) recognizing synonymous spatial expression with different forms; (5) conducting spatial position reasoning. In addition to proposing new tasks, SpaCE 2024 applied a rule-based method to generate high-quality synthetic data with difficulty levels for the reasoning task. 12 teams submitted their models and results, and the top-performing team attained an accuracy of 60.24%, suggesting that there is still significant room for current LLMs to improve, especially in tasks requiring high spatial cognitive processing.”
+ 2024.ccl-3.14
+ eng
+ liming-etal-2024-fourth
+
+
+ 面向中文抽象语义表示解析的大模型评估与增强
+ ChenRongbo荣波陈
+ PeiZhenwu振武裴
+ BaiXuefeng雪峰白
+ ChenKehai科海陈
+ ZhangMin民张
+ 135–142
+ “本文介绍了我们在第二十三届中文计算语言学大会中文抽象语义表示解析评测任务中提交的参赛系统。中文抽象语义表示(Chinese Abstract Meaning Representa-tion,CAMR)以一个单根可遍历的有向无环图表示中文句子的语义。本系统选择大语言模型作为解决方案。我们首先系统地评估了当下中文大语言模型在AMR解析任务上的性能,在此基础上基于图融合算法整合性能较高的大模型预测结果,最终得到预测的CAMR图。实验结果表明,1)现有大模型已经具备一定的少样本中文AMR解析能力;2)基于微调中文大模型的AMR解析系统能够取得相较以往最优系统更强的性能;3)图融合算法能够进一步增强基于大模型的CAMR解析系统的性能。”
+ 2024.ccl-3.15
+ zho
+ rongbo-etal-2024-mian
+
+
+ 混合 LoRA 专家的中文抽象语义表示解析框架
+ WuZihao梓浩吴
+ YinHua华尹
+ GaoZiqian子千高
+ ZhangJiajia佳佳张
+ JiYuelei跃蕾季
+ TangKuntian堃添唐
+ 143–153
+ “本文介绍了我们在第二十三届中国计算语言学大会中文抽象语义表示解析评测任务中提交的参赛系统。抽象语义表示 (Abstract Meaning Representation,AMR) 使用有向无环图对句子进行建模,以语义概念作为节点,关系标签作为边,表示一个句子的语义。我们受到结合语法信息的 AMR 解析研究的启发,提出混合 LoRA(Low-Rank Adaption) 专家的 CAMR 解析框架,该框架包含一个由大型语言模型微调而来的基础 CAMR 解析器和 4 个句类专家和 1 个古汉语 LoRA 专家模型。最终,本文所提出的框架在三个评测数据集中均取得了最好的成绩。”
+ 2024.ccl-3.16
+ zho
+ zihao-etal-2024-hun
+
+
+ A Two-stage Generative Chinese AMR Parsing Method Based on Large Language Models
+ ShenZizhuo
+ ShaoYanqiu
+ LiWei
+ 154–159
+ “The purpose of the CAMR task is to convert natural language into a formalized semantic representation in the form of a graph structure. Due to the complexity of the AMR graph structure, traditional AMR automatic parsing methods often require the design of complex models and strategies. Thanks to the powerful generative capabilities of LLMs, adopting an autore-gressive generative approach for AMR parsing has many advantages such as simple modeling and strong extensibility. To further explore the generative AMR automatic parsing technology based on LLMs, we design a two-stage AMR automatic parsing method based on LLMs in this CAMR evaluation. Specifically, we design two pipeline subtasks of alignment-aware node generation and relationship-aware node generation to reduce the difficulty of LLM understanding and generation. Additionally, to boost the system’s transferability, we incorporate a retrieval-augmented strategy during both training and inference phases. The experimental results show that the method we proposed has achieved promising results in this evaluation.”
+ 2024.ccl-3.17
+ eng
+ zizhuo-etal-2024-two
+
+
+ The Fourth Chinese Abstract Meaning Representation Parsing Evaluation
+ XuZhixing
+ ZhangYixuan
+ LiBin
+ ZhouJunsheng
+ QuWeiguang
+ 160–171
+ “Abstract Meaning Representation has become a key research area in sentence-level semantic parsing within natural language processing. Substantial progress has been achieved in various NLP tasks using AMR. This paper presents the fourth Chinese Abstract Meaning Representation parsing evaluation, held during the technical evaluation task workshop at CCL 2024. The evaluation also introduced a new test set comprising Ancient Chinese sentences. Results indicated decent performance, with the top team achieving an F1 of 0.8382 in the open modality, surpassing the previous record at CoNLL 2020 by 3.30 percentage points under the MRP metric. However, current large language models perform poorly in AMR parsing of Ancient Chinese, highlighting the need for effective training strategies. The complex syntax and semantics of Ancient Chinese pose significant challenges. Additionally, optimizing transfer learning techniques to better apply knowledge from Chinese Mandarin to Ancient Chinese parsing is crucial. Only through continuous innovation and collaboration can significant advancements in both Ancient Chinese and Chinese Mandarin AMR parsing be achieved.”
+ 2024.ccl-3.18
+ eng
+ zhixing-etal-2024-fourth
+
+
+ 基于大小模型结合与半监督自训练方法的古文事件抽取
+ FuWeiwei薇薇付
+ WangShiquan士权王
+ FangRuiyu瑞玉方
+ LiMengxiang孟祥李
+ HeZhongjiang忠江何
+ LiYongxiang永翔李
+ SongShuangyong双永宋
+ 172–177
+ “本文描述了队伍“TeleAI”在CCL2024古文历史事件类型抽取评测任务(CHED2024)中提交的参赛系统。该任务旨在自动识别出古代文本中的事件触发词与事件类型,其中事件类型判别被分为粗粒度和细粒度的事件类型判别两部分。为了提高古文历史事件类型抽取的性能,我们结合了大模型和小模型,并采用了半监督自训练的方法。在最终的评估中,我们在触发词识别任务得分0.763,粗粒度事件类型判别任务得分0.842,细粒度事件类型判别任务得分0.779,综合得分0.791,在所有单项任务和综合评分上均排名第一。”
+ 2024.ccl-3.19
+ zho
+ weiwei-etal-2024-ji
+
+
+ Multi-Model Classical Chinese Event Trigger Word Recognition Driven by Incremental Pre-training
+ LinLitao
+ WuMengcheng
+ ShenXueying
+ ZhouJiaxin
+ OuShiyan
+ 178–190
+ “This paper addresses the task of identifying and classifying historical event trigger words in Classical Chinese, utilizing both small-scale and large-scale language models. Specifically, we selected the small-scale language model GujiBERT for intelligent processing of classical texts, and the large-scale language model Xunzi-Qwen-14b. Both models underwent continued pretraining and fine-tuning, resulting in GujiBERT-CHED-mlm and Xunzi-Qwen-14b-CHED. For the small-scale language model, we used a BiLSTM as the feature extraction module and a CRF as the decoding module, employing a sequence labeling paradigm to complete the evaluation experiments. For the large-scale language model, we optimized the prompt templates and used a sequence-to-sequence paradigm for evaluation experiments. Our experiments revealed that GujiBERT-BiLSTM-CRF achieved the best performance across all tasks, ranking fourth in overall performance among all participating teams. The large-scale language model demonstrated good semantic understanding abilities, reaching a preliminary usable level. Future research should focus on enhancing its ability to produce standardized outputs.”
+ 2024.ccl-3.20
+ eng
+ litao-etal-2024-multi
+
+
+ 基于增量预训练与外部知识的古文历史事件检测
+ KangWenjun文军康
+ ZuoJiali家莉左
+ HuYiyu益裕胡
+ WangMingwen明文王
+ 191–200
+ “古文历史事件检测任务旨在识别文本中的事件触发词和类型。为了解决传统pipeline方法容易产生级联错误传播,以及大多数事件检测方法仅依赖句子层面信息的问题,本文提出了一种结合外部信息和全局对应矩阵的联合抽取模型EIGC,以实现触发词和事件类型的精确抽取。此外,本文还整理了一个包含“二十四史”等古汉语文献的数据集,共计约97万条古汉语文本,并利用该文本对BERT-Ancient-Chinese进行增量预训练。最终,本文所提出的模型在三个任务上的总F1值达到了76.2%,验证了该方法的有效性。”
+ 2024.ccl-3.21
+ zho
+ wenjun-etal-2024-ji
+
+
+ Classical Chinese Historical Event Detection Evaluation
+ FengZhenbing
+ LiWei
+ ShaoYanqiu
+ 201–209
+ “Event detection involves identifying and extracting event information from natural language texts. The complex syntax and semantics of Classical Chinese, coupled with its limited usage, pose significant challenges for information extraction tasks on classical Chinese texts. At the 23rd China National Conference on Computational Linguistics (CCL 2024), we launched an evaluation task focused on the extraction of historical events from Classical Chinese. We used our constructed Classical Chinese Historical Event Logical Schema to identify event triggers and classify event types. The evaluation utilized the Classical Chinese Historical Event Detection Dataset (CHED), annotated from The Twenty-Four Histories corpus, with the aim of enhancing event extraction technologies and advancing the digital study of classical Chinese historical texts. The evaluation included two subtasks and attracted 28 teams, with 15 teams submitting valid results. In the subtask of trigger identification, the best-performing system achieved an Exact match score of 63.6%. In the subtasks of coarse-grained and fine-grained event type classification, the top systems achieved F1-scores of 84.5% and 81.4%, respectively.”
+ 2024.ccl-3.22
+ eng
+ zhenbing-etal-2024-classical
+
+
+ A Unified Multi-Task Learning Model for Chinese Essay Rhetoric Recognition and Component Extraction
+ FangQin
+ ZhangZheng
+ WangYifan
+ PengXian
+ 210–216
+ “In this paper, we present our system at CCL24-Eval Task 6: Chinese Essay Rhetoric Recognition and Understanding (CERRU). The CERRU task aims to identify and understand the use of rhetoric in student writing. The evaluation set three tracks to examine the recognition of rhetorical form, rhetorical content, and the extract of rhetorical components. Considering the potential correlation among the track tasks, we employ the unified multi-task learning architecture to fully incorporate the inherent interactions among the related tasks to improve the overall performance and to complete the above 3 track tasks with a single model. Specifically, the framework mainly consists of four sub-tasks: rhetorical device recognition, rhetorical form recognition, rhetorical content recognition, and rhetorical component extraction. The first three tasks are regarded as multi-label classification tasks, and the last task is regarded as an entity recognition task. The four tasks leverage potential information transfer to achieve fusion learning. Finally, the above four sub-tasks are integrated into a unified model through parameter sharing. In the final evaluation results, our system ranked fourth with a total score of 60.14, verifying the effectiveness of our approach.”
+ 2024.ccl-3.23
+ eng
+ qin-etal-2024-unified
+
+
+ 中小学作文修辞识别与理解
+ ZhaoLiang亮赵
+ WuWeixuan伟轩武
+ YuHao浩余
+ LuWenbin文斌鲁
+ 217–222
+ “本技术报告是对2024CCL评测任务(中小学作文修辞识别与理解评测)的一种解决方案。在中小学生的学习过程中,修辞手法不仅是阅读理解和写作技巧的核心组成部分,同时也是塑造优秀文学作品的不可或缺的元素。识别并理解学生作文中的修辞使用,可以帮助学生提高作文表达能力,指导学生更高质量的叙述和描写。对修辞的识别目前属于自然理解领域比较困难的任务,因为需要用到人类领域的大量先验知识,而且很多时候不同的修辞之间的边界还是模糊的。我们通过lora技术直接微调基于qwen-chat-7B的大语言预训练模型,来进行修辞类别的识别。我们的主要创新技术点为:基于相同的输入输出数据来构造多条训练数据提升算法表现;分级分层来进行修辞的判断,先进行大的修辞类别判断,再把大的修辞类别做为输入对修辞的子类别进行判断;针对修辞成分抽取的任务,直接输出对应的结果文本,再对应回原文本进行位置检索,而不是直接输出索引下标。”
+ 2024.ccl-3.24
+ zho
+ liang-etal-2024-zhong
+
+
+ Essay Rhetoric Recognition and Understanding Using Synthetic Data and Model Ensemble Enhanced Large Language Models
+ SongJinwang
+ ZanHongying
+ ZhangKunli
+ 223–231
+ “Natural language processing technology has been widely applied in the field of education. Essay writing serves as a crucial method for evaluating students’ language skills and logical thinking abilities. Rhetoric, an essential component of essay, is also a key reference for assessing writing quality. In the era of large language models (LLMs), applying LLMs to the tasks of automatic classification and extraction of rhetorical devices is of significant importance. In this paper, we fine-tune LLMs with specific instructions to adapt them for the tasks of recognizing and extracting rhetorical devices in essays. To further enhance the performance of LLMs, we experimented with multi-task fine-tuning and expanded the training dataset through synthetic data. Additionally, we explored a model ensemble approach based on label re-inference. Our method achieved a score of 66.29 in Task 6 of the CCL 2024 Eval, Chinese Essay Rhetoric Recognition and Understanding(CERRU), securing the first position.”
+ 2024.ccl-3.25
+ eng
+ jinwang-etal-2024-essay
+
+
+ 基于深度学习模型的中小学作文修辞识别与理解评测
+ LiChenyang晨阳李
+ ZhangLong龙张
+ ZhengQiusheng秋生郑
+ 232–239
+ “在中小学生的学习进程中,修辞手法是阅读和写作技巧的核心,也是优秀文学作品的关键元素。然而,识别与理解学生文章中的修辞使用需要大量的人工,为教师的作文评估和教学提出了挑战。最近的研究开始使用计算机技术来自动评审作文,其中修辞的使用是评估的重要部分。本文介绍了我们在第二十三届中文计算语言大会中中小学作文修辞识别与理解评测中的所用的参赛方法。在本次评测中,我们针对不同任务,分别使用了传统模型分类模型和大模型,再利用伪标签、数据增强等方法提升模型性能。实验结果表明,我们的方法取得了较为先进的效果。”
+ 2024.ccl-3.26
+ zho
+ chenyang-etal-2024-ji-yu
+
+
+ 人类思维指导下大小模型协同决策的中文修辞识别与理解方法
+ WangWen雯王
+ TangSiyi思怡汤
+ YuDong东于
+ LiuPengyuan鹏远刘
+ 240–252
+ “CCL24-Eval任务6提出了一个多层次、细粒度中小学作文修辞识别与理解任务。针对任务特点,本文提出了人类思维指导下大小模型协同决策的中文修辞识别与理解方法。该方法根据人类在面对修辞识别和理解任务时的处理思路,将任务顺序重新定义,并分别选取大小语言模型,使每个步骤的实现效果均达到局部最优,以局部最优达到整体任务的最优效果。结果表明,本文提出的方法能够有效对修辞进行识别与理解,在三个赛道上相较于Baseline方法分别提升了13.54、4.03、57.11。”
+ 2024.ccl-3.27
+ zho
+ wen-etal-2024-ren
+
+
+ Chinese Essay Rhetoric Recognition and Understanding (CERRU)
+ LiuNuowei
+ ChenXinhao
+ RenYupei
+ LanMan
+ BaiXiaopeng
+ WuYuanbin
+ MaoShaoguang
+ XiaYan
+ 253–261
+ “Rhetoric is fundamental to the reading comprehension and writing skills of primary and middle school students. However, current work independently recognize single coarse-grained categories or fine-grained categories. In this paper, we propose the CCL24-Eval Task6: Chinese Essay Rhetoric Recognition and Understanding (CERRU), consisting of 3 tracks: (1) Fine-grained Form-level Categories Recognition, (2) Fine-grained Content-level Categories Recognition and (3) Rhetorical Component Extraction. A total of 32 teams registered to participate in CERRU and 9 teams submitted evaluation results, with 7 of these teams achieving an overall score that surpassed the baseline.”
+ 2024.ccl-3.28
+ eng
+ nuowei-etal-2024-chinese
+
+
+ Assessing Essay Fluency with Large Language Models
+ WuHaihong
+ AoChang
+ NiShiwen
+ 262–268
+ “With the development of education and the widespread use of the internet, the scale of essay evaluation has increased, making the cost and efficiency of manual grading a significant challenge. To address this, The Twenty-third China National Conference on Computational Linguistics (CCL2024) established evaluation contest for essay fluency. This competition has three tracks corresponding to three sub-tasks. This paper conducts a detailed analysis of different tasks,employing the BERT model as well as the latest popular large language models Qwen to address these sub-tasks. As a result, our overall scores for the three tasks reached 37.26, 42.48, and 47.64.”
+ 2024.ccl-3.29
+ eng
+ haihong-etal-2024-assessing
+
+
+ Multi-Error Modeling and Fluency-Targeted Pre-training for Chinese Essay Evaluation
+ ZhangJingshen
+ YangXiangyu
+ SuXinkai
+ ChenXinglu
+ HuangTianyou
+ QiuXinying
+ 269–277
+ “This system report presents our approaches and results for the Chinese Essay Fluency Evaluation (CEFE) task at CCL-2024. For Track 1, we optimized predictions for challenging fine-grained error types using binary classification models and trained coarse-grained models on the Chinese Learner 4W corpus. In Track 2, we enhanced performance by constructing a pseudo-dataset with multiple error types per sentence. For Track 3, where we achieved first place, we generated fluency-rated pseudo-data via back-translation for pretraining and used an NSP-based strategy with Symmetric Cross Entropy loss to capture context and mitigate long dependencies. Our methods effectively address key challenges in Chinese Essay Fluency Evaluation.”
+ 2024.ccl-3.30
+ eng
+ jingshen-etal-2024-multi
+
+
+ 中小学作文语法错误检测、病句改写与流畅性评级的自动化方法研究
+ TianWei巍田
+ 278–284
+ “本研究旨在提高中小学生作文评改的质量和效率,通过引入先进的自然语言处理模型进行作文病句检测、纠正和流畅性评分,并分别针对三个具体的任务进行了模型构建。在任务一中,提出语法错误替换方法进行数据增强,接着基于UTC模型对语病类型进行识别。在任务二中,融合了预训练的BART模型和SynGEC策略进行文本纠错,充分利用了BART的生成能力和SynGEC的语法纠错特性。任务三中,基于TextRCNN-NEZHA模型进行作文流畅性的评级,构建了一个能够综合语义信息的分类器。经评测,本文提出的方法在任务一和任务二中均位列第一,任务三位列第二,即提出的方法可以有效地识别病句类型和纠正作文中的病句,并给出合理的作文流畅性评级。”
+ 2024.ccl-3.31
+ zho
+ wei-2024-zhong
+
+
+ Prompting GPT-4 for Chinese Essay Fluency Evaluation
+ ZhangDan
+ HoangThuong
+ ZhuYe
+ 285–293
+ “This report presents the methodology and results of utilizing GPT-4 for CCL24-Eval Task 7 of Chinese Essay Fluency Evaluation (CEFE). The task is divided into three tracks: Identification of Error Sentence Types, Rewriting Error Sentences, and Essay Fluency Rating. We employed a few-shot prompt engineering to guide GPT-4 in performing this task. Our approach integrated fine-grained error analysis with advanced NLP techniques to provide detailed, actionable feedback for students and teachers. Despite some successes, particularly in generating semantically similar and syntactically relevant corrections, our analysis revealed significant challenges, especially in multiple-label classification and the accurate identification of error types. The report discusses these findings and suggests areas for further improvement.”
+ 2024.ccl-3.32
+ eng
+ dan-etal-2024-prompting
+
+
+ 基于大模型数据增强的作文流畅性评价方法
+ PengQianwen倩雯彭
+ GaoYanzipeng延子鹏高
+ LiXiaoqing晓青李
+ MinFanke凡珂闵
+ LiMingrui明锐李
+ WangZhichun志春王
+ LiuTianyun天昀刘
+ 294–301
+ “CCL2024-Eval任 务7为 中 小 学 生 作 文 流 畅 性 评 价 (Chinese Essay Fluency Evalua-tion,CEFE),该任务定义了三项重要且富有挑战性的问题,包括中小学作文病句类型识别、中小学作文病句改写、以及中小学作文流畅性评级。本队伍参加了评测任务7的三项子任务,分别获得了45.19、43.90和45.84的得分。本报告详细介绍本队伍在三个子任务上采用的技术方法,并对评测结果进行分析。”
+ 2024.ccl-3.33
+ zho
+ qianwen-etal-2024-ji
+
+
+ Chinese Essay Fluency Evaluation (CEFE) Task
+ ZhuangXinlin
+ ShenXinshu
+ WuHongyi
+ LanMan
+ BaiXiaopeng
+ WuYuanbin
+ ZhouAimin
+ MaoShaoguang
+ 302–310
+ “This paper presents a detailed review of Task 7 in the CCL24-Eval: the second Chinese Essay Fluency Evaluation (CEFE). The task aims to identify fine-grained grammatical errors that impair readability and coherence in essays authored by Chinese primary and secondary school students, evaluate the essays’ fluency levels, and recommend corrections to improve their written fluency. The evaluation comprises three tracks: (1) Coarse-grained and fine-grained error identification; (2) Error sentence rewriting; and (3) Essay Fluency Level Recognition. We garnered 29 completed registrations, resulting in 180 submissions from 10 dedicated teams. The paper discusses the submissions and analyzes the results from all participating teams.”
+ 2024.ccl-3.34
+ eng
+ xinlin-etal-2024-chinese
+
+
+ A Two-stage Prompt-Based Strategy for CRMUS Track 1
+ ChenMosha
+ 311–319
+ “Large Language Model (LLM) has sparked a new trend in Natural Language Processing, and an increasing number of researchers have recognized the potential of using LLM to unify diverse NLP tasks into a text-generative manner. To explore the potential of LLM for the children’s stories domain, CCL2024 has released the Commonsense Reasoning and Moral Understanding in Children’s Stories (CRMUS) task. This paper presents a straightforward yet effective two-stage prompt-based strategy for the CRMUS Track 1. In the initial stage, we use the same prompt to obtain responses from GPT-4, ERNIE-4, and Qwen-Max. In the subsequent stage, we implement a voting mechanism based on the results from the first stage. For records with inconsistent outcomes, we query GPT-4 for secondary confirmation to determine the final result. Experimental results indicate that our method achieved an average score of 79.27, securing first place in the closed domain among ten participating teams, thereby demonstrating the effectiveness of our approach.”
+ 2024.ccl-3.35
+ eng
+ mosha-2024-two
+
+
+ 基于指令微调与数据增强的儿童故事常识推理与寓意理解研究
+ YuBohan博涵于
+ LiYunlong云龙李
+ LiuTao涛刘
+ ZhengAoze傲泽郑
+ ZhangKunli坤丽张
+ ZanHongying红英昝
+ 320–326
+ “尽管现有语言模型在自然语言处理任务上表现出色,但在深层次语义理解和常识推理方面仍有提升空间。本研究通过测试模型在儿童故事常识推理与寓意理解数据集(CRMUS)上的性能,探究如何增强模型在复杂任务中的能力。在本次任务的赛道二中,本研究使用多个7B以内的开源大模型(如Qwen、InternLM等)进行零样本推理,并选择表现最优的模型基于LoRA进行指令微调来提高其表现。除此之外,本研究还对数据集进行了分析与增强。研究结果显示,通过设计有效的指令格式和调整LoRA微调参数,模型在常识推理和寓意理解上的准确率显著提高。最终在本次任务的赛道二中取得第一名的成绩,该任务的评价指标Acc值为74.38,达到了较为先进的水准。”
+ 2024.ccl-3.36
+ zho
+ bohan-etal-2024-ji
+
+
+ Exploring Faithful and Informative Commonsense Reasoning and Moral Understanding in Children’s Stories
+ WangZimu
+ YuqiWang
+ HanNijia
+ ChenQi
+ ZhangHaiyang
+ PanYushan
+ WangQiufeng
+ WangWei
+ 327–335
+ “Commonsense reasoning and moral understanding are crucial tasks in artificial intelligence (AI) and natural language processing (NLP). However, existing research often falls short in terms of faithfulness and informativeness during the reasoning process. We propose a novel framework for performing commonsense reasoning and moral understanding using large language models (LLMs), involving constructing guided prompts by incorporating relevant knowledge for commonsense reasoning and extracting facts from stories for moral understanding. We conduct extensive experiments on the Commonsense Reasoning and Moral Understanding in Children’s Stories (CRMUS) dataset with widely recognised LLMs under both zero-shot and fine-tuning settings, demonstrating the effectiveness of our proposed method. Furthermore, we analyse the adaptability of different LLMs in extracting facts for moral understanding performance.”
+ 2024.ccl-3.37
+ eng
+ zimu-etal-2024-exploring
+
+
+ 基于提示工程和思维链的提示词构造
+ LuoYun允罗
+ FengYi毅冯
+ JingLiping丽萍景
+ 336–345
+ “儿童故事常识推理与寓意理解评测任务旨在从常识推理和寓意理解两个任务多角度评价中文预训练语言模型和大型语言模型的常识推理和故事理解能力,这考察了模型的常识储备能力以及对文本内容的深入理解能力,因此极具挑战性。随着大语言模型的发展,其卓越的指令跟随能力显著提升了自然语言处理任务的效率和效果。然而,这也对提示词的设计提出了更高的要求,因为提示词的质量直接影响了大模型的表现和预测结果的准确性。因此,设计有效的提示词变得尤为重要,不仅需要理解任务的具体需求,还要具备对语言模型的深入认识和灵活运用能力。本文针对儿童故事常识推理与寓意理解评测赛道一的两个任务,提出了一种基于提示工程的提示词构造方法。首先,我们提出了一种基于融合提示工程、思维链的通用提示词构建框架;然后,我们针对具体的任务调整对应的提示词模板;最后,结合语言模型使用这些提示词进行结果预测。在本次评测中,我们的方法在赛道一的封闭数据条件下获得了第三名的成绩,这验证了我们方法的有效性,并展示了其在自然语言理解领域的应用潜力。”
+ 2024.ccl-3.38
+ zho
+ yun-etal-2024-ji
+
+
+ Evaluation of Commonsense Reasoning and Moral Understanding in Children’s Stories
+ YanGuohang
+ LiangFeihao
+ GuoYaxin
+ TanHongye
+ LiRu
+ ZhangHu
+ 346–352
+ “This paper provides a comprehensive review of the the CCL24-Eval Task 8: Commonsense Reasoning and Moral Understanding in Children’s Stories(CRMUS). This task has designed two sub-tasks, which aim to assess the commonsense reasoning and implicit meaning comprehension capabilities of Large Language Models(LLMs). We heve received registration forms from 33 teams, 15 of which submitted final results that exceeded the baseline score. We present the results of the top 5 teams and our analysis of these results.”
+ 2024.ccl-3.39
+ eng
+ guohang-etal-2024-evaluation
+
+
+ Bridging the Gap between Authentic and Answer-Guided Images for Chinese Vision-Language Understanding Enhancement
+ WangFeiyu
+ GuoWenyu
+ YuDong
+ KangChen
+ LiuPengyuan
+ 353–362
+ “The objective of the Chinese Vision-Language Understanding Evaluation (CVLUE) is to comprehensively assess the performance of Chinese vision-language multimodal pre-trained models in multimodal modeling and understanding across four tasks: Image-Text Retrieval, Visual Question Answering, Visual Grounding, and Visual Dialog. To enhance the models’ performance across various multimodal tasks, this paper propose a multimodal information understanding enhancement method based on answer-guided images. Firstly, we propose task-specific methods for answer-guided image generation. Secondly, the authentic and answer-guided images are fed into the model for multimodal fine-tuning, respectively. Finally, training objectives are set for different tasks to minimize the gap between the answer-guided images and authentic images, thereby supervising the results produced by the authentic images utlizing answer-guided images. The experimental results demonstrate the effectiveness of the proposed method.”
+ 2024.ccl-3.40
+ eng
+ feiyu-etal-2024-bridging
+
+
+ Chinese Vision-Language Understanding Evaluation
+ WangJiangkuo
+ ZhengLinwei
+ ChenKehai
+ BaiXuefeng
+ ZhangMin
+ 363–373
+ “This paper introduces our systems submitted for the Chinese Vision-Language Understanding Evaluation task at the 23rd Chinese Computational Linguistics Conference.In this competition, we utilized X2-VLM and CCLM models to participate in various subtasks such as image-text retrieval, visual grounding, visual dialogue, and visual question answering. Additionally, we employed other models to assess performance on certain subtasks. We optimized our models and successfully applied them to these different tasks”
+ 2024.ccl-3.41
+ eng
+ jiangkuo-etal-2024-chinese
+
+
+ 中文图文多模态理解评测
+ WangYuxuan宇轩王
+ LiuYijun议骏刘
+ WanZhiguo志国万
+ CheWanxiang万翔车
+ 374–381
+ “中文图文多模态理解评测任务旨在从多角度评价中文图文多模态预训练模型的图文多模态建模和理解能力。本任务共包括五个子任务:图片检索、文本检索、视觉问答、视觉定位和视觉对话,最终成绩根据这五个任务的得分综合计算。本文首先介绍了任务的背景和动机,然后从任务介绍、评价指标、比赛结果、参赛方法等方面介绍并展示了本次评测任务的相关信息。本次任务共有11支队伍报名参赛,其中3支队伍提交了结果。”
+ 2024.ccl-3.42
+ zho
+ yuxuan-etal-2024-zhong
+
+
+ 维沃手语数字人翻译系统
+ HeJunyuan俊远何
+ LiuXin鑫刘
+ YangMurong牧融杨
+ LiXiaolong小龙李
+ HuangXuming旭铭黄
+ TengFei飞滕
+ ChenXiaoxin晓昕陈
+ FuFan凡付
+ 382–392
+ “本文介绍了我们在第二十四届中国计算语言学大会手语数字人翻译质量评测中提交的参赛系统。本次评测任务旨在评测手语数字人将汉语翻译成中国手语方面的自然性和准确性。本文介绍的手语数字人翻译系统首先通过手语翻译算法将汉语文本翻译成手语文本,然后将手语文本对应的手语动作单元运用动作融合算法合成为自然、完整的手语数字人动作,同时借助面部驱动算法将口型、表情等非语言元素自然地融入手语合成中,实现带微表情的和唇形同步的手语数字人。最终,我们在官方手语数字人翻译质量的人工评测集上取得了3.513的综合评分,获得了该任务第一名的成绩。”
+ 2024.ccl-3.43
+ zho
+ junyuan-etal-2024-wei
+
+
+ 结合LLM与3D动画技术的手语数字人系统
+ YangYang阳杨
+ ZhangYing颖张
+ HuangKaiyu锴宇黄
+ XuJinan金安徐
+ 393–404
+ “手语翻译(Sign Language Translation, SLT)系统作为一种重要的辅助技术,为听障人士提供了与他人沟通的有效途径。然而,传统手语翻译系统在准确性、流畅性差等方面存在问题。本文提出了一种结合大语言模型(Large Language Model, LLM)和3D动画技术(3D Animation Technology)的手语翻译系统,旨在克服这些局限,提高翻译的准确性和流畅性。本文详细介绍了系统的设计与实现过程,包括提示词设计、数据处理方法以及手语数字人翻译系统的实现。实验结果表明,采用LLM方法在手语翻译中能够生成较为自然和准确的结果。在标准评估和人工评估的两种评估方法下,本系统在大多数情况下能够较好地完成手语翻译任务,性能优于传统方法。本文的研究为进一步改进手语翻译系统提供了有益的参考和启示。”
+ 2024.ccl-3.44
+ zho
+ yang-etal-2024-jie
+
+
+ Translation Quality Evaluation of Sign Language Avatar
+ ZhaoYuan
+ ZhangRuiquan
+ YaoDengfeng
+ ChenYidong
+ 405–415
+ “Sign Language Avatar technology aims to create virtual agents capable of communicating with deaf individuals through sign language, similar to the text dialogue agent ChatGPT but focusing on sign language communication. Challenges in sign language production include limited dataset sizes, information loss due to reliance on intermediate representations, and insufficient realism in generated actions. In this event, we particularly focus on the ability of the Sign Language Avatar to translate spoken language text into sign language that is easily understood by deaf individuals. As the first sign language avatar event held by the China National Conference on Computational Linguistics(CCL), this event attracted wide attention from both industry and academia, with 14 teams registering and 10 of them submitting their system interfaces on time. We provided a dataset consisting of 1074 text-video parallel sentence pairs for training, and the evaluation team comprised proficient Chinese sign language users and professional sign language translators. The scoring method employed a comprehensive evaluation based on multiple metrics, focusing primarily on sign language grammar accuracy, naturalness, readability, and cultural adaptability. The final scores were determined by considering performance across these four aspects. The final scores, taking into account these four aspects, showed that four teams demonstrated good readability, with Vivo Mobile Communication Co., Ltd. ranking first with a score of 3.513 (out of a full score of 5), leading the baseline model by 1.394 points. According to the analysis of the results, most teams used the traditional method of converting text into Gloss sequences before generating sign language. Additionally, some teams experimented with emerging methods, including gloss-free end-to-end training and Large Language Model(LLMs) prompt learning, which also achieved promising results. We anticipate that this event will promote the development of sign language avatar technology and provide higher-quality communication tools for the deaf community. For more information on this task, please visit the website of the CCL24-Eval: Translation Quality Evaluation of Sign Language Avatar Task.”
+ 2024.ccl-3.45
+ eng
+ yuan-etal-2024-translation
+
+
+