Papers about keyphrase generation and extraction
Di Wu and Kai-Wei Chang
Keyphrases are the phrases that identify the most salient concepts in a document. Keyphrase extraction and keyphrase generation are fundamental tasks connected with numerous NLP and IR applications. In this repo, we summarize influential keyphrase-related papers and resources. Papers are selected from *ACL, EMNLP, AAAI, SIGIR, and other related conferences. If you have any suggestions or would like to add some papers, please submit an issue or a pull request. Your contribution is much appreciated! KeyphraseExtractionSurvey and Awesome-Keyphrase-Prediction are two other great repositories. Also check them out if you are interested.
- Deep Keyphrase Generation (Meng et al., ACL 2017)
- Keyphrase Generation with Correlation Constraints (Chen et al., EMNLP 2018)
- Semi-Supervised Learning for Neural Keyphrase Generation (Ye & Wang, EMNLP 2018)
- Title-Guided Encoding for Keyphrase Generation (Chen et al., AAAI 2019)
- An Integrated Approach for Keyphrase Generation via Exploring the Power of Retrieval and Extraction (Chen et al., NAACL 2019)
- Incorporating Linguistic Constraints into Keyphrase Generation (Zhao & Zhang, ACL 2019)
- Topic-Aware Neural Keyphrase Generation for Social Media Language (Wang et al., ACL 2019)
- Neural Keyphrase Generation via Reinforcement Learning with Adaptive Rewards (Chan et al., ACL 2019)
- Diverse Keyphrase Generation with Neural Unlikelihood Training (Bahuleyan & El Asri, COLING 2020)
- One Size Does Not Fit All: Generating and Evaluating Variable Number of Keyphrases (Yuan et al., ACL 2020)
- Exclusive Hierarchical Decoding for Deep Keyphrase Generation (Chen et al., ACL 2020)
- A Preliminary Exploration of GANs for Keyphrase Generation (Swaminathan et al., EMNLP 2020)、
- Keyphrase Generation with GANs in Low-Resources Scenarios (Lancioni et al., sustainlp 2020)
- SGG: Learning to Select, Guide, and Generate for Keyphrase Generation (Zhao et al., NAACL 2021)
- UniKeyphrase: A Unified Extraction and Generation Framework for Keyphrase Prediction (Wu et al., ACL Findings 2021)
- One2Set: Generating Diverse Keyphrases as a Set (Ye et al., ACL-IJCNLP 2021)
- Select, Extract and Generate: Neural Keyphrase Generation with Layer-wise Coverage Attention (Ahmad et al., ACL-IJCNLP 2021)
- Keyphrase Generation with Fine-Grained Evaluation-Guided Reinforcement Learning (Luo et al., EMNLP Findings 2021)
- Heterogeneous Graph Neural Networks for Keyphrase Generation (Ye et al., EMNLP 2021)
- Structure-Augmented Keyphrase Generation (Kim et al., EMNLP 2021)
- HTKG: Deep Keyphrase Generation with Neural Hierarchical Topic Guidance (Zhang et al., SIGIR 2022)
- Fast and Constrained Absent Keyphrase Generation by Prompt-Based Learning (Wu et al., AAAI 2022)
- Unsupervised Deep Keyphrase Generation (Shen et al., AAAI 2022)
- Automatic Keyphrase Generation by Incorporating Dual Copy Mechanisms in Sequence-to-Sequence Learning (Wang et al., COLING 2022)
- Retrieval-Augmented Multilingual Keyphrase Generation with Retriever-Generator Iterative Training (Gao et al., NAACL Findings 2022)
- Learning Rich Representation of Keyphrases from Text (Kulkarni et al., NAACL Findings 2022)
- WR-One2Set: Towards Well-Calibrated Keyphrase Generation (Xie et al., EMNLP 2022)
- Keyphrase Generation via Soft and Hard Semantic Corrections (Zhao et al., EMNLP 2022)
- Representation Learning for Resource-Constrained Keyphrase Generation (Wu et al., EMNLP Findings 2022)
- KPDROP: Improving Absent Keyphrase Generation (Ray Chowdhury et al., EMNLP Findings 2022)
- Keyphrase Generation Beyond the Boundaries of Title and Abstract (Garg et al., EMNLP Findings 2022)
- Unsupervised Open-domain Keyphrase Generation (Do et al., ACL 2023)
- Data Augmentation for Low-Resource Keyphrase Generation (Garg et al., ACL Findings 2023)
- General-to-Specific Transfer Labeling for Domain Adaptable Keyphrase Generation (Meng et al., ACL Findings 2023)
- Rethinking Model Selection and Decoding for Keyphrase Generation with Pre-trained Sequence-to-Sequence Models (Wu et al., EMNLP 2023)
- SimCKP: Simple Contrastive Learning of Keyphrase Representations (Choi et al., EMNLP Findings 2023)
- Improving Low-Resource Keyphrase Generation through Unsupervised Title Phrase Generation (Kang & Shin, LREC-COLING 2024)
- Keyphrase Generation: Lessons from a Reproducibility Study (Thomas & Vajjala, LREC-COLING 2024)
- On Leveraging Encoder-only Pre-trained Language Models for Effective Keyphrase Generation (Wu et al., LREC-COLING 2024)
- Improving Absent Keyphrase Generation with Diversity Heads (Thomas & Vajjala, NAACL Findings 2024)
- One2Set + Large Language Model: Best Partners for Keyphrase Generation (Shao et al., EMNLP 2024)
- Unsupervised Domain Adaptation for Keyphrase Generation using Citation Contexts (Boudin and Aizawa, EMNLP Findings 2024)
- MetaKP: On-Demand Keyphrase Generation (Wu et al., EMNLP Findings 2024)
- A statistical learning approach to automatic indexing of controlled index terms (Leung and Kan, JASIS 1997)
- KEA: Practical Automatic Keyphrase Extraction (Witten et al., DL 1999)
- A Language Model Approach to Keyphrase Extraction (Tomokiyo & Hurst, MWE 2003)
- TextRank: Bringing Order into Text (Mihalcea & Tarau, EMNLP 2004)
- CollabRank: Towards a Collaborative Approach to Single-Document Keyphrase Extraction (Wan & Xiao, COLING 2008)
- A ranking approach to keyphrase extraction (Jiang et al., SIGIR 2009)
- Automatic Keyphrase Extraction via Topic Decomposition (Liu et al., EMNLP 2010)
- KP-Miner: Participation in SemEval-2 (El-Beltagy & Rafea, SemEval 2010)
- TopicRank: Graph-Based Topic Ranking for Keyphrase Extraction (Bougouin et al., IJCNLP 2013)
- Diverse Keyword Extraction from Conversations (Habibi & Popescu-Belis, ACL 2013)
- Single Document Keyphrase Extraction Using Label Information (Negi, COLING 2014)
- Extracting Discriminative Keyphrases with Learned Semantic Hierarchies (Wang et al., COLING 2016)
- Keyphrase Annotation with Graph Co-Ranking (Bougouin et al., COLING 2016)
- A Graph Degeneracy-based Approach to Keyword Extraction (Tixier et al., EMNLP 2016)
- Supervised Keyphrase Extraction as Positive Unlabeled Learning (Sterckx et al., EMNLP 2016)
- Keyphrase Extraction Using Deep Recurrent Neural Networks on Twitter (Zhang et al., EMNLP 2016)
- Incorporating Expert Knowledge into Keyphrase Extraction (Gollapalli et al., AAAI 2017)
- Salience Rank: Efficient Keyphrase Extraction with Topic Modeling (Teneva & Cheng, ACL 2017)
- Multi-Task Learning of Keyphrase Boundary Classification (Augenstein & Søgaard, ACL 2017)
- PositionRank: An Unsupervised Approach to Keyphrase Extraction from Scholarly Documents (Florescu & Caragea, ACL 2017)
- Learning Feature Representations for Keyphrase Extraction (Florescu and Jin, AAAI 2018)
- Simple Unsupervised Keyphrase Extraction using Sentence Embeddings (Bennani-Smires et al., CoNLL 2018)
- Yake! collection-independent automatic keyword extractor (Campos et al., ECIR 2018)
- Unsupervised Keyphrase Extraction with Multipartite Graphs (Boudin, NAACL 2018)
- Key2Vec: Automatic Ranked Keyphrase Extraction from Scientific Articles using Phrase Embeddings (Mahata et al., NAACL 2018)
- DivGraphPointer: A Graph Pointer Network for Extracting Diverse Keyphrases (Zhang et al., SIGIR 2019)
- Glocal: Incorporating Global Information in Local Convolution for Keyphrase Extraction (Prasad & Kan, NAACL 2019)
- Open Domain Web Keyphrase Extraction Beyond Language Modeling (Xiong et al., EMNLP-IJCNLP 2019)
- Using Human Attention to Extract Keyphrase from Microblog Post (Zhang & Zhang, ACL 2019)
- SaSAKE: Syntax and Semantics Aware Keyphrase Extraction from Research Papers (Santosh et al., COLING 2020)
- KeyGames: A Game Theoretic Approach to Automatic Keyphrase Extraction (Saxena et al., COLING 2020)
- A Joint Learning Approach based on Self-Distillation for Keyphrase Extraction from Scientific Documents (Lai et al., COLING 2020)
- SIFRank: a new baseline for unsupervised keyphrase extraction based on pre-trained language model. (Sun et al., IEEE Access 2020)
- Web Document Encoding for Structure-Aware Keyphrase Extraction (Kim et al., SIGIR 2021)
- Keyphrase Extraction from Scientific Articles via Extractive Summarization (Kontoulis et al., sdp 2021)
- Word centrality constrained representation for keyphrase extraction (Gero & Ho, BioNLP 2021)
- Exploiting Position and Contextual Word Embeddings for Keyphrase Extraction from Scientific Papers (Patel & Caragea, EACL 2021)
- Keyphrase Extraction with Incomplete Annotated Training Data (Lei et al., WNUT 2021)
- Importance Estimation from Multiple Perspectives for Keyphrase Extraction (Song et al., EMNLP 2021)
- Unsupervised Keyphrase Extraction by Jointly Modeling Local and Global Context (Liang et al., EMNLP 2021)
- AttentionRank: Unsupervised Keyphrase Extraction using Self and Cross Attentions (Ding & Luo, EMNLP 2021)
- UCPhrase: Unsupervised Context-aware Quality Phrase Tagging (Gu et al., KDD 2021)
- AGRank: Augmented Graph-based Unsupervised Keyphrase Extraction (Ding & Luo, AACL-IJCNLP 2022)
- Hyperbolic Relevance Matching for Neural Keyphrase Extraction (Song et al., NAACL 2022)
- MDERank: A Masked Document Embedding Rank Approach for Unsupervised Keyphrase Extraction (Zhang et al., ACL Findings 2022)
- Unsupervised Keyphrase Extraction via Interpretable Neural Networks (Joshi et al., EACL Findings 2023)
- PromptRank: Unsupervised Keyphrase Extraction Using Prompt (Kong et al., ACL 2023)
- Improving Embedding-based Unsupervised Keyphrase Extraction by Incorporating Structural Information (Song et al., ACL Findings 2023)
- Unsupervised Keyphrase Extraction by Learning Neural Keyphrase Set Function (Song et al., ACL Findings 2023)
- Multi-Task Knowledge Distillation with Embedding Constraints for Scholarly Keyphrase Boundary Classification (Park & Caragea, EMNLP 2023)
- SAMRank: Unsupervised Keyphrase Extraction using Self-Attention Map in BERT and GPT-2 (Kang & Shin, EMNLP 2023)
- HyperRank: Hyperbolic Ranking Model for Unsupervised Keyphrase Extraction (Song et al., EMNLP 2023)
- Mitigating Over-Generation for Unsupervised Keyphrase Extraction with Heterogeneous Centrality Detection (Song et al., EMNLP 2023)
- Clustering-based Sampling for Few-Shot Cross-Domain Keyphrase Extraction (Mishra et al., EACL Findings 2024)
- Enhancing Phrase Representation by Information Bottleneck Guided Text Diffusion Process for Keyphrase Extraction (Luo et al., LREC-COLING 2024)
- Match More, Extract Better! Hybrid Matching Model for Open Domain Web Keyphrase Extraction (Song et al., ACL Findings 2024)
- Attention-Seeker: Dynamic Self-Attention Scoring for Unsupervised Keyphrase Extraction (Lopez Zapata et al., COLING 2025)
- Empirical Study of Zero-shot Keyphrase Extraction with Large Language Models (Kang & Shin, COLING 2025)
- TermITH-Eval : a French Standard-Based Resource for Keyphrase Extraction Evaluation (Bougouin, LREC 2016)
- Human-competitive tagging using automatic keyphrase extraction (Medelyan et al., EMNLP 2009)
- Approximate Matching for Evaluating Keyphrase Extraction (Zesch & Gurevych, RANLP 2009)
- Evaluating N-gram based Evaluation Metrics for Automatic Keyphrase Extraction (Kim et al., COLING 2010)
- How Document Pre-processing affects Keyphrase Extraction Performance (Boudin et al., WNUT 2016)
- Evaluating anaphora and coreference resolution to improve automatic keyphrase extraction (Basaldella et al., COLING 2016)
- Creation and evaluation of large keyphrase extraction collections with multiple opinions (Sterckx et al., Language Resources and Evaluation 2018)
- Encoding Conversation Context for Neural Keyphrase Extraction from Microblog Posts (Zhang et al., NAACL 2018)
- Keyphrase Generation: A Text Summarization Struggle (Çano & Bojar, NAACL 2019)
- Understanding the Tradeoff between Cost and Quality of Expert Annotations for Keyphrase Extraction (Chau et al., LAW 2020)
- Large-Scale Evaluation of Keyphrase Extraction Models (Gallina et al., JCDL 2020)
- Scientific Keyphrase Identification and Classification by Pre-Trained Language Models Intermediate Task Transfer Learning (Park & Caragea, COLING 2020)
- An Empirical Study on Neural Keyphrase Generation (Meng et al., NAACL 2021)
- Redefining Absent Keyphrases and their Effect on Retrieval Effectiveness (Boudin & Gallina, NAACL 2021)
- KPEval: Towards Fine-Grained Semantic-Based Keyphrase Evaluation (Wu et al., ACL Findings 2024)
- Are Your Keywords Like My Queries? A Corpus-Wide Evaluation of Keyword Extractors with Real Searches (Galletti et al., COLING 2025)
- Automatic Keyphrase Extraction: A Survey of the State of the Art (Hasan & Ng, ACL 2014)
- Keyword and Keyphrase Extraction Techniques: A Literature Review (Siddiqi and Sharan, IJCA 2015)
- Keyphrase Generation: A Multi-Aspect Survey (Cano and Bojar, IEEE FRUCT 2019)
- Automatic keyphrase extraction: a survey and trends (Merrouni et al., JIIS 2020)
- A Survey on Recent Advances in Keyphrase Extraction from Pre-trained Language Models (Song et al., EACL Findings 2023)
- From statistical methods to deep learning, automatic keyphrase prediction: A survey (Xie et. al., Information Processing & Management 2023)
- Automatic phrase indexing for document retrieval (Fagan, SIGIR 1987)
- A Just-In-Time Keyword Extraction from Meeting Transcripts (Song et al., NAACL 2013)
- Keyphrase Extraction for N-best Reranking in Multi-Sentence Compression (Boudin & Morin, NAACL 2013)
- Joint Learning of Chinese Words, Terms and Keywords (Cao et al., EMNLP 2014)
- Financial Keyword Expansion via Continuous Word Vector Representations (Tsai & Wang, EMNLP 2014)
- Citation-Enhanced Keyphrase Extraction from Research Papers: A Supervised Approach (Caragea et al., EMNLP 2014)
- Cross-Lingual Information to the Rescue in Keyword Extraction (Huang et al., ACL 2014)
- Automatic Keyword Extraction on Twitter (Marujo et al., ACL-IJCNLP 2015)
- Extraction of Keywords of Novelties From Patent Claims (Suzuki & Takatsuka, COLING 2016)
- Real-Time Keyword Extraction from Conversations (Meladianos et al., EACL 2017)
- Keyphrases Extraction from User-Generated Contents in Healthcare Domain Using Long Short-Term Memory Networks (Saputra et al., BioNLP 2018)
- Leveraging Just a Few Keywords for Fine-Grained Aspect Detection Through Weakly Supervised Co-Training (Karamanolakis et al., EMNLP-IJCNLP 2019)
- A Human-AI Loop Approach for Joint Keyword Discovery and Expectation Estimation in Micropost Event Detection (Bhardwaj et al., AAAI 2020)
- Understanding Medical Conversations with Scattered Keyword Attention and Weak Supervision from Responses (Shi et al., AAAI 2020)
- Keywords-Guided Abstractive Sentence Summarization (Li et al., AAAI 2020)
- Keyphrase Generation for Scientific Document Retrieval (Boudin et al., ACL 2020)
- Diverse, Controllable, and Keyphrase-Aware: A Corpus and Method for News Multi-Headline Generation (Liu et al., EMNLP 2020)
- Cross-Media Keyphrase Prediction: A Unified Framework with Multi-Modality Multi-Head Attention and Image Wordings (Wang et al., EMNLP 2020)
- Incorporating Multimodal Information in Open-Domain Web Keyphrase Extraction (Wang et al., EMNLP 2020)
- Bayesian Critiquing with Keyphrase Activation Vectors for VAE-based Recommender Systems (Yang et al., NAACL 2021)
- KPQA: A Metric for Generative Question Answering Using Keyphrase Weights (Lee et al., NAACL 2021)
- Arabic Keyphrase Extraction: Enhancing Deep Learning Models with Pre-trained Contextual Embedding and External Features (Alharbi & Al-Muhtasab, WANLP 2022)
- Towards Better Multi-modal Keyphrase Generation via Visual Entity Enhancement and Multi-granularity Image Noise Filtering (ACM MM 2023)
- Please also check out KeyphraseExtractionSurvey for early keyphrase extraction datasets.
- KP20k, Inspec, Krapivin, NUS, SemEval (Meng et al., ACL 2017)
- KPTimes (Gallina et al., INLG 2019)
- OpenKP (Xiong et al., EMNLP 2019)
- OAGK (Cano et al., NAACL 2019)
- StackEx (Yuan et al., ACL 2020)
- KPBiomed (Houbre et al., Louhi 2022)
- Keyphrase Prediction from Video Transcripts: New Dataset and Directions (Veyseh et al., COLING 2022)
- EcommerceMKP (Gao et al., NAACL Findings 2022)
- AcademicMKP (Gao et al., NAACL Findings 2022)
- LipKey (Koto et al., COLING 2022)
- LDKP (Mahata et al., arXiv 2022)
- A new dataset for multilingual keyphrase generation (Piedboeuf and Langlais, NeurIPS 2022 Datasets and Benchmarks track)
- Few-TK: A Dataset for Few-shot Scientific Typed Keyphrase Recognition (Lahiri et al., NAACL Findings 2024)
- EUROPA: A Legal Multilingual Keyphrase Generation Dataset (Salaün et al., ACL 2024)
- DKPro Keyphrases: Flexible and Reusable Keyphrase Extraction Experiments (Erbs et al., ACL 2014)
- pke: an open source python-based keyphrase extraction toolkit (Boudin, COLING 2016)
- OpenNMT-kpg
- keyphrase-generation-rl
- ir_using_kg
- multilingual_keyphrase_generation
- dlkp
- DeepKPG
- KPEval
- Automatic Keyphrase Extraction from Text: A Walk-through, ECAI 2020
- A Tutorial on Keyphrasification, ECIR 2022