diff --git a/content/authors/yajing/_index.md b/content/authors/yajing/_index.md new file mode 100644 index 0000000..156ebac --- /dev/null +++ b/content/authors/yajing/_index.md @@ -0,0 +1,61 @@ +--- +# Display name +title: Yajing Yang + +# Full Name (for SEO) +first_name: Yajing +last_name: Yang + +# Is this the primary user of the site? +superuser: false + +# Role/position +role: IPP Doctoral Student (Aug '20) + +# Organizations/Affiliations +organizations: + - name: National University of Singapore, School of Computing + url: 'http://www.comp.nus.edu.sg' + +# Short bio (displayed in user profile at end of posts) +bio: PhD Candidate August 2020 Intake + +interests: + - Data-to-Text Generation + - Table Reasoning + - Large Language Models + + +social: + - icon: house + icon_pack: fas + link: https://yajingyang.github.io/ + - icon: envelope + icon_pack: fas + link: 'yajing.yang@u.nus.edu' + - icon: google-scholar + icon_pack: ai + link: https://scholar.google.com/citations?user=eTNzO7oAAAAJ&hl=en + - icon: github + icon_pack: fab + link: https://github.com/yajingyang +# Link to a PDF of your resume/CV from the About widget. +# To enable, copy your resume/CV to `static/files/cv.pdf` and uncomment the lines below. +# - icon: cv +# icon_pack: ai +# link: files/cv.pdf + +# Enter email to display Gravatar (if Gravatar enabled in Config) +email: 'yajing.yang@u.nus.edu' + +# Highlight the author in author lists? (true/false) +highlight_name: false + +# Organizational groups that you belong to (for People widget) +# Set this to `[]` or comment out if you are not using People widget. +user_groups: + - Graduate Students +# - Researchers +--- + +Yajing is currently a fifth year Ph.D candidate in the Industrial PhD Programme in School of Computing at National University of Singapore and Rio Tinto. She is currently the member of Web Information Retrieval / Natural Language Processing Group (WING) and uner supervision of associate professor Dr. Min-yen Kan. Her primary research interest lies in Natural Language Processing, with a specific focus on data-to-text generation and data narration. diff --git a/content/authors/yajing/avatar.jpg b/content/authors/yajing/avatar.jpg new file mode 100644 index 0000000..827b963 Binary files /dev/null and b/content/authors/yajing/avatar.jpg differ diff --git a/content/publication/Ramesh_Kashyap_Yang_Kan_2023/cite.bib b/content/publication/Ramesh_Kashyap_Yang_Kan_2023/cite.bib new file mode 100644 index 0000000..ff541ed --- /dev/null +++ b/content/publication/Ramesh_Kashyap_Yang_Kan_2023/cite.bib @@ -0,0 +1,15 @@ +@article{Ramesh_Kashyap_Yang_Kan_2023, + title={Scientific document processing: challenges for modern learning methods}, + volume={24}, + ISSN={1432-1300}, + url={https://doi.org/10.1007/s00799-023-00352-7}, + DOI={10.1007/s00799-023-00352-7}, + abstractNote={Neural network models enjoy success on language tasks related to Web documents, including news and Wikipedia articles. However, the characteristics of scientific publications pose specific challenges that have yet to be satisfactorily addressed: the discourse structure of scientific documents crucial in scholarly document processing (SDP) tasks, the interconnected nature of scientific documents, and their multimodal nature. We survey modern neural network learning methods that tackle these challenges: those that can model discourse structure and their interconnectivity and use their multimodal nature. We also highlight efforts to collect large-scale datasets and tools developed to enable effective deep learning deployment for SDP. We conclude with a discussion on upcoming trends and recommend future directions for pursuing neural natural language processing approaches for SDP.}, + number={4}, + journal={International Journal on Digital Libraries}, + author={Ramesh Kashyap, Abhinav and Yang, Yajing and Kan, Min-Yen}, + year={2023}, + month=dec, + pages={283–309}, + language={en} + } diff --git a/content/publication/Ramesh_Kashyap_Yang_Kan_2023/index.md b/content/publication/Ramesh_Kashyap_Yang_Kan_2023/index.md new file mode 100644 index 0000000..d7a7d15 --- /dev/null +++ b/content/publication/Ramesh_Kashyap_Yang_Kan_2023/index.md @@ -0,0 +1,17 @@ +--- +title: 'Scientific document processing: challenges for modern learning methods' +authors: +- abhinav +- yajing +- min +date: '2023-03-24' +publishDate: '2024-07-06T02:22:24.568376Z' +publication_types: +- paper-journal +publication: '*International Journal on Digital Libraries*' +doi: 10.1007/s00799-023-00352-7 +abstract: Neural network models enjoy success on language tasks related to Web documents, including news and Wikipedia articles. However, the characteristics of scientific publications pose specific challenges that have yet to be satisfactorily addressed: the discourse structure of scientific documents crucial in scholarly document processing (SDP) tasks, the interconnected nature of scientific documents, and their multimodal nature. We survey modern neural network learning methods that tackle these challenges: those that can model discourse structure and their interconnectivity and use their multimodal nature. We also highlight efforts to collect large-scale datasets and tools developed to enable effective deep learning deployment for SDP. We conclude with a discussion on upcoming trends and recommend future directions for pursuing neural natural language processing approaches for SDP. +links: +- name: URL + url: https://doi.org/10.1007/s00799-023-00352-7 +--- diff --git a/content/publication/yang-etal-2024-datatales/cite.bib b/content/publication/yang-etal-2024-datatales/cite.bib new file mode 100644 index 0000000..3c1da44 --- /dev/null +++ b/content/publication/yang-etal-2024-datatales/cite.bib @@ -0,0 +1,18 @@ +@inproceedings{yang-etal-2024-datatales, + title = "{D}ata{T}ales: A Benchmark for Real-World Intelligent Data Narration", + author = "Yang, Yajing and + Liu, Qian and + Kan, Min-Yen", + editor = "Al-Onaizan, Yaser and + Bansal, Mohit and + Chen, Yun-Nung", + booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing", + month = nov, + year = "2024", + address = "Miami, Florida, USA", + publisher = "Association for Computational Linguistics", + url = "https://aclanthology.org/2024.emnlp-main.601/", + doi = "10.18653/v1/2024.emnlp-main.601", + pages = "10764--10788", + abstract = "We introduce DataTales, a novel benchmark designed to assess the proficiency of language models in data narration, a task crucial for transforming complex tabular data into accessible narratives. Existing benchmarks often fall short in capturing the requisite analytical complexity for practical applications. DataTales addresses this gap by offering 4.9k financial reports paired with corresponding market data, showcasing the demand for models to create clear narratives and analyze large datasets while understanding specialized terminology in the field. Our findings highlights the significant challenge that language models face in achieving the necessary precision and analytical depth for proficient data narration, suggesting promising avenues for future model development and evaluation methodologies." +} \ No newline at end of file diff --git a/content/publication/yang-etal-2024-datatales/index.md b/content/publication/yang-etal-2024-datatales/index.md new file mode 100644 index 0000000..4d705b6 --- /dev/null +++ b/content/publication/yang-etal-2024-datatales/index.md @@ -0,0 +1,18 @@ +--- +title: 'DataTales: A Benchmark for Real-World Intelligent Data Narration' +authors: +- Yajing Yang +- Qian Liu +- min +date: '2024-11-12' +publishDate: '2024-07-06T02:22:24.568376Z' +publication_types: +- paper-conference +publication: '*Proceedings of the 2024 Conference on Empirical Methods in Natural + Language Processing*' +doi: 10.18653/v1/2024.emnlp-main.601 +abstract: We introduce DataTales, a novel benchmark designed to assess the proficiency of language models in data narration, a task crucial for transforming complex tabular data into accessible narratives. Existing benchmarks often fall short in capturing the requisite analytical complexity for practical applications. DataTales addresses this gap by offering 4.9k financial reports paired with corresponding market data, showcasing the demand for models to create clear narratives and analyze large datasets while understanding specialized terminology in the field. Our findings highlights the significant challenge that language models face in achieving the necessary precision and analytical depth for proficient data narration, suggesting promising avenues for future model development and evaluation methodologies. +links: +- name: URL + url: https://aclanthology.org/2024.emnlp-main.601/ +---