This is a class datavis project exploring a dataset of research papers of AAAI 2014 conference. Tool link https://interactiveaaai2014.herokuapp.com/
Most of the current conference proceedings are in the form of listed paper titles with links to papers full text/further details opened in a new page which makes the process of exploring conference proceedings and doing literature reviews more hideous for researchers. The idea of this visualization tool is to create interactive conference proceedings which can be used to open up new and deeper ways of exploration of conference publications faster and easier by researchers. A number of datasets were explored and AAAI 2014 conference dataset was found to be the most suitable dataset for this task with respect to availability and dataset attributes. This dataset has important attributes like keywords, topics and groups that can interest researchers while looking for papers in the proceedings. For example, a researcher can be interested to read all the papers in the proceedings that contain a certain phrase like ‘machine learning’ in the title or topics. A python script was coded to clean the data by doing some splitting in the authors, groups, keywords and topics fields to be able to do further processing of the data to calculate the statistics of the data for the charts. The tool is designed in html, CSS and javascript. The tool is designed to be as simple and easy to use as possible. The title of the tool is displayed and a write-up button is included to include the writeup of the project in a popup window. The visualization tool consists of 3 main components: dynamic search query box, statistics summary box and papers display grid.
This box is used to filter out the data as wanted by the user. The user can filter by all the attributes available in the dataset: keyword, title, author, topic or group. After selecting a filtering option, the user can type in a string or a substring in the text box to get all papers that have their chosen method of filtering (keyword, group, title, etc) equal to or containing this string/substring. The user then can click on the filter button to display the filtered data. There is also a reset button to reset all filtering done and display all data. A simple placeholder is displayed in the text box to inform the user that they can type in a full string or substring of the searching keyword.
This box displays the summary of the statistics of the conference on the filtered data if any filtering option is selected from the search query box or on all data if none is selected. The statistics is calculated from the raw data (by transforming the data) by counting the number of papers for each topic, group and author for the filtered data. This box is composed of 3 tabs: Topics, groups and authors. At first the graphs were displayed all together each on a newline but this method was very overwhelming and ineffective to read. The graphs are added to a box with different tabs to save space and not overwhelm the user with 3 different graphs at the same time. Each tab is composed of a bar chart displaying the percentage of papers in each category. For example, in the Topics tab the bar chart displays the percentage of papers in each research topic (similarly in other tabs too for authors and groups). The box contains a very short description of what the bar chart represents with the axes labeled in orange in the text description. This description is added to aid the user understand what the bar chart is presenting and how to make use of it. The labels are added in the text description instead of adding them to the bar chart to make it look simpler. They are still colored in orange to be catchy without having to read the full description. The title of the bar chart is also included above the chart. Under the title, a statistics summary of the data is included to give the user a quick insight of what are the max, min and median topics or groups or authors. A user might be interested to know which topic has the most papers accepted or who is the most active author with the highest number of accepted papers. These summary labels are colored in orange to attract the attention of the user and deliver a quick sense of the statistics (hottest research topic, etc). The bars of the chart are sorted descendingly to make the chart more readable and help the user do more analysis and draw conclusions faster. The x-axis of the bar chart is not labeled with names of the categories as the names in this dataset are too long to display in such a small space. Instead, a tooltip is used to display more information while hovering each bar. The tooltip displays the name of the category, the percentage of papers and also the actual count when hovering over a certain category/bar. Also the color of the selected bar is colored in brown to make it more visible which bar is currently selected. A smaller gray version of the barchart is included below the actual bar chart to perform brushing and zooming on certain areas of the bar chart by drawing a rectangle on the area of interest (click and drag). The area of interest can be redrawn or dragged along the x-axis. This feature is used to zoom in and explore the barchart more and also to display the papers of the brushed/selected area only in the paper display grid. To unbrush the chart, the user can change the tab or double click on the white area of the bar chart (outside the bars).
This grid is used to display the filtered and/or brushed papers with each paper added as a card with a big title and authors names. This interface is so much better and informative than having titles listed below each other as any available conference proceeding. Each paper card has a quick view button that opens a quick view popup with all the details about the paper (title, abstract, authors, keywords, topics and groups). The popup window is animated with fade in and fade out when appearing and disappearing respectively. This quick view is better in exploring papers of interest without having to open a new page each time. To exit the quick view, the user can press outside of the popup window. The number of displayed papers is also displayed in the title of the paper display grid to be more informative when the user uses the filtering and/or brushing processes. The combination of the filtering in the search query box and brushing of the bar chart allows the user to filter out papers of interest according to combinations of features. For example, the user can filter out the papers by ‘keyword’ to contain the string ‘bayesian’. The resulting statistics are displayed on the bar chart which can be used by the user to do more filtering by topic or group or authors. The user can for example brush the graph in the topics tab by selecting the first 5 bars of the chart to get the papers in the highest 5 topics that have the word ‘bayesian’ in any of their keywords. This tool can help to speed up the researchers exploration/reading process of previous literature and analysis of conference statistics to form insights about the hottest topics/most active authors/and so on.
I worked on this project alone as I am taking this class as a TQE. The project took around 35 hours of work. At first, a lot of time was spent on choosing which language/technologies to use for the development of this app. Another aspect that took a lot of time (the most) is figuring out how to do interactions between different parts of the visualization tool and fixing the bugs while adding new features.