Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
  • Loading branch information
ericjeangirard committed Jan 15, 2025
1 parent 85a3927 commit edc4bdf
Show file tree
Hide file tree
Showing 10 changed files with 65 additions and 12 deletions.
1 change: 1 addition & 0 deletions doc_network/bso.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ geometry: "left=3cm, right=3cm, top=3cm, bottom=3cm"

# Abstract


This study introduces a novel methodology for mapping scientific communities at scale, addressing challenges associated with network analysis in large bibliometric datasets. By leveraging enriched publication metadata from the French research portal scanR and applying advanced filtering techniques to prioritize the strongest interactions between entities, we construct detailed, scalable network maps. These maps are enhanced through systematic disambiguation of authors, affiliations, and topics using persistent identifiers and specialized algorithms. The proposed framework integrates Elasticsearch for efficient data aggregation, Graphology for network spatialization (Force Atltas2) and community detection (Louvain algorithm) and VOSviewer for network vizualization. A Large Language Model (Mistral Nemo) is used to label the communities detected and OpenAlex data helps to enrich the results with citation counts estimation to detect hot topics. This scalable approach enables insightful exploration of research collaborations and thematic structures, with potential applications for strategic decision-making in science policy and funding. These web tools are effective at the global (national) scale but are also available (and can be integrated via iframes) on the perimeter of any French research institution (from large research organisms to any laboratory). All tools and methodologies are open-source on the repo [https://github.com/dataesr/scanr-ui](https://github.com/dataesr/scanr-ui).

# 1. Motivation
Expand Down
Binary file modified doc_network/mapping_at_scale.pdf
Binary file not shown.
34 changes: 30 additions & 4 deletions doc_network/mapping_at_scale.tex
Original file line number Diff line number Diff line change
Expand Up @@ -37,8 +37,9 @@
\IfFileExists{bookmark.sty}{\usepackage{bookmark}}{\usepackage{hyperref}}
\hypersetup{
pdftitle={Mapping scientific communities at scale},
pdfkeywords={scanR, VOSviewer, scientific ccommunity, research
portal, Elasticsearch, network analysis},
pdfkeywords={scanR, VOSviewer, graphology, scientific
ccommunity, community detection, research portal, Elasticsearch, network
analysis},
hidelinks,
pdfcreator={LaTeX via pandoc}}
\urlstyle{same} % disable monospaced font for URLs
Expand Down Expand Up @@ -166,8 +167,33 @@

\begin{document}
\maketitle

\textbf{Keywords}: open access, open science, open data, open source
\begin{abstract}
This study introduces a novel methodology for mapping scientific
communities at scale, addressing challenges associated with network
analysis in large bibliometric datasets. By leveraging enriched
publication metadata from the French research portal scanR and applying
advanced filtering techniques to prioritize the strongest interactions
between entities, we construct detailed, scalable network maps. These
maps are enhanced through systematic disambiguation of authors,
affiliations, and topics using persistent identifiers and specialized
algorithms. The proposed framework integrates Elasticsearch for
efficient data aggregation, Graphology for network spatialization (Force
Atltas2) and community detection (Louvain algorithm) and VOSviewer for
network vizualization. A Large Language Model (Mistral Nemo) is used to
label the communities detected and OpenAlex data helps to enrich the
results with citation counts estimation to detect hot topics. This
scalable approach enables insightful exploration of research
collaborations and thematic structures, with potential applications for
strategic decision-making in science policy and funding. These web tools
are effective at the global (national) scale but are also available (and
can be integrated via iframes) on the perimeter of any French research
institution (from large research organisms to any laboratory). All tools
and methodologies are open-source on the repo
\url{https://github.com/dataesr/scanr-ui}.
\end{abstract}

\textbf{Keywords}: scanR, VOSviewer, graphology, scientific community,
community detection, research portal, Elasticsearch, network analysis

\hypertarget{motivation}{%
\section{1. Motivation}\label{motivation}}
Expand Down
Binary file modified doc_network/out.docx
Binary file not shown.
2 changes: 1 addition & 1 deletion doc_network/out.enriched.json

Large diffs are not rendered by default.

Binary file modified doc_network/out.epub
Binary file not shown.
6 changes: 3 additions & 3 deletions doc_network/out.html
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
<meta charset="utf-8" />
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<meta name="viewport" content="width=device-width, initial-scale=1, user-scalable=no" />
<meta name="keywords" content="scanR, VOSviewer, scientific ccommunity, research portal, Elasticsearch, network analysis" />
<meta name="keywords" content="scanR, VOSviewer, graphology, scientific ccommunity, community detection, research portal, Elasticsearch, network analysis" />
<title>Mapping scientific communities at scale</title>
<style type="text/css">code{white-space: pre;}</style>
<style type="text/css">
Expand Down Expand Up @@ -503,12 +503,12 @@ <h1 property="headline">Mapping scientific communities at scale</h1>
</div>
<div class="author-info">
</div>

<p class="abstract" property="description"><p>This study introduces a novel methodology for mapping scientific communities at scale, addressing challenges associated with network analysis in large bibliometric datasets. By leveraging enriched publication metadata from the French research portal scanR and applying advanced filtering techniques to prioritize the strongest interactions between entities, we construct detailed, scalable network maps. These maps are enhanced through systematic disambiguation of authors, affiliations, and topics using persistent identifiers and specialized algorithms. The proposed framework integrates Elasticsearch for efficient data aggregation, Graphology for network spatialization (Force Atltas2) and community detection (Louvain algorithm) and VOSviewer for network vizualization. A Large Language Model (Mistral Nemo) is used to label the communities detected and OpenAlex data helps to enrich the results with citation counts estimation to detect hot topics. This scalable approach enables insightful exploration of research collaborations and thematic structures, with potential applications for strategic decision-making in science policy and funding. These web tools are effective at the global (national) scale but are also available (and can be integrated via iframes) on the perimeter of any French research institution (from large research organisms to any laboratory). All tools and methodologies are open-source on the repo <a href="https://github.com/dataesr/scanr-ui">https://github.com/dataesr/scanr-ui</a>.</p></p>



<div property="articleBody" class="article-body">
<p><strong>Keywords</strong>: open access, open science, open data, open source</p>
<p><strong>Keywords</strong>: scanR, VOSviewer, graphology, scientific community, community detection, research portal, Elasticsearch, network analysis</p>
<h1 id="motivation">1. Motivation</h1>
<p>Analysing and mapping scientific communities provides an insight into the structure and evolution of academic disciplines. This involves providing an analytical and visual representation of the relationships between entities (e.g. researchers, research laboratories, research themes), with the aim, in particular, of understanding the networks and dynamics of scientific collaboration, and identifying collaborative groups and their influences. From the point of view of decision-makers, this type of tool is useful for strategic decision-making with a view to public policy and funding.</p>
<p>These maps are generally deduced from data in bibliographic databases (open or proprietary), based on co-publication or citation information. In the case of co-publications, two entities (authors, for example) will be linked if they have collaborated (co-published) on a piece of research. These links are then symmetrical. In the case of citation links, two authors will be linked if one cites the research work of another, in the list of references. This is a directed link, as one author may cite another without this being reciprocal. A lot of recent work uses this second approach, for example by trying to calculate composite indicators of novelty (or innovation) based on citation links.</p>
Expand Down
34 changes: 30 additions & 4 deletions doc_network/out.latex
Original file line number Diff line number Diff line change
Expand Up @@ -37,8 +37,9 @@
\IfFileExists{bookmark.sty}{\usepackage{bookmark}}{\usepackage{hyperref}}
\hypersetup{
pdftitle={Mapping scientific communities at scale},
pdfkeywords={scanR, VOSviewer, scientific ccommunity, research
portal, Elasticsearch, network analysis},
pdfkeywords={scanR, VOSviewer, graphology, scientific
ccommunity, community detection, research portal, Elasticsearch, network
analysis},
hidelinks,
pdfcreator={LaTeX via pandoc}}
\urlstyle{same} % disable monospaced font for URLs
Expand Down Expand Up @@ -166,8 +167,33 @@ France}

\begin{document}
\maketitle

\textbf{Keywords}: open access, open science, open data, open source
\begin{abstract}
This study introduces a novel methodology for mapping scientific
communities at scale, addressing challenges associated with network
analysis in large bibliometric datasets. By leveraging enriched
publication metadata from the French research portal scanR and applying
advanced filtering techniques to prioritize the strongest interactions
between entities, we construct detailed, scalable network maps. These
maps are enhanced through systematic disambiguation of authors,
affiliations, and topics using persistent identifiers and specialized
algorithms. The proposed framework integrates Elasticsearch for
efficient data aggregation, Graphology for network spatialization (Force
Atltas2) and community detection (Louvain algorithm) and VOSviewer for
network vizualization. A Large Language Model (Mistral Nemo) is used to
label the communities detected and OpenAlex data helps to enrich the
results with citation counts estimation to detect hot topics. This
scalable approach enables insightful exploration of research
collaborations and thematic structures, with potential applications for
strategic decision-making in science policy and funding. These web tools
are effective at the global (national) scale but are also available (and
can be integrated via iframes) on the perimeter of any French research
institution (from large research organisms to any laboratory). All tools
and methodologies are open-source on the repo
\url{https://github.com/dataesr/scanr-ui}.
\end{abstract}

\textbf{Keywords}: scanR, VOSviewer, graphology, scientific community,
community detection, research portal, Elasticsearch, network analysis

\hypertarget{motivation}{%
\section{1. Motivation}\label{motivation}}
Expand Down
Binary file modified doc_network/out.odt
Binary file not shown.
Binary file modified doc_network/out.pdf
Binary file not shown.

0 comments on commit edc4bdf

Please sign in to comment.