Sankey diagram web preview For Emerging Research Topics during late 2022 generated using BERTtopic on semanticscholar data for 6 months. The topics extracted from BERTtopic over 6 months are then linked together to see a temporal pattern of emerging, merging, diverging or fading research topics.
- Different node represent the following runs:
- 2022-08.
- 2022-09.
- 2022-10.
- 2022-11.
- 2022-12.
- 2023-01.
-
The onward links coming out of the nodes are the same color as the topic nodes. The last run 2023-01 is aqua color has no onward links that's why only the nodes are shown.
-
The width of a link line represents the weight of the next run's topic (whereas color is the same as the previous node).
-
The location of topic nodes along the horizontal axis does not represent time period. If a topic is present in all the runs, then it will start from 2022-08 on the left most side and go right till 2023-01.
-
However, if a topic emerges in later run, let's say in 2023-09 then the left most node for that topic will be purple representing a start in 2023-09.
-
Similarly, a topic having a link for only two or three runs/colors means it's corresponding topic was absent in the other runs.
-
At the botton of the diagram, there are some standalone topics without any links. They represents outliers. Our current algorithm doesn't link topics that are absent in runs in-between (i.e., it doesn't link 2022-08 to 2022-10, if it's missing in 09).
-
The outlier topics can be used to debug the issues with BERTtopic. For example, in the topic labelled 202301,313_coal_combustion_gas_gangue has no link with the previous month. Given the popularity of this topic, it's unlikely.
-
Some of the outlier topics for the last run (aqua color) could be newly emerging topics. For example: 202301,405_mirna_microrna_mir_21.
-
When a topic's width increases compared to it's tributaries, that shows an increasing weight of that topic.