Networks of keywords for the bibliometric analysis

EUSN 2021

Carlos G. Figuerola      []
Modesto Escobar []

University of Salamanca, 2021

Author’s keywords

  • authors describe the content of their academic articles
  • they are free vocabulary
    • no constraints
    • even no normalization
    • very wide range of words (most of them with only one use)

Author’s keywords

  • low utility for retrieval pourposes
  • most of author’s keywords occur also inside the title and/or the abstract
  • journals often include another subject headings, topic classification, etc.
  • current retrieval technology performs better than searching by such author’s keywords

Author’s keywords

  • however
    • keywords give a good idea about how authors themselves see their own work
    • as keywords connect (co occur) each other, they can show how authors are related with another diciplines, knowledge subfields, etc.
    • as science fields are not monolithic, and academic articles are stored in bibliographic databases, keywords analysis can show byass or different ways of view specific disciplines.

Data Source

  • bibliographic records from Web of Science
  • all records in the Library & Information Science category
  • from 1971 to 2020

Data overview

  • number of records: 114,020
  • records with author’s keywords: 42,838

Data overview

  • total of keywords: 228,420
  • unique keywords: 62,550
  • average keywords per article: 5.34

Most frequent keywords

internet 1092 social_network 483
social_media 1048 information_system 482
knowledge_management 1034 knowledge_sharing 481
bibliometrics 927 innovation 457
information_retrieval 896 china 457
academic_library 834 trust 429
library 716 information_and_communication_technology 426
information_technology 571 collaboration 425
digital_library 553 electronic_commerce 412
citation_analysis 552 e_government 403
information_literacy 535 research 390
qualitative 512 public_library 368
case_study 489

Most frequent keywords over time

Network of keywords

  • keywords can be represented by a network in which:
    • every keyword is a node
    • nodes are connected if they cooccur in the same article
    • weigth of connection (edges) is proportional to the frequency of the coincidence
    • edges are not directed

Network of keywords

  • resulting network is pretty big
    • 62,550 nodes
    • 260,788 edges
  • network can be pruned dropping nodes with degree <= n
  • with n=2
    • 2,453 nodes
    • 7,045 links

Visualizing the network

Community analysis

  • Community: some nodes links strongly between them and weakly with outsiders
  • when nodes are keywords
    • a community is a bunch of keywords which are used together frequently
    • a community can show thematic clustering of keywords
    • the community structure of such a network can show the thematic structure of the research field

Community detection techniques

  • several algorithms available
    • infomap (Rosvall & Bergstrom …)
    • Leiden (Traag, Waltman & van Eck …)



Main Communities

    • library digital_library academic_library interlending worldwide_web document_delivery electronic_book
    • knowledge_management knowledge information_system knowledge_sharin case_study outsourcing knowledge_transfer
    • e_government information_technology information_and_communication_technology telecommunication digital_divide broadband
    • bibliometrics citation_analysis citation h_index open_access research impact_factor
    • electronic_commerce trust privacy technology_acceptance_model culture ethic security supply_chain
    • social_media web_2.0 social_network collaboration network blog facebook twitter
    • information_retrieval search_engine user_interface world_wide_web user design precision text_mining
    • internet cd_rom web_survey standard computer_mediated_communication expert_system online_searching
    • information_literacy e_learning education multimedia training learning information_seeking distance_learning
    • ontology metadata evaluation semantic_web classification quality preservation
    • information implementation decision_support_system enterprise_resource_planning_system information_quality
    • geographical_information_system open_source_software data software_development uncertainty digital_elevation_model map
    • intellectual_capital data_mining business_intelligence decision_support data_warehouse real_option

Tracking communities over time


  • network analysis techniques can help in bibliometrics studies
  • network of author’s keywords can be useful for analizing thematic development of scientific disciplines
  • community structure of a network of keywords can show the topic’s structure of a scientific field
  • dynamic networks can be useful to analize the evolution over time of topics inside a scientific discipline

Thank you !!