Skip to main content

Official Journal of the Japan Wood Research Society

  • Original Article
  • Open access
  • Published:

Evolving research themes in six selected wood science journals: insights from text mining and latent dirichlet allocation

Abstract

This study analyzes the status, trends, and future directions in wood science research using text-mining techniques. We applied these techniques to a textual dataset constructed from metadata of six major wood science journals, covering the period from 2002 to 2024. The research explores publication trends, international collaborations, keywords, and research networks, and it employs topic modeling using the Latent Dirichlet Allocation model. The descriptive analysis reveals a consistent increase in publication volume throughout the study period, unaffected by the COVID-19 pandemic. In contrast, international collaboration declined after 2020, likely due to the pandemic. In addition, a network analysis identified key research areas, including surface treatments, structural composites, and high-performance wood products, with lignin, mechanical properties, and moisture content emerging as central keywords. Topic modeling reveals a growing interest in wood modification technologies and an increased focus on studying wood as a sustainable material. The study confirms a shift of the field towards sustainable innovations while also highlighting the enduring relevance of traditional research areas. Future research should adapt to these evolving trends and address emerging challenges to maximize the potential of wood for carbon neutrality and sustainable development. This analysis provides a concise overview of current research trends and future directions in wood science.

Introduction

Wood science is a multidisciplinary field focused on understanding the properties, processing techniques, and applications of wood and wooden materials. Its significance spans various industries, including construction and manufacturing, due to the unique attributes of wood and its potential to support sustainable practices. Over the past few decades, the field has evolved significantly, driven by both fundamental research and technological advancements.

Modern wood science originated with the establishment of the Forest Products Laboratory in Madison, Wisconsin, USA, in 1910 [1]. The critical role of wood research was highlighted during World War I when wood became a vital material for military applications [2]. This period of intense research led to the establishment of numerous wood research institutes across industrialized nations by the 1950s [3]. Historically, wood science research focused on fundamental physical and chemical properties as well as anatomy, constrained by the technological limitations of the time [4, 5]. Today, the field benefits from sophisticated tools and methods that allow for a deeper exploration of these aspects. This progress, driven by technological innovation and environmental concerns, has significantly advanced our understanding of the complex mechanisms of wood and led to the development of innovative, sustainable wood products [6, 7].

Recent advancements in machine learning have propelled data-driven wood science research into new realms [8,9,10,11,12]. However, constructing comprehensive databases remains a significant hurdle for researchers in this field. Despite this, academic articles represent a vast, yet often overlooked, repository of information. Wood science has accumulated a wealth of high-quality data in the form of text from these articles, which offers a rich source of knowledge. By converting this text into a format for computational analysis, valuable latent information can be extracted effectively. Furthermore, this approach not only addresses the challenges of database construction but also maximizes the value of existing data.

Despite these advancements, the field has faced challenges in tracking and understanding the breadth of research activities and their impacts. Traditional methods of literature review and research analysis often fall short of capturing the full scope of research trends and emerging themes. In order to address these limitations, recent studies have employed text-mining techniques to analyze large volumes of academic literature, providing new insights into research trends and patterns. Text mining has proven to be a valuable tool for processing extensive datasets and uncovering these insights [13, 14]. By systematically extracting and analyzing text data from articles published in leading peer-reviewed journals, researchers can gain insights into prevalent trends, emerging interests, and shifts in research focus within the field of science [15]. Such analyses offer a valuable quantitative basis for understanding how the field is advancing and where future research efforts should be directed.

This study aims to leverage text mining methods to analyze trends in wood science research using a dataset compiled from leading wood science journals. The study investigates publication trends, assesses patterns of international collaboration, evaluates keyword centrality, and performs topic modeling to identify key research themes and their evolution over time. By examining these aspects, the study seeks to provide a comprehensive overview of the current state of wood science research and highlight potential future directions. Understanding these trends is crucial for recognizing current research priorities, technological innovations, and future challenges in wood science. This study enhances our understanding of the evolving research landscape in this field and provides insights into how wood science can be further leveraged for sustainable development.

Methodology

Dataset

A comprehensive text dataset was constructed using metadata from six representative journals in wood science: European Journal of Wood and Wood Products, Holzforschung, Wood Science and Technology, Journal of Wood Science, Maderas. Ciencia y Tecnologia, and Wood Material Science & Engineering. The dataset includes metadata fields such as journal name, volume, issue, year, DOI, authors, affiliations, countries, article titles, keywords, and abstracts, all sourced from the journals’ websites.

Data collection involved developing a custom Python script to automate the extraction process from these web pages. The extracted data was then manually cleaned to ensure accuracy and completeness. The final dataset was saved in a CSV file format with UTF-8-SIG encoding to accommodate special characters. Table 1 summarizes the dataset, detailing the key attributes of the journals and the bibliographic information collected.

Table 1 Summary of selected wood science publications and composition of the text dataset

Descriptive analytics

Descriptive analytics provides a comprehensive overview of the dataset by summarizing its principal characteristics and identifying patterns. This analytical approach is crucial for elucidating overarching trends, variability, and distribution within the dataset.

Publication trends and diversity

In order to assess publication trends and the diversity of entities involved in the research, the number of articles published annually and the counts of unique countries were counted. By aggregating the annual article counts, fluctuations in publication activity over time were tracked. Concurrently, the diversity of countries was evaluated by identifying the number of unique entities per year. This process involved extracting and enumerating distinct countries from the dataset to understand the geographic distribution of research activities. The analysis highlights the trends in research publications and the diversity of research entities over time.

Contribution score

In order to calculate the contribution score (Sc) for each country in the context of the selected journals, the following formula was applied:

$$\begin{array}{c}{S}_{c}={\sum }_{i=1}^{k}\frac{1}{{N}_{c,i}},\end{array}$$
(1)

where k is the total number of articles and Nc,i denotes the number of contributing countries for the i-th article. The formula aggregates the fractional scores assigned to each country across all publications and provides a measure of each country’s overall contribution. This metric reflects the relative impact of each country as assessed by the selected journals.

Text network analysis

Keyword centrality

A keyword co-occurrence network was constructed, where nodes represent keywords and edges indicate co-occurrence within the same document. Edge weights correspond to the frequency of co-occurrence. Keywords appearing in multiple documents were connected, forming a weighted, undirected graph. This graph structure enables the analysis of keyword relationships and the identification of central concepts within the dataset.

Four centrality metrics were employed to analyze the keyword network: degree centrality, closeness centrality, betweenness centrality, and eigenvector centrality. These metrics provide a comprehensive assessment of the structural significance and effect of keywords within the network [16].

Degree centrality measures the number of direct connections a node has and reflects its immediate connectivity. The degree centrality CD(v) for a node v is defined as:

$$\begin{array}{c}{C}_{D}\left(v\right)=\text{deg}\left(v\right),\#\end{array}$$
(2)

where deg(v) represents the number of edges connected to node v.

Furthermore, closeness centrality measures how close a node is to all other nodes in the network. It is calculated as the inverse of the sum of the shortest path distances from the node to every other node. The closeness centrality CC(v) for a node v is defined as:

$$\begin{array}{c}{C}_{C}\left(v\right)=\frac{1}{{\sum }_{u\in V}d\left(v,u\right)},\#\end{array}$$
(3)

where d(v, u) denotes the shortest path distance between node v and u.

Betweenness centrality quantifies how often a node lies on the shortest paths between other nodes, identifying nodes that act as bridges within the network. The betweenness centrality CB(v) for a node v is defined as:

$$\begin{array}{c}{C}_{B}\left(v\right)=\sum_{s\ne v\ne t}\frac{{\sigma }_{st}\left(v\right)}{{\sigma }_{st}}\#\end{array}$$
(4)

Here, σst is the total number of shortest paths from node s to node t, and σst(v) is the number of those paths that pass through node v.

Eigenvector centrality measures both the number of connections a node has and the importance of the nodes it is connected to. Nodes connected to other high-centrality nodes receive a higher score. The eigenvector centrality CE(v) for a node v is defined as:

$$\begin{array}{c}{C}_{E}\left(v\right)=\frac{1}{\uplambda }\sum_{u\in N\left(v\right)}{x}_{u},\#\end{array}$$
(5)

where N(v) is the set of neighbors of node v, xu is the centrality score of node u, and λ is the largest eigenvalue of the adjacency matrix A. The largest eigenvalue λ is determined by solving the matrix equation Ax = λx.

These centrality metrics collectively reveal the role and influence of each node within the network, which improves the understanding of its structural properties.

Co-occurrence network

The network analysis explores relationships and collaboration patterns among countries, affiliations, and keywords through co-occurrence network analysis. The process begins with constructing a graph where each node represents an entity (e.g., a country, affiliation, or keyword), and each edge signifies a collaboration between two entities. The weight of each edge reflects the frequency of these collaborations, with higher weights indicating stronger connections.

The resulting network visualization reveals key patterns in the collaborative landscape and identifies both central entities and the strength of their connections. This analysis offers valuable insights into the structure and dynamics of collaborative relationships, highlighting how various entities interact and cooperate across different domains.

Topic modeling

Latent Dirichlet Allocation (LDA) model

The LDA model was used for topic modeling to identify latent topics within article abstracts. The analysis followed a structured approach – see Fig. 1:

Fig. 1
figure 1

Schematic illustrating topic modeling via Latent Dirichlet Allocation

First, the abstracts were preprocessed to extract relevant terms. This step involved tokenizing the text, performing part-of-speech tagging, lemmatizing nouns, and removing stopwords to help focus the analysis on meaningful words. LDA is a generative probabilistic model that assumes each document is a mixture of topics and each topic is a mixture of words [17]. The model iteratively assigns words in documents to topics based on their co-occurrence patterns and aims to uncover hidden thematic structures in the data. The core idea is that documents are generated by selecting topics in certain proportions and then choosing words according to each topic’s word distribution.

Furthermore, a Gensim Dictionary and Corpus were created from the processed text [18], which were used as inputs for the LDA model. The model was trained with a varying number of topics, ranging from 2 to 40, to explore different configurations. To find the optimal number of topics, both perplexity and coherence scores were calculated. Perplexity measures how well the model predicts new data, with lower values indicating better predictive performance [19]. Coherence measures the semantic similarity of words within a topic, with higher values suggesting that the terms are more closely related [20]. Subsequently, plots of perplexity and coherence scores were generated to find the optimal number of topics by balancing high coherence with low perplexity.

LDA topic modeling results were visualized using pyLDAvis [21], a Python library that creates an intertopic distance map and highlights the most relevant terms for each topic. Based on these terms, the topics were interpreted subjectively using domain knowledge to determine what each topic represents. The top terms for each topic were analyzed to identify common themes, and documents associated with each topic were reviewed for context. In addition, domain knowledge was also applied to refine the interpretation of each topic, which ensured that the topic structure of the LDA model was both robust and insightful. In this way, it captured meaningful themes in the article abstracts and supported effective interpretation.

Dynamic topic modeling for trend analysis

In order to investigate changes in topics over time, dynamic topic modeling was performed using the LDA model [22] by following several key steps:

First, the dominant topic for each document was determined using the trained LDA model. Each document was converted into a bag-of-words representation [23], and the topic with the highest probability was assigned to it. This approach enabled the classification of documents into distinct topics.

Next, the distribution of topics across different years was calculated. The number of documents assigned to each topic per year was counted and then normalized by the total number of documents for that year. This normalization provided proportions that reflect how each topic’s prominence changed over time.

In order to identify hot and cold topics, trends were analyzed by calculating the average proportions of each topic up to 2023. Topics with increasing proportions were classified as hot topics, indicating a recent rise in their relevance and interest. Conversely, topics with decreasing proportions were labeled as cold topics, reflecting a recent decline in their prominence. Subsequently, trends for each topic were visualized by plotting their proportions over the years. These plots revealed how the significance of various topics has evolved, which offers valuable insights into emerging areas of interest as well as those becoming less relevant.

Results and discussion

Descriptive analytics: key insights

Publication trends

The descriptive analytics reveal significant patterns and trends within the dataset. As shown in Fig. 2, the number of articles published in selected wood science journals has steadily increased since 2002, peaking at 571 articles in 2018. This trend reflects the growth in wood science research, which may be attributed to the rise of electronic publishing and open-access policies [24]. The international collaboration ratio also increased (peaking at 33.8% in 2020) but sharply declined to 25.1% in 2021 due to the COVID-19 pandemic’s impact on global research activities [25]. Interestingly, the pandemic also led to an increase in publications because researchers had more time to focus on writing [26]. The decline continued through 2023, with the ratio dropping further (to 24.1%).

Fig. 2
figure 2

Article count and international collaboration ratio in selected journals vs publication time

These findings highlight significant shifts in publication and collaboration patterns, which are driven by technological advances and global events such as the pandemic. Moreover, the decline in international collaborations post-2020 suggests that, while researchers adapted to new publishing technologies, the pandemic hindered cross-border research efforts significantly. The ongoing increase in article counts reflects a growing emphasis on academic output and visibility. It is facilitated by digital tools and platforms that streamline submission and dissemination. Conversely, the decrease in international collaboration underscores the crucial role of physical presence and in-person interactions when it comes to fostering global research partnerships, which were disrupted during the pandemic. These observations highlight the wood science community’s adaptability to new publishing models and the challenges faced in maintaining international collaborations amid global disruptions.

International collaboration

The selected journals include contributions from 117 countries. Figure 3 shows a detailed analysis of publication output and collaborative research patterns by country. Japan leads with 1705 articles and a contribution score of 1537 (Fig. 3a). Following Japan, countries with substantial publication counts and scores include China, Germany, the USA, Sweden, France, and Canada. It is important to note that contribution scores are relative quantifications specific to these journals, which should be considered when interpreting the data. Countries with abundant forest resources often show higher research activity due to their rich natural resources. Germany, the USA, China, France, and Japan lead in international collaborations (Fig. 3b) despite differing publication counts and scores. European countries and some African countries exhibit high international collaboration ratios despite lower overall publication volumes.

Fig. 3
figure 3

Country-based frequency analysis of articles published in selected wood science journals (2002–2024): a number of publications and contribution scores; b number of single-country and international joint research studies

The analysis highlights regional patterns in wood science research influenced by both climate and infrastructure. Temperate countries show high publication volumes and extensive collaborations, which indicates a strong research infrastructure. Conversely, countries with fewer publications often have high international collaboration rates due to their need for global partnerships arising from limited local resources. The roles and impacts of these collaborations vary across regions.

Keyword trends in wood science: tracking evolving research interests

Analyzing the top keywords from 2002 to 2024 reveals key research interests in wood science (see Table 1). Research during this period has primarily focused on fundamental aspects such as mechanical properties, moisture content, and density, which reflects a continuing interest trend in optimizing wood performance. Additionally, a substantial portion of research examined the chemical components of wood (including lignin, cellulose, and hemicelluloses) to understand how these elements affect material properties and applications. Lignin has consistently been a primary area of interest. Significant attention has also been given to both dimensional stability and equilibrium moisture content (EMC), which underlines the importance of the wood’s response to environmental conditions.

Moreover, research during this period frequently addressed wood processing areas like heat treatment and drying, as well as wood-based panels such as medium-density fiberboards (MDFs), oriented strand boards (OSBs), and particleboards. Studies on specific wood species, such as beech and poplar, aim to enhance knowledge of their unique characteristics and applications. These keywords highlight ongoing efforts to deepen understanding of fundamental wood properties while advancing practical applications and processing techniques.

The keyword analysis revealed several notable shifts: From 2002 to 2005, research focused on fundamental aspects like lignin, moisture content, and density. Between 2006 and 2010, the focus expanded to include EMC and hemicelluloses, which indicates a growing interest in detailed material characteristics. The 2011–2015 period emphasized functional characteristics (including mechanical properties and dimensional stability), which continued into the next period (2016–2020) with an increasing focus on wood durability and industrial applications. Since 2016, there has been growing interest in heat treatments and thermal modification technologies, a trend that continues into 2021–2024 (Table 2).

Table 2 Overview of top keywords in wood science publications (2002–2024) by time period

Technological advancements, such as near-infrared (NIR) spectroscopy and digital image correlation, have become significant and mark a trend towards more precise, rapid, non-destructive analysis. Key wood genera studied include Pinus, Picea, Cryptomeria, Fagus, Eucalyptus, and Populus, which reflect their unique properties and applications. Moreover, industrial applications have gained prominence, with research shifting from basic properties to specific products. Recent studies emphasize innovative materials like cross-laminated timber (CLT), which underscores a continuing interest in practical and technological advancement. These aspects illustrate a transition from basic studies to a broader focus on functional characteristics, sustainability, technological advancement, and industrial applications.

Network analysis

Keyword centrality: insights and analysis

The analysis of network centrality helps understand the relative importance and interconnections of nodes within a network [27]. Table 3 shows the significance and interrelationships of keywords in wood science research. Lignin consistently emerges as the most central keyword across all measures, which highlights its pivotal role in connecting various research topics in wood science.

Table 3 Centrality scores for primary keywords in wood science

Mechanical Properties, Moisture Content, and Density also achieve high centrality scores, which reflects their significance in the field. Hemicelluloses and Cellulose are crucial for understanding the chemical properties of wood, but they are less central in terms of information flow. Extractives (while important across various metrics) rank lower in closeness centrality, which suggests less centrality in terms of information flow efficiency. Conversely, Heat Treatment stands out in both closeness and eigenvector centralities, highlighting its crucial role in information accessibility and connectivity with influential topics. This analysis underscores the broad scope of wood science, which encompasses physical, chemical, and technical aspects.

Co-occurrence networks: insights and analysis

A co-occurrence network illustrates the relationships between entities (e.g., keywords or countries) by depicting their simultaneous appearances. Edges represent the strength of these relationships, with their thickness indicating the frequency of co-occurrence. Node proximity shows the degree of relatedness, with closer nodes representing either more frequent or stronger connections.

Within the national research network (Fig. 4a), the USA and China have the strongest connections, and they are closely tied with Canada. European wood research is centered around Germany, with Sweden, Austria, Switzerland, France, and the Netherlands forming closely linked connections. France acts as a key connector between the European countries and USA–China networks. In Asia, Japan serves as a central hub that connects with South Korea, Indonesia, and Malaysia, and it is closely linked with China. This national network highlights the pivotal role of the USA and China in the global landscape of wood science, linking through networks in Canada, Europe, and Asia. Research in Europe is concentrated in Germany, highlighting its leading role in the field. On the other hand, France’s intermediary position emphasizes its strategic importance as a key link between various research regions. Japan’s prominent role in Asia reflects its central position in regional research and its strengthened connections with major research hubs, including the United States and China.

Fig. 4
figure 4

Visualization of co-occurrence networks: a inter-country and b inter-keyword relationships

The keyword network shows that crucial wood components, such as lignin, cellulose, and hemicellulose, form the strongest connections (Fig. 4b). Lignin, in particular, plays a key role and forms important connections with physical properties such as moisture content, which is then linked to mechanical properties through density. This connection emphasizes lignin as a core node, integrates essential wood attributes, and enables linking with other critical elements. This fact underscores the crucial role of lignin in wood science, as highlighted by its prominent position in the keyword centrality analysis shown in Table 3. Furthermore, both cellulose and hemicelluloses also exhibit substantial connectivity, which highlights their importance in understanding the structural and chemical properties of wood. Notably, the relationships among these key terms reveal complex interactions that are essential for advancing knowledge in wood science.

Topic modeling

LDA topics

LDA identifies underlying topics in a document collection, with perplexity and coherence as key metrics for evaluating model performance. The optimal number of topics was determined based on these metrics (Fig. 5). As the number of topics increased, the perplexity value decreased, which indicates an improved statistical fit. The coherence value increased initially, which indicates improved interpretability and semantic consistency. However, after reaching a peak, it showed minor fluctuations. After evaluating the trade-off between perplexity and coherence, the analysis determined that 24 topics were optimal.

Fig. 5
figure 5

Perplexity and coherence values used to determine the optimal number of topics in the model

An intertopic distance map visualizes the LDA model results by showing the similarities and differences between topics (Fig. 6a). Topics are depicted as points on a two-dimensional plane, with each point representing a specific topic. The distance between points reflects their similarity, while the point size indicates the proportion of that topic within the document corpus. Additionally, each topic is provided with its top 30 most relevant terms (Fig. 6b). Figure 6a illustrates several clusters of topics identified by the LDA model. The topics on the map were manually labeled based on the top 30 most relevant terms for each topic – see Fig. 6b and Table S1.

Fig. 6
figure 6

Interactive visualization of the LDA topic modeling results: a intertopic distance map with topics manually assigned based on b the top 30 most relevant terms for the first topic

The central area of the map is predominantly occupied by topics related to wood processing and quality and highlights fundamental aspects of wood utilization. The topics on the far left pertain to mechanical aspects and emphasize the structural purposes of wood. In the upper and lower right quadrants, the topics related to wood-based panels and chemical properties are clustered, respectively, reflecting specialized interests in these areas. A single topic related to wood anatomy is distinctly separated, which marks its unique significance in wood science.

This spatial arrangement illustrates the thematic structure of the study, separating areas such as practical processing and quality problems, mechanical aspects, panel products, chemical concerns, and wood anatomy. The diverse range of topics reflects the broad scope of wood science, encompassing both fundamental properties and advanced products. This variety highlights the interdisciplinary nature of the field and its relevance across various applications and industries. The intertopic distance map offers valuable insights into the thematic organization of wood science research, with distinct focus areas and the unique position of wood anatomy. This suggests potential for further exploration at the intersection of wook anatomy with other research areas.

Analyzing trends in wood science research

Dynamic topic modeling with the LDA model was used to analyze trends in wood science research by tracking changes in the focus on individual topics over time. The slopes of trend lines for the different topics were evaluated, which enabled the identification of current research trends and changes (Fig. 7).

Fig. 7
figure 7

Trends and dynamics of topics identified via LDA: a trends of prominent hot and cold topics; b slope of trend lines for all topics from 2002 to 2023; c slope of trend lines for all topics from 2014 to 2023

The analysis, covering the period from 2002 to 2023, revealed a notable increase in interest in Surface Modification (Fig. 7a, b). This trend reflects the growing significance of surface treatment technologies, which have emerged as a key strategy for enhancing wood’s resistance throughout its lifecycle in both indoor and outdoor applications [28]. Additionally, topics such as Predictive Modeling for wood properties and CLT Processing have also shown increasing interest (Fig. 7b). Furthermore, the increasing emphasis on CLT and Structural Composite Lumber (SCL) underscores their potential as sustainable construction materials, which further fuels research in this field.

In contrast, research on Mechanical Performance, Wood-Plastic Composites (WPCs), bamboo utilization, and wood extractives underwent a decline (Fig. 7b). This reduction may be attributed to several factors, including the comprehensive nature of existing research in these areas or the emergence of new technologies and materials. The observed decline suggests that, while these topics hold significant sustainability potential, they may be overshadowed by more innovative or promising technologies. This shift indicates that research priorities are evolving, with emerging technologies and materials gaining increased prominence.

Recent trends from 2015 to 2024 (Fig. 7c) highlight a notable increase in the trend slope for SCL, which means it is the most rapidly growing topic during this period. This increase aligns with the higher interest in CLT, indicating a growing emphasis on these sustainable construction materials. Conversely, topics such as Fire Resistance, Pulping, Durability and Preservation, and Bamboo Utilization, which were in decline over the entire analysis period, have undergone a resurgence in the last decade. This may be due to advances in technology or shifting industry needs. On the other hand, Drying and Adhesives showed declining trends, with Drying showing a particularly sharp decrease. This recent downturn in Drying underscores a shift in focus away from this area, possibly in favor of other emerging topics. In addition, this trend, paradoxically, may also reflect the technical maturity of the wood-drying field.

Research on Wood Processing, Wood Industry, and Chemical Properties remained relatively stable. However, these topics continue to be vital areas of focus, reflecting their ongoing relevance in understanding the fundamental properties of wood and its applications in various industries.

Current trends suggest an increasing focus on advanced material modification technologies and high-performance products, which marks the development of new, environmentally friendly, and functional solutions. Innovations such as CLT and SCL show how wood can advance sustainable development and support environmentally responsible practices. Text mining of recent research highlights that the primary mission of current wood science is to drive innovation in wood as a sustainable material for achieving carbon neutrality. This evolving goal aligns with the broader emphasis on emerging technologies, which underscores a commitment to advancing sustainable solutions and tackling new challenges. While traditional research remains essential for foundational stability and understanding, integrating these insights with innovative approaches is crucial. Future research should balance the exploration of new technologies with a firm commitment to fundamental studies, which ensures that both areas are adequately supported and that emerging challenges are addressed proactively.

Limitations

In light of the limited scope of this analysis, which focuses on papers published in six selected journals, it is important to acknowledge that recent advancements in wood science, particularly in cellulose nanofibers, are being explored across a broader range of publications. These innovative studies not only appear in specialized wood science journals but also in diverse fields. To ensure a more comprehensive understanding of current trends and developments, future efforts will focus on expanding our database to include additional relevant journals in the field of wood science.

While this study specifically examined papers with English abstracts from selected journals to ensure consistency in data collection, it also introduces a limitation by excluding research published in other languages, such as German, Japanese, and Russian. To ensure a more comprehensive understanding of current trends and developments, future efforts will focus on expanding our database to include additional relevant journals in the field of wood science, as well as non-English publications.

Conclusions

Text mining techniques were utilized to analyze the current status, trends, and future directions in wood science. The findings reveal a decline in international collaboration over the past three years. However, publication volume has steadily increased, which indicates a growing interest in the field despite disruptions caused by the COVID-19 pandemic. Major contributors such as Japan, China, Germany, and the USA play pivotal roles in wood science, with strong collaborative networks evident in Europe and Asia. This highlights significant global engagement in the field. The analysis revealed a growing interest in surface modification, CLT, and SCL, signaling advancements in wood science. In contrast, research on wood-plastic composites and wood drying is declining, suggesting a shift toward newer technologies. Despite these significant changes, the traditional research areas that focused on wood properties and processing remain crucial. The importance of ongoing research to fully realize the potential of wood for achieving carbon neutrality and advancing sustainable development was emphasized. Future research should integrate emerging trends with foundational studies to effectively tackle new challenges. Additionally, the collected text data serve as a valuable foundation for future research beyond the scope of this study. Specifically, the database can be leveraged to develop a specialized language model for wood science, enabling enhanced literature analysis, automated research assistance, and more efficient knowledge retrieval. This approach would support future efforts to address both emerging topics and traditional challenges in the field.

Availability of data and materials

The datasets generated and/or analyzed during the current study are available from the corresponding author upon reasonable request. Additionally, the code utilized in this study is publicly available at https://github.com/pywood21/text_mining_2024, allowing for further exploration and replication of the computational methods described herein.

Abbreviations

CLT:

Cross-laminated timber

EMC:

Equilibrium moisture content

LDA:

Latent dirichlet allocation

NIR:

Near-infrared

MDF:

Medium-density fiberboard

OSB:

Oriented strand board

SCL:

Structural composite lumber

WPCs:

Wood-plastic composites

References

  1. Koning JW Jr (2011) Forest Products Laboratory, 1910–2010: celebrating a century of accomplishments. University of Wisconsin Press, Madison

    Google Scholar 

  2. Kisser JG, Ylinen A, Freudenberg K, Kollmann FFP, Liese W, Thunell B, Winkelmann HG, Côté WA Jr, Koch P, Marian JE, Stamm AJ (1967) History of wood science. Wood Sci Technol 1:161–190. https://doi.org/10.1007/BF00350460

    Article  Google Scholar 

  3. Mai C, Schmitt U, Niemz P (2022) A brief overview on the development of wood research. Holzforschung 76:102–119. https://doi.org/10.1515/hf-2021-0155

    Article  CAS  Google Scholar 

  4. Kollmann FF, Cote WA (1968) Principles of wood science and technology, volume I: solid wood. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-87928-9

    Book  Google Scholar 

  5. Niemz P, Mai C, Schmitt U (2023) Introduction to wood science. In: Niemz P, Teischinger A, Sandberg D (eds) Springer handbook of wood science and technology. Springer Cham, Cham. https://doi.org/10.1007/978-3-030-81315-4_2

  6. Berglund LA, Burgert I (2018) Bioinspired wood nanotechnology for functional materials. Adv Mater 30:1704285. https://doi.org/10.1002/adma.201704285

    Article  CAS  Google Scholar 

  7. Goldhahn C, Cabane E, Chanana M (2021) Sustainability in wood materials science: an opinion about current material development techniques and the end of lifetime perspectives. Philos Trans R Soc A 379:20200339. https://doi.org/10.1098/rsta.2020.0339

    Article  CAS  Google Scholar 

  8. Bianconi F, Filippucci M (2019) WOOD, CAD AND AI: digital modelling as place of convergence of natural and artificial intelligent to design timber architecture. In: Bianconi F, Filippucci M (eds) Digital wood design: innovative techniques of representation in architectural design. Springer Cham: Cham. https://doi.org/10.1007/978-3-030-03676-8_1

  9. Hwang SW, Sugiyama J (2021) Computer vision-based wood identification and its expansion and contribution potentials in wood science: a review. Plant Method 17:47. https://doi.org/10.1186/s13007-021-00746-1

    Article  Google Scholar 

  10. Hwang SW, Hwang UT, Jo K, Lee T, Park J, Kim JC, Kwak HW, Choi IG, Yeo H (2021) NIR-chemometric approaches for evaluating carbonization characteristics of hydrothermally carbonized lignin. Sci Rep 11:16979. https://doi.org/10.1038/s41598-021-96461-x

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Sharma A, Garg S, Sharma V (2024) ATR-FTIR spectroscopy and Machine learning for sustainable wood sourcing and species Identification: applications to wood forensics. Microchem J 200:110467. https://doi.org/10.1016/j.microc.2024.110467

    Article  CAS  Google Scholar 

  12. Feng Y, Mekhilef S, Hui D, Chow CL, Lau D (2024) Machine learning-assisted wood materials: applications and future prospects. Extrem Mech Lett 71:102209. https://doi.org/10.1016/j.eml.2024.1022093

    Article  Google Scholar 

  13. Antons D, Grünwald E, Cichy P, Salge TO (2020) The application of text mining methods in innovation research: current state, evolution patterns, and development priorities. R&D Manag 50:329–351. https://doi.org/10.1111/radm.12408

    Article  Google Scholar 

  14. Kononova O, He T, Huo H, Trewartha A, Olivetti EA, Ceder G (2021) Opportunities and challenges of text mining in materials research. iScience 24:102155. https://doi.org/10.1016/j.isci.2021.102155

    Article  PubMed  PubMed Central  Google Scholar 

  15. Gupta T, Zaki M, Krishnan NA, Mausam (2022) MatSciBERT: a materials domain language model for text mining and information extraction. NPJ Comput Mater 8:102. https://doi.org/10.1038/s41524-022-00784-w

    Article  Google Scholar 

  16. Newman MEJ (2010) Networks: an introduction. Oxford University Press, Oxford. https://doi.org/10.1093/acprof:oso/9780199206650.001.0001

    Book  Google Scholar 

  17. Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022

    Google Scholar 

  18. Řehůřek R, Sojka P (2010) Software framework for topic modelling with large corpora. In: Proceedings of LREC 2010 workshop new challenges for NLP frameworks, Valletta

  19. Chang J, Gerrish S, Wang C, Boyd-Graber J, Blei D (2009) Reading tea leaves: how humans interpret topic models. In: Proceedings of the 22nd international conference on neural information processing systems, Vancouver

  20. Lau JH, Newman D, Baldwin T (2014) Machine reading tea leaves: automatically evaluating topic coherence and topic model quality. In: Proceedings of the 14th conference of the European chapter of the association for computational linguistics, Gothenburg

  21. Sievert C, Shirley K (2014) LDAvis: a method for visualizing and interpreting topics. In: Proceedings of the workshop on inter-active language learning, visualization, and interfaces, Baltimore

  22. Blei DM, Lafferty JD (2006) Dynamic topic models. In: Proceedings of the 23rd international conference on machine learning, Pittsburgh

  23. Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, Cambridge

    Book  Google Scholar 

  24. Piwowar H, Priem J, Larivière V, Alperin JP, Matthias L, Norlander B, Farley A, West J, Haustein S (2018) The state of OA: a large-scale analysis of the prevalence and impact of Open Access articles. PeerJ 6:e4375. https://doi.org/10.7717/peerj.4375

    Article  PubMed  PubMed Central  Google Scholar 

  25. Cai X, Fry CV, Wagner CS (2021) International collaboration during the COVID-19 crisis: autumn 2020 developments. Scientometr 126:3683–3692. https://doi.org/10.1007/s11192-021-03873-7

    Article  CAS  Google Scholar 

  26. Harper L, Kalfa N, Beckers GM, Kaefer M, Nieuwhof-Leppink AJ, Fossum M, Herbst KM, Bagli D, The ESPU Research Committee (2020) The impact of COVID-19 on research. J Pediatr Urol 16:715–716.https://doi.org/10.1016/j.jpurol.2020.07.002

  27. Borgatti SP (2005) Centrality and network flow. Soc Netw 27:55–71. https://doi.org/10.1016/j.socnet.2004.11.008

    Article  Google Scholar 

  28. Teacă CA, Tanasă F (2020) Wood surface modification—classic and modern approaches in wood chemical treatment by esterification reactions. Coat 10:629. https://doi.org/10.3390/coatings10070629

    Article  CAS  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the National Research Foundation of Korea for their financial support.

Funding

This study was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF), funded by the Ministry of Education (NRF-RS-2023–00246356).

Author information

Authors and Affiliations

Authors

Contributions

SWH was the major contributor to this study and wrote the manuscript. WHL contributed to methodology, validation, and supervision. SWH and WHL conceived the original ideas. All authors have read and approved the final manuscript.

Corresponding author

Correspondence to Won-Hee Lee.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hwang, SW., Lee, WH. Evolving research themes in six selected wood science journals: insights from text mining and latent dirichlet allocation. J Wood Sci 70, 56 (2024). https://doi.org/10.1186/s10086-024-02171-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s10086-024-02171-z

Keywords