Skip to main content

Official Journal of the Japan Wood Research Society

Wood defect detection based on the CWB-YOLOv8 algorithm


As an important renewable resource, wood is widely used in various industries. When addressing wood defects that limit the amount of wood used during processing, manual inspection and other technologies are not suitable for automated production scenarios. In this paper, we first establish our own dataset, which includes information about multiple tree species and multiple defects types, to enhance the overall applicability of the proposed model. Second, target detection technology involving deep learning is used for defect detection. The conditional parametric convolution (CondConv), Wise-IoU, and BiFormer modules are used to improve upon the latest YOLOv8 algorithm. Based on the experimental findings, the suggested approach exhibits notable improvements in terms of both the mAP@0.5 index and the mAP@0.5:0.95 index, surpassing the performance of the YOLOv8 algorithm by 3.5% and 5.8%, respectively. It also has advantages over other target detection algorithms. The proposed method can effectively improve wood utilization and automated wood processing technology.


As an important renewable resource, forestry has important ecological value [1] and economic value [2, 3]. However, due to its environment and other factors, wood develops many internal defects during the growth process. Therefore, the current comprehensive wood utilization rate is 50–70%, which results in a waste of resources [4]. Wood defect identification can significantly enhance both the caliber of timber goods and the board utilization rate, thus yielding economic and ecological benefits. Therefore, it is highly important to conduct defect detection and improve the related technology when processing wood.

The traditional defect detection method is manual detection, which is arbitrary and inefficient, so it is not suitable for automated production tasks [5]. Other defect detection technologies, such as ultrasonic methods that utilize the reflection and attenuation characteristics of ultrasonic waves in wood for defect detection [6,7,8,9], have subsequently emerged. By analyzing the observed propagation characteristics, we can determine whether internal defects are present. Although this approach has a certain effect, it usually requires a medium. X-ray-based defect detection methods [10] determine whether internal defects are present by analyzing the different attenuation modes of rays in wood. The inevitable problem with this technique is that rays can cause harm to the human body. Defect detection methods based on acoustic emission signals [11, 12] that detect the presence of defects by receiving transient elastic waves generated by stress in wood have also been proposed. However, this technology is limited to identifying elastic waves generated by wood deformation and cannot identify nonstructural defects. Other detection methods are also available [13,14,15]. Due to the above-mentioned shortcomings, these methods are difficult to widely apply.

Therefore, advances in deep learning technology have provided new solutions for wood defect detection. The development of target detection technology, especially single-stage [16,17,18,19] and dual-stage algorithms [20,21,22], has not only improved the speed and accuracy of detection but also made automated detection possible. Compared with traditional methods, deep learning-based algorithms are able to identify multiple types of wood defects in complex backgrounds without direct contact with the material or potential operator risks. Numerous researchers have successfully applied target detection technology to the wood defect detection field. For example, in research based on single-stage target detection algorithms, Ding F et al. [23] improved the Single Shot Multibox Detector (SSD) algorithm by introducing DenseNet to detect node defects, but only detected three types of defects: living joints, dead joints, and cracks, and the detection accuracy of living joints was only 90%. Kurdthongmee W et al. [24] used the YOLO algorithm to locate the pith of wood stem cross-sections. Although the detection accuracy of this approach reached 76.3%, it only targeted the pith [23]. Wang R et al. [25] improved the YOLOv7 algorithm by introducing dynamic convolution and a full-dimensional dynamic coordinate attention mechanism to detect wood defects in a public dataset. These improvements made the algorithm unsuitable for deployment on edge devices [26].Cui Y et al. [27] improved the YOLOv3 algorithm by introducing spatial pyramid pooling (SPP) to detect wood defects. However, the detection accuracy of wood crack defects decreases due to insufficient sample size and similar crack and background colors. Wang B et al. [28] replaced the residual block of YOLOv3 with a ghost block structure and improved the loss function to conduct online wood defect monitoring. However, the disadvantage of this method is that it requires many parameters, making the model complex [29]. In addition, studies have been conducted on the use of two-stage target detection algorithms for wood defect detection. Although two-stage algorithms can achieve higher accuracy, such algorithm models are highly complex, and their processing times are long [30]. According to the scene requirements of industry manufacturing, the real-time performance and model complexity of single-stage target detection algorithms are better than those of two-stage target detection algorithms. Therefore, we mainly consider single-stage target detection algorithms, and the latest YOLOv8 algorithm has superior performance to that of other algorithms. For example, the YOLOv8 algorithm implements anchor-free detection and optimizes the loss function. In scenarios with complex wood defect characteristics, the YOLOv8 algorithm is undoubtedly a better choice than other methods. On this basis, to better address the problem of wood defects with colors similar to those of the background and complex defect characteristics, we enhanced and optimized the YOLOv8 algorithm by adjusting its head, neck, and backbone components. In addition, we constructed our own dataset to address the low sample size problem, which is a shortcoming of most studies.

The main work conducted in this research is summarized as follows. First, we formed a self-built wood defect dataset, including defects exhibited by many types of tree species, such as radiata pine, eucalyptus, and toon trees. Second, we introduced conditional parametric convolution(CondConv) [31] to replace the ConvModule in the YOLOv8 algorithm module to address the complex characteristics of wood defects. Then, we integrated the BiFormer [32] module into the backbone part of the YOLOv8 algorithm to enhance the ability of the model to understand defects at multiple scales. Finally, we used the Wise-IoU [33] in the regression branch loss to enrich the ability of the model to handle unqualified samples. We call the improved algorithm CWB-YOLOv8 (YOLOv8 + CondConv + Wise-IoU + BiFromer).

Materials and methods

Experiment-related information

The CPU used in the experiment was an AMD Ryzen 9 7950X, and the motherboard was an ASUS ProArt X670E-CREATOR. The graphics card used was a Colorful iGame RTX 4090 Advanced OC Silver Shark (24 GB). The system environment included AlmaLinux 8.6, and the Python version was 3.8.5. In addition, the employed image acquisition equipment was a Canon 5D Mark IV SLR camera, the lens model was an EF 24–105 mm f/4L IS II USM, and the camera parameters used automatic settings.

To improve the adaptability of the model, we went to multiple wood processing plants and collected a total of 3608 defect images of multiple tree species, such as 1103 radiata pine images, 758 eucalyptus images, and 926 toon images. All wooden boards were cut from logs without performing any processing. In addition, we collected 354 images of thin boards, which are the materials used to make plywood, obtained by peeling logs. On the basis of the above data, we selected 746 images from the public dataset of the Technical University of Ostrava[34]. Ultimately, our initial dataset included 4354 images. Since defects such as cracks and resin are not as common as knot defects, we applied rotation and cropping techniques to increase the number of crack, resin, and bone marrow defect samples. The enhanced dataset included a total of 6134 images. The information concerning the various types of defects contained in the final dataset is shown in Table 1.

Table 1 Defect distribution

Live_knot and dead_knot defects were the most common defects in the dataset, accounting for 30.9% and 26.4% of the total, respectively. After the other four types of defects were processed, the quantities did not greatly differ, and the proportion of defects was approximately 10%. We divided the dataset into a training set, a test set and a verification set at a ratio of 8:1:1 for defect detection purposes. Figure 1 shows part of the dataset, in which (a) is a dead_knot defect, (b) is a knot_with_crack defect, (c) is a live_knot defect, (d) is a crack and live_knot defect, and (e) is a crack defect.

Fig. 1
figure 1

Partial defect dataset


Figure 2 shows the structural diagram of YOLOv8. The backbone network uses multiple convolution modules (ConvModules) to extract image features, and C2F is used to fuse feature maps with different sizes. The neck is further processed through upsampling and connection operations. When performing feature fusion and processing, the head forms part of the algorithmic output, including classification and bounding box regression results. For each detected object, it outputs a category label and a bounding box by calculating classification loss (Cls loss) and bounding box regression loss (Bbox loss), respectively.

Fig. 2
figure 2

YOLOv8 structure

Conditionally parameterized convolutions

The structure of the ConvModule is shown in Fig. 3a. Conv2d is used to capture local features, BatchNorm2d is used to accelerate the model training process and improve the stability of the model, and SiLU is utilized to increase the expressive capacity of the model.

Fig. 3
figure 3

a ConvModule structure and b examples of wood defects

In the wood defect detection task, the fixed-weight ConvModule faces great challenges. As shown in Fig. 3b, ordinary convolution cannot effectively handle the complex and changeable appearances of wood defects; moreover, the background texture is similar to those of certain defects, and wood defect problems include too many defect types. Therefore, we replace the ConvModule module with CondConv [31]. Compared with conventional convolution techniques, CondConv can adaptively alter the weights of the convolution kernel based on the attributes of the input data. As a result, this approach enhances the ability of the model to extract features pertaining to different types of wood imperfections, leading to superior results.

Figure 4a illustrates the architecture of CondConv. The input of the current module is derived from the output of the preceding layer, which is also referred to as PREV LAYER OUTPUT, and \({W}_{1}\), \({W}_{2}\), and \({W}_{3}\) are three convolution kernels. The ROUTR FN is a routing function that is used to calculate the weights of different convolution kernels based on the input. The routing function is shown in formula (1)

$$r\left(x\right)={\text{Sigmoid}}\left(\mathrm{Global Average Pool}\left(x\right)R\right),$$

where \(x\) is the input feature, \({\text{GlobalAveragePool}}(x)\) is the global average pooling operation, and the input features are pooled. \(R\) is a learnable parameter matrix that converts the pooled vector into routing weights, and the sigmoid function is the activation function. The routing function generates customized weight coefficients for the input features and then performs the COMBINE operation; that is, the convolution kernels are dynamically combined based on the weights calculated by the routing function. The principle of this step is shown in formula (2)

$${\text{Output}}\left(x\right)=\sigma \left(\left({\alpha }_{1}{W}_{1}+\dots +{\alpha }_{n}{W}_{n}\right)\cdot {\varvec{x}}\right),$$

where \({\alpha }_{1}\),…, \({\alpha }_{n}\) are the weight coefficients of each convolution kernel; \({W}_{1}\),…, \({W}_{n}\) are different convolution kernels; and \(\sigma \) is an activation function, where a ReLU or sigmoid function is generally used. Since different weights are calculated for different inputs, different combinations are produced for different types of inputs. The resultant feature map is generated by applying a convolution operation to the input data using the combined convolution kernel. Finally, normalization and activation function components are applied. In addition, CondConv uses the ReLU activation function, as shown in formula (3)

Fig. 4
figure 4

a CondConv structure and b optimized CondConv structure

$$f\left(x\right)=\left\{\begin{array}{c}0, x<0\\ 1, x\ge 0\end{array}.\right.$$

ConvModule uses the SiLU activation function, as shown in formula (4)

$$f\left(x\right)=x\cdot {\text{Sigmoid}}\left(x\right).$$

Although the ReLU function addresses the gradient disappearance issue, it produces a gradient of zero when the input is negative. Conversely, the SiLU function yields a nonzero gradient even when the input is negative, thereby enhancing the performance achieved during the backpropagation phase. Therefore, we choose to use the SiLU activation function. The changed module is shown in Fig. 4(b). We replace the ConvModule module in the modified version of YOLOv8 with CondConv.

Efficient attention mechanism

Integrating an attention mechanism into the model has the potential to enhance its generalizability and enable its adaptation to diverse tasks. The multihead attention mechanism is better than the self-attention mechanism at capturing medium- and long-distance information and has better feature expression capabilities. However, the multihead attention mechanism has several shortcomings, such as requiring large amounts of computing resources and memory. To solve these problems, BiFormer [32] introduced a dynamic sparse attention mechanism to focus only on the most relevant key-value pairs, reducing the number of required calculations. Its structure is shown in Fig. 5.

Fig. 5
figure 5

BiFormer structure

BiFormer attention has a four-layer pyramid structure, including one patch embedding layer and three patch merging layers. The input feature map is sliced using the patch embedding module, transforming it into a vector, and feature extraction is performed through linear transformation. The patch merging module is used to merge adjacent patches. During the merging process, the resolution of the input image is reduced, and the feature dimensionality is increased to reduce the number of required calculations and improve the feature expressions. The above operations depend on the BiFormer block, whose structure is shown in Fig. 6a.

Fig. 6
figure 6

a BiFormer block and b bi-level routing attention

The block includes depthwise separable convolution (DWConv), bi-level routing attention (BRA), multilayer perceptron (MLP), and layer normalization (LN) modules. The structure of the core BRA module is shown in Fig. 6b. First, the input feature map is divided into \({{\text{S}}}^{2}\) nonoverlapping areas. The number of vectors in each area is \(\frac{HW}{{S}^{2}}\), where \(H\) and \(W\) represent the width and height of the feature map, respectively. After applying linear projection, the query (Q), key (K), and value (V) are obtained as follows:

$$Q={X}^{r}{W}^{Q}, K={X}^{r}{W}^{K},V={X}^{r}{W}^{V};$$

\(X\) is the input 2D feature map, and the projection weights of Q, K, and V are denoted as \({W}^{Q}\), \({W}^{K}\), and \({W}^{V}\), respectively. Second, the regional-level similarity between the query and key is calculated, and a directed graph is constructed to establish attention relationships between regions. This graph is formed as follows:


where \({A}^{r}\) is the adjacency matrix of the affinity graph with \({A}^{r}\in {R}^{{S}^{2}*{S}^{2}}\); \({Q}^{r}\), and \({K}^{r}\) represent regional-level queries and keys, respectively, with and \({Q}^{r}\) and \({K}^{r}\in {R}^{{S}^{2}*C}\). Then, area screening is performed. Through the routing index matrix, the top-k connections are retained in each area; that is, the most relevant area is selected, as shown in formula (7)


where \({I}^{r}\) represents the \(k\) most relevant indices in each region. According to the routing index matrix, for each query token, the attention weights between it and the key-value pairs retained in the region are calculated as shown in formula (8)

$$O = {\text{Attention}}\left(Q,{K}^{g},{V}^{g}\right)+\mathrm{ LCE}\left(V\right);$$

\(O\) represents the output of the attention mechanism, \({K}^{g},{V}^{g}\) represents the aggregated key-value tensor, and \({\text{LCE}}(V)\) represents the local context enhancement item. After performing processing via the BRA module, the attention focus of the model is improved, the stabilization of the output relies on the utilization of the LN module, and the expression ability of the model is enhanced through the application of the MLP module.

In wood defect detection scenarios, incorporating the BiFormer attention mechanism into the model can enhance its understanding of complex features. Furthermore, this strategy possesses the potential to enhance the capacity of the model for analyzing wood aberrations across various levels, regardless of whether these abnormalities manifest as minute fissures or extensive impairments.

Improved loss function

The head part of YOLOv8 uses the anchor-free approach to separate the classification and regression branches, as shown in Fig. 7.

Fig. 7
figure 7

Head structure

The loss function for the classification branch is the binary cross-entropy loss (BCEL), while the regression branch utilizes distribution focal loss (DFL) and Complete-IoU (CIoU) loss functions. By incorporating the Distance-IoU (DIoU) and considering the aspect ratio between the predicted bounding box and the ground-truth bounding box, the CIoU loss function enhances the precision of bounding box regression. The CIoU is calculated according to the following formula:

$${\mathcal{L}}_{CIoU}=1-IoU+\frac{{d}^{2}}{{c}^{2}}+\alpha v,$$

where \(IoU\) represents the intersection-over-union ratio between the anticipated bounding box and the actual bounding box. The Euclidean distance between the center points of the two bounding boxes is calculated and denoted as d. Additionally, c represents the diagonal of the minimum closed box that simultaneously encompasses both the predicted bounding box and the true bounding box. As a weight coefficient, \(\alpha \) is not included in the gradient calculation. \(\alpha \) is defined as shown in formula (10)

$$\alpha =\frac{v}{\left(1-IoU\right)+v}.$$

Formula (11) introduces a parameter v, which quantifies the consistency of the aspect ratios by means of measurement

$$v={\frac{4}{{\pi }^{2}}({\text{arctan}}\frac{{w}_{gt}}{{h}_{gt}}-{\text{arctan}}\frac{w}{h})}^{2}.$$

The variables \({w}_{{\text{gt}}}\) and \({h}_{{\text{gt}}}\) represent the dimensions of the actual bounding box, specifically its width and height, respectively, and \(w\) and \(h\) represent the dimensions of the estimated bounding box, again representing the width and height, respectively.

The CIoU algorithm increases the penalty imposed on low-quality samples and reduces the generalizability of the model. With the variety of encountered production environments and defects, wood defect images can easily produce low-quality samples. Therefore, we use the Wise-IoU(WIoU) v3 [33] to replace the CIoU in the bounding box regression branch to solve this problem. The Wise-IoU algorithm can dynamically adjust the gradient distribution during the training process. Wise-IoU v1 integrates a distance-based attention mechanism to the IoU loss, as shown in the following formula:


The IoU variable is utilized to assess the intersection ratio between the estimated box and the bounding box of the actual ground truth. x and y symbolize the coordinate values of the center point in the predicted box, whereas \({x}_{{\text{gt}}}\) and \({y}_{{\text{gt}}}\) represent the center point coordinates of the actual box. \({W}_{g}\) and \({H}_{g}\) are the sizes of the minimum bounding boxes. The IoU loss of this approach can be adjusted based on the distance from anchor box to the center point of the target box. Wise-IoU v3 is defined similarly to Wise-IoU v1, considering the distance between the anchor box and the center point of the target box

$$\beta =\frac{{L}_{IoU}^{*}}{\overline{{L }_{IoU}}}\in \left[0,+\infty \right),$$
$$r=\frac{\beta }{\delta {\alpha }^{\beta -\delta }},$$

where \(\beta \) represents the outlier degree. A higher outlier degree indicates poor anchor box quality, and a smaller gradient gain is allocated to reduce the attention given to low-quality anchor boxes. A lower outlier degree indicates higher quality anchor boxes, and the corresponding distribution is smaller. The gradient gain thus increases the applicability of the normal-quality anchor box. By dynamically adjusting the gradient gain in this manner, the generalizability of the model improves.

Figure 8 shows the structure of our proposed CWB-YOLOv8 algorithm, and our key improvements made to the algorithm are highlighted in red. First, in the backbone part, we use the BiFormer attention mechanism to improve the ability of the model to understand the complex features of wood through double-layer routing attention. Second, in the head part, we use Wise-IoUv3 to replace the original CIoU loss function for performing wood defect detection. By addressing the problems caused by low- and medium-quality samples and dynamically adjusting the gradient distribution during the training process, the generalization ability of the model is improved. Finally, we use the CondConv module to replace the original convolution module. The advantage of this approach is that the improved model dynamically adjusts the weight of the convolution kernel according to the attributes of the input data, enhancing the ability of the algorithm to extract different types of wood defect features.

Fig. 8
figure 8

CWB-YOLOv8 structure

Results and analysis

Evaluation indices

To conduct an impartial assessment of the ability of the proposed model to detect wood defects, various evaluation metrics were employed. These metrics encompassed the mAP@IoU, precision, recall, and inference time. The mAP@IoU represents the precision performance achieved under different recall rates and different degrees of bounding box overlap. To evaluate the accuracy of the tested models, we used two metrics, mAP@0.5 and mAP@0.5:0.95, where mAP@0.5 represents the average model accuracy (mAP) evaluated at an IoU threshold of 0.5 and mAP@0.5:0.95 represents the IoU. The average accuracy of each model was evaluated at a series of thresholds that were gradually increased by some step size (usually 0.05), ranging from 0.5 to 0.95. Equation 16 demonstrates that precision signifies the ratio of positive samples correctly identified by the model out of all samples identified as positive. Equation 17, on the other hand, illustrates the ratio of correctly recognized positive samples to all real positive samples. The inference time required by the model to process a single image was measured in milliseconds


the number of true-positive samples, denoted as TP in formula (16) and formula (17), indicates the parameter for assessing the accuracy achieved in the experiment. Conversely, the number of false positives, represented by FP, measures the parameter for identifying incorrect outcomes, whereas the number of false negatives (FN) is the parameter for determining the number of missed correct outcomes.

Wood defect detection based on CWB-YOLOv8

In our study, we integrated the CondConv, Wise-IoU, and BiFormer components to enhance the performance of the YOLOv8 model in terms of detecting wood defects. The configuration of the algorithmic parameters can be found in Table 2.

Table 2 Algorithmic parameter settings

Figure 9 shows the resulting graph generated by the algorithm. The horizontal axis corresponds to the number of epoch, and the vertical axis is the value of each indicator. For example, the vertical axis of the train/box_loss image gives the value of box_loss corresponding to 1–200 epochs. In the figure, box_loss is the improved loss function. As the number of epochs increased, the values of box_loss, cls_loss, and dfl_loss in the training and verification stages tended to converge. The ability of the model to locate and classify defects gradually improved, and no overfitting occurred during the training process. The improved mAP@0.5 and mAP@0.5:0.95 values signify a consistent enhancement in the performance of the model, demonstrating its resilience to different intersection-over-union thresholds.

Fig. 9
figure 9

Training curve produced by CWB-YOLOv8

The confusion matrix in Fig. 10 illustrates the label information and prediction results. Figure 10a presents a histogram located in the upper-left corner, depicting the frequencies of different defects within the instances; the upper right corner is a distribution heatmap showing the distribution of the bounding boxes. The scatter plot in the lower left corner shows the concentration of the center points in the x and y dimensions, and the scatter plot in the lower right corner shows that the number of small targets was relatively high in the utilized dataset. In Fig. 10b, the abscissa is the correct category, the ordinate is the detection category, and the value in the box represents the proportion. The diagonal line represents the proportion of correct predictions for each category. Among them, the prediction and classification results for cracks and resin defects were the best, with proportions of correct predictions reaching 0.96 and 0.93, respectively. In addition to the effectiveness of the algorithm, since the test set contained some data-enhanced images, the model accuracy may have been improved to a certain extent. Parts other than the diagonal lines indicate false detections, and blank parts indicate the absence of false detections.

Fig. 10
figure 10

a Labels and b confusion matrix

Ablation experiment

To assess the efficacy of each module, the defect detection outcomes of the different modules (290 in total) were compared via ablation experiments. The comparison methods included YOLOv8, YOLOv8 + CondConv, YOLOv8 + CondConv + Wise-IoU, and YOLOv8 + CondConv + Wise-IoU + BiFormer (CWB-YOLOv8). Table 3 displays the outcomes of the ablation study. The evaluation metrics encompassed the mAP@0.5, mAP@0.5:0.95, precision, and recall.

Table 3 Results of an ablation experiment

As displayed in Table 3, when diverse enhancement modules were incorporated into the YOLOv8 algorithm, the indicators exhibited varying degrees of improvement. Among them, the mAP@0.5 and other indicators of the CWB-YOLOv8 method were the highest. This is because CondConv improves the ability of the model to learn different input abilities, the BiFormer attention mechanism can extract and integrate local and global contextual information, and the Wise-IoU allows the model to locate and identify defects more accurately.

Figure 11 shows the discrepancies among the mAP@0.5, mAP@0.5:0.95, precision, and recall values produced by the four methods throughout their respective training processes. The red curve represents CWB-YOLOv8 method exhibited excellent performance, indicating that YOLOv8 achieved improved detection accuracy and robustness in the wood defect detection tasks after the CondConv, Wise-IoU, and BiFormer modules were incorporated.

Fig. 11
figure 11

Curve showing the changes exhibited by the ablation experiment indices

Figure 12 shows some defect detection results. The value in the figure represents the confidence level. The higher the value is, the higher the accuracy of the detection algorithm. Board (1) shows an image we collected, with a knot_with_crack defect; Board (2) presents an image from the public data set, including knot_with_crack, live_knot, and resin defects. (a), (b), (c), and (d) correspond to the four methods used in the ablation experiment. As shown in Fig. 12, method (a) displays the lowest detection accuracy. For example, the confidence of classification for the knot_with_crack defect in board (1) is only 0.82 and 0.81 for the base methods, values lower than those obtained with the other three methods. The other methods improved the defect detection capabilities of the model. Among them, method (d), the proposed CWB-YOLOv8 algorithm, displays the highest detection accuracy. For example, the confidence of classification for the knot_with_crack defect in board (1) reached 0.90 and 0.95.

Fig. 12
figure 12

Comparison among the produced detection effects: a YOLOv8, b YOLOv8 + CondConv, c YOLOv8 + CondConv + Wise-IoU, and d CWB-YOLOv8

Figure 13 shows the heatmaps generated by the four algorithms via Grad-CAM technology when detecting live_knot defects in the original image. The spatial focus of each algorithm on wood defects is visually displayed. Warmer colors represent greater levels of attention. As shown in the figure, the YOLOv8 algorithm produced the smallest area of concern for defects, did not cover the entire defect and had the lowest confidence score. With the addition of the improvement modules, the area of interest gradually increased, and the CWB-YOLOv8 method yielded the largest area of interest, almost completely covering it. This approach detected the most defects and obtained the highest confidence score, proving that the CWB-YOLOv8 method is more effective than other methods for wood defect detection tasks.

Fig. 13
figure 13

Original image and heatmaps: a YOLOv8, b YOLOv8 + CondConv, c YOLOv8 + CondConv + Wise-IoU, and d CWB-YOLOv8

Comparative experiment

We compared different algorithms to verify the effectiveness of the proposed method for use in wood defect detection scenarios. The algorithms included Faster RCNN, the SSD, YOLOv3, YOLOv5, YOLOv7, YOLOv8, and CWB-YOLOv8. The employed assessment criteria included mAP@0.5, mAP@0.5:0.95, and FPS. Among them, the higher the two indicators mAP@0.5 and mAP@0.5:0.95 are, the better, and the lower the FPS indicator is, the better. The findings from the experiment are presented in Table 4.

Table 4 Results of an algorithmic comparison experiment

According to Table 4, for the mAP@0.5 indicator, CWB-YOLOv8 produced the best result with a score of 0.892; for the mAP@0.5:0.95 indicator, CWB-YOLOv8 also ranked first with a score of 0.588. These two indicators indicate that the defect detection accuracy of CWB-YOLOv8 was better than that of the other detection algorithms. In terms of inference time, Faster RCNN took the longest amount of time to detect a single image, requiring 47.52 ms. YOLOv3 took the shortest amount of time at only 13.1 ms, and YOLOv8 required 13.9 ms. Since CWB-YOLOv8 added modules such as attention mechanisms on the basis of YOLOv8, the process required a slightly longer period of 17.3 ms. However, considering the detection accuracy and processing time, CWB-YOLOv8 still yielded better results than did the other methods.

Figure 14 shows the visual results produced by the seven algorithms for wood defect detection. The diagram includes the dead_knot and knot_with_crack defects. As shown in Fig. 14, Faster RCNN can detect two defects, but the confidence obtained for the dead_knot defect is only 57%. The SSD algorithm can only detect the knot_with_crack defect, with a confidence level of 0.76, which is 1 percentage point higher than that for the Faster RCNN. All YOLO series algorithms can detect the defects, but the YOLOv5 method yields false detections. For example, resin is mistakenly detected as a marrow defect. From this point of view, YOLOv5 is less effective than other models. CWB-YOLOv8 displays the best overall detection effect, with the confidence level for the dead_knot defect reaching 0.86, and there are no false detections.

Fig. 14
figure 14

Visualization results produced by seven wood defect detection algorithms: a Faster RCNN, b SSD, c YOLOv3, d YOLOv5, e YOLOv7, f YOLOv8, and g CWB-YOLOv8


In this paper, a self-built defect dataset was constructed for wood defect detection to achieve better generalization performance via training on defects in multiple tree species. The YOLOv8 algorithm was also improved in this paper. First, we replaced the convolution module in the original algorithm with a CondConv module. The convolution kernel weight was dynamically adjusted to address the pressure imposed by different defects on the feature extraction process. Afterward, the Wise-IoU function replaced the CIoU function to enhance the generalization performance of the model. Ultimately, we incorporated the BiFormer attention mechanism into the backbone. We enhanced the contextual understanding capabilities of the model and improved its ability to handle multiscale defects.

The experimental results showed that incorporating the CondConv, Wise-IoU, and BiFormer modules improved the wood defect detection effect of the proposed method. Compared with the YOLOv8 algorithm, the enhanced approach exhibited increases of 3.5% and 5.8% in terms of the mAP@0.5 and mAP@0.5:0.95 indices, respectively. Moreover, when contrasted with the prevailing target detection methodologies, Faster RCNN, the SSD, YOLOv3, YOLOv5, and YOLOv7, CWB-YOLOv8 exhibited mAP@0.5 value improvements of 20.1%, 20.5%, 6.2%, 12%, and 5.2%, respectively. Regarding the mAP@0.5 metric, the 0.95 indicator increased by 26.7%, 25.3%, 15.2%, 20.2%, and 11.7%, respectively. However, since we replaced the convolution module and added an attention mechanism, the single-image processing time of the CWB-YOLOv8 method was 3.4 ms greater than that of the original YOLOv8 algorithm.

In follow-up work, our priority will be to enhance the model to minimize its computational requirements while ensuring its accuracy when adapting to embedded devices to perform wood defect detection on a production line. Additionally, we will expand our defect data collection to include additional tree species to improve the model's generalizability. Given that the inclusion of enhanced images in the test set may introduce bias affecting model performance, it is essential to utilize original, unenhanced images in future testing phases to ensure a more accurate evaluation of model efficacy.

Availability of data and materials

Not applicable.



Conditionally parameterized convolutions


Single Shot Multibox Detector


Spatial pyramid pooling




Bounding box


Depthwise separable convolution


Bilevel routing attention


Multilayer perceptron


Layer normalization


Binary cross-entropy loss


Distribution focal loss






Wise Intersection over Union


  1. Soimakallio S, Saikku L, Valsta L, Pingoud K (2016) Climate change mitigation challenge for wood utilization-the case of Finland. Environ Sci Technol 50(10):5127–5134.

    Article  CAS  PubMed  Google Scholar 

  2. Li X, Liu L, Sun S, Li Y, Jia L, Ye S, Yu Y, Dossa K, Luan Y (2022) Leaf-transcriptome profiles of phoebe bournei provide insights into temporal drought stress responses. Front Plant Sci 13:4170

    Google Scholar 

  3. Wang JP, Matthews ML, Williams CM, Shi R, Yang C, Tunlaya-Anukit S, Chen H-C, Li Q, Liu J, Lin C-Y (2018) Improving wood properties for wood utilization through multi-omics integration in lignin biosynthesis. Nat Commun 9(1):1579

    Article  PubMed  PubMed Central  Google Scholar 

  4. Chen Y, Sun C, Ren Z, Na B (2023) Review of the current state of application of wood defect recognition technology. BioRes.

    Article  Google Scholar 

  5. Ling J, Xie Y (2022) Research on wood defects classification based on deep learning. Wood Res-Slovakia 67(1):147–156

    Article  Google Scholar 

  6. Espinosa L, Brancheriau L, Cortes Y, Prieto F, Lasaygues P (2020) Ultrasound computed tomography on standing trees: accounting for wood anisotropy permits a more accurate detection of defects. Ann Forest Sci 77(3):1–13

    Article  Google Scholar 

  7. Fang Y, Lin L, Feng H, Lu Z, Emms GW (2017) Review of the use of air-coupled ultrasonic technologies for nondestructive testing of wood and wood products. Comput Electron Agr 137:79–87

    Article  Google Scholar 

  8. Lin C-J, Huang Y-H, Huang G-S, Wu M-L (2015) Detection of decay damage in iron-wood living trees by nondestructive techniques. J Wood Sci 62(1):42–51.

    Article  Google Scholar 

  9. Hu C, Afzal MT (2006) A wavelet analysis-based approach for damage localization in wood beams. J Wood Sci 52(5):456–460.

    Article  Google Scholar 

  10. Longuetaud F, Mothe F, Kerautret B, Krähenbühl A, Hory L, Leban JM, Debled-Rennesson I (2012) Automatic knot detection and measurements from X-ray CT images of wood: a review and validation of an improved algorithm on softwood samples. Comput Electron Agr 85:77–89

    Article  Google Scholar 

  11. Xu N, Li M, Fang S, Huang C, Chen C, Zhao Y, Mao F, Deng T, Wang Y (2023) Research on the detection of the hole in wood based on acoustic emission frequency sweeping. Constr Build Mater 400:132761

    Article  Google Scholar 

  12. Tu J, Zhao D, Zhao J, Zhao Q (2021) Experimental study on crack initiation and propagation of wood with LT-type crack using digital image correlation (DIC) technique and acoustic emission (AE). Wood Sci Technol 55:1577–1591

    Article  CAS  Google Scholar 

  13. Hu C, Xiao M, Zhou H, Wen W, Yun H (2011) Damage detection of wood beams using the differences in local modal flexibility. J Wood Sci 57(6):479–483.

    Article  Google Scholar 

  14. Yang X, Ishimaru Y, Iida I, Urakami H (2002) Application of modal analysis by transfer function to nondestructive testing of wood I: determination of localized defects in wood by the shape of the flexural vibration wave. J Wood Sci 48(4):283–288.

    Article  CAS  Google Scholar 

  15. Hu C, Tanaka C, Ohtani T (2004) Locating and identifying sound knots and dead knots on Sugi by the rule-based color vision system. J Wood Sci 50(2):115–122.

    Article  Google Scholar 

  16. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) SSD: Single shot multibox detector. In: Lect Notes Comput Sci, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, Springer, pp 21-37

  17. Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. Proc IEEE Conf Comput Vis Pattern Recognit 2017:7263–7271

    Google Scholar 

  18. Bochkovskiy A, Wang C-Y, Liao H-YM (2020) YOLOv4: Optimal speed and accuracy of object detection. arXiv:230110051, 10934

  19. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proc IEEE Comput Soc Conf Comput Vision Pattern Recognit, December 9, 2016, Volume 2016-December, pp 779–788,

  20. Ren S, He K, Girshick R, Sun J (2016) Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149

    Article  PubMed  Google Scholar 

  21. Girshick R, Donahue J, Darrell T, Malik J (2015) Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans Pattern Anal Mach Intell 38(1):142–158

    Article  Google Scholar 

  22. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. Proc IEEE Comput Soc Conf Comput Vision Pattern Recognit 2014:580–587

    Google Scholar 

  23. Ding F, Zhuang Z, Liu Y, Jiang D, Yan X, Wang Z (2020) Detecting defects on solid wood panels based on an improved SSD algorithm. Sensors-Basel 20(18):5315

    Article  PubMed  PubMed Central  Google Scholar 

  24. Kurdthongmee W, Suwannarat K (2019) Locating wood pith in a wood stem cross sectional image using YOLO object detection. In: Proc Int Conf Technol Appl Artif Intell, TAAI, 2019. IEEE, pp 1–6

  25. Wang R, Liang F, Wang B, Mou X (2023) ODCA-YOLO: An omni-dynamic convolution coordinate attention-based YOLO for wood defect detection. Forests 14(9):1885

    Article  Google Scholar 

  26. Yu X, Yu Q, Mu Q, Hu Z, Xie J (2023) MCAW-YOLO: An efficient detection model for ceramic tile surface defects. Appl Sci Basel 13(21):12057

    Article  CAS  Google Scholar 

  27. Cui Y, Lu S, Liu S (2023) Real-time detection of wood defects based on SPP-improved YOLO algorithm. Multimed Tools Appl.

    Article  Google Scholar 

  28. Wang B, Yang C, Ding Y, Qin G (2021) Detection of wood surface defects based on improved YOLOv3 algorithm. BioResources 16(4):6766–6780

    Article  CAS  Google Scholar 

  29. Xu J, Yang H, Wan Z, Mu H, Qi D, Han S (2023) Wood surface defects detection based on the improved YOLOv5-C3Ghost with SimAm module. IEEE ACCESS 11:105281–105287

    Article  Google Scholar 

  30. Yang G, Wang J, Nie Z, Yang H, Yu S (2023) A lightweight YOLOv8 tomato detection algorithm combining feature enhancement and attention. Agronomy 13(7):1824

    Article  Google Scholar 

  31. Yang B, Bender G, Le QV, Ngiam J (2019) CondConv: Conditionally parameterized convolutions for efficient inference. Adv Neural Inf Proces Syst

  32. Zhu L, Wang X, Ke Z, Zhang W, Lau RW (2023) BiFormer: Vision transformer with bi-level routing attention. Proc IEEE Comput Soc Conf Comput Vision Pattern Recognit 2023:10323–10333

    Google Scholar 

  33. Tong Z, Chen Y, Xu Z, Yu R (2023) Wise-IoU: Bounding box Regression loss with dynamic focusing mechanism. arXiv:230110051, 10051

  34. Kodytek P, Bodzas A, Bilik P (2021) A large-scale image dataset of wood surface defects for automated vision-based quality control processes. F1000Res 10

Download references


Not applicable.


This research was funded by the China-Myanmar Cross-border Logistics and Trade Integration Service Platform Research and Development Project (No.202307AB110009-1).

Author information

Authors and Affiliations



Conceptualization, H.A. and M.Q.; software, Y.H.; writing—review and editing, H.A and M.Q.; supervision, F.X.; project administration, G.Z.; funding acquisition, Z.L. Hao An, and Mingming Qin contributed equally to this work and co-first author.

Corresponding author

Correspondence to Zhihong Liang.

Ethics declarations

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

An, H., Liang, Z., Qin, M. et al. Wood defect detection based on the CWB-YOLOv8 algorithm. J Wood Sci 70, 26 (2024).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: