Wood defect detection based on the CWB-YOLOv8 algorithm

As an important renewable resource, wood is widely used in various industries. When addressing wood defects that limit the amount of wood used during processing, manual inspection and other technologies are not suitable for automated production scenarios. In this paper, we first establish our own dataset, which includes information about multiple tree species and multiple defects types, to enhance the overall applicability of the proposed model. Second, target detection technology involving deep learning is used for defect detection. The conditional parametric convolution (CondConv), Wise-IoU, and BiFormer modules are used to improve upon the latest YOLOv8 algorithm. Based on the experimental findings, the suggested approach exhibits notable improvements in terms of both the mAP@0.5 index and the mAP@0.5:0.95 index, surpassing the performance of the YOLOv8 algorithm by 3.5% and 5.8%, respectively. It also has advantages over other target detection algorithms. The proposed method can effectively improve


Introduction
As an important renewable resource, forestry has important ecological value [1] and economic value [2,3].However, due to its environment and other factors, wood develops many internal defects during the growth process.Therefore, the current comprehensive wood utilization rate is 50-70%, which results in a waste of resources [4].Wood defect identification can significantly enhance both the caliber of timber goods and the board utilization rate, thus yielding economic and ecological benefits.Therefore, it is highly important to conduct defect detection and improve the related technology when processing wood.
The traditional defect detection method is manual detection, which is arbitrary and inefficient, so it is not suitable for automated production tasks [5].Other defect detection technologies, such as ultrasonic methods that utilize the reflection and attenuation characteristics of ultrasonic waves in wood for defect detection [6][7][8][9], have subsequently emerged.By analyzing the observed propagation characteristics, we can determine whether internal defects are present.Although this approach has a certain effect, it usually requires a medium.X-ray-based defect detection methods [10] determine whether internal defects are present by analyzing the different attenuation modes of rays in wood.The inevitable problem with this technique is that rays can cause harm to the human body.Defect detection methods based on acoustic emission signals [11,12] that detect the presence of defects by receiving transient elastic waves generated by stress in wood have also been proposed.However, this technology is limited to identifying elastic waves generated by wood deformation and cannot identify nonstructural defects.Other detection methods are also available [13][14][15].Due to the above-mentioned shortcomings, these methods are difficult to widely apply.
Therefore, advances in deep learning technology have provided new solutions for wood defect detection.The development of target detection technology, especially single-stage [16][17][18][19] and dual-stage algorithms [20][21][22], has not only improved the speed and accuracy of detection but also made automated detection possible.Compared with traditional methods, deep learning-based algorithms are able to identify multiple types of wood defects in complex backgrounds without direct contact with the material or potential operator risks.Numerous researchers have successfully applied target detection technology to the wood defect detection field.For example, in research based on single-stage target detection algorithms, Ding F et al. [23] improved the Single Shot Multibox Detector (SSD) algorithm by introducing DenseNet to detect node defects, but only detected three types of defects: living joints, dead joints, and cracks, and the detection accuracy of living joints was only 90%.Kurdthongmee W et al. [24] used the YOLO algorithm to locate the pith of wood stem cross-sections.Although the detection accuracy of this approach reached 76.3%, it only targeted the pith [23].Wang R et al. [25] improved the YOLOv7 algorithm by introducing dynamic convolution and a full-dimensional dynamic coordinate attention mechanism to detect wood defects in a public dataset.These improvements made the algorithm unsuitable for deployment on edge devices [26].Cui Y et al. [27] improved the YOLOv3 algorithm by introducing spatial pyramid pooling (SPP) to detect wood defects.However, the detection accuracy of wood crack defects decreases due to insufficient sample size and similar crack and background colors.Wang B et al. [28] replaced the residual block of YOLOv3 with a ghost block structure and improved the loss function to conduct online wood defect monitoring.However, the disadvantage of this method is that it requires many parameters, making the model complex [29].In addition, studies have been conducted on the use of two-stage target detection algorithms for wood defect detection.Although two-stage algorithms can achieve higher accuracy, such algorithm models are highly complex, and their processing times are long [30].According to the scene requirements of industry manufacturing, the real-time performance and model complexity of single-stage target detection algorithms are better than those of two-stage target detection algorithms.Therefore, we mainly consider single-stage target detection algorithms, and the latest YOLOv8 algorithm has superior performance to that of other algorithms.For example, the YOLOv8 algorithm implements anchor-free detection and optimizes the loss function.In scenarios with complex wood defect characteristics, the YOLOv8 algorithm is undoubtedly a better choice than other methods.On this basis, to better address the problem of wood defects with colors similar to those of the background and complex defect characteristics, we enhanced and optimized the YOLOv8 algorithm by adjusting its head, neck, and backbone components.In addition, we constructed our own dataset to address the low sample size problem, which is a shortcoming of most studies.
The main work conducted in this research is summarized as follows.First, we formed a self-built wood defect dataset, including defects exhibited by many types of tree species, such as radiata pine, eucalyptus, and toon trees.Second, we introduced conditional parametric convolution(CondConv) [31] to replace the ConvModule in the YOLOv8 algorithm module to address the complex characteristics of wood defects.Then, we integrated the BiFormer [32] module into the backbone part of the YOLOv8 algorithm to enhance the ability of the model to understand defects at multiple scales.Finally, we used the Wise-IoU [33] in the regression branch loss to enrich the ability of the model to handle unqualified samples.We call the improved algorithm CWB-YOLOv8 (YOLOv8 + CondConv + Wise-IoU + BiFromer).

Experiment-related information
The CPU used in the experiment was an AMD Ryzen 9 7950X, and the motherboard was an ASUS ProArt X670E-CREATOR.The graphics card used was a Colorful iGame RTX 4090 Advanced OC Silver Shark (24 GB).The system environment included AlmaLinux 8.6, and the Python version was 3.8.5.In addition, the employed image acquisition equipment was a Canon 5D Mark IV SLR camera, the lens model was an EF 24-105 mm f/4L IS II USM, and the camera parameters used automatic settings.
To improve the adaptability of the model, we went to multiple wood processing plants and collected a total of 3608 defect images of multiple tree species, such as 1103 radiata pine images, 758 eucalyptus images, and 926 toon images.All wooden boards were cut from logs without performing any processing.In addition, we collected 354 images of thin boards, which are the materials used to make plywood, obtained by peeling logs.On the basis of the above data, we selected 746 images from the public dataset of the Technical University of Ostrava [34].Ultimately, our initial dataset included 4354 images.Since defects such as cracks and resin are not as common as knot defects, we applied rotation and cropping techniques to increase the number of crack, resin, and bone marrow defect samples.The enhanced dataset included a total of 6134 images.The information concerning the various types of defects contained in the final dataset is shown in Table 1.
Live_knot and dead_knot defects were the most common defects in the dataset, accounting for 30.9% and 26.4% of the total, respectively.After the other four types of defects were processed, the quantities did not greatly differ, and the proportion of defects was approximately 10%.We divided the dataset into a training set, a test set and a verification set at a ratio of 8:1:1 for defect detection purposes.Figure 1 shows part of the dataset, in which (a) is a dead_knot defect, (b) is a knot_with_crack defect, (c) is a live_knot defect, (d) is a crack and live_ knot defect, and (e) is a crack defect.

CWB-YOLOv8
Figure 2 shows the structural diagram of YOLOv8.The backbone network uses multiple convolution modules (ConvModules) to extract image features, and C2F is used to fuse feature maps with different sizes.The neck is further processed through upsampling and connection operations.When performing feature fusion and processing, the head forms part of the algorithmic output, including classification and bounding box regression results.For each detected object, it outputs a category label and a bounding box by calculating classification loss (Cls loss) and bounding box regression loss (Bbox loss), respectively.

Conditionally parameterized convolutions
The structure of the ConvModule is shown in Fig. 3a.Conv2d is used to capture local features, BatchNorm2d is used to accelerate the model training process and  In the wood defect detection task, the fixed-weight ConvModule faces great challenges.As shown in Fig. 3b, ordinary convolution cannot effectively handle the complex and changeable appearances of wood defects; moreover, the background texture is similar to those of certain defects, and wood defect problems include too many defect types.Therefore, we replace the ConvModule module with CondConv [31].Compared with conventional convolution techniques, CondConv can adaptively alter the weights of the convolution kernel based on the attributes of the input data.As a result, this approach enhances the ability of the model to extract features pertaining to different types of wood imperfections, leading to superior results.
Figure 4a illustrates the architecture of CondConv.The input of the current module is derived from the output of the preceding layer, which is also referred to as PREV LAYER OUTPUT, and W 1 , W 2 , and W 3 are three convolu- tion kernels.The ROUTR FN is a routing function that is used to calculate the weights of different convolution kernels based on the input.The routing function is shown in formula (1) where x is the input feature, GlobalAveragePool(x) is the global average pooling operation, and the input features  are pooled.R is a learnable parameter matrix that con- verts the pooled vector into routing weights, and the sigmoid function is the activation function.The routing function generates customized weight coefficients for the input features and then performs the COMBINE operation; that is, the convolution kernels are dynamically combined based on the weights calculated by the routing function.The principle of this step is shown in formula (2) where α 1 ,…, α n are the weight coefficients of each convo- lution kernel; W 1 ,…, W n are different convolution kernels; and σ is an activation function, where a ReLU or sigmoid function is generally used.Since different weights are calculated for different inputs, different combinations are produced for different types of inputs.The resultant feature map is generated by applying a convolution operation to the input data using the combined convolution kernel.Finally, normalization and activation function components are applied.In addition, CondConv uses the ReLU activation function, as shown in formula (3) ConvModule uses the SiLU activation function, as shown in formula (4) Although the ReLU function addresses the gradient disappearance issue, it produces a gradient of zero when the input is negative.Conversely, the SiLU function yields a nonzero gradient even when the input is negative, thereby enhancing the performance achieved during the backpropagation phase.Therefore, we choose to use the SiLU activation function.The changed module is shown in Fig. 4(b).We replace the ConvModule module in the modified version of YOLOv8 with CondConv. (2)

Efficient attention mechanism
Integrating an attention mechanism into the model has the potential to enhance its generalizability and enable its adaptation to diverse tasks.The multihead attention mechanism is better than the self-attention mechanism at capturing medium-and long-distance information and has better feature expression capabilities.However, the multihead attention mechanism has several shortcomings, such as requiring large amounts of computing resources and memory.To solve these problems, BiFormer [32] introduced a dynamic sparse attention mechanism to focus only on the most relevant key-value pairs, reducing the number of required calculations.Its structure is shown in Fig. 5.
BiFormer attention has a four-layer pyramid structure, including one patch embedding layer and three patch merging layers.The input feature map is sliced using the patch embedding module, transforming it into a vector, and feature extraction is performed through linear transformation.The patch merging module is used to merge adjacent patches.During the merging process, the resolution of the input image is reduced, and the feature dimensionality is increased to reduce the number of required calculations and improve the feature expressions.The above operations depend on the BiFormer block, whose structure is shown in Fig. 6a.
The block includes depthwise separable convolution (DWConv), bi-level routing attention (BRA), multilayer perceptron (MLP), and layer normalization (LN) modules.The structure of the core BRA module is shown in Fig. 6b.First, the input feature map is divided into S 2 nonover- lapping areas.The number of vectors in each area is HW S 2 , where H and W represent the width and height of the fea- ture map, respectively.After applying linear projection, the query (Q), key (K), and value (V) are obtained as follows: X is the input 2D feature map, and the projection weights of Q, K, and V are denoted as W Q , W K , and W V , respectively.Second, the regional-level similarity (5) between the query and key is calculated, and a directed graph is constructed to establish attention relationships between regions.This graph is formed as follows: where A r is the adjacency matrix of the affinity graph with A r ∈ R S 2 * S 2 ; Q r , and K r represent regional-level que- ries and keys, respectively, with and Q r and K r ∈ R S 2 * C .Then, area screening is performed.Through the routing index matrix, the top-k connections are retained in each area; that is, the most relevant area is selected, as shown in formula (7) where I r represents the k most relevant indices in each region.According to the routing index matrix, for each query token, the attention weights between it and the key-value pairs retained in the region are calculated as shown in formula (8) O represents the output of the attention mechanism, K g , V g represents the aggregated key-value tensor, and LCE(V ) represents the local context enhancement item.After performing processing via the BRA module, the attention focus of the model is improved, the stabilization of the output relies on the utilization of the LN module, and the expression ability of the model is enhanced through the application of the MLP module.
In wood defect detection scenarios, incorporating the BiFormer attention mechanism into the model can enhance its understanding of complex features.Furthermore, this strategy possesses the potential to enhance (6) the capacity of the model for analyzing wood aberrations across various levels, regardless of whether these abnormalities manifest as minute fissures or extensive impairments.

Improved loss function
The head part of YOLOv8 uses the anchor-free approach to separate the classification and regression branches, as shown in Fig. 7.
The loss function for the classification branch is the binary cross-entropy loss (BCEL), while the regression branch utilizes distribution focal loss (DFL) and Complete-IoU (CIoU) loss functions.By incorporating the Distance-IoU (DIoU) and considering the aspect ratio between the predicted bounding box and the groundtruth bounding box, the CIoU loss function enhances the precision of bounding box regression.The CIoU is calculated according to the following formula: where IoU represents the intersection-over-union ratio between the anticipated bounding box and the actual bounding box.The Euclidean distance between the center points of the two bounding boxes is calculated and denoted as d.Additionally, c represents the diagonal of the minimum closed box that simultaneously encompasses both the predicted bounding box and the true bounding box.As a weight coefficient, α is not included in the gradient calculation.α is defined as shown in for- mula (10) Formula ( 11) introduces a parameter v, which quantifies the consistency of the aspect ratios by means of measurement The variables w gt and h gt represent the dimensions of the actual bounding box, specifically its width and height, respectively, and w and h represent the dimensions of the estimated bounding box, again representing the width and height, respectively.
The CIoU algorithm increases the penalty imposed on low-quality samples and reduces the generalizability ( 9) of the model.With the variety of encountered production environments and defects, wood defect images can easily produce low-quality samples.Therefore, we use the Wise-IoU(WIoU) v3 [33] to replace the CIoU in the bounding box regression branch to solve this problem.
The Wise-IoU algorithm can dynamically adjust the gradient distribution during the training process.Wise-IoU v1 integrates a distance-based attention mechanism to the IoU loss, as shown in the following formula: The IoU variable is utilized to assess the intersection ratio between the estimated box and the bounding box of the actual ground truth.x and y symbolize the coordinate (12)  where β represents the outlier degree.A higher outlier degree indicates poor anchor box quality, and a smaller gradient gain is allocated to reduce the attention given to low-quality anchor boxes.A lower outlier degree indicates higher quality anchor boxes, and the corresponding distribution is smaller.The gradient gain thus increases the applicability of the normal-quality anchor box.By dynamically adjusting the gradient gain in this manner, the generalizability of the model improves.
Figure 8 shows the structure of our proposed CWB-YOLOv8 algorithm, and our key improvements made to the algorithm are highlighted in red.First, in the backbone part, we use the BiFormer attention mechanism to improve the ability of the model to understand the complex features of wood through double-layer routing attention.Second, in the head part, we use Wise-IoUv3 to replace the original CIoU loss function for performing wood defect detection.By addressing the problems caused by low-and medium-quality samples and dynamically adjusting the gradient distribution during the training process, the generalization ability of the model is improved.Finally, we use the CondConv module to replace the original convolution module.The advantage of this approach is that the improved model dynamically adjusts the weight of the convolution kernel according to the attributes of the input data, enhancing the ability of the algorithm to extract different types of wood defect features.

Evaluation indices
To conduct an impartial assessment of the ability of the proposed model to detect wood defects, various evaluation metrics were employed.These metrics encompassed the mAP@IoU, precision, recall, and inference time.The mAP@IoU represents the precision performance achieved under different recall rates and different degrees of bounding box overlap.To evaluate the accuracy of the tested models, we used two metrics, mAP@0.5 and mAP@0.5:0.95,where mAP@0.5 represents the average (15) L WIoUv3 = rL WIoUv1 , model accuracy (mAP) evaluated at an IoU threshold of 0.5 and mAP@0.5:0.95represents the IoU.The average accuracy of each model was evaluated at a series of thresholds that were gradually increased by some step size (usually 0.05), ranging from 0.5 to 0.95.Equation 16demonstrates that precision signifies the ratio of positive samples correctly identified by the model out of all samples identified as positive.Equation 17, on the other hand, illustrates the ratio of correctly recognized positive samples to all real positive samples.The inference time required by the model to process a single image was measured in milliseconds the number of true-positive samples, denoted as TP in formula ( 16) and formula (17), indicates the parameter for assessing the accuracy achieved in the experiment.
Conversely, the number of false positives, represented by FP, measures the parameter for identifying incorrect outcomes, whereas the number of false negatives (FN) is the parameter for determining the number of missed correct outcomes.

Wood defect detection based on CWB-YOLOv8
In our study, we integrated the CondConv, Wise-IoU, and BiFormer components to enhance the performance of the YOLOv8 model in terms of detecting wood defects.The configuration of the algorithmic parameters can be found in Table 2.
Figure 9 shows the resulting graph generated by the algorithm.The horizontal axis corresponds to the number of epoch, and the vertical axis is the value of each indicator.For example, the vertical axis of the train/box_ loss image gives the value of box_loss corresponding to 1-200 epochs.In the figure, box_loss is the improved loss function.As the number of epochs increased, the values of box_loss, cls_loss, and dfl_loss in the training and ( 16) Table 3 Results of an ablation experiment

CondConv
Wise-IoU BiFormer mAP@0.5 mAP@0.The improved mAP@0.5 and mAP@0.5:0.95values signify a consistent enhancement in the performance of the model, demonstrating its resilience to different intersection-over-union thresholds.The confusion matrix in Fig. 10 illustrates the label information and prediction results.Figure 10a presents a histogram located in the upper-left corner, depicting the frequencies of different defects within the instances; the upper right corner is a distribution heatmap showing the distribution of the bounding boxes.The scatter plot in the lower left corner shows the concentration of the center points in the x and y dimensions, and the scatter plot in the lower right corner shows that the number of small targets was relatively high in the utilized dataset.In Fig. 10b, the abscissa is the correct category, the ordinate is the detection category, and the value in the box represents the proportion.The diagonal line represents the proportion of correct predictions for each category.Among them, the prediction and classification results for cracks and resin defects were the best, with proportions of correct predictions reaching 0.96 and 0.93, respectively.In addition to the effectiveness of the algorithm, since the test set contained some data-enhanced images, the model accuracy may have been improved to a certain extent.Parts other than the diagonal lines indicate false detections, and blank parts indicate the absence of false detections.
As displayed in Table 3, when diverse enhancement modules were incorporated into the YOLOv8 algorithm, the indicators exhibited varying degrees of improvement.Among them, the mAP@0.5 and other indicators of the CWB-YOLOv8 method were the highest.This is because CondConv improves the ability of the model to learn different input abilities, the BiFormer attention mechanism Figure 11 shows the discrepancies among the mAP@0.5,mAP@0.5:0.95,precision, and recall values produced by the four methods throughout their Figure 12 shows some defect detection results.The value in the figure represents the confidence level.The higher the value is, the higher the accuracy of the detection algorithm.Board (1) shows an image we collected, with a knot_with_crack defect; Board (2) presents an image from the public data set, including knot_with_ crack, live_knot, and resin defects.(a), (b), (c), and (d) correspond to the four methods used in the ablation experiment.As shown in Fig. 12, method (a) displays the lowest detection accuracy.For example, the confidence of classification for the knot_with_crack defect in board ( 1) is only 0.82 and 0.81 for the base methods, values lower than those obtained with the other three methods.The other methods improved the defect detection capabilities of the model.Among them, method (d), the proposed CWB-YOLOv8 algorithm, displays the highest detection accuracy.For example, the confidence of classification for the knot_with_crack defect in board (1) reached 0.90 and 0.95.
Figure 13 shows the heatmaps generated by the four algorithms via Grad-CAM technology when detecting live_knot defects in the original image.The spatial focus of each algorithm on wood defects is visually displayed.Warmer colors represent greater levels of attention.As shown in the figure, the YOLOv8 algorithm produced the smallest area of concern for defects, did not cover the entire defect and had the lowest confidence score.With the addition of the improvement modules, the area of interest gradually increased, and the CWB-YOLOv8 method yielded the largest area of interest, almost completely covering it.This approach detected the most defects and obtained the highest confidence score,  proving that the CWB-YOLOv8 method is more effective than other methods for wood defect detection tasks.

Comparative experiment
We compared different algorithms to verify the effectiveness of the proposed method for use in wood defect detection scenarios.The algorithms included Faster RCNN, the SSD, YOLOv3, YOLOv5, YOLOv7, YOLOv8, and CWB-YOLOv8.The employed assessment criteria included mAP@0.5, mAP@0.5:0.95, and FPS.Among them, the higher the two indicators mAP@0.5 and mAP@0.5:0.95are, the better, and the lower the FPS indicator is, the better.The findings from the experiment are presented in Table 4.
According to Table 4, for the mAP@0.5 indicator, CWB-YOLOv8 produced the best result with a score of 0.892; for the mAP@0.5:0.95indicator, CWB-YOLOv8 also ranked first with a score of 0.588.These two indicators indicate that the defect detection accuracy of CWB-YOLOv8 was better than that of the other detection algorithms.In terms of inference time, Faster RCNN took the longest amount of time to detect a single image, requiring 47.52 ms.YOLOv3 took the shortest amount of time at only 13.1 ms, and YOLOv8 required 13.9 ms.Since CWB-YOLOv8 added modules such as attention mechanisms on the basis of YOLOv8, the process required a slightly longer period of 17.3 ms.However, considering the detection accuracy and processing time, CWB-YOLOv8 still yielded better results than did the other methods.
Figure 14 shows the visual results produced by the seven algorithms for wood defect detection.The diagram includes the dead_knot and knot_with_crack defects.As shown in Fig. 14, Faster RCNN can detect two defects, but the confidence obtained for the dead_knot defect is only 57%.The SSD algorithm can only detect the knot_with_crack defect, with a confidence level of 0.76, which is 1 percentage point higher than that for the Faster RCNN.All YOLO series algorithms can detect the defects, but the YOLOv5 method yields false detections.For example, resin is mistakenly detected as a marrow defect.From this point of view, YOLOv5 is less effective than other models.CWB-YOLOv8 displays the best overall detection effect, with the confidence level for the dead_knot defect reaching 0.86, and there are no false detections.

Conclusions
In this paper, a self-built defect dataset was constructed for wood defect detection to achieve better generalization performance via training on defects in multiple tree species.The YOLOv8 algorithm was also improved in this paper.First, we replaced the convolution module in the original algorithm with a CondConv module.The convolution kernel weight was dynamically adjusted to address the pressure imposed by different defects on the feature extraction process.Afterward, the Wise-IoU function replaced the CIoU function to enhance the generalization performance of the model.Ultimately, we incorporated the BiFormer attention mechanism into the backbone.We enhanced the contextual understanding capabilities of the model and improved its ability to handle multiscale defects.
The experimental results showed that incorporating the CondConv, Wise-IoU, and BiFormer modules improved the wood defect detection effect of the proposed method.Compared with the YOLOv8 algorithm, the enhanced approach exhibited increases of 3.5% and 5.8% in terms of the mAP@0.5 and mAP@0.5:0.95indices, respectively.Moreover, when contrasted with the prevailing target detection methodologies, Faster RCNN, the SSD, YOLOv3, YOLOv5, and YOLOv7, CWB-YOLOv8 exhibited mAP@0.5 value improvements of 20.1%, 20.5%, 6.2%, 12%, and 5.2%, respectively.Regarding the mAP@0.5 metric, the 0.95 indicator increased by 26.7%, 25.3%, 15.2%, 20.2%, and 11.7%, respectively.However, since we replaced the convolution module and added an attention mechanism, the single-image processing time of the CWB-YOLOv8 method was 3.4 ms greater than that of the original YOLOv8 algorithm.
In follow-up work, our priority will be to enhance the model to minimize its computational requirements while ensuring its accuracy when adapting to embedded devices to perform wood defect detection on a production line.Additionally, we will expand our defect data collection to include additional tree species to improve the model's generalizability.Given that the inclusion of enhanced images in the test set may introduce bias affecting model performance, it is essential to utilize original, unenhanced images in future testing phases to ensure a more accurate evaluation of model efficacy.

Fig. 3 aFig. 4 a
Fig. 3 a ConvModule structure and b examples of wood defects

Fig. 11
Fig. 11 Curve showing the changes exhibited by the ablation experiment indices

Table 1
Defect distribution

Table 4
Results of an algorithmic comparison experiment