Evaluation of image partitioning strategies for preserving spatial information of cross-sectional micrographs in automated wood recognition of Fagaceae

Although wood cross sections contain spatiotemporal information regarding tree growth, computer vision-based wood identification studies have traditionally favored disordered image representations that do not take such information into account. This paper describes image partitioning strategies that preserve the spatial information of wood cross-sectional images. Three partitioning strategies are designed, namely grid partitioning based on spatial pyramid matching and its variants, radial and tangential partitioning, and their recognition performance is evaluated for the Fagaceae micrograph dataset. The grid and radial partitioning strategies achieve better recognition performance than the bag-of-features model that constitutes their underlying framework. Radial partitioning, which is a strategy for preserving spatial information from pith to bark, further improves the performance, especially for radial-porous species. The Pearson correlation and autocorrelation coefficients produced from radially partitioned sub-images have the potential to be used as auxiliaries in the construction of multi-feature datasets. The contribution of image partitioning strategies is found to be limited to species recognition and is unremarkable at the genus level.


Introduction
In the field of wood science, there is a growing interest in computer vision (CV)-based wood identification. Applications of CV are expanding from automated wood identification systems to wood anatomical approaches [1][2][3][4]. The recent emergence of open wood image databases, such as the forest species database of the Laboratory of Vision, Robotics and Imaging at the Federal University of Parana [5,6] and the Xylarium Digital Database for wood information science and education (XDD) [7], provides an opportunity for the further application of CV technologies in wood science.
Feature extraction from images is the most important process in determining the performance of CVbased wood identification systems. In the classification of micrograph datasets, several studies have proven that local feature techniques for extracting morphological information about wood cells, represented by the scaleinvariant feature transform (SIFT), are superior to texture features such as the gray-level co-occurrence matrix and local binary patterns [4,8]. Moreover, local features encoded within the bag-of-features (BOF) framework [9] not only allow indirect quantification of anatomical elements, but can also improve classification performance [4,10]. However, the CV-based wood identification strategies using micrograph datasets that have been reported to date have only focused on the morphological characterization of wood cells and their quantification, neglecting the spatial information among the features extracted from an image. The BOF model basically loses the spatial relationship among the features because it represents an image as a feature histogram. This characteristic may be an advantage for recognizing highly deformable objects such as animals, but negatively affects the recognition of objects with a fixed shape, such as cars or buildings [11]. When using cross-sectional micrographs of wood, the spatial information among the wood cells should also be considered, because the structure and arrangement of the cells within one annual ring exhibit fixed patterns between species.
Spatial pyramid matching (SPM) is an image partitioning technique that is designed to compensate for the fact that BOF does not consider spatial information within images [11]. Various SPM models have demonstrated its effectiveness in general image classification [12,13], but this approach has not yet been used for wood recognition. Image patch extraction and blocking techniques for automated wood identification have been reported [14,15], but the main purpose of these strategies was to reduce computation costs and so they do not preserve any spatial information. Application of image partitioning techniques in wood recognition may be helpful as it preserves geometric information such as cell arrangement and distribution within images.
This paper describes three image partition strategies based on SPM, namely conventional SPM, radial-SPM, and tangential-SPM, that preserve the spatial information of Fagaceae cross-sectional micrographs, and evaluates their recognition performance through a comparison with other recognition strategies. Radial-and tangential-SPMs are modified partitioning techniques that preserve spatial information in the radial and tangential directions of an image. In addition, the correlation coefficients among the image features in the partitioned sub-images and the autocorrelations of each feature are computed to evaluate their effectiveness as auxiliary features.

Dataset
The Fagaceae micrograph dataset from the Xylarium Digital Database for wood information science and education (XDD_005) was employed to evaluate the image partitioning strategies [16]. The dataset contains 18 species from five genera of the Fagaceae family, and is composed of 2446 cross-sectional optical micrographs ( Table 1). The micrographs were acquired at a low magnification with Olympus ™ 2× (0.08NA) PlanApo objective lens, using a BX51 optical microscope equipped with DP73 CCD (charge-coupled device) camera (Olympus, Japan). The Fagaceae dataset is well suited for evaluating spatial partitioning strategies because it consists of tree species with various porosities. All images in the dataset are 8-bit grayscale and have a pixel resolution of 4.44 μm, corresponding to a size of 600 × 600 pixels. Image processing such as the exclusion of specific anatomical features or adjustment of the annual ring width was not considered to preserve the anatomical diversity of the original images.

Recognition procedure
The proposed models follow the recognition procedures presented in Fig. 1. The XDD_005 dataset was split into training and test sets at a ratio of 4:1 by individual units, rather than by images. The SIFT keypoints were extracted from the training set, then codewords were generated by mini-batch k-means clustering of the keypoints. Subsequently, the images were divided into several levels in predefined ways for each partitioning strategy. The correlation coefficients among the codewords for the partitioned sub-images and the autocorrelation of each codeword were calculated, and the correlation data were combined with each feature histogram of the SPMs to generate multiple features. The data produced at each stage of feature extraction, feature encoding, image partitioning, and multiple feature generation were input to a support vector machine (SVM) classifier to build recognition models. The recognition performance of the models when learning the data produced in the feature extraction and encoding steps was compared with that of the image partitioning and multiple feature strategies. In addition, the recognition performance of the VGG16 model [17], a family of convolutional neural networks, on the codeword data was used as a benchmark.

Feature extraction and encoding
Local features were extracted from images using the SIFT algorithm. The superiority of SIFT has been demonstrated in comparative studies of major local feature extraction algorithms in general image classification [18]. For wood identification, it has been reported that SIFT, which actively detects cell corners, is superior to other local feature algorithms [4]. As parameters of the SIFT algorithm for feature extraction, the number of layers in each octave was set to 3, the contrast and edge thresholds were set to 0.06 and 10, respectively, and the sigma value of the Gaussian applied to the image at octave number 0 was set to 1.6. To encode the SIFT keypoints into codewords, minibatch k-means clustering with a processing batch size of 100 was applied to all the extracted features [19]. The optimal number of codewords (k) was determined by threefold cross-validation with various values of k.

Image partitioning
Three different image partitioning strategies based on SPM, namely conventional SPM (SPM), radial-SPM (RSPM), and tangential-SPM (TSPM), were employed to preserve the spatial information in the cross-sectional micrographs.
SPM divides the image into a grid with pyramid kernels (Fig. 2a). SPM has two partitioning schemes, single level and pyramid level. Higher single levels further subdivide the image and apply higher weights to the codeword histogram of each sub-region. The process of producing the feature histogram is performed within the BOF framework, and the histogram at single-level 0 is exactly the same as that of the BOF. The pyramid level is a combination of all previous single levels (Fig. 2e). The computational efficiency of SPM can be poor because the feature dimension increases exponentially as the level increases. Therefore, we set the maximum pyramid level to 2.
In this study, we designed RSPM, which partitions images along the radial direction of the wood, and TSPM, which divides images along the tangential direction, by considering the growth process of wood. For RSPM and TSPM, only single levels were considered and no pyramid kernel was applied, which resulted in a low computational cost compared with SPM. The feature histograms of sub-images divided at each partition level were connected linearly to preserve the spatial information of all sub-regions.

Feature correlations and multi-feature combination
The sub-images are represented by the codewords, and codeword histograms with different patterns for each sub-image will be generated. Therefore, the correlations among the codewords in the sub-images may provide useful image features regarding spatial information from cross-sectional images of wood.
From the sub-images of the partitioning strategy that achieved the highest recognition performance, we calculated the Pearson correlation (PC) of the codewords and the autocorrelation (AC) of each codeword, and combined them with the codeword histogram of the selected strategy. However, if PC and AC were calculated using all codewords chosen through k-fold cross-validation, the data dimension may increase exponentially. Based on the study of Kobayashi et al. [3], which reported good recognition results using the SIFT keypoint-based feature histogram with 18 codewords from the XDD_005 dataset, we regenerated 18 codewords and used them for the PC and AC calculations. The PC coefficient takes a value from − 1 to 1, where − 1 indicates negative correlation between two variables and 1 indicates positive correlation. The formula for the PC coefficient (r) is: where n is the sample size, x i and y i are the feature vectors (codeword histograms), and x̄ and ȳ are the means of the vectors. r is the covariance divided by the product of the standard deviations.
Similar to PC measuring the magnitude of a linear relationship between two variables, AC measures the linear relationship between time-lagged values in a time series data. The formula for AC (r k ) is: , where T is the length of a time series. For example, r 1 measures the relationship between y t and y t−1 and r 2 measures that between y t and y t−2 . The usefulness of PC and AC as auxiliary features was evaluated by comparing the recognition performance of models trained using multi-feature sets with that of models trained using single features.

Data learning and performance metric
The SVM classifier with a radial basis function (RBF) kernel was used for wood recognition [20]. To optimize the parameters, we set up a grid search with a logarithmic grid ranging from 10 −3 to 10 3 for gamma (a Gaussian kernel parameter for nonlinear classification) and from 10 −8 to 10 −2 for cost (a parameter that controls the cost of misclassification of the training data).
Because the XDD_005 dataset has quite imbalanced classes, the F1 score was used as a metric to evaluate the performance of the established models. The F1 score is the harmonic mean of the precision and recall, and is more appropriate than the accuracy for evaluating models that have been trained using an imbalanced dataset [4].

Determination of the number of codewords
The cross-validation errors produced using various numbers of codewords were computed to determine the optimal number of codewords. The results showed that the minimum error was achieved with 300 codewords. Similar errors were produced with larger numbers of codewords. Indeed, 300 codewords is relatively small compared with that for the Lauraceae dataset (XDD_008) [21]. In a study on Lauraceae image recognition using the BOF framework, the optimal number of codewords was found to be 500 [4]. The Fagaceae dataset may require fewer codewords because the Fagaceae species have more distinctive structures than the Lauraceae species, and/or because of the lower pixel resolution of the dataset. Differences in pixel resolution affect the discriminative power of feature extraction algorithms. Indeed, at the pixel resolution of 4.44 μm tested in this study, some morphological structures on small wood fibers appeared blurred, but this was not the case in the study using the Lauraceae dataset, where the pixel resolution is 2.94 μm [4]. In a recognition study of the XDD_005 dataset using SIFT keypoints and connected component analysis data as image features, there was no significant difference in recognition performance between pixel resolutions of 4.44 μm and 2.94 μm [3]. This may be because many of the species in the Fagaceae dataset, particularly Quercus, have almost completely closed fiber lumina with thick cell walls [22].

Image partitioning models Conventional SPM
The recognition performance of the SPM model with various partition levels is listed in Table 2. In the singlelevel cases, the F1 scores decrease as the level increases. Interestingly, the highest F1 score of 0.722 is produced at pyramid level 1, which is a combination of the histograms of single levels 0 and 1. This result suggests that image partitioning that preserves spatial information within the images is an effective strategy for wood recognition. Even though there are subtle inter-and intra-species variations, the distribution patterns and types of wood cells are broadly stationary and the patterns are repeated in the direction from pith to bark in xylem. Owing to such structural characteristics of wood cells, preserving spatial information is a better strategy than disordered representations like BOF. Barmpoutis et al. [23] also reported that horizontal and vertical image patch models combined with higher-order linear dynamical systems within the BOF framework were successful in recognizing the WOOD-AUTH macroscopic image dataset [24].
Partition level 2 was neither cost effective nor performance effective.

Modified SPMs
The F1 scores of RSPM, a partitioning strategy that considers the direction of wood growth, and its counterpart TSPM are presented in Fig. 3. The RSPM model further improves the recognition performance of SPM. At partition level 2 (three sub-regions), RSPM achieves an F1 score of 0.738, which is the best score of all the partitioning strategies tested. In contrast, the F1 scores of TSPM continue to decrease as the partition level increases. These results confirm that the performance improvement in the conventional SPM model is the result of the discriminative power of radial partitioning, rather than tangential partitioning. The radial direction contains spatiotemporal information about the growth of the wood, whereas the tangential direction does not. It was expected that the recognition performance would be improved by TSPM for tree species with broad rays or radial-porous species, but it was not. Since more than half of the species, Fagus and Quercus species, in the dataset have broad rays, the features detected from them are genus-specific features rather than species specific. Figure 4 shows confusion matrices for BOF, SPM, and RSPM. The confusion matrix of BOF (Fig. 4a) was used as a benchmark to evaluate the partitioning strategies. Figure 4b, c show the differences between the matrices of SPM and BOF and between those of RSPM and BOF, respectively. The former indicates both improvement and deterioration of recognition in various species (Fig. 4b). In particular, the recognition results for Fagus crenata, a diffuse-porous species, and Quercus serrata, a ring-porous species, exhibit relatively strong improvement; whereas, the results for Castanopsis cuspidata, another ring-porous species,  Figure 4c, in contrast, exhibits a noticeable improvement in the recognition of radialporous species. The performance has improved for all radial-porous species in the genus Quercus, except for Q. acuta. Lithocarpus glaber was the most difficult species to recognize. In all of the models tested, all images of this species were misrecognized as Lithocarpus edulis or Quercus salicina. Although the recognition improved slightly when using the SPM model, it was not enough to improve the overall performance. The difficulty in recognizing L. glaber may result from anatomical similarities with L. edulis and Q. salicina, insufficient species-specific information or image resolution, and/or a lack of learning about the anatomical diversity of the species in the model because of the relatively small number of images. In fact, L. glaber is very similar to L. edulis and Q. salicina in terms of anatomical composition, as well as in the frequency and maximum tangential diameter of vessels [25].

Multi-feature schemes
To extract additional features from the sub-images of RSPM, the AC of each codeword and the PC of the 18 regenerated codewords were calculated. The AC and PC coefficients were combined individually or together with the codeword histogram of RSPM with partition level 2 (R2SPM), which achieved the highest F1 score among the models trained using the single-feature set. Consequently, three multi-feature sets were created, and each of them was used to establish respective recognition models.
In models trained by the multi-feature sets that combined AC or PC coefficients with R2SPM data, the recognition performance improved slightly from that of R2SPM. These models achieved the highest F1 scores of 0.742 and 0.746 at partition levels 7 and 9, respectively (Fig. 5). It is interesting to note that the recognition rates of Fagus crenata, Quercus acuta, and Quercus crispula, which are diffuse-, radial-, and ring-porous species, respectively, improved in the feature set combined with PC; whereas in the ACcombined set, the recognition rates of the ring-porous species Quercus serrata and Castanopsis cuspidata were particularly improved. Ring-porous species have a somewhat different cell composition ratio, arrangement, and size between earlywood and latewood, because there is a concentration of large vessels in the earlywood. As seen in Fig. 6a, b, the different positions of earlywood and latewood between given images lead to significant differences in the feature data produced from the spatial partitioning strategies (Fig. 6c). Interestingly, the difference is flattened out by AC (Fig. 6d). AC is not only an indicator of changes in specific anatomical elements with wood growth, but may also contribute to the recognition of ringporous species through this property of flattening the difference between earlywood and latewood positions. The model trained by the multi-feature set in which AC and PC were combined together with the R2SPM data achieved the best F1 score of 0.750 among all the proposed strategies.

Performance comparison
In the process of implementing the SPM-based models, feature sets were inevitably produced from the SIFT and BOF models. To evaluate the performance of the image partitioning strategies, we also established recognition models that learned the SIFT and BOF data, as well as VGG16, a convolutional neural network model, and compared their recognition performance with the XDD_005 dataset (Table 3).
From the F1 scores of the strategies presented in Table 3, it can be seen that the extraction of more sophisticated features from the wood images results in better recognition performance of the model. The codeword histogram of the BOF model, which is the quantification of features based on the morphological similarity of wood cells, is more discriminating in terms of wood recognition than the SIFT descriptor, which is the sum of the morphological information of the wood cells. Furthermore, the SPMs, which add spatial information of the codewords to the histogram of BOF, and especially RSPM, which preserves spatial information in the direction of wood growth, produce even more discriminative feature sets. Multi-feature strategies combined with correlation coefficients also help to further improve the recognition performance.
The image partitioning strategies proposed in this study achieve higher F1 scores than the VGG16 model, which has reported good results for both wood recognition and general image recognition. To achieve good recognition performance, deep learning models generally require a larger database than conventional machine learning models using feature engineering techniques [26]. With this empirical fact in mind, the size of XDD_005 may have been insufficient to allow the VGG16 model to achieve good performance.
As seen in Table 4, the genus-level recognition exhibits different aspects from the species recognition. All the established models produce significantly better genus recognition than species recognition, with F1 scores above 0.9, suggesting that the species within a given genus share similar anatomical characteristics. The F1 scores of SPM and RSPM do not surpass that of BOF. In other words, the image partitioning strategies contribute to preserving species-specific spatial information, but not genus-specific spatial information.

Conclusion
Wood recognition models based on spatial partitioning strategies were established to identify crosssectional micrographs of Fagaceae species. While the SPM and RSPM models achieved improved recognition performance over that of their underlying framework, BOF, the TSPM model did not. The strategy of radial partitioning, which contains spatiotemporal information on wood growth, was particularly effective for wood recognition, and the performance improvement in SPM was mainly the result of the contribution of radial partitioning. The AC and PC coefficients calculated from the sub-images divided in RSPM were found to provide good auxiliary features for creating multi-feature sets. However, it is necessary to determine an appropriate tradeoff between recognition performance and computation cost, because higher levels of image partitioning and multifeature combination result in increased computational  complexity. Rapidly evolving CV and machine learning techniques provide a variety of tools that enable a better understanding of wood. Therefore, further efforts are required to interpret and utilize these techniques from the perspective of wood science.

Table 3 Comparison of recognition performance
SIFT, scale-invariant feature transform; nOctaveLayers, the number of layers in each octave; contrastThreshold, the contrast threshold used to filter out weak features; edgeThreshold, the threshold used to filter out edge-like features; sigma, the sigma of Gaussian applied to the image at the octave number 0; BOF, bag-of-features; k, number of codewords; SPM, spatial pyramid matching; RSPM, radial-SPM; R2SPM, RSPM with partition level 2; TSPM, tangential-SPM; AC, autocorrelation; PC, Pearson correlation coefficient; SGD, stochastic gradient descent