Detection and visualization of encoded local features as anatomical predictors in cross-sectional images of Lauraceae

This paper describes computer vision-based quantitative microscopy and its application toward better understanding species specificity. An image dataset of the Lauraceae family that consists of nine species across six genera was investigated, and structural features were quantified using encoded local features implemented in a bag-of-features framework. Of the algorithms used for feature detection, the scale-invariant feature transform (SIFT) achieved the best performance in species discrimination. In the bag-of-features framework with the SIFT features, each image is represented by a histogram of codewords. The codewords were further analyzed by mapping them to each image to visualize the corresponding anatomical elements. From this analysis, we were able to classify and quantify the modes of aggregation of different combinations of cell elements based on clustered codewords. An analysis of the term frequency–inverse document frequency weights revealed that blob-based codewords are generally shared by all species, whereas corner-based codewords are more species specific.


Introduction
The primary task to be considered for the utilization and research of wood is to correctly identify species. A variety of methods have been established, starting with visual inspection, in which the efforts of wood anatomists are concentrated, to chemometrics, X-ray computed tomography and DNA-based techniques. Although visual inspection based on the anatomical characteristics of wood is the most accurate and reliable identification method, it faces the difficulty of training a sufficient number of highly qualified technicians because the method is entirely specialized. Other techniques also have several limitations such as procedural complexity, field applicability, data sensitivity, and high cost, making them suitable for special purposes rather than general purpose.
With the advent of machine learning and its remarkable development, various problems can be handled by automated systems. The same is true for wood identification. Any kind of data generated from the wood identification methods mentioned above can be applied to machine learning. Image data representing morphological information of wood are more suitable for machine learning than other types of data in various aspects, including applicability, reproducibility, and scalability. Computer vision-based local feature detection is a representative method for extracting morphological information from an image. A local feature consists of information about neighboring pixels in an image and represents local structures or patterns.
The use of local features enables the distinction of different types of wood cells from cross-sectional images [1]. Wood anatomists predict taxa from anatomical features such as wood fibers, axial parenchyma cells, vessels, rays and their arrangements from cross-sectional images. In contrast, computer vision detects spatial and spectral distributions of features such as blobs, corners, and edges, and returns a class. Such local features can effectively capture morphological clues for wood anatomists [1]. Taking these facts into consideration opens up the possibility of interpreting computer vision in terms of wood anatomy. However, wood is not only structurally complex, but many species have similar characteristics. Furthermore, because of the huge number of species, large-scale databases are required, even if only major species are being covered. Local features are often implemented within a bag-of-features (BOF) framework [2] to represent such complicated datasets. The BOF approach is adapted from the bag-of-words (BOW) method [3], which is a method for retrieving information from documents. In BOW, the number of words in a document is counted and a frequency histogram is generated for all words. The histogram allows us to identify the keywords of the document and retrieve it from a number of documents. BOF borrows the same concept to classify images, and text words in BOW are replaced with local features detected in images.
The basic principle of the BOF model with local features for wood recognition is to learn the properties of the detected features and the difference in the number of features of each species. If such differences are discerned, it is also possible to quantify the anatomical elements indirectly. While the information provided by this approach may differ from that produced by the currently established anatomy, the concept of pursuing species specificity in terms of features is fundamentally the same in both approaches. In recent years, some studies have reported that wood species can be recognized using BOF-based models [4,5]. However, because they used macroscopic images and focused on species classification, these approaches are difficult to interpret from the perspective of the anatomical and morphological characteristics of wood.
All experiments in this study were implemented within the BOF framework. Local features were extracted from the cross-sectional images by the scale-invariant feature transform (SIFT), speeded up robust features (SURF), oriented features-from-accelerated-segment-test (FAST) and rotated binary-robust-independent-elementaryfeature (BRIEF) (ORB), or accelerated-KAZE (AKAZE) algorithms. We then selected the best local feature extraction algorithm for the Lauraceae image dataset based on their recognition performances. The quantification and species specificity of structural elements were investigated by analyzing the local features and codewords of the selected algorithm. The codewords, which are analogous to the terms and glossary of anatomical descriptions, have been extensively analyzed. This paper discusses the potential for computer vision to improve or at least corroborate the currently established anatomy of wood.

Dataset
The Lauraceae image dataset was employed to analyze the structural elements. The dataset contains nine species from six genera of the Lauraceae family, and is composed of 1019 cross-sectional optical micrographs, with each species present in more than 20 images ( Table 1). The 8-bit grayscale images have a size of 900 × 900 pixels and a pixel resolution of 2.94 μm. This resolution is close to the theoretical resolving power of our optic system. To enable the recognition models to select the best feature detection algorithm, the dataset was divided into training and test sets at a two-to-one ratio, as presented in Table 1, and both sets consist of images collected from independent individuals.

Feature extraction
Local features were extracted from the cross-sectional optical micrographs using the well-known SIFT [6], SURF [7], ORB [8], and AKAZE [9] algorithms. Their features are basically invariant to scale, rotation, and limited affine changes. The algorithms were implemented by OpenCV [10], and all their parameters except for those of the ORB algorithm follow OpenCV's defaults. For ORB, the default maximum number of features that can be detected from an image is 500, which is extremely small for cross-section images. We therefore employed ORB with the maximum number of features set to 15,000 and 30,000 (referred to as ORB (15,000) and ORB (30,000), respectively).

Codebook generation
In this study, we implemented a BOF framework to analyze structural elements and recognize wood species (Fig. 1 ). The descriptors of all the extracted features were grouped into a specified number (k) of clusters, with the centroids from each group extracted as codewords, which are also called visual words (Fig. 1c). The k codewords formed the codebook, which is analogous to a dictionary in text analysis. The number of codewords that are used to create the codebook is arbitrary and is determined by k, which is how many clusters are used to classify the image features.
The k-means clustering algorithm is a popular method of generating a codebook, but it can lead to memory problems because of the high computational cost involved in processing massive datasets [11,12]. Therefore, we used the mini-batch k-means algorithm as an alternative [13]. This algorithm does not use all data, but instead repeatedly takes a subset of fixed size, drastically reducing the computational cost. Mini-batch k-means clustering was implemented by the k-means ++ algorithm [14] with a processing batch size of 100. The number of clusters (k-size) of 500 and 1000 were used to generate codebooks.

Image representation
Once a codebook has been completed, it can be used to represent each image. In other words, the image is represented as a histogram of the occurrence frequency of codewords in the codebook (Fig. 1d). All images have individual histograms with k bins. This is called vector quantization. The key idea of our approach is that the histograms are then used for anatomical analysis and wood recognition.

Data learning and wood recognition
The support vector machine (SVM) classifier with a radial basis function (RBF) kernel was used to learn the data for wood recognition. For classification, SVM finds the hyperplane with as wide a margin as possible between the input data of different classes. Whether linear or nonlinear models, SVM always classifies data linearly. Nonlinear models that map data into high-dimensional feature space to create hyperplanes are effective in classifying data that are difficult to distinguish linearly. The RBF kernel SVM uses a Gaussian kernel to map data into highdimensional spaces [15]. The training set histograms were analyzed as feature vectors and input to the support vector machine classifier to learn the class boundaries. To optimize the parameters, we set up grid searches with a logarithmic grid from 10 −3 to 10 3 for gamma and from 10 −8 to 10 −2 for C. Gamma is a Gaussian kernel parameter for nonlinear classification, and C is a parameter that controls the cost of misclassification on the training data.
The F1 score, the harmonic mean of precision and recall, was use as a metric to evaluate the performance of the recognition models, which is given by where TP is a test result in which the model correctly predicts a positive class, FP an error predicted as a positive class, and FN an error predicted as a negative class. Because the Lauraceae dataset has quite imbalanced classes (Table 1), it is inappropriate to evaluate the model based on recognition accuracy. The F1 score tells us how precise and robust the model is. Based on the F1 score, we selected the best feature extraction algorithm for the Lauraceae images and used it to analyze the structural elements. In this study, the computational cost was not a consideration in algorithm selection.

Codeword clustering and assignment to anatomical elements
We clustered the codewords generated by the feature extraction algorithm that was shown to have the best discrimination based on recognition performance. For anatomical analysis, codewords generated from the local features of the selected algorithm were clustered with those with similar descriptor characteristics. Agglomerative hierarchical clustering was performed on the codewords using the Euclidian distance through Ward's method [16]. The codewords were further analyzed by mapping them onto each image to visualize the corresponding anatomical structure. The number of local features (keypoints) of the codewords contained in each species was then evaluated to indirectly quantify anatomical elements.

Feature weights
In a codebook, some codewords may be common to many species, whereas others are more species specific. It is possible to classify such uncommon codewords using the term frequency-inverse document frequency (TFIDF) weighting [17], which is given by where w is the TFIDF weight, tf i,j the frequency of feature j in image i, df j the number of images containing feature j, and N the number of images. The first and second terms in the formula represent the term frequency (TF) and inverse document frequency (IDF) weights, respectively. A codeword with a high TFIDF score indicates a rare and unique feature present in a small number of species, whereas a low score denotes a feature shared by many species. The score of common features detected in all images becomes zero. The IDF and TFIDF scores respectively provide us with the rarity and species specificity of each codeword. The program for all the experiments was written in Python 3.5.2 with various external packages.

Comparison of local features
SIFT detected the largest number of keypoints in various types of wood cells, including vessels, axial parenchyma cells, rays, and wood fibers (Fig. 2b). Because the SIFT algorithm is basically designed to detect corners and blobs as local features, cell corners and cell lumina are primarily detected as keypoints in cross-sectional images, exposing the aggregation of numerous wood cells with lumina. In fact, SIFT detected them more aggressively than the other algorithms. A cell corner can be detected when the pixel resolution is more than 2.94 μm and was found to be a key feature that determines the discriminative power of SIFT [1]. SURF also detected various cells more densely in vessel lumina and rays, but the final number of keypoints is slightly over half that of SIFT (Fig. 2c). This means that SURF misses a significant number of wood cells that SIFT does not. ORB revealed a different aspect of the wood images than that revealed by SIFT and SURF, in that most keypoints were detected in blobs (Fig. 2d). This trend was common in both ORB (15,000) and ORB (30,000). Both ORB methods were implemented by combining the FAST keypoint detector [18] and the BRIEF descriptor [19] to improve performance. Although ORB uses FAST, which is an algorithm that detects corners as features, it did not detect any corners in the cross-sectional images, which are made up of numerous corners. This is because FAST certainly detects corners, but its keypoints, unlike those of other algorithms, represent a pixel surrounded by a corner, not a pixel corresponding to a corner. AKAZE, with the least number of keypoints, found various types of wood cells (Fig. 2e). It detected features in both cell corners and lumina, but those in the lumina were significantly more numerous. In contrast to SIFT and SURF, which employ Gaussian filtering for feature detection, AKAZE adopts nonlinear diffusion filtering to preserve image details and remove noise. However, the algorithm works similarly to SIFT for the detection of most wood fibers, and similarly to SURF for the presence of many keypoints in vessel lumina.

Performance of local feature extractors on wood recognition Wood recognition performance of local features
The F1 scores of each recognition model using the local features detected by the SIFT, SURF, ORBs, and AKAZE algorithms are presented in Table 2. For both codebook sizes, SIFT achieved the best F1 scores. The ORBs were ranked next, but their performance gaps with SIFT are quite large. Although ORB (30,000) detects twice as many keypoints as ORB (15,000), its F1 score did not surpass that of ORB (15,000). SURF produced such poor recognition performance that it would not be possible to perform further analyses based on its results.
Hwang et al. [1] reported that cell corners and cell lumina should contain important information for wood classification, given that a reduction in the number of keypoints detected in the cell corners and lumina results in a decrease in classification accuracy. In a cross-section of wood, cell lumina represent the characteristics of single cell types, whereas cell corners are a more important factor for wood recognition because they contain information not only about single cells but also about aggregates of various types of cells. Therefore, the high discriminative power of SIFT seems to be closely related to its cell corner detection capability, which is superior to that of the others. In a study by Tareen and Saleem [20], which compared the image matching performance of the well-known local feature extraction algorithms SIFT, SURF, KAZE, AKAZE, ORB, and BRISK (binary robust invariant scalable keypoints), they reported that SIFT achieved the best performance if the computational cost was not taken into account. The same results were reported by Karami et al. [21]. Given these results, we selected SIFT as the optimal algorithm for further wood anatomy analysis.

Recognition error
A normalized confusion matrix of the recognition model using the SIFT features is presented in Fig. 3. The confusion matrix is a table that compares the actual and predicted classes to measure the performance of the established model. As seen in Fig. 3, Laurus nobilis, Lindera glauca, Lindera umbellata, and Sassafras tzumu were recognized perfectly even though the number of images used to train the model was small. For Machilus japonica, in contrast, 91% of its test images were   Table 3 shows the International Association of Wood Anatomist (IAWA) list of microscopic features for hardwood identification [22]. It also presents anatomical features in cross-sections for the misrecognized species; in fact, their cross-section anatomical features are very similar. M. japonica shares most of its anatomical features with C. japonicum and M. thunbergii and even covers all the features of M. thunbergii. In comparison with C. japonicum, M. japonica has vessels that are larger in tangential diameter, but theoretically, the SIFT algorithm detects features that are invariant in scale, so the difference in lumen diameters or cell size is not considered to be an important factor for recognition. For M. thunbergii, a quarter of its test images were misrecognized as C. camphora, whereas C. camphora images were never recognized as M. thunbergii. This means that although they have similar anatomical features, C. camphora has sufficient species-specific features that distinguish it from others, but M. thunbergii does not.

Codewords decomposed into anatomical elements
The number of codewords influences the discrimination of local features. To determine the codebook size (k) that maximizes the discriminative power of the SIFT features, we performed threefold cross-validation on the training set with various numbers of codewords (from 100 to 1900). The minimum cross-validation error was produced at codebook size of 1000, and similar levels were obtained at larger sizes. When the codebook size is excessively large, there is a high possibility of introducing artifacts into the histogram or overfitting, and computation is inefficient. In consideration of the possible problems, therefore, we used a codebook with 1000 codewords for further analyses. Figure 4 shows a dendrogram of 1000 codewords forming six clusters and a visualization of some codewords belonging to each cluster. The codewords are mainly classified into two groups, corner-based (as in clusters 1, 3, and k in Fig. 1c) and blob-based groups (as in clusters 2 and 4 in Fig. 1c), which are further classified into four and two sub-clusters (Fig. 4a).
Together with the visual assignment of codewords, approximately 60% of the codebook refers to corners and blobs generally present in various types of wood cells, and they form their own clusters (clusters III and VI in Fig. 4a). In the corner-based group, vessels, axial parenchyma cells, and rays, wood fibers, and vessel boundaries can be divided into different clusters. The corner-based codewords in vessels are included in two clusters (clusters I and IV in Fig. 4a). The first cluster includes cornerbased codewords present in the rather long, smooth and thicker cell walls, which are independent from the cell type but more often detected in rays, whereas the codewords in the fourth cluster are always present in the vessel cell walls that connect with the other type of adjacent cells. In the blob-based group, the codewords are included in two clusters. These clusters are divided by the location of SIFT keypoints in the cell lumen, namely, the center or portions near edges, regardless of the cell types. These results show that the corner-based codewords have different characteristics for anatomical elements compared with those based on blobs, and they vary depending on the type of adjacent wood cells.

Rarity and species specificity of codewords
The 20 codewords with the highest IDF values and their anatomical elements as well as the average number of SIFT keypoints for each codeword by species are presented in Table 4. If some artifacts introduced into an image by chance or the noise generated during image processing are detected as local features, they are likely to be converted into codewords with high IDF values even though they are not informative. To avoid such problems, we excluded codewords with less than 10 keypoints in all species.
The IDF reflects how common or rare a codeword is. Of the 1000 codewords, 452 have IDF values of 0, indicating that 45% of the codewords are common features in the Lauraceae dataset. Hence, the 55% of codewords with values greater than 0 contain species-specific information. In addition, although the number of codewords and keypoints representing corners in axial parenchyma cells, vessels, and rays is relatively small, they are ranked in the top class in terms of IDF values. The IDF reveals that blob-based codewords are generally shared by many  species, whereas corner-based codewords are more species specific. The TFIDF score can be used to predict the feature importance by IDF, taking into consideration the number of keypoints. Although codewords 751 and 820, which have the two highest IDF values, both represent corners in axial parenchyma cells, the former is abundant in C. camphora and the latter is abundant in S. tzumu (Table 4). In addition, codeword 923, which represents the same element, is dominant in L. glauca. For vessels, codewords 313 and 850 are abundant in L. glauca and S. tzumu, respectively. Rays are also species dependent (codewords 682, 225, and 890), and L. nobilis in particular has many keypoints in the ray-related codewords. L. glauca rarely has keypoints in codeword 635, which is abundant in most species. Hence, this codeword These results indicate that even within the same type of wood cells, codewords represent different local features and some of them are species dependent. Thus, the TFIDF score, which considers both IDF and the number of features simultaneously, is a useful tool for determining the differences in the anatomical elements of wood.

Species-specific features
The codewords with the top five TFIDF scores of four species that were correctly recognized among all the species investigated are mapped and visualized on the original optical micrographs of the corresponding species (Fig. 5). In the image of L. glauca (Fig. 5a), the keypoints are distributed in the rays, axial parenchyma cells, and vessels, especially in their cell lumina. In contrast, in the images of the other three species, the keypoints are mainly distributed in corners in vessels (Fig. 5b, d), and rays (Fig. 5c). Such anatomical elements can be considered species-specific features, which are not shared by all species. In contrast, M. japonica, the most misrecognized species, is the only species that does not have a specific wood cell that is species specific. This species has relatively high TFIDF scores in the corner-and blob-based features that are common to various wood cells.
For S. tzumu, which is the only ring porous species, large vessels in earlywood have high scores (Fig. 5d). Although the SIFT feature is invariant in scale, the reason for the high TFIDF scores of large vessels is that the formation of large vessels leads to morphological deformation of their adjacent cells, resulting in local features that are differentiated from the others (Fig. 6a, b). SIFT ignores individual cell type and size but detects a variety of the mode of aggregation of different cell elements.
Because the SIFT descriptors represent the gradient orientation of local area in the image, the shape and/ or arrangement of wood cells is an important factor in determining species-specific features. Figure 6c, d illustrates different types of wood fibers in C. camphora. In the Lauraceae dataset, rounded wood cells are a feature common to all species, whereas polygonal cells are likely to be species specific with high TFIDF scores.  6e, f presents the solitary vessels of M. thunbergii and C. camphora, respectively. According to the TFIDF score, C. camphora has angular corners in vessels and axial parenchyma cells as species-specific features. Recalling the relationship between this species and M. thunbergii in wood recognition, M. thunbergii had a high probability of being misrecognized as C. camphora, whereas C. camphora does not, even though they have very similar anatomical features. The angular outline of solitary vessels present in C. camphora is the only anatomical difference in the cross-section based on the IAWA feature code that distinguishes C. camphora and M. thunbergii (Table 3), and the proposed method suggests it is species specific. Similar to the large vessels of S. tzumu, the solitary vessels with angular outlines of C. camphora influence the formation of vasicentric axial parenchyma cells, resulting in unique morphological characteristics.
In the BOF framework, codewords as anatomical predictors allowed us to quantify the aggregation of different combinations of wood cells, and the TFIDF score provided us with information on species-specific features. With a codebook and local features, we have moved one step closer toward understanding how the features used by computer vision for recognition relate to wood anatomy.

Conclusion
A BOF model based on local features was established to investigate the Lauraceae image dataset. Of the wellknown local feature extraction algorithms implemented, the SIFT algorithm was selected as the best for the dataset and was used for distinguishing anatomical elements. The resulting codebook allowed us to approach computer vision from the perspective of wood anatomy. In the same anatomical elements, the local features varied with respect to the type of adjacent cells, and according to the TFIDF weights, these are important features for recognition in some species. The local features encoded with codewords are promising anatomical predictors that were able to indirectly assess the anatomical characteristics of Lauraceae. Further efforts to remove the gap between human and computer vision, as well as to reduce the gap between informatics and wood anatomy, will be the focus of subsequent studies.