North American Hardwoods Identification Using Machine-Learning

Verly Lopes, Dercilio Junior; Burgreen, Greg W.; Entsminger, Edward D.

doi:10.3390/f11030298

Open AccessTechnical Note

North American Hardwoods Identification Using Machine-Learning

by

Dercilio Junior Verly Lopes

^1,*,

Greg W. Burgreen

²

and

Edward D. Entsminger

¹

Department of Sustainable Bioproducts/Forest and Wildlife Research Center (FWRC), Mississippi State University, Starkville, MS 39762-9820, USA

²

CAVS, Mississippi State University, Starkville, MS 39759, USA

^*

Author to whom correspondence should be addressed.

Forests 2020, 11(3), 298; https://0-doi-org.brum.beds.ac.uk/10.3390/f11030298

Submission received: 11 February 2020 / Revised: 4 March 2020 / Accepted: 5 March 2020 / Published: 7 March 2020

(This article belongs to the Section Wood Science and Forest Products)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

This technical note determines the feasibility of using an InceptionV4_ResNetV2 convolutional neural network (CNN) to correctly identify hardwood species from macroscopic images. The method is composed of a commodity smartphone fitted with a 14× macro lens for photography. The end-grains of ten different North American hardwood species were photographed to create a dataset of 1869 images. The stratified 5-fold cross-validation machine-learning method was used, in which the number of testing samples varied from 341 to 342. Data augmentation was performed on-the-fly for each training set by rotating, zooming, and flipping images. It was found that the CNN could correctly identify hardwood species based on macroscopic images of its end-grain with an adjusted accuracy of 92.60%. With the current growing of machine-learning field, this model can then be readily deployed in a mobile application for field wood identification.

Keywords:

wood identification; machine-learning; smartphone; macro lens; Inception-ResNet; convolutional neural networks (CNN)

1. Introduction

Currently, wood identification is performed by human lumber graders, enthusiasts, professionals, and experts in the field by using hand lenses, keys, and wood species atlases or field guides and manuals. Thus, the accuracy of such wood identification critically depends on the observer’ expertise and experience to correctly recognize esoteric species features, which depend on wood soundness, in the case of decayed specimens for example, and correctly cover the full range of anatomical variability within a species [1]. Being able to correctly identify wood is an issue of major international importance, due to illegal logging, misrepresentation, mislabeling, and public awareness, of which species they are buying for their own applications and because each unique species contains its own performance characteristics [2,3].

One application area for which fast and accurate wood identification is important is illegal logging. Wiedenhoeft et al. [3] indicated that more than 60% of the 73 types of wood products tested, including furniture, musical instrumental, sporting equipment, kitchen implements, etc., had some evidence of fraud or mislabeling. In more than half of the samples, the wood was labeled with the wrong species name, whereas about 20% of the samples had the wrong product type label. They also surveyed people regarding laboratory capacity in identifying wood and found that 15 of the 23 respondents reported having limited capacity to conduct wood identification, and 13 reported in detail about their identification capacity. These 13 laboratories identify over 830 specimens per year with an average identification fee of $65.00/sample (United States currency), and a total investment in wood identification of $54,000.00 per year. According to [4], considering only sawn wood, the United States exported $3.8 billion in 2018, which resulted in an identification capacity of approximately 0.001% of the exported wood. For domestic wood identification, the accuracy was near 90% for the laboratories surveyed. Such high accuracy for law enforcement personnel officers screening products at border should help significantly minimize any illegal activity.

In a broader example, wood maintains a 93% share of the market for crossties installed in North America. This corresponds to 3249 wood crossties per mile, with a total of 650,000,000 crossties over 200,000 miles. The softwood and hardwood species used for crossties are catalpas (Catalpa spp. Scop.), cherries (Prunus spp. L.), cottonwoods (Populus spp. L.), Douglas-fir (Pseudotsuga menziesii (Mirb.) Franco), hackberries (Celtis spp. L.), hemlock (Tsuga spp. Carrière), larches (Larix spp. Mill.), mulberries (Morus spp. L.), oaks (Quercus spp. L.), pines (Pinus spp. L.), redwoods (Sequoia sempervirens (Lamb. ex D. Don) Endl.), sassafras (Sassafras albidum (Nutt.) Nees), spruces (Picea spp. A. Dietr.), true firs (Abies spp. Mill.), walnuts (Juglans spp. L.), yellow poplar (Liriodendron tulipifera L.), and many others [5]. Each species has its own heartwood treatability characteristics that critically influence performance [5]. Accurate wood identification will benefit from this type of industry for economic and durability reasons, as it is important to extend the service life of crossties.

In another application area, wood identification using fast and user-friendly approaches can help to develop a stronger and more competitive wood market. A prominent market is engineered wood mats. Mats protect environmentally sensitive soils and provide access and transport of heavy machinery in the field for construction of powerlines, bridges, roads, and drilling platforms. They generally are constructed with a wide variety of commercial hardwood timbers such as ash (Fraxinus spp. L.), beech (Fagus spp. L.), hickory (Carya spp. Nutt.), magnolia (Magnolia spp. L.), oak (Quercus spp. L.), pecan (Carya illinoinensis (Wangenh.) K. Koch), and sweetgum (Liquidambar styraciflua L.). Since the mechanical properties of wood changes from species to species, correct wood identification is crucial for adequate decision-making.

Over the last few years, artificial intelligence has provided technological breakthroughs in several fields by classifying, detecting, and segmenting images and videos [6,7]. For instance, [8,9] developed a system composed of a macro lens fitted with commodity smartphone to magnify the end-grain of Malaysian wood species. The images were processed and trained using convolutional neural networks (CNN), which is a complex numerical analysis used to assign predictive parameters to various aspects (shape and patterns) of an image. Their CNN classified 100 trade wood species with top-1 and top-2 accuracies (i.e., species is correctly identified in the top one or two predictions) of 77.52% and 87.29%, respectively [8]. However, deep learning models are specific, i.e., classifications can be done only on species and their attributes for which the model was trained. In that case, for North American hardwoods and even softwood species, their CNN has minimal impact. In addition, CNNs are not able to recognize non-visual characteristics such as species odor, unless such information can be digitized for training purposes.

The main goal of this work was to demonstrate how to make use of smartphone and macro lenses to develop powerful wood identification methods. This technical note, to the author’s knowledge, represents the first documented approach to macroscopically classify a wide variety of North American hardwood species using a CNN methodology. This computer vision system can be enhanced and developed as a mobile application to ameliorate illegal logging and/or misrepresentation, increase wood identification body, and advance forest products industry in general.

2. Materials and Methods

2.1. Generation of Wood Sample Database

Hardwood species were chosen from the Department of Sustainable Bioproducts/Forest and Wildlife Research Center (FWRC) xylarium collection at Mississippi State University, namely, American elm (Ulmus americana L.), black locust (Robinia pseudoacacia L.), hackberry (Celtis occidentalis L.), honey locust (Gleditsia triacanthos L.), Osage orange (Maclura pomifera (Raf.) C. K. Schneid.), red mulberry (Morus rubra L.), red oak (Quercus rubra spp. L.), sassafras (Sassafras albidum (Nutt.) Nees), white ash (Fraxinus americana L.), and white oak (Quercus alba spp. L.). These species were chosen due to their availability and extensive number of pieces. Samples were identified by three experts, one of which has more than 25 years of experience in the field.

We generated an initial dataset, which consisted of 1869 unique images from the 10 hardwood species. Figure 1 shows examples of the macroscopic end-grain samples used in the image dataset. As discussed later, this small dataset was augmented since a training set of approximately 150 unique images per class is too small to develop reliable CNN models.

2.2. Image Acquisition Setup and Dataset Processing

Preparation of the wood surface was needed to properly locate and identify cell types that were useful for identification. To this end, several thin and clean cuts with a razor blade were made across the transverse end surface to improve sufficient visible area for photography. End-grain sanding was not applied in order to better simulate field applications. Image acquisition was facilitated by an inexpensive 14× macro lens attached to a commodity smartphone with a 12-megapixel rear-facing camera with aperture of f/1:8. The semi-transparent protective cap of the macro lens positioned the camera approximately 3 cm from the samples, which improved photo-taking stability (Figure 2). As the pieces of available species varied in sizes, positioning of the lens was carefully considered to avoid any possible overlapping. Photos were taken under natural illumination. A camera flash was not used. Images were 3024 pixels × 3024 pixels with resolution of 72 dpi and 24 bits depth at ISO-40.

For effective training and validation of deep learning models, a large number of images are necessary [10]. The initial image dataset was augmented to generate additional synthesized images produced by zooming, rotation, and flipping. Only the training set was augmented. To improve trustworthiness of our results, we applied means, i.e., stratified k-fold cross-validation. The augmented image dataset was randomly split into 5 (k = 5) folds of mutually exclusive and shuffled subsets of proportional size. The model was then trained and tested on the each of the k-folds of data. To avoid overfitting, data augmentation was performed on-the-fly after splitting the entire dataset so that redundancy was minimized.

2.3. Convolutional Neural Network Architecture

The InceptionV4_ResNetV2 convolutional neural network (CNN) model used in this study to classify 10 North American hardwoods species and was based on the work of [11]. The reason for using this CNN model was due to its image classification performance on the ImageNet dataset (0.953 top-5 accuracy in 1000 classes) and due to the smaller number of trainable parameters (3 times less) compared to prior research that used VGG16 architecture [12]. High definition images of the wood samples were resized to 299 × 299 × 3 (width × height × color-channels), which was the default input size for this CNN. The CNN was implemented using TensorFlow 1.14 [13] and Keras [14]. We only performed training from scratch with a final 10-way softmax function, which corresponded to the 10 possible wood species of the dataset. The RMSprop stochastic gradient descent optimizer was used with an initial learning rate of 0.001 and decay rate developed by [11]. The CNN was trained using a categorical cross entropy loss function. We balanced the dataset classes through class_weight functionality from Keras. The CNN was run using a Nvidia RTX2070 graphics-processing unit with a batch size of 8. We evaluated CNN performance by several metrics: the mean of weighted F1-score, precision, recall, adjusted accuracy averaged confusion matrices, and precision-recall by F1 isometric curves for the validation set.

3. Results and Discussion

The modern InceptionV4_ResNetV2 CNN architecture was used to macroscopically classify ten hardwood species. The validation set varied from 341 to 342 images. The stratified cross-validation split the dataset such that an equal percentage of images were selected for the validation set. The longest fold took 89 epochs to train. The same decay rate was used for all folds; the learning rate decay is displayed in Figure 3.

When developing deep learning models, overfitting can be a crippling problem. Overfitting means that the validation loss tends to move in a U-shape as training progresses such that the difference between training loss and validation loss increases. Figure 4 shows the averaged training and validation accuracies, and training and validation losses for five folds.

Even though spikes in the validation loss were observed as training progressed, the variance decreased as the learning rate decreased. No overfitting was observed for two reasons: (1) overfitting follows a U-shape, which was not the case, and (2) the differences between training and validation loss decreased over epochs. All of these behaviors demonstrated stability of the model. The CNN training stage was considered as converged when the accuracy approaches 1.0. This finding supports that InceptionV4_ResNetV2 can correctly classify ten North American hardwood species from macroscopic images. In that case, this model could be considered for use in a mobile application.

Table 1 shows the averaged model evaluation metrics per species. The overall adjusted accuracy for the entire model was 92.60% on the imbalanced unseen validation set. This result translates into 318 corrected wood species identifications out of a possible 343, which validates the capability of this model to identify truly unseen data with high accuracy.

Precision and recall metrics address different questions about model performance. These metrics are crucial to confirm that imbalanced datasets have been properly modeled. For our study, there were twice as many observations of some species than others (e.g., Red Mulberry and Sassafras). Precision refers to the ratio of true positives divided by the sum of the true positive and false positive. High precision defines a low false positive rate. Recall is the ratio of true positive divided by the sum of the true positive and false negative. This indicates the percentage of ground truths that were correctly predicted. In the case of imbalanced datasets, the F1-score is another useful metric that considers the harmonic mean between precision and recall. Figure 5 shows the precision–recall curve (PRC) by isometric F1 curves for this study.

The PRC is important to characterize model performance when moderate skewness of classes was present in the dataset. We avoided using receiver operating characteristic (ROC) because it does not always translate into realistic PRC when the classes are imbalanced, which was recommended by [15]. The PRC provides an accurate prediction for future classification performance because only the fraction of true positives among all positive predictions are evaluated, i.e., in the PRC there is an avoidance of true negatives. In our case, all folds showed robust performance with a desirable clustering of curves toward the maximum precision and recall values. The overall area under the curve for all species was 0.98 (in comparison, pure random guessing would result in a quadrant of 0.50). The averaged confusion matrix is displayed in Figure 6.

The quality of our predictions can be verified by examining confusion matrix, which shows model estimation quality. An ideal classifier would produce 1.0 values on the diagonal of a confusion matrix, which would mean all species have been correctly predicted by the model. Figure 6 shows a very low degree of confusion in our model. This is likely due to using stratified k-fold cross-validation on this limited dataset to help train an unbiased model by ensuring that every fold tested different randomized data. However, our model did struggle to classify Hackberry. It misclassified 6 images out of a total of 35 images. The images were confused with Sassafras. We believe as more data is gathered and processed, this particular confusion can be ameliorated. In fact, the next step for this research is to collect more wood sample data, which is needed to increase the reliability of our model. As we increase the portfolio of species, we see future challenges related to image and model accuracy. To tackle these issues, we plan to deploy wood identification on mobile devices, not relying on remote data centers, and we plan to continuously update our increasingly comphesive machine-learning models to sustain high accuracy.

4. Conclusions

In this study, we showed the feasibility of the InceptionV4_ResNetV2 convolutional neural network to classify ten North American hardwood species with 92.60% of accuracy and precision–recall curve of 0.98. We envision our highly accurate model being utilized to combat illegal logging and/or misrepresentation.

With a proven deep learning model, our future efforts will focus on developing a mobile application. Artificial intelligence applications are typically trained on workstations and powerful laptops that are not friendly to harsh environmental or weather conditions. More recently, training and inference are performed on cloud data centers. However, online processing of images via data centers requires internet availability, and data privacy is not guaranteed. Our vision of having wood identification applications on mobile devices avoids these issues. Mobile-first AI applications are a rapidly growing field that allows machine-learning apps to see, hear, sense, and think in real-time. We plan to directly process images on mobile devices using TensorFlow Lite. End-users will need to follow only a few simple steps to become productive: loading the application on a smartphone, attaching an inexpensive macro lens, preparing the wood surface with clean cuts with a razon blade, and finally snapshot an image for wood identification using our app. This wood identification work flow is fast, and user-friendly such that any person with minimal training could perform highly accurate wood identification. Mobile device-based machine-learning will help put wood identification capabilities directly into the hands of field technicians, sawmills operators, matting industry, wood anatomists, border controllers, and the forest products industry as a whole.

Author Contributions

Conceptualization, D.J.V.L., G.W.B., E.D.E.; methodology, D.J.V.L, G.W.B., E.D.E.; validation, writing draft, and review editing, D.J.V.L.; G.W.B.; E.D.E.; formal analysis, D.J.V.L.; G.W.B.; investigation, D.J.V.L.; data curation, D.J.V.L.; G.W.B.; visualization, D.J.V.L.; G.W.B.; E.D.E.; All authors have read and agreed to the published version of the manuscript.

Funding

The authors wish to acknowledge the support of U.S. Department of Agriculture (USDA), Research, Education, and Economics (REE), Agriculture Research Service (ARS), Administrative and Financial Management (AFM), Financial Management and Accounting Division (FMAD), Grants and Agreements Management Branch (GAMB), under Agreement No. 58-0204-9-164. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the view of the U.S. Department of Agriculture. This publication is a contribution of the Forest and Wildlife Research Center (FWRC) at Mississippi State University. This manuscript is under SBP 975 designation.

Conflicts of Interest

The authors declare no conflict of interest.

References

Wheeler, E.A.; Baas, P. Wood identification—A review. IAWA J. 1998, 19, 241–264. [Google Scholar] [CrossRef]
Shmulsky, R.; Jones, P.D. Forest Products and Wood Science: An Introduction, 7th ed.; John Wiley & Sons: Hoboken, NJ, USA, 2019; pp. 447–470. [Google Scholar]
Wiedenhoeft, A.C.; Simeone, J.; Smith, A.; Parker-Forney, M.; Soares, R.; Fishman, A. Fraud and misrepresentation in retail forest products exceed U.S. forensic wood science capacity. PLoS ONE 2019, 14, e0219917. [Google Scholar] [CrossRef] [PubMed]
Food and Agriculture Organization of the United Nations—FAO. Forestry Production and Trade. Available online: http://www.fao.org/faostat/en/#data/FO (accessed on 20 January 2020).
Webb, D.A.; Webb, G.V. The Tie Guide: Handbook for Commercial Timbers Used by the Railroad Industry; The Railway Tie Association: Fayetteville, GA, USA, 2016; pp. 1–100. Available online: https://www.rta.org/assets/docs/TieGuide/2016_tie%20guide%20for%20web.pdf (accessed on 4 March 2020).
Seetha, J.; Raja, S.S. Brain tumor classification using convolutional neural networks. Biomed. Pharm. J. 2018, 11, 1457–1461. [Google Scholar] [CrossRef]
Wallelign, S.; Polceanu, M.; Buche, C. Soybean plant disease identification using convolutional neural network. In Proceedings of the Thirty-First International Florida Artificial Intelligence Research Society Conference (FLAIRS-31), Sarasota, FL, USA, 19–22 May 2019. [Google Scholar]
Tang, X.J.; Tay, Y.H.; Siam, N.A.; Lim, S.C. A rapid and robust automated macroscopic wood identification system using smartphone with macro-lens. arXiv 2017, arXiv:1709.08154. [Google Scholar]
Tang, X.J.; Tay, Y.H.; Siam, N.A.; Lim, S.C. My Wood-ID: Automated Macroscopic Wood Identification System Using Smartphone and Macro-Lens. In Proceedings of the 2018 International Conference on Computational Intelligence and Intelligent Systems, Phuket Island, Thailand, 17–18 November 2018; pp. 37–43. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems (NIPS’12), Lake Tahoe, NV, USA, 3–8 December 2012; pp. 1097–1105. Available online: https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf. (accessed on 4 March 2020).
Szegedy, C.; Iofee, S.; Vanhoucke, V.; Alemi, A.A. Inception-v4, Inception-ResNet and the impact of residual connections on learning. arXiv arXiv:1602.07261, 2015.
Ravidran, P.; Costa, A.; Soares, R.; Wiedenhoeft, A.C. Classification of CITES-listed and other neotropical Meliaceae wood images using convolutional neural networks. Plant Methods 2018, 14, 1–10. [Google Scholar]
Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G.S.; Davis, A.; Dean, J.; Devin, M.; et al. TensorFlow: Large-scale machine learning on heterogeneous distributed systems. arXiv 2016, arXiv:1603.04467. [Google Scholar]
Chollet, F.; Keras. GitHub. 2015. Available online: https://github.com/fchollet/keras (accessed on 11 January 2020).
Saito, T.; Rehmsmeier, M. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE 2015, 10, e0118432. [Google Scholar] [CrossRef] [PubMed] [Green Version]

Figure 1. Examples of cross sections of American elm (a), black locust (b), hackberry (c), and honey locust (d), were used as the dataset for the machine-learning model.

Figure 2. Image acquisition setup.

Figure 3. Learning rate decay during training.

Figure 4. Averaged training and validation parameters for five folds. Val Loss: Validation Loss; Acc.: Accuracy; Val Acc.: Validation Accuracy.

Figure 5. Precision–recall curves (PRC) for different isometric F1 curves for (a) Fold 1; (b) Fold 2; (c) Fold 3; (d) Fold 4; and (e) Fold 5.

Figure 6. Averaged confusion matrix of the ten North American hardwood species.

Table 1. Averaged model evaluation metrics per species.

Species	Precision	Recall	F1-Score	Support
American elm	0.93	1.00	0.96	29
Black locust	0.94	0.97	0.95	27
Hackberry	0.99	0.82	0.89	35
Honey locust	0.99	0.88	0.92	27
Osage orange	0.97	0.94	0.96	36
Red mulberry	0.98	0.94	0.96	50
Red oak	0.86	0.93	0.90	31
Sassafras	0.77	0.99	0.86	25
White ash	0.99	0.87	0.92	46
White oak	0.92	0.97	0.94	37

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Verly Lopes, D.J.; Burgreen, G.W.; Entsminger, E.D. North American Hardwoods Identification Using Machine-Learning. Forests 2020, 11, 298. https://0-doi-org.brum.beds.ac.uk/10.3390/f11030298

AMA Style

Verly Lopes DJ, Burgreen GW, Entsminger ED. North American Hardwoods Identification Using Machine-Learning. Forests. 2020; 11(3):298. https://0-doi-org.brum.beds.ac.uk/10.3390/f11030298

Chicago/Turabian Style

Verly Lopes, Dercilio Junior, Greg W. Burgreen, and Edward D. Entsminger. 2020. "North American Hardwoods Identification Using Machine-Learning" Forests 11, no. 3: 298. https://0-doi-org.brum.beds.ac.uk/10.3390/f11030298

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

North American Hardwoods Identification Using Machine-Learning

Abstract

1. Introduction

2. Materials and Methods

2.1. Generation of Wood Sample Database

2.2. Image Acquisition Setup and Dataset Processing

2.3. Convolutional Neural Network Architecture

3. Results and Discussion

4. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI