 Original Article
 Open Access
 Published:
Sensitivity of censored data analysis to determine the characteristic value of structural timber
Journal of Wood Science volume 66, Article number: 39 (2020)
Abstract
In structural timber tests, unintended failure mechanisms occur frequently in specimens and their results are called censored data. There are two censored data analysis: censored maximum likelihood estimation (CMLE) and Kaplan–Meier (KM) method. In this study, the precision of the censored data analysis was investigated to determine the characteristic value, 5th percentile value, of the structural timber. The results show that (1) the 5th percentile value was underestimated by ordinary data analysis methods; maximum likelihood estimation (MLE) and Order statistics. (2) CMLE with 30% lower tail censored data and KM method provided much more precise 5th percentile value. (3) The amount of undermeasurement (5 MPa, 10 MPa, and 15 MPa in this simulation study) did not show significant effect on the 5th percentile determination in CMLE and KM method, but the proportion of censored data (percentage of unintended failure specimen; 10%, 20%, 30%, and 40%) affected the determination of 5th percentile value. (4) CMLE with 30% lower tail censored data and KM method showed good agreement in case that the data included unintended failure data up to 20%.
Introduction
During fullscale timber tests, we expect a failure mode intended. However, some specimens showed a different failure from the intended ones in many cases. For example, in tension tests of structural timber, grip failures (unintended failure) happened [1]. The tension grip can damage the timber during the test. The tension grip damage can lead to grip failure. Especially, in the case of a specimen that has knots around the grip, a failure in tension test often occurred around the grip. In this case, the measured strength may be lower than the actual strength which we need to get. As a result, the specimens of grip failure may lead to underestimation of the 5th percentile value. Grip failure means that the tension test was censored prior to reaching the failure of intent.
The tensile strength of a specimen that failed in grip cannot be removed in a statistical analysis, because it makes an effect on the sample size and cumulative probability. If those are removed in statistical analysis, a bias can be included in the 5th percentile value determination. Also, the censored data of grip failure can be regarded as correct test results. This may lead to underestimation of the characteristic value. Fortunately, this is acceptable in conservatism, but it is not the precise characteristic value.
There are two statistical methods for analyzing the samples with censored data. At first, several researchers [2,3,4,5] considered the censored maximum likelihood estimation (CMLE) in wood research. The CMLE is a parametric estimation and the parameters are derived by fitting for a certain type of distribution. Yeh and Williamson [2] derived the glulam shear strength from shear failure data and from all data (including censored data). The CMLE provided higher characteristic values than maximum likelihood estimation (MLE) with shear failureonly data. Pang et al. [3] evaluated the rolling shear resistance of hybrid crosslaminated timber (CLT) from fourpoint bending test setup. He reported that some specimens showed bending failure which was not the intended failure mode. Li [4] also reported that either rolling shear or tensile failure happened at the same CLT bending test. Kohler and Faber [5] applied the CMLE for quality control procedures of timber grading. They showed that refinement of the strength distribution by the CMLE in the lower tail distribution. Steiger and Kohler [6] applied the CMLE for deriving the characteristic value of axially loaded steel rods in glulam (gluedin rods). They investigated the effects of sample size and sample variation by analyzing the Monte Carlogenerated samples. They reported that for small samples or samples with bigger coefficient of variation (COV), higher than 15%, the CMLE method can be a useful tool, because it uses the all information derived from the experimental test.
Second statistical methods for analyzing samples with censored data are Kaplan–Meier (KM) method. Kaplan and Meier [7] introduced as the nonparametric estimation for censored data analysis. This is the most frequently used method to estimate the survival function from lifetime data in medical research [8, 9]. The KM method is also used in nonmedical research, such as calculation of the timetofailure of machine or measurement of the unemployment period of people [10]. Chastain et al. [11] applied the KM method to characterize the probability of strand thickness in oriented strand board (OSB). Link and DeGroot [12] used the KM method to investigate the lifetimes of wood stakes and the effectiveness of wood preservative.
As mentioned above, several researchers tried to use the CMLE and KM method. However, to use the censored data analysis in structural wood evaluation, the accuracy of the CMLE and KM method should be investigated. Especially, we do not have detailed information for applying the approach; what cases we can use the censored data analysis in structural test of wood and how many censored data can be included. Therefore, in this study, the sensitivity of the censored data analysis was investigated based on datasets simulating tensile strength. Specifically, the aim of this study was (1) to find out the most precise analysis method for determining 5th percentile value of structural timber, and (2) to analyze the characteristic of each method depending on the proportion of unintended failure specimens or the amount of undermeasured strength by unintended failure mechanisms in timber strength test.
Materials and methods
Dataset preparation and simulating tensile strength
To investigate the precision of censored data analysis, an ideal dataset of 100 tensile strength data was generated by Monte Carlo simulation based on the previous longitudinal tensile strength test result of Zhu et al. [13], and it was assumed that there were no unintended failure specimens. The distribution was derived by testing with 5 mm thick (tangential direction) × 25 mm width (radial direction) × 120 mm length (longitudinal direction) size of Japanese larch (Larix kaempferi) and fitted by 2P Weibull. The shape and scale parameter of 2P Weibull were 4.22 and 114.0, respectively [14]. By repetition, 10 ideal datasets were simulated. The 100 tensile strength data were generated by inverse transform method [15]. Equation (1) shows how to generate the tensile strength. It was repeated 100 times to generate 100 tensile strengths. The probability (p) was randomly selected from 0 to 1 by using random function in Excel software [16]:
where \(x\) is tensile strength (MPa); \(\alpha\) is shape parameter; \(\beta\) is scale parameter; \(p\) is a random variable ranging from 0 to 1.
Figure 1 shows the overall process for generating the ideal datasets and the censored datasets. The censored datasets were generated from each ideal dataset; at first a certain proportion of data 0%, 20%, 30% and 40%) was randomly selected in an ideal dataset, then a certain number (5, 10 and 15 Pa) was subtracted from the selected strength data. In a tensile test, a failure in the tension grip makes the specimens’ strength undermeasured, because the specimen may be broken due to damage by the grip before reaching the actual tensile capacity of the specimen. Thus, if a specimen fails at grip, the actual tensile strength of the specimen may be higher than the measured strength. In this study, to simulate undermeasurement by grip failure, some data were randomly selected from the ideal dataset and a certain number was subtracted. Thus, the subtraction means the undermeasured strength caused by grip failure in a fullsize timber tension test.
For a specimen that failed tension grip, it is impossible to qualify how much the tensile capacity of the specimen was reduced than its actual tensile capacity. Thus, the undermeasurement cannot be generated stochastically. In this study, three levels of undermeasurement strength (5, 10 and 15 MPa) were assumed, and subtracted from the randomly selected data (10%, 20%, 30% and 40%) in the ideal dataset. The subtracted strength data were regarded as censored data. For each ideal dataset, these procedures were repeated.
Finally, 10 ideal datasets were prepared. Also, 120 censored datasets (10 ideal datasets × 4 proportions of censored data × 3 subtraction in strength) were prepared. The sensitivity of the censoring proportion on precision was investigated by comparing the 5th percentile value of ideal dataset and 5th percentile value of censored data determined by 3 different censored data analysis methods. Also, the precision was investigated in this study.
Determination of characteristic values
For ideal datasets, 5th percentile value was determined for each datasets by MLE and it can be regarded as exact 5th percentile value. These 5th percentile values were used to evaluate the precision of censored data analysis.
For the censored datasets, grip failure data, ordinary data analysis methods were applied at first; maximum likelihood estimation (MLE) and Order statistics. The MLE requires a parametric distribution, and Weibull distribution was used for tensile strength distribution of structural lumber [17,18,19]. Also, it is known that most strength properties of structural lumber well fit the Weibull distribution. Many theories such as size effect, load configuration effect, have been developed by assumption that the strength well fits the Weibull distribution. Thus, the censored dataset was fitted to Weibull distribution by MLE (Eq. 2), and 5th percentile value (5%_{CD,MLE}) was determined by Eq. 3. In Eqs. 2b and 2c, the parameters were determined using Excel solver [20,21,22]. Order statistics determined the 5th percentile value (5%_{CD,Order}) by choosing the 5th lower strength. The 5%_{CD,MLE} and 5%_{CD,Order} intended to simulate the case that grip failure specimens (unintended failure) were regarded as normal test results (tension failure), and these might be lower than 5%_{ideal, MLE} or 5%_{ideal, Order}.
Maximum likelihood estimation (MLE)
where \(f (x_{i} \alpha ,\beta )\) is the probability density function of Weibull distribution with parameter \(\alpha\) and \(\beta\); \(x_{i}\) is the tensile strength of ith specimen; \({\text{F}}^{  1} \left( {\alpha ,\beta ,0.05} \right)\) is the inverse Weibull distribution function with parameter α, β, and lower 5th percentile of cumulative distribution.
For more precise analysis, censored maximum likelihood estimation (Eq. 4) and Kaplan–Meier method (Eq. 5) were applied to censored datasets. Also 5th percentile values were determined by each methods. In this kind of test, we can know which data were obtained by normal failure (intended failure, uncensored data) and which data by unintended failure (censored data). Therefore, censored data and uncensored data in a censored dataset were grouped. The likelihood function for normal failure data must be the same as MLE analysis, probability function of the Weibull distribution. But the likelihood function for censored data must be the probability that the distribution is larger than the observation as shown in Eq. 4b. This is called as censored MLE method [6]. The parameters in Eqs. 4c and 4d were determined using the Excel solver in the same way as the parameters in Eqs. 2b and 2c. By this process the Weibull distribution was fitted, and the 5th percentile value was determined (5%_{CD,CMLE_all}).
In determination of 5th percentile value, lower tail is much more important than upper value. Faber et al. [23] investigated the effect of lower tail data by means of the CMLE to estimate the bending strength distribution of graded timber. Since a small number of observations in censored data cause statistical uncertainty, a threshold is required, and they concluded that the use of the lower 30% data in censored data was reasonable as the threshold. In this study, the 30% lower tail in censored data was also used in CMLE and compared with other analysis methods, and the 5th percentile value (5%_{CD,CMLE_30}) was also determined in the same manner.
Censored MLE (CMLE)
where \(f (x_{i} \alpha ,\beta )\) is the probability density function of Weibull distribution with parameter \(\alpha\) and \(\beta\). \(F (X < x_{i} \alpha ,\beta )\) is the cumulative distribution function of Weibull distribution with parameters \(\alpha\) and \(\beta\); \(u_{i}\) is the tensile strength of exactly observed ith specimen (uncensored data); and \(c_{i}\) is the tensile strength of censored data (bending failure specimen in shear test).
Kaplan–Meier method, the number of event to the total number of observations (incidence rate) is calculated in ascending order of time, and the probability that life is longer than a certain time can be calculated by the empirical survivor function (Eq. 5) [7]. In this study, the domain of time was placed by tensile strength, and the 5th percentile value (\(5{\text{\% }}_{{{\text{CD}}, {\text{KM}}}} )\) was selected in the ascending order of strength.
Kaplan–Meier method (KM)
where t_{i} is tensile strength of ith specimen in ascending order; \(d_{i}\) is the number of specimens failed by correct failure mode (tensile failure) at strength \(t_{i}\); and \(n_{i}\) is the number of specimens that had higher strength than ith specimen.
Precision according to analysis methods
To find out the precision of the analysis methods, differences between the real 5th percentile value for ideal datasets (5%_{ideal}) and the 5th percentile values for censored datasets (5%_{CD}) were calculated by Eq. 6. In this study, 10 ideal datasets which did not include any censored data were prepared. The 12 censored datasets were developed from each ideal dataset by changing the proportion and the strength subtraction of randomly selected data in the ideal dataset. Five data analysis were applied to derive 5th percentile value of the censored datasets (Table 1). As a result, six hundred 5%_{CD} values (120 censored datasets × five 5%_{CD} values) of censored datasets were compared to the 5%_{ideal} of their ideal datasets before the censored data were generated.
where \(5{\text{\% }}_{{{\text{CD}},{\text{method}}}}\) is the 5th percentile value of the censored dataset, and \(5{\text{\% }}_{{{\text{ideal}},{\text{method}}}}\) is the 5th percentile value of the ideal dataset before censored data were generated.
Results and discussion
Ideal datasets
Figure 2 shows the cumulative strength distribution of the generated ideal datasets from the tensile strength distribution. Each ideal dataset has 100 strength data. Two kinds of 5th percentile value were derived from the ideal datasets (Table 2). The average value of 5th percentile value from MLE (5%_{ideal,MLE}) was 48.6 MPa. The average 5th percentile value from Order statistics (5%_{ideal,Order}) was 47.6 MPa. The minimum and maximum value of the 5th percentile values were 42.0 and 55.3 MPa from MLE, respectively. The minimum and maximum were 42.0 and 53.2 MPa from Order statistics. Thus, the two methods showed small difference of 0 ~ 2 MPa.
Censored datasets
Figure 3 shows the differences of 5th percentile value between the ideal datasets and the censored datasets. The machine measurement in which the grip failure occurred in tension test must be lower than the tensile resistance of the specimen if it failed by the tension fracture. The censored data were simulated for the grip failure. If the grip fail data in 5th percentile value estimation are regarded as normal test results, the 5th percentile value must be underestimated. This is acceptable in conservatism when allowable strength is determined. Therefore, this approach is used in many cases. In this clause, it was investigated how much underestimation is made. The correct 5th percentile value can be assumed as 5%_{ideal,MLE} or 5%_{ideal,Order}. The conservative estimation can be assumed as 5%_{CD,MLE} or 5%_{CD,Order}. The difference between correct 5th percentile values and the conservative 5th percentile values was analyzed.
As Fig. 3 shows, the 5th percentile values decreased with the increase of the proportion of censored data. Also, as the strength subtraction increased from 5 MPa to 15 MPa, 5th percentile value (5%_{CD,MLE}) was decreased. When the proportion of censored data was 10%, 5th percentile value (5%_{CD,MLE}) was decreased from 1.0% (5 MPa strength subtraction for censored data) to 3.9% (15 MPa strength subtraction for censored data).
In nonparametric approach, the same trend was found. The 5th percentile values for the censored datasets by Order statistics (5%_{CD,Order}) was approximately 0 ~ 3% lower than the 5% values for ideal datasets (5%_{ideal,MLE}). When the proportion of censored data was 10%, the 5%_{CD,Order} was decreased from 3.2% (5 MPa strength subtraction for censored data) to 6.6% (15 MPa strength subtraction for censored data).
From this comparison, if the amount of undermeasurement by grip failure is large, the 5th percentile value can be seriously underestimated. Also, if the proportion of grip failure is large, the similar large amount of underestimation will occur.
Sensitivity in 5th percentile value determination according to censored data analysis
To estimate more precise 5th percentile determination, three censored data analysis approaches were used to determine the 5th percentile values of the censored datasets; CMLE with all censored data, CMLE with 30% lower tail censored data, and KM method. The 5th percentile values derived by the three analysis were compared with the ideal 5th percentile values, which was simulated the case that there was no grip failure. The ideal 5th percentile value was determined by MLE with ideal datasets. Figure 4 shows the differences between ideal 5th percentile and 5th percentile for censored datasets.
In fullsize timber test including unintended failure specimens, we know how many specimen failed by unintended failure, but we cannot know how low the strength was measured comparing with ideal strength which would be measured if it failed by intended failure mode. Therefore, the amount of undermeasurement in censored data should not be sensitive in 5th percentile value determination. In this study, it was investigated by applying three different subtractions (5 MPa, 10 MPa, and 15 MPa) in censored data. Also, four level of proportion of censored data (10%, 20%, 30%, and 40%) was reflected in censored data sets, and the sensitivity of the amount of undermeasurement and the proportion of censored data was investigated. As Fig. 4 shows, the proportion of censored data makes large differences in precision. This means that the precision depends on how many specimens failed by unintended failure mode. But the amount of subtraction did not show large differences. This means that the undermeasurement did not make significant effect on the 5th percentile value even though the data include unintended failure specimen (e.g., grip failure).
Precision according to censored data analysis
Figure 5 shows the comparisons between statistics analysis for censored datasets. Each point is average of thirty 5th percentile values for the same proportion of censored data; 10 ideal datasets × 3 strength subtraction. MLE and Order statistics are calculated with assumption that censored data (Grip failure data) are regarded as normal data (tension failed data). As expected, MLE and Order statistics underestimated the 5th percentile value as the proportion of strength subtraction increases. On the contrary, CMLE and KM method did not underestimate the 5th percentile value. Out of the three censored data analysis, CMLE with all censored data showed overestimation rather than underestimation. However, CMLE with 30% lower tail censored data showed a good agreement in smaller proportion than 30%. From this comparison, it was concluded that CMLE with 30% lower tail censored data is appropriate for structural lumber test in parametric approach. Nonparametric approach, KM method, showed a good agreement up to 20% proportion of censored data.
This comparison means that the 5th percentile value can be determined with more precision by CMLE with 30% lower tail censored data or KM method than other ordinary analysis (MLE or Order statistics). The CMLE or KM method is not recommended, when higher than 30% specimens showed different failure mode than intended failure.
Conclusions
In testing a fullsize structural timber, unintended failure mode can be found very often, such as grip failure in tension test, bending failure in rolling shear test of crosslaminated timber and lumber failure in finger joint test. In this study, it was intended to find the condition to be able to apply the censored data analysis for more precise 5th percentile determination. With the ideal tension test data, the censored data (unintended failure) were simulated by reducing constant strengths (5 MPa, 10 MPa, 15 MPa) from randomly selected data. To reach the aim of this study, the proportion of censored data and the amount of undermeasurement (subtraction amount) were investigated by comparing the precisions of censored data analysis. In this study, the below conclusions were found from a hypothesis that the constant strengths (5 MPa, 10 MPa, 15 MPa) were underestimated when a specimen failed in grip.
 1.
If 5th percentile value is determined by MLE or Order statistic without consideration of the censored data (unintended failure, e.g., grip failure), it can be underestimated.
 2.
CMLE with 30% lower tail censored data and KM method provided much more precise 5th percentile value.
 3.
The amount of undermeasurement (5 MPa, 10 MPa, 15 MPa in this simulation study) did not show significant effect on the 5th percentile determination in CMLE and KM method, but the proportion of censored data (percentage of unintended failure specimen) makes large effect on the determination of 5th percentile value. Fortunately, the amount of undermeasurement cannot be known, but the proportion of censored data is known information in real test.
 4.
CMLE with 30% lower tail censored data and KM method showed good agreement in case that the data included unintended failure data up to 20%.
Availability of data and materials
Not applicable.
Abbreviations
 CLT:

Crosslaminated timber
 CMLE:

Censored maximum likelihood estimation
 COV:

Coefficient of variation
 KM:

Kaplan–Meier
 MLE:

Maximum likelihood estimation
 OSB:

Oriented strand board
References
Pang SJ, Oh JK, Hong JP, Lee SJ, Lee JJ (2018) Stochastic model for predicting the bending strength of gluedlaminated timber based on the knot area ratio and localized MOE in lamina. J Wood Sci 64:126–137
Yeh B, Williamson TG (2001) Evaluation of glulam shear strength using a fullsize fourpoint test method. In: Proceedings of CIB W. 18, International Council for Research and Innovation in Building and Construction, Venice, Italy, 22–24 August 2001
Pang SJ, Lee HJ, Yang SM, Kang SG, Oh JK (2019) Moment and shear capacity of Plylam composed with plywood and structural timber under outofplane bending. J Wood Sci. https://doi.org/10.1186/s1008601918478
Li M (2017) Evaluating rolling shear strength properties of crosslaminated timber by shortspan bending tests and modified planar shear tests. J Wood Sci 63:331–337
Köhler J, Faber MH (2003) A probabilistic approach to cost optimal timber grading. In: Proceedings of CIB W. 18, International Council for Research and Innovation in Building and Construction, Colorado, USA, 11–14 August 2003
Steiger R, Köhler J (2005) Analysis of censored dataexamples in timber engineering research. In Proceedings of CIB W. 18, International Council for Research and Innovation in Building and Construction, Karlsruhe, Germany, 29–31 August 2005
Kaplan EL, Meier P (1958) Nonparametric estimation from incomplete observations. J Am Stat Assoc 53(282):457–481
Rich JT, Neely JG, Paniello RC, Voelker CC, Nussenbaum B, Wang EW (2010) A practical guide to understanding Kaplan–Meier curves. Otolaryngology Head Neck Surg 143(3):331–336
Colosimo E, Ferreira FV, Oliveira M, Sousa C (2002) Empirical comparisons between Kaplan–Meier and Nelson–Aalen survival function estimators. J Stat Comput Simul 72(4):299–308
Meyer BD (1990) Unemployment insurance and unemployment spells. Econometrica 58(4):757–782
Chastain JS, Young TM, Guess FM, León RV (2009) Using reliability tools to characterize wood strand thickness of oriented strand board panels. Int J Reliab Appl 10(2):89–99
Link CL, DeGroot RC (2007) Predicting effectiveness of wood preservatives from small sample field trials. Wood Fiber Sci 22(1):92–108
Zhu J, Kudo A, Takeda T, Tokumoto M (2001) Methods to estimate the length effect on tensile strength parallel to the grain in Japanese larch. J Wood Sci 47(4):269–274
Moshtaghin AF (2016) Stochastic analysis of clear timber as a structural material. Doctoral dissertation, Ecole Polytechnique Fédérale de Lausanne, Switzerland, p 43
Lee JJ, Park JS, Kim KM, Oh JK (2005) Prediction of bending properties for structural glulam using optimized distributions of knot characteristics and laminar MOE. J Wood Sci 51:640–647
Quirk TJ (2016) Excel 2016 for educational and psychological statistics: a guide to solving practical problems. Springer, Berlin, p 27
Takeda T, Hashizume T (1999) Differences of tensile strength distribution between mechanically high grade and low grade Japanese larch lumber. 1. Effect of length on the strength of lumber. J Wood Sci 45:200–206
Takeda T, Hashizume T (1999) Differences of tensile strength distribution between mechanically high grade and low grade Japanese larch lumber 2: effect of knots on tensile strength distribution. J Wood Sci 45:207–212
Takeda T, Hashizume T (2000) Differences of tensile strength distributions between mechanically highgrade and lowgrade Japanese larch lumber 3: effect of knot restriction on the strength of lumber. J Wood Sci 46:95–101
Barati R (2013) Application of excel solver for parameter estimation of the nonlinear Muskingum models. KSCE J Civil Eng 17:1139–1148
Chandrakantha, L (2011) Using Excel Solver in optimization problems. In: Proceedings of the Twentythird Annual International Conference on Technology in Collegiate Mathematics, Denver, Colorado, USA, 17–20 March 2011
Häcker J, Ernst D (2017) Financial Modeling: An Introductory Guide to Excel and VBA Applications in Finance. Palgrave Macmillan, London
Faber MH, Köhler J, Sorensen JD (2004) Probabilistic modeling of graded timber material properties. J Struct Saf 26:295–309
Acknowledgements
(1) This work was supported by Research Resettlement Fund for the new faculty of Seoul National University. (2) This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2017R1A2B1010748).
Funding
This project funded by Seoul National University and the National Research Foundation of Korea (NRF).
Author information
Authors and Affiliations
Contributions
SJP analyzed the data and wrote this manuscript. HJL contributed to develop data analysis tools. KSA contributed to develop the structure of manuscript and reviewed this manuscript. JKO designed this research project, managed this research and approved the final manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare they have no competing interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Pang, SJ., Lee, HJ., Ahn, KS. et al. Sensitivity of censored data analysis to determine the characteristic value of structural timber. J Wood Sci 66, 39 (2020). https://doi.org/10.1186/s10086020018850
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s10086020018850
Keywords
 Characteristic value
 Structural timber
 Tensile strength
 Censored data
 Maximum likelihood estimation
 Kaplan–Meier method