Regression algorithms-driven mechanical properties prediction of angle bracket connection on cross-laminated timber structures

The construction of structures using cross-laminated timber (CLT) has grown in popularity as a result of its environmentally friendly and high-strength characteristics. The primary function of angle bracket connections is to resist the force of CLT structures under horizontal forces, which is essential to ensure the seismic resilience and ductility of CLT structures. A regression algorithms-driven method for predicting the mechanical performance of angle bracket connections is introduced in this study. As input parameters, the geometric dimensions of the angle bracket connector, the connection method of the connector with the wall and floor slabs, and the properties of the CLT panel are utilized to predict the yield load, the maximal load, the initial stiffness, and the ductility ratio of the angle bracket connection. Prediction models were developed using the collected data from 110 angle bracket experiments


Introduction
The timber structure is frequently cited as the sustainable alternative to concrete and steel construction for its green resources, low gravity, and carbon storage.Mass timber projects with high construction efficiency, such as cross-laminated timber (CLT) structures, are increasing globally [1].CLT structures have been prefabricated wood solutions that have excellent seismic response due to the lightness of engineered CLT panels and the dissipative capacity of the connections.Over the last decade, various application examples of CLT could be found worldwide, like the 18-story building Mjøstårnet, completed in Brumunddal, Norway [2].As for the codification, European Committee for Standardization (CEN) drafted the second generation of Eurocode in 2012 [3], in which much work has gone into implementing the design rules of the CLT structure.Therefore, it can be observed that CLT structures have positive development prospects, and it is meant to improve the learning of the work properties of CLT structures and their components.
Connections are crucial in providing timber structures with strength, rigidity, stability, and ductility.Extensive research has shown that deformation in CLT structures arises mainly from the bending and slippage of metal connections [4].Angle bracket connection is a type of CLT structural connection typically evenly arranged along the wall to provide stiffness and strength in the shear direction.Therefore, it is of utmost significance to identify a method for predicting the mechanical performance of angle bracket connection to optimize the design of CLT structures and ensure their seismic safety.However, there are multiple damage mechanisms for angle bracket connections during operation, including the tearing of wood, deformation of steel members, and loss of nail bearing capacity, making evaluating the mechanical properties of angle bracket connections more complicated.
Previously, a considerable amount of research has been performed on the mechanical functionality of angle bracket connections.Gavric et al. [5] accomplished monotonic and low-cycle reversed loading tests on angle bracket connections of different sizes and with varying numbers of fasteners to evaluate and discuss their mechanical characteristics, such as energy dissipation, loss of strength, rigidity, stiffness, and ductility.The study suggested that it is necessary to put the resistance of the fasteners and the characteristics of the CLT walls into consideration when predicting the shear strength of the angle bracket connection.Mahdavifar et al. [6] investigated the influence of various wood densities on the properties of angle bracket joints by conducting shear and uplift experiments on angle bracket connections with two sets of conventional CLT panels and eight sets of hybrid CLT panels.The test findings revealed that if the damage of the screws or bolts penetrated the low-density core layer of the CLT panel, there was a substantial difference in connection efficacy between hybrid and conventional CLT panels.Rezvani et al. [7] built a three-dimensional (3D) numerical model of the angle bracket connection using commercial finite element software ABAQUS to simulate its mechanical properties under different loading combinations.They also introduced a 3D model of the fasteners to conduct a preliminary numerical simulation study of the angle bracket connection.The numerical modeling analysis indicated that replacing nails with screws and adding larger-sized screws did not noticeably improve the shear resistance of the connection.Pošta et al. [8] performed shear experiments on three types of angle bracket connections and compared the experimental results with Eurocode 5 (EC 5) [9].The results showed that the maximum loads obtained by EC 5 calculations were much higher than those obtained experimentally.The difference was even more remarkable for the angle bracket without a rib, which could be dangerous in practical applications.The above study shows that there is still much room for optimization of mechanical property prediction of angle bracket connections.Mechanical property tests and authentic numerical simulations are time-consuming and costly, so finding more efficient and accurate prediction methods is significant.
Machine learning (ML), a data-driven analytical approach, has become widely used in building construction design and performance evaluation in recent decades [10].Zhang et al. [11] used nine ML algorithms to build a reinforced concrete (RC) wall seismic performance prediction model based on 429 sets of RC wall test data, including the classification prediction of wall damage modes and the regression prediction of wall lateral stiffness and lateral displacement.Suzuki et al. [12] successfully classified wood damage locations using vibration waveforms combined with ML methods.The specimen waveforms were obtained by piezoelectric sensors, and a classification model was built using a neural network (NN).The results showed that NN could effectively improve the applicability of the wood health monitoring system, with an accuracy of 83.3% for the classification of damaged or undamaged locations.Luo et al. [13] proposed a local ML model named locally weighted least squares support vector regression machine (LWLS-SVMR) to enhance and generalize the estimation of drift capability of RC columns, and the effectiveness of LWLS-SVMR method was verified by comparing it with traditional empirical formulas.The above study shows that using ML in civil engineering has symbolic advantages.However, mechanical property prediction of CLT metal connections using ML algorithms has yet to be reported.
Based on the above research problems of angle bracket connections and the advantages of ML methods, this paper selects input variables and uses 110 sets of angle bracket connection tests and numerical simulation data collected to estimate the mechanical properties of angle bracket connections using four ML regression algorithms: random forest, support vector regression, gradient boosting and extreme gradient boosting.Furthermore, the prediction performance of ML for yielding load, maximum load, initial stiffness, and ductility ratio of angle bracket connections is evaluated.Lastly, this paper provides a parameter importance analysis of the input parameters, and the interpretability analysis of the prediction models is performed to validate the reliability of the prediction models.The method presented in this paper can automatically and efficiently predict the performance of the angle bracket connection, taking into consideration various factors that may affect the performances of the angle bracket connection; the parameter importance analysis and the interpretability analysis of the model serve as an optimization and guide for designing this connection in practical engineering.
The selection of input and output parameters is presented in the next section, and the statistical distribution of each parameter in the database used in this study is described.Then, the process of the proposed ML regression algorithm-driven method for predicting is detailed in the following section.The four regression algorithms and evaluation coefficients used in this study are also described in this section.Subsequently, the outcome of each algorithm and evaluation of the outcomes are discussed by assessing the coefficients.Finally, an interpretability analysis of the prediction model proposed in this paper is provided.

Selection of input and output parameters
Numerous studies have revealed that when the external load increases, the three phases of mechanical behavior that the angle bracket connection typically exhibits are the elastic, elastoplastic, and failure stages [14].Accordingly, the four mechanical properties of yielding load ( F y ), maximum load ( F m ), initial stiffness ( K e ), and ductil- ity ratio ( D ) are selected as the output variables for the angle bracket connection.The yielding displacement ( v y ) and maximum displacement ( v m ) can be obtained based on these output variables, and the simplified bilinear constitutive relationship of this angle bracket connection can be derived (Fig. 1), which will be fundamental to in both the design and research of the angle bracket connection.
To comprehensively quantify the angle bracket connection features, ten variables are selected as inputs in this paper.Four categories can be made up of the input feature variables.The first set of features corresponds to the geometric features of the angle bracket, including the width (B), length (P), height (H), and thickness (t) of the connector (Fig. 2); the second group of features is the thickness of CLT wall panel (T); the third group is related to the connection fasteners to the wall, including the self-tapping screw diameter ( S r ), the selftapping screw length ( S l ), the number of self-tapping screws ( S n ); the last group of features is related to the connection fasteners to the floor, with the ground connection bolts (or screws) diameter (B r ), the number of bolts (B n ).The units of variables B, P, H, t, T, S r , S l , and B r are mm.

Description of experimental database
This study collected 110 sets of shear tests [5,6, for angle bracket connections, including 107 sets of experimental data and 3 sets of numerical simulation data (Additional file 1).The distribution of the dataset is  shown in Fig. 3.The experimental data within the dataset stem from shear loading tests conducted along the direction of angle brackets.Monotonic or cyclic loading procedures were executed to derive a comprehensive set of mechanical performance parameters for angle brackets.This approach guarantees that the data within the database are amenable to integration for subsequent regression analyses.The minimum error between the numerical simulation results in the dataset and the experimental results is 0.1%, while the maximum error is 35.5%, with an average error of 19.0%.These data were utilized to create ML models to forecast the mechanical characteristics of angle bracket connections.
Figure 4 illustrates the statistical distribution of the input and output variables.The number of data points within the relevant interval is shown on the left y-axis., while the x-axis displays the range of values for the chosen variables.Accordingly, the equivalent cumulative probability is shown on the right y-axis.
Data preprocessing is required before training the prediction model using Scikit-learn to increase the prediction model's accuracy and stability.The data collected were normalized to a range of [0,1] to ensure comparability between features and eliminate the influence of magnitudes.The mean value was utilized to fill in the missing values based on the central tendency of the sample to handle any missing values in the collected information.In this study, while considering the detailed description of angle bracket connections concerning input parameters, a series of measures were taken to account for the influence of non-independent variables.Apart from feature engineering, such as parameter selection and data normalization, in the choice of regression algorithms, an emphasis was placed on selecting ensemble algorithms that exhibit robustness in handling correlation.Model performance with respect to correlation was further improved and validated through hyperparameter optimization and cross-validation techniques.

Regression algorithms-driven methodology for mechanical estimation of angle bracket
At present, experimental or numerical modeling analysis methods are mainly used to study the mechanical characteristics of angle brackets.In the shear test of angle bracket connection, the variety of parameters is easily limited due to the cost and time required.The mechanical characteristics of angle bracket connections can be effectively simulated using finite element analysis.However, detailed numerical simulations take much time for modeling calculations, and the parameters of metal fasteners and connected wood units are usually missing when performing simulations to improve efficiency.
In contrast, the prediction model of mechanical characteristics of angle bracket connection under shear established by ML can better analyze the interrelationship between parameters and accurately and quickly predict the mechanical properties of angle bracket connection under various conditions.

Framework
Figure 5 gives the proposed framework for predicting the angle bracket connection's mechanical properties using ML.The data collected are split into two parts at random: the training set, which makes up 70% of the total, is used to develop the prediction model, and the test set, which makes up 30% of the total, is used to measure the performance of the prediction model.The same training and test datasets are used for all algorithms to guarantee compatibility among ML algorithms.During model training, hyperparameter optimization is performed by a grid search to get the most remarkable performance out of the algorithms.Finally, the test set not involved in model training is used for prediction model performance evaluation, and the 10 feature values are analyzed for permutation importance and SHAP value.

Random forest
Random forest (RF) is a bagging algorithm based on decision trees [37], in which each iteration selects a subset of data with replacement and a subset of characteristics as inputs randomly (Fig. 6a).In regression, the "ensemble predictor" is created by averaging the output of individual decision trees ( h 1 (x)、h 2 (x)…h K (x) ) (Eq. 1).In each decision tree, the root node determines, in accordance with predefined criteria and conditions, which branch to follow, leading to the internal nodes.Based on the available features, these internal nodes perform assessments to create homogeneous subsets, which are denoted by leaf nodes (or terminal nodes).Since each decision tree is entirely random, compared with a single decision tree, a random forest reduces the possibility of overfitting and improves generalization ability:

Support vector regression
Support vector regression (SVR) is a variant of support vector machine (SVM) and has been extensively used in regression issues [38].The SVR model is used to find a suitable high-dimensional hyperplane that minimizes the total deviation of all samples from the hyperplane: (1)  The notation < •, • > denotes dot product, where w is the normal vector of the hyperplane and b is the bias term.
In the SVR model, a certain degree of tolerance deviation ε is given.When the absolute difference between f(x) and y is within ε , the loss value is not calculated, which is equivalent to creating a "margin strip" on both sides of the hyperplane (as shown in Fig. 6(b)), and only the samples falling outside the margin strip are used to calculate the loss.

Gradient boosting
Gradient boosting (GB) is a supervised ML algorithm that trains new weak learners by using the negative gradient information of the loss function of the present model [39].The existing model is then additively integrated with the trained weak learners (as shown in Fig. 6(c)).For a given training set (x, y) N i=1 , the GB algorithm uses K weak learners to fit the model f k (x): where h(x, θ) is a straightforward parameterized func- tion of input variables x , defined by parameters θ t ; the optimal step-size ρ should be given at each iteration.

Extreme gradient boosting
Extreme gradient boosting (XGB) is a scalable ML system for tree boosting.The target function is where it diverges most from the GB algorithm [40].
where l is the loss function used to measure the differ- ence between the true value y i of the i-th sample and its predicted value y i .The model complexity function is represented by .The XGB method adds a regularization parameter compared to the GB algorithm to address generalization problems and lessen model complexity (as shown in Fig. 6d).

Hyperparameter optimization
In the ML model-building process, besides the model parameters estimated by the model from the given data, some parameters that cannot be estimated from the given data, and these parameters are called hyperparameters.The choice of hyperparameters, which are used to control the ML procedure, can impact on the algorithm's robustness, stability, and generalization.Hyperparameter optimization is the task of finding the optimal combination of hyperparameter values to achieve the optimal performance of the model in a reasonable time.The grid search approach is applied for hyperparameter optimization in this paper.For the grid search method, a grid of possible values is created for the hyperparameters, and each iteration is tried in a specific order for the hyperparameter combinations.The performance of the trained model produced by each combination is recorded, and the best model with the best hyperparameters is returned at the end.Table 1 lists the results of hyperparameter optimization for each prediction model in this paper.

Evaluation metrics and model interpretability
In this work, the predictive capability of the models was assessed using an impartial testing set.As shown in Table 2, four evaluation metrics-mean absolute error (MAE), mean squared error (MSE), root mean square error (RMSE), and coefficient of determination (R squared, R 2 )-were used to evaluate the accuracy of each regression model.However, ML models generally boost their accuracy by increasing model complexity [41], which makes their operation uncertain.Therefore, in addition to verifying the certainty of prediction results, it is equally important to understand why the model makes such predictions and prevent model bias.Permutation importance and SHAP value analysis are techniques used to determine which characteristics affect model fitting most.Permutation importance randomly orders each feature and calculates model changes to overcome the drawback of default feature importance computed with mean impurity decrease.The SHAP explanation method, inspired by cooperative game theory [42], builds an additive explanation model that can reflect each feature's influence and positive or negative effects in each sample.

Prediction results for yielding load
Figure 7 compares the tested normalized yielding loads ( F yt ) and predicted normalized yielding loads ( F yp ) for both training and testing sets.The model performs more predictably the closer the data points are near the black line.The figure also includes lines for relative error of ± 15% and ± 30% in the testing set.Among the 110 test data points, the RF, SVR, GB, and XGB algorithms have 96, 56, 101, and 105 data samples within the ± 30% limit.Figure 8 presents the evaluation coefficients of MAE, MSE, RMSE, and R 2 for these models on the test dataset.
Based on the prediction results of angle bracket yielding load, R 2 values for the three ensemble models (RF, GB, and XGB) are all greater than 0.746, indicating that the ensemble models outperform the single model (SVR) (as shown in Fig. 7).However, relying solely on R 2 to determine prediction accuracy may not be proper, and additional evaluation metrics need to be considered.According to overall veracity and performance, XGB is an effective machine learning model for predicting angle bracket yielding load.
It is noteworthy that the RF model has MAE, MSE, RMSE, and R 2 of 0.0207, 0.0008, 0.0297, and 0.956 on the training set, respectively.However, the predicted results for the testing set have MAE, MSE, RMSE, and R 2 of 0.0514, 0.0096, 0.0983, and 0.746.This indicates that the RF model has poor generalization in predicting the yielding load of angle bracket connections.Therefore, more accuracy in the training set does not always equate to greater accuracy in the test set.Thus, evaluating the prediction model using the training set can reduce the tendency of model overfitting.

Prediction results for maximum load
The tested normalized maximum load ( F mt ) and pre- dicted normalized maximum load ( F mp ) for the train- ing and testing sets under various ML methods are displayed in Fig. 9.The figure also presents lines representing the relative error of ± 15% and ± 30% in the testing set.Among the 110 test data points, the RF, SVR, GB, and XGB algorithms have 97, 58, 89, and 103 data points within the ± 30% limit, respectively.The distribution of samples indicates that the predictions of RF, SVR, and GB have remarkable errors when the actual values are low, while the overall prediction accuracy of the XGB algorithm is high.
In addition, Fig. 10 shows the evaluation metrics, including MAE, MSE, RMSE, and R 2 , of these models on the testing set.The R 2 of the trained regression models (i.e., RF, SVR, GB, and XGB) are 0.872, 0.839, 0.846, and 0.939, indicating good prediction results [43].It ought to be noted that the MAE value of the XGB model (0.0362) is significantly better than that of the SVR model (0.0677) because the maximum load values have been normalized  Table 2 Evaluation metrics for the regression model y , y are the target value and the predicted value for the normalized test dataset, respectively; n is the number of samples in the test dataset; and y i , y i refer to the i-th target value and the i-th predicted value for the normalized test dataset, respectively

Evaluation metrics Definition
to a small range.Based on the predicted outputs in Fig. 9 and the generalization metrics in Fig. 10, it can be declared that the XGB model performs more accurately than the other three methods in predicting the maximum load of angle bracket connections, with MSE and RMSE values of 0.0024 and 0.0491.

Prediction results for initial stiffness
The actual normalized initial stiffness ( K et ) and the predicted normalized initial stiffness ( K ep ) of the training and testing sets under various ML methods are displayed in Fig. 11.The figure also presents straight lines indicating the relative error of ± 15% and ± 30% in the testing set.Among the 110 testing data markers, the RF, SVR, GB, and XGB algorithms have 85, 57, 94, and 96 data points within the ± 30% limit.The distribution of data points reveals that the XGB method has a more robust overall forecast accuracy, while the SVR prediction results have larger errors when the actual values are low.Moreover, Fig. 12 describes the performance metrics of MAE, MSE, RMSE, and R 2 these models on the test database.When predicting the initial stiffness of the connection, the MSE and RMSE of the RF algorithm are 0.0164 and 0.1283, and those of the GB algorithm are 0.0148 and 0.1217, which are significantly inferior to those of the SVR and XGB algorithms.The R 2 values of the four regression models (i.e., RF, SVR, GB, and XGB) are 0.642, 0.602, 0.678, and 0.809, respectively.The XGB algorithm has far better generalization performance than the other three algorithms.Therefore, among these four regression models, the XGB algorithm performs best for predicting the initial stiffness of the angle bracket connection.

Prediction results for ductility ratio
Figure 13 depicts the predicted normalized ductility ratio ( D p ) and the tested normalized ductility ratio ( D t ) of training and testing sets.The testing set's 15% and 30% relative error lines are also shown in the image, with RF, SVR, GB, and XGB algorithms having 81, 78, 81, and 87 data sets that fall inside the 30% limitations, respectively.It can be observed from the distribution of data sets that SVR and GB models have larger errors in predicting when the actual values are low, while the overall prediction accuracy of the XGB algorithm is higher.
Figure 14 shows the assessment indices of MAE, MSE, RMSE, and R 2 of these models on the testing set.When predicting the ductility ratio of the angle bracket connection, the R 2 value of the single model (SVR) is 0.623, which is better than that of the ensemble models (RF, GB, XGB) in terms of generalization performance.Based on the general performance and precision, SVR is a valuable ML model for estimating the ductility ratio of angle bracket connections.

Interpretability of prediction model
The XGB model was used to conduct permutation importance and SHAP analysis on the predicted results of the mechanical shear properties of angle bracket connections based on the prediction models in the previous section.The analysis results are shown in Figs. 15 and 16.From the permutation importance analysis results, it can be seen that the width of the angle bracket connector and the number of screws connecting it to the wall panel have the greatest impact on the maximum load and initial stiffness of the connection, with importance coefficients of 44.5% and 24.5%, respectively.Furthermore, the number of bottom anchoring devices and the thickness of the angle bracket have the most significant impact on the yielding load and ductility ratio.Based on the SHAP analysis results (Fig. 16), it is evident that the number of fastens used to connect the angle bracket to the wall panel has an enormous influence on the yield strength of the connection.The width of the angle bracket connection is found to have the highest sensitivity with regard to the maximum load and initial stiffness.Additionally, the thickness of the angle bracket connector is observed to have the greatest effect on the ductility coefficient.By comparing the SHAP values of various parameters, it can be concluded that, for maximum load and initial stiffness, the number of self-tapping screws, the width of the angle bracket, and the number of bolts have a more significant influence compared to the other parameters.Regarding the maximum load of the connector, the width of the angle bracket has a greater influence than the length and thickness of the angle bracket.However, for the ductility coefficient, the length and thickness of the angle bracket have a more substantial effect than the angle bracket width.
It is crucial to note that permutation importance evaluates the impact of a feature on model performance by randomly shuffling the feature values, while the core idea of SHAP is to calculate the marginal contribution of a characteristic to the model output.When predicting the yielding load of the angle bracket connection, a comparison of the two methods reveals a considerable difference in the impact of the thickness of the angle bracket and the number of self-tapping screws connecting it to the wall panel.Because permutation feature importance mainly measures the model prediction error through single perturbation to determine the importance of features, it cannot consider the correlation between factors.In addition, the thickness of the angle bracket connection in the database is relatively fixed, so the model error will be smaller when perturbing this feature.As a post hoc explanation method for Fig. 11 Comparison of tested initial stiffness and predicted initial stiffness of different algorithms Fig. 12 Evaluation metrics of various ML models for predicting initial stiffness models, SHAP analysis provides local and global explanations for the "black box".According to the SHAP analysis approach, the angle bracket connection's thickness affects yield load less than the number of screws used to attach it to the wall panel.Self-tapping screws play a role in bearing shear forces when the angle bracket connection is subjected to shear, so the number of self-tapping screws more strongly influences the yielding load of the angle bracket connection than by its thickness.

Discussion
In previous studies on the mechanical properties of angle bracket connections, experimental or numerical modeling analysis methods were typically used.However, the impact of each feature value on mechanical properties was challenging to quantify while incurring high computational time and cost.This work demonstrated that it is feasible to develop a prediction model for the mechanical shear properties of angle bracket connectors using regression algorithms.When performing regression predictions on the yielding load, maximum load, and initial stiffness, ML showed good generalization performance.But when predicting the ductility ratio, the best model achieved an R 2 of 0.623, indicating significant room for improvement, likely due to limited sample diversity in the current dataset.In addition, due to the limitation of the dataset, the type of steel used for angle bracket, anchor and bolt were not considered with regard to the type of wood used for CLT panels, but this study confirms the feasibility of the prediction method by analyzing the available data.Therefore, expanding the dataset in future studies can effectively improve accuracy.
In the context of this study, the principal objective was predicting the mechanical performance of angle brackets, and the algorithms employed were unsuitable

Conclusions
This study was based on a database containing 110 sets of angle bracket shear test data and used ML methods to establish predictive models for the yielding load, maximum load, initial stiffness, and ductility ratio of angle bracket connections under shear.The generalization performance and prediction accuracy of different ML methods were analyzed and compared, and thrpretability of ML methods was studied.The results of this study show that: 1. XGB algorithm has the highest accuracy in predicting the yielding load and initial stiffness of angle bracket connections, with R 2 values of 0.969 and 0.809.In addition, higher certainty in the training dataset does not automatically imply higher certainty in the test dataset.Evaluating the predictive models with an independent training set can reduce the tendency of model overfitting.2. RF, SVR, GB, and XGB algorithms perform well in predicting the maximum load of angle bracket connections, with evaluation coefficients MSE smaller than 0.068 and R 2 greater than 0.830.3. A single model (SVR) has better generalization performance than ensemble models (RF, GB, XGB) in predicting the ductility ratio of angle bracket connections and is an effective machine learning model for predicting the ductility ratio.

Fig. 4 Fig. 5
Fig. 4 Statistical distribution of input and output variables, depicting minimum (Min), maximum (Max), mean (Mean), and standard deviation (St.Dev) values for a comprehensive overview of the dataset's characteristics

Fig. 6
Fig. 6 Diagram of ML algorithm.In a, c, and d, the blue dots represent the root nodes of the decision tree, initiating branching based on specific conditions.These branches, termed 'directed edges' in decision tree terminology, are visualized as one-way arrows.Branches from root nodes lead to internal nodes, as represented with yellow dots, and subsequently to the next level of internal nodes (green dots in d) until a stopping condition is met.The final level of nodes, denoted by red dots, represents terminal nodes

Fig. 7 Fig. 8
Fig. 7 Comparison of tested yielding load and predicted yielding load of different algorithms

Fig. 9
Fig. 9 Comparison of tested maximum load and predicted maximum load of different algorithms

Fig. 13 Fig. 14
Fig.13 Comparison of tested ductility ratio and predicted ductility ratio of different algorithms

Fig. 15
Fig. 15 Relative importance of input features

Table 1
Hyperparameter optimization for prediction models