### Grey relation analysis

GRA analysis process is to unify the data to an approximate range and calculate the grey relational grade between parameters through the data change trend of each parameter [25]. To reduce the absolute numerical difference of data caused by different dimensions of parameters, normalization is needed before the GRA analysis. Normalization usually uses min–max normalization or mean normalization in GRA [26]. This study selected the process of averaging data, as shown in the following equation:

$$f(x(k)) = \frac{x(k)}{{\overline{x} }} = y(k),\quad \overline{x} = \frac{1}{n}\sum\limits_{k = 1}^{n} {x(k)}$$

(1)

where *x*(*k*) is a vector of normalized parameters, *k* is the *k*^{th} data in normalized parameters vector, \(\overline{x}\) is the mean value of data in normalized parameter vector. Grey relation analysis is shown in the following equation:

$$\zeta_{i} (k) = \frac{{\min_{i} \min_{k} \left| {x_{0} (k) - x_{i} (k)} \right| + \rho \cdot \max_{i} \max_{k} \left| {x_{0} (k) - x_{i} (k)} \right|}}{{\left| {x_{0} (k) - x_{i} (k)} \right| + \rho \cdot \max_{i} \max_{k} \left| {x_{0} (k) - x_{i} (k)} \right|}}$$

(2)

where \(x_{0} (k)\) is a vector of target parameters, \(x_{i} (k)\) is the *i*^{th} parameter vector, and *ρ* is discrimination coefficient, usually the smaller the resolution coefficient, the better the resolution, typically 0.5 [27].

### Support vector regression algorithm

SVR has advantages in small sample, non-linear and high-dimensional pattern recognition [28]; therefore, SVR has excellent prediction effect between particle gluing production parameters and IB of PB.

Given a training sample set on \(T = \{ ({\varvec{x}}_{1} ,y_{1} ),({\varvec{x}}_{2} ,y_{2} ), \cdot \cdot \cdot ,({\varvec{x}}_{{\varvec{n}}} ,y_{n} )\}\) a feature space, where *x*_{i} ∈ *R*^{n} is the input sample vector, *y*_{i} ∈ *R*^{n} is the corresponding output sample, and *n* is the number of training data. Regression function as shown in the following equation:

$$f(x) = \omega^{T} \varphi (x) + b$$

(3)

where \(\omega \in R^{n}\) represents the weight vector, \(\varphi (x)\) represents the nonlinear mapping function, and b represents the deviation, so as to obtain the minimum structural risk of the regression function. The optimal solution of SVR regression problem can be obtained by introducing relaxation variable \(\xi_{i} \ge 0,\xi_{i}^{ * } \ge 0\) [29], as shown in Eqs. 4–5:

$$\min_{(\omega ,b,\xi )} \frac{1}{2}\left\| \omega \right\|^{2} + C\sum\limits_{i = 1}^{n} {(\xi_{i} + \xi_{i}^{ * } } )$$

(4)

$$s.t.\left\{ \begin{gathered} y_{i} - \omega^{T} \varphi (x) - b \le \varepsilon + \xi_{i} \hfill \\ \omega^{T} \varphi (x) + b - y_{i} \le \varepsilon + \xi_{i}^{ * } \hfill \\ \xi_{i} ,\xi_{i}^{ * } \ge 0,i = 1,2, \cdot \cdot \cdot ,n \hfill \\ \end{gathered} \right.$$

(5)

where *C* is the penalty factor, indicating the correlation between the empirical error of the model and the smoothness. *ε* is a prescribed parameter, and the Lagrange multiplier is used to solve the bi-objective optimization problem. By introducing the Lagrange multiplier (\(a_{i}\) and \(a_{i}^{*}\)) [30], the regression function is obtained by dual solution as shown by Eq. 6,

$$f(x) = \omega^{T} \varphi (x) + b = \sum\limits_{i = 1}^{n} {(a_{i} - a_{i}^{ * } )} [\varphi (x_{i} )^{T} \cdot \varphi (x_{j} )] + b$$

(6)

Introducing \(\varphi (x)^{T} \varphi (x)\) in the kernel function \(K(x_{i} ,x_{j} )\) substitution, as shown by Eq. 7,

$$f(x) = \sum\limits_{i = 1}^{n} {(a_{i} - a_{i}^{ * } )} K(x_{i} ,x_{j} ) + b$$

(7)

Since the Gaussian Radial Basis Function (RBF) has good generalization, nonlinear prediction performance and less adjustment parameters [31], this paper selects RBF as the kernel function. RBF function as shown by Eq. 8,

$$K(x_{i} ,x_{j} ) = \exp \left( {\frac{{ - \left\| {x_{i} - x_{j} } \right\|}}{{2\sigma^{2} }}} \right)$$

(8)

To improve the prediction accuracy of the model, it is necessary to optimize the penalty coefficient “*C*” and the width of Gaussian RBF kernel “*σ*” [32]. In this paper, the 5-folder cross-validation was used to conduct grid search the training set samples.

###
*The GRA*–*SVR prediction model for IB*

#### Step 1: GRA correlation analysis

Through the GRA analysis of *f*_{core}, *f*_{surface}, *v*_{core}, *v*_{surface}, *p*_{core}, *p*_{surface}, *I*_{core} and *I*_{surface} on the grey relational grade of IB, the variables with low grey relational grade were screened out.

#### Step 2: Normalization of sample data

The normalization of sample data was to scale the data in the interval [0, 1], remove the unit limitation of sample data, and transform it into dimensionless pure values, so that different units can be maintained stable in the training process of the SVR model.

$$X_{changed} = \frac{{x - x_{\min } }}{{x_{\max } - x_{\min } }}$$

(9)

#### Step 3: Optimization of grid search parameters by K-fold cross validation

The grid search trains all the candidate parameters by the exhaustive method. Combined with the K-fold cross validation, the sample data set of the validation set was divided into k subsets and each subset performed the validation set once. The subset was used as the training set, and the validation K times were trained repeatedly until the penalty coefficient “*C*” of the minimum mean square error and the width “*σ*” of the Gaussian RBF kernel were selected as the optimal parameters.

#### Step 4: Constructing SVR nonlinear prediction model

The SVR prediction model for IB was established through the determined optimal parameters, and the model can predict each input. The SVR prediction models imported from the training set and the testing set were, respectively, used for prediction. The predicted values of the model output were compared with the experimental values, and the deviation of the model prediction was analyzed. The GRA–SVR prediction model diagram, as shown in Fig. 4.

#### Step 5: Evaluation of GRA–SVR model

MAE, MRE, RMSE and TIC were used to analyze the convergence of predicted values of GRA–SVR model to experimental values, and then the prediction performance of the model was accurately evaluated.

$${\text{MAE}} = \frac{{\sum\limits_{i = 1}^{n} {\left| {y_{i} - \hat{y}_{i} } \right|} }}{n}$$

(10)

$${\text{MRE}} = \frac{{\sum\limits_{i = 1}^{n} {\left| {\frac{{y_{i} - \hat{y}_{i} }}{{y_{i} }}} \right|} }}{n}$$

(11)

$${\text{RMSE}} = \sqrt {\frac{{\sum\limits_{i = 1}^{n} {(y_{i} - \hat{y}_{i} )^{2} } }}{n}}$$

(12)

$${\text{TIC}} = \frac{{\sqrt {\sum\limits_{i - 1}^{n} {(y_{i} - \hat{y}_{i} )^{2} } } }}{{\sqrt {\sum\limits_{i - 1}^{n} {(y_{i} )^{2} } } + \sqrt {\sum\limits_{i - 1}^{n} {(\hat{y}_{i} )^{2} } } }}$$

(13)