Federated Transfer Learning-Based Paper Breakage Fault Diagnosis

Yu, Xiaoru; Chen, Guojian; Zeng, Xianyi; He, Zhenglei

doi:10.70322/amsm.2024.10009

Part of Special Issue: AI-based Sustainable Smart Industrial Systems

Article Open Access

Federated Transfer Learning-Based Paper Breakage Fault Diagnosis

Xiaoru Yu ¹ Guojian Chen ¹ Xianyi Zeng ² Zhenglei He ^1,*

Author Information

Other Information

State Key Laboratory of Pulp and Paper Engineering, South China University of Technology, Guangzhou 510640, China

GEMTEX–Laboratoire de Génie et Matériaux Textiles, ENSAIT, University of Lille, F-59000 Lille, France

Authors to whom correspondence should be addressed.

Received: 28 August 2024 Accepted: 16 October 2024 Published: 23 October 2024

Views:3138

Downloads:207

Citations: 1

Adv. Mat. Sustain. Manuf. 2024, 1(2), 10009; DOI: 10.70322/amsm.2024.10009

ABSTRACT: The diagnosis of paper breakage faults during the papermaking process is of great significance for improving product quality and maintaining stability in the production process. This paper develops a cross-condition transfer learning fault diagnosis model. This study proposes a fault diagnosis method based on transfer learning to address the issue of single-condition diagnostic models performing poorly when applied to different conditions..This method uses both parameter transfer and feature transfer to diagnose faults across different conditions. At the same time, in response to the issue of insufficient small sample operating data, we introduce federated learning technology to explore the impact of model compression rates on the diagnostic accuracy of the federated global model during the federated model training process. The results indicate that compared to single operating condition models, fault diagnosis performance based on transfer learning across different operating conditions has improved. The diagnostic model based on feature transfer performs even better, achieving accuracy rates of 98.31%, 94.64%, and 96.43% under different transfer tasks, allowing for accurate classification of the majority of samples. Additionally, the federated learning method provides an effective solution for fault diagnosis in small sample operating conditions, and an appropriate model compression rate can ensure diagnostic accuracy while protecting data privacy.

Keywords: Paper industry; Fault diagnosis; Deep learning; Transfer learning

1. Introduction

In recent years, paper has continued to play an irreplaceable role across various industries, with paper products widely used in daily life, education, and technological development [1]. As papermaking technology and equipment advance, the efficiency of paper production has improved [2]. However, issues such as surface defects and sheet breaks can arise during paper production due to factors like pulp properties, chemical use, process conditions, and equipment status [3]. These faults can severely impact the yield and economic benefits of paper production. This paper establishes a fault diagnosis model based on transfer learning. It uses a systematic approach to quickly locate and analyze faults when they occur, thereby reducing their impact on production and enhancing stability. In everyday paper usage, which requires lightweight, soft, and fluffy characteristics, the overall basis weight of the paper is relatively low [4]. Additionally, the creping process in the dryer increases the risk of sheet breaks. Such breaks can severely disrupt the papermaking process, leading to unplanned downtime, decreased efficiency, and potential additional damage to key equipment, which shortens its lifespan. Furthermore, paper breaks can affect the continuity of paper rolls and the uniformity and strength of the paper, reducing the quality of the final paper product. Traditionally, diagnosing paper break faults has depended on the experience and expertise of engineers.While this method is effective, its reliability and applicability are limited in the face of complex and variable production environments [5]. With technological advancements, fault diagnosis has evolved. It has shifted from traditional experience-based methods to intelligent diagnostics. These new methods rely on advanced data analysis and machine learning technologies [6]. The core tasks of fault diagnosis are detecting, identifying, and locating faults in the system, determining whether a fault has occurred, pinpointing its exact location, and analyzing its causes [7]. Currently, the main methods for fault diagnosis are grouped into three categories: model-based methods, knowledge-based methods, and data-based methods [8]. Model-based diagnostic methods require a thorough understanding of the intrinsic mechanisms of the system under study and this understanding can be used to construct a mechanistic model of the relevant processes [9]. Common model-based methods include state estimation, parameter estimation, and equivalent space methods. The effectiveness of model-based methods for fault diagnosis depends on understanding and analyzing the actual industrial processes’ mechanisms. When the system’s mechanisms are complex, building accurate models becomes challenging, which limits the application of model-based methods in industrial processes [10]. Knowledge-based methods include expert systems, fuzzy logic, and graph theory. Expert systems leverage expert knowledge to provide good explanations for faults in production and equipment operation. Still, the accuracy of fault diagnosis is directly affected by the level and experience of different experts [11]. Fuzzy logic methods are relatively simple, robust to parameter variations, and can quickly respond to changes in system states [12]. However, the performance of fuzzy logic systems is heavily influenced by the fuzzy rules, which can impact the system construction process. Directed graphs offer advantages in fault diagnosis and detection by allowing for early inference of faults, explanation of fault paths, and simultaneous analysis of fault causes [13]. Cao et al. [14] combined binary decision diagrams with fault tree analysis to avoid the combinatorial explosion problem that can occur during fault diagnosis. The fault tree quantitative analysis method can provide probabilities of occurrence at different times. By optimizing the intelligent diagnostic process using expert systems, better fault diagnosis for hot forging presses can be achieved. Graph theory methods use logical causal relationships to determine system faults, resulting in outcomes that are easy to understand and have broad applicability. However, for more complex systems, ensuring accuracy can be challenging. With the continuous advancement of information technology and data science, industrial systems and production processes are becoming increasingly complex [15]. Traditional manual experience and mechanistic analysis are no longer sufficient to extract and analyze the implicit information in date. Data-driven methods primarily rely on data analysis for fault diagnosis, including statistical methods, signal processing, and machine learning methods [16]. Machine learning theory applied to machine fault diagnosis mainly involves adaptively learning diagnostic knowledge from collected data rather than relying on manual expertise. It builds diagnostic models to automatically establish relationships between the collected data and machine states [17]. Unsupervised feature learning automatically extracts features from raw data. Due to its adaptive learning nature, it can more effectively capture data characteristics and achieve higher diagnostic accuracy. Autoencoders (AE), due to their structural features, have been widely used in feature extraction and fault diagnosis [18]. As an effective machine learning technique, transfer learning enhances learning performance in the target domain by leveraging knowledge from the source domain. This approach offers new solutions for fault diagnosis, including applications like paper break detection [19]. In industrial processes, especially in fault diagnosis, data imbalance is common, with fault occurrences being much rarer than normal operations, leading to scarce fault data [20]. Transfer learning can enhance model generalization and diagnostic accuracy by effectively utilizing source domain data when target domain data is limited or poorly labeled [21]. This approach does not rely on large amounts of labeled data for training, making fault diagnosis methods more applicable in real-world industrial scenarios [22]. Transfer learning can effectively utilize existing data and models in the fault diagnosis process, enhancing the accuracy and efficiency of fault detection. Compared to traditional model-driven methods, transfer learning can operate in the target domain with relatively fewer labeled data by leveraging data from the source domain for knowledge transfer, thereby reducing reliance on labeled data. Fine-tuning pre-trained models enables rapid adaptation to new environments and fault patterns, enhancing flexibility. Currently, several feasible online detection solutions for paper breakage faults have been proposed and applied. For example, infrared photoelectric sensors are used to continuously monitor the paper web, where real-time signals from the sensors help determine if a paper breakage fault has occurred at various positions [23]. Some studies have also combined video monitoring systems with information technology to accurately monitor the operational status of paper machines and detect and locate faults on the production site. The application of digital measurement technology allows online weight measurement during paper breakage events, capturing signals such as sizing flow, sizing concentration, and paper machine speed. By measuring the input weight of the product, this method compensates for the inability of weight sensors in quality control systems to measure during paper breakage, assisting paper mills in decision-making. These methods mainly focus on detecting paper breakage faults, but there is a lack of analysis regarding the causes of the faults. This study focuses on the paper machines used for household paper production, analyzing paper breakage faults based on historical production data from paper mills. The actual production process is subdivided into different operating conditions according to key process parameters. By combining mechanism analysis with data analysis, a diagnostic model is constructed. Further generalization of this diagnostic model enables its application to a wider range of production processes and practical scenarios. The paper aims to investigate a fault diagnosis method for paper breakage based on transfer learning. First, it introduces the background of paper breakage fault diagnosis and the existing research challenges. Next, it provides a detailed explanation of the basic principles and current applications of transfer learning in the machine learning field. Finally, it proposes a transfer learning-based framework for paper breakage fault diagnosis and verifies its potential to improve diagnostic performance through experimental results. This research aims to provide new insights and methods for fault diagnosis in the paper manufacturing industry, thereby enhancing production efficiency and equipment utilization and reducing losses caused by faults.

2. Experiments

2.1. Data In papermaking , parameters such as paper product basis weight, sizing flow rate, and machine speed vary. This paper analyzes key process parameters of the papermaking machine, classifies different production processes based on quantitative set values, and merges similar operating conditions according to other key process variables to facilitate subsequent modeling analysis. For the papermaking process, the main difference between operating conditions lies in the production process, and different production processes correspond to different process parameters. This study segments the collected dataset based on variations in basis weight set values, resulting in multiple time series data subsets, and analyzes the relationships between basis weight and other variables that strongly influence the production process. After segmenting the operating condition data, the study classifies the conditions based on three parameters: basis weight, machine speed, and long fiber ratio. Seven distinct production conditions are identified by splitting and reorganizing the existing dataset The three conditions with the largest amount of data are designated as Condition A, Condition B, and Condition C, respectively, for data preparation in subsequent modeling. 2.2. Modeling System 2.2.1. Model Building In the paper production process, changes in papermaking machine parameters can significantly impact the production process. To explore the relationship between paper breakage faults and operational parameters and their variations and to quickly locate the fault, actual production data is obtained and processed. Based on fault analysis mechanisms and experience, unsupervised clustering methods are used. These methods categorize faults and identify different types of paper breakage issues. Since process parameters vary across different production conditions, different algorithms are used to build and train fault detection models for each production state. Various evaluation metrics are employed to assess the effectiveness of the models. For high-dimensional data, different clustering methods can be chosen based on the scenario, including partition clustering, density clustering, distribution clustering, and hierarchical clustering [24]. K-Means clustering is simple, easy to implement, and converges quickly, but it is sensitive to outliers, which can affect the clustering results. It aims to partition the dataset into K non-overlapping subsets, minimizing the sum of distances between data points and their corresponding cluster centers [25]. Based on probability density, the Gaussian Mixture Model (GMM)performs better with multidimensional data. Using both K-Means and GMM for clustering allows for cross-validation of results, improving the reliability of the clustering outcome. To determine the optimal number of clusters, this study employs both methods and compares their results for a comprehensive evaluation.In analyzing the papermaking process, it is crucial to initially determine the operating status of the papermaking machine based on paper break signals. Given the diversity of paper break fault types, fault diagnosis requires classifying these various fault types. Commonly employed fault classification models include Logistic Regression (LR) [26], Support Vector Machines (SVM) [27], Random Forest (RF) [28], and Softmax Classifier. Logistic regression fits model parameters through maximum likelihood estimation. Given the training data, optimization algorithms like gradient descent are used. They iteratively adjust the model parameters. This process minimizes the discrepancy between the model’s predicted values and the actual observed values [29]. Support Vector Machines (SVMs) perform exceptionally well in classification tasks. The objective is to find an optimal hyperplane. This hyperplane separates data from different classes. It aims to maximize the margin between the classes. Additionally, it seeks to minimize the number of misclassified points [30]. Random Forest is an ensemble method that enhances model performance and robustness by constructing multiple decision trees [31]. A decision tree is a machine learning model based on a tree structure used for classification and regression analysis. It builds a tree-like decision diagram by recursively and binary splitting the dataset. The construction process involves selecting the optimal features for splitting until a predefined stopping criterion is met. Its advantages include minimal preprocessing requirements, ease of understanding and interpretation, and the ability to handle both numerical and categorical data [32]. An autoencoder is an unsupervised learning model that compresses data into a lower-dimensional representation and reconstructs the input from this compressed form [33]. In the training process of an autoencoder, the objective is to minimize the reconstruction error and derive a lower-dimensional representation of the data, thereby facilitating feature extraction from high-dimensional inputs. For multi-class classification problems, the Softmax classifier is frequently employed. This classifier produces probabilistic outputs by applying the Softmax function, which exponentiates and normalizes the raw scores for each class. This normalization maps the class scores to a probability distribution over the range [0, 1], reflecting the likelihood of each class. The computation of these probabilities is typically expressed as follows in Equation (1) [34].

```latexf(x_i)=\frac{e^{x_i}}{\sum_{n=1}^Ne^{x_n}}i=1,2,\cdots,N```

(1)

In the equation, x_i represents the raw score for the (i)-th class in the input, while f(x_i) denotes the probability that the sample belongs to the (i)-th class. By employing both traditional machine learning methods and deep learning algorithms, effective classification and identification of various types of paper breakage faults have been achieved. Results indicate that deep learning models outperform traditional machine learning methods across multiple evaluation metrics. However, there is still room for improvement in the performance of deep learning models on certain operating condition datasets. Among the traditional machine learning methods, the random forest classifier performs better than logistic regression and support vector machines. Experimental results show that a fault diagnosis model trained on data from a specific operating condition performs poorly. It struggles to generalize to faults under different operating conditions. This indicates that models trained under single operating conditions have limited generalization capabilities. They cannot be directly applied to other production processes. To address this issue, new methods need to be introduced to enhance the accuracy of fault diagnosis, which is of significant importance for practical applications in paper breakage fault detection. Figure 1 shows the main research framework of the modling section proposed in this study.

Figure 1. Main Research Framework of the Modeling Section.

2.2.2. Transfer Learning In the production process of household paper, different specific paper products and varying production quantities involve different process parameters. Typically, research on industrial process fault diagnosis is based on having sufficient data from the papermaking process, and fault diagnosis is achieved through a comprehensive analysis of data and empirical mechanisms for different production processes [35]. However, in real industrial processes, faults are low-probability events and process conditions vary under different operating conditions, which can result in situations where sufficient data for fault diagnosis research cannot be collected under certain conditions. In cases with limited data, transfer learning offers a new approach to address diagnosis challenges. In many tasks within machine learning and deep learning, it is generally assumed that data follows the same distribution and comes from the same feature space. Still, in reality, these conditions are often not fully met. Issues such as limited labeled training samples or changes in data distribution may arise. Transfer learning methods are introduced to address these problems, where deep learning can be used for feature extraction, and transfer learning can be used for knowledge transfer. Combining these two approaches, deep transfer learning methods can apply features learned in the source domain to learning tasks in the target domain. This paper combines fault diagnosis models with transfer learning theory to achieve cross-condition fault diagnosis. By extracting features and transferring model parameters trained under data-rich conditions to new conditions, the generalizability and applicability of fault diagnosis models can be enhanced. This study employs two different transfer methods: parameter transfer and feature transfer. For fault diagnosis based on parameter transfer, the model training process includes several parts: source model training, model transfer, model fine-tuning, and model application. The fault diagnosis model based on feature transfer consists of two parts: model training on source domain data and model application testing on target domain data. Different evaluation metrics are introduced to assess the performance of the target domain test samples under various transfer tasks. The main modeling process is shown in Figure 2.

Figure 2. Research Content of Transfer Learning.

Parameter Transfer Model

To study the fault diagnosis of paper machine paper breakages under different operating conditions, this section establishes a parameter transfer model. The overall framework of the model includes a feature extraction layer and a fault classification layer [36]. The parameters of the feature extraction module are trained using source domain data, and these parameters and weights are frozen. At the same time, the classification layer’s structure parameters are adjusted using samples from the target domain. The fine-tuned model is then used for fault diagnosis under the target conditions. The general process involves training the model on the source domain, applying the model to some labeled data from the target domain for fine-tuning, and obtaining the transfer diagnosis model. In the established parameter transfer learning model, we set several key parameters to construct and train the model. The number of neurons in the model’s hidden layer is set to 16, which helps the model learn complex features. The num_classes is set to 5, as this study addresses a multi-classification problem with five different categories. We set the number of training epochs, num_epochs, to 100 to ensure the model has sufficient learning time. Finally, the batch_size is set to 4, meaning that 4 samples will be used to update the model parameters each time, which helps improve training efficiency and reduce memory consumption. The model structure is shown in Figure 3.

Figure 3. Schematic Diagram of Parameter Transfer Process.

Parameter transfer involves applying model parameters trained on a source task to another task to accelerate learning and improve performance. During the transfer, the feature extraction layers or parts of a pre-trained model are used for feature extraction in the target task, with these parameters frozen and not updated during training. Fine-tuning involves further training the model on the target task, usually with a small number of samples, to enhance performance. Despite differences in process parameters under various conditions, data features and parameters from the same machine still share similarities. By extracting common knowledge through the feature extraction network and fine-tuning the model, parameter transfer can be effectively achieved. Incorporate source domain data and small samples from the target domain into the classification network, maintaining the same network structure and parameters as previously constructed. The diagnostic framework is illustrated in Figure 4.

Figure 4. Flowchart of Parameter Transfer Fault Diagnosis.

In the source model training phase, operational data and paper break fault labels from a specific condition are utilized as source domain data for model training. In contrast, data from the target condition serves as the target domain dataset. A subset of this data is reserved for fine-tuning, with the remainder used to evaluate model performance. During the model transfer phase, parameters from the feature extraction module of the source model are transferred to the target network model to capture target domain features. The classification layer of the initialized target model is then employed for fault classification, thus constructing the parameter transfer fault diagnosis model. In the model fine-tuning phase, the feature extraction layers of the target model are frozen, and the classification layer parameters are refined using the fine-tuning samples. Finally, the fine-tuned model is applied to test samples from the target domain to assess diagnostic efficacy.

Feature Transfer Model

The feature-based transfer method minimizes the distribution difference between the source and target domains while updating network parameters to learn domain-invariant features. To validate the feasibility of domain adaptation methods in cross-condition paper breakage fault diagnosis, the feature transfer method is integrated with the fault diagnosis model mentioned earlier to construct a feature transfer-based diagnostic model. In this study, the feature transfer model updates its parameters using 4 samples at a time during training, with the batch_size set to 4. Next, we use the resampled feature and label data, applying the train_test_split function to divide the dataset into training and testing sets, with test_size = 0.3 indicating that 30% of the data is designated for testing. In comparison, the remaining 70% is used for training. This division helps evaluate the model’s performance and prevents overfitting. During training, num_epochs is set to 100, and in each epoch, the model enters training mode, iterating through the entire target dataset in batches. Each batch's features and labels are extracted from the source and target datasets. The optimizer’s gradients are then cleared to prepare for calculating new gradients. Subsequently, the model propagates forward on the source and target data to generate outputs. The Maximum Mean Discrepancy (MMD) loss is calculated to compare the distributions of the source and target outputs. In contrast, the cross-entropy loss function is used to assess the classification performance of the target output. The final overall loss is the sum of these two loss components. After updating the model parameters through backpropagation, the loss value for each batch is accumulated. Every 50 epochs, the program prints the current training epoch and loss value to monitor the model's training progress. The process is shown in Figure 5.

Figure 5. Schematic of the Feature Transfer Process.

Domain adaptation is one of the primary methods in feature transfer learning. It applies to situations where the sample feature spaces of the source and target domains are the same, but their probability distributions differ. During the evaluation phase, the model uses a context manager to disable gradient calculations, which reduces memory usage and speeds up computation. The test data from the target domain is preprocessed and input into the model as tensors for prediction. The model outputs the predicted probabilities for each category, and by selecting the category corresponding to the maximum probability for each sample, prediction labels are generated. Next, the accuracy between the source and target domains is calculated. Additionally, the macro recall and macro precision are computed, providing a more comprehensive reflection of the model’s performance across different categories. Source domain data is typically used to train the model, while target domain data is used to validate the model’s generalization ability in a new environment. These evaluation metrics help understand the performance differences and effectiveness of the model between the source and target domains. The core idea is to map the data features from different domains to a common feature space, thereby enabling the use of data from other domains to enhance the training of the target domain. A schematic of domain adaptation is shown in Figure 6 [37].

Figure 6. Schematic of Domain Adaptation.

To effectively perform domain adaptation, it is essential to accurately measure the differences between the source and target domains [38]. Adaptive layers in deep learning methods can achieve adaptive data matching from the source and target domains. This adaptation process brings the data distributions of the source and target domains closer together, which can improve the network’s classification performance. The choice of adaptation method has a decisive impact on the model’s generalization ability. Distance functions such as KL divergence, Mahalanobis distance, MMD, and Multiple Kernel Maximum Mean Discrepancy (MK-MMD) can be used for this measurement [39]. As a commonly used metric, MMD maps vectors to a Reproducing Kernel Hilbert Space (RKHS) and calculates the distribution distance between two probability distributions P and Q, in that space. For sample sets X and Y generated from P and Q, respectively, the MMD distance reflects the expected difference between X and Y when mapped to the RKHS H. The computation formula for MMD is given by Equation (2) [40]:

```latex\mathrm{MMD}[P,Q]=\parallel\frac1m\sum_{i=1}^m\phi(x_i)-\frac1n\sum_{j=1}^n\phi(y_j)\parallel_{\mathrm{H}}```

(2)

In Equation (2), x_i denotes the (i)-th sample in the sample set (X), and y_j denotes the (j)-th sample in the sample set (Y), with (m) and (n) representing the number of samples in sets (X) and (Y), respectively. ϕ(·) denotes the feature mapping function. Based on deep feature transfer theory, this section introduces different distance metric methods to map datasets from different working conditions to a common feature space, achieving similar distributions of source and target domain data within this feature space. The research focuses on cross-condition paper breakage fault diagnosis, with the specific algorithmic process shown in Figure 7, which includes two modules: model training and model application.

Figure 7. Flowchart of Feature Transfer Fault Diagnosis.

2.2.3. Federated Learning Previous studies addressed the issue of having partially labeled data in large sample conditions. In practical papermaking processes, situations with insufficient data samples, make it challenging for data-driven methods to meet fault classification requirements in small sample cases. To address this, centralized processing of datasets from different operating conditions was used to enhance the generalization ability of the network model, aiming to provide a feasible solution for fault diagnosis with small samples. Federated Learning aggregates knowledge from various clients to a central cloud server, where clients jointly train to improve model classification accuracy [41]. Federated Learning features local computation and model transmission. To enhance the efficiency of model training and improve the security of data transmission, this chapter employs model compression on top of Federated Learning. Specifically, only a subset of parameters is uploaded during parameter transmission to increase model transfer efficiency and protect data privacy [42]. Considering the variations in production processes across different locations, the method aims to better extract parameter information from various devices and conditions and to compare the effectiveness of different models. Locally, both fully connected neural networks and convolutional neural networks are used to extract features from each subset of data, and the trained model parameters are uploaded to the cloud [43]. At the cloud server, parameters are aggregated according to various aggregation strategies, resulting in a Federated Learning fault diagnosis model through multiple iterations of parameter uploads and updates [44]. The final Federated global fault diagnosis model is applied to a small sample fault diagnosis. Based on Federated Learning theory, a Federated Learning-based method for diagnosing paper breakage faults is designed, with the detailed process shown in Figure 8.

Figure 8. Federated Learning Diagnostic Flowchart.

The procedure for constructing a global fault diagnosis model using Federated Learning involves several steps: Initially, datasets from paper machine operations under various conditions are designated as local datasets for each participant, facilitating the development of these local datasets [45]. Subsequently, each local dataset is employed to train local network models, creating multiple local fault diagnosis models. The parameters from these local models are then uploaded to a central server, where the server aggregates the received model updates using various aggregation algorithms to produce a new global model. This updated global model is redistributed to all clients for further local training, continuing until the maximum iteration limit is achieved. To investigate the effectiveness of Federated Learning across different scenarios and assess model performance under various deep learning frameworks, this chapter constructs fault diagnosis models with diverse neural network architectures at each local site [46]. Data under conditions A, B, and C during local training are partitioned into training and testing sets. The training set is utilized for local model training, while the testing set, together with small sample condition data, is employed to evaluate the performance of the Federated global model. For each local training dataset, classification models with fully connected neural networks and convolutional neural network architectures are developed and trained locally [47]. The model parameters are then uploaded to a central server to complete the training of the federated global model. This approach aims to explore the fault diagnosis accuracy of the federated global model under various local model configurations and parameter aggregation strategies. To ensure the validity of comparative experiments, parameters such as activation functions, optimization algorithms, learning rates, local iteration counts, and federated iteration counts are kept consistent [48]. The Federated Averaging (FedAvg) algorithm and Federated Proximal Gradient Descent (FedProx) algorithm are used for aggregation. The FedAvg algorithm aggregates model parameters through weighted averaging. The core idea is to upload local model parameters to the central server, where the server computes the average of all model parameters and returns this average to all local models [49]. The global model can aggregate the parameters of local models, using local data from all devices to train the global model, which can enhance the model’s accuracy and generalization performance. 2.3. Evaluation Metrics In the field of deep learning fault diagnosis [50], for classification problems, commonly used evaluation metrics to assess model performance include Accuracy, Precision, Recall, and the harmonic mean of Precision and Recall (F1-score). The relevant calculations are shown in Equations (3)–(6).

```latexAccuracy=\frac{TP+TN}{TP+TN+FP+TN}\times100\%```

(3)

```latexPrecision=\frac{TP}{TP+FP}\times100\%```

(4)

```latexRecall=\frac{TP}{TP+FN}\times100\%```

(5)

```latexF1-score=\frac{2\times Precision\times Recall}{Precision+Recall}```

(6)

In the equations, TP (true positive) is the number of correctly identified positive samples; FP (false positive) is the number of incorrectly identified negative samples; TN (true negative) is the number of correctly identified negative samples; FN (false negative) is the number of missed positive samples. 2.4. Advantages Compared to Other Literature Due to the presence of some small sample conditions in the data, which do not meet the modeling needs, this paper explores the feasibility of federated learning for papermaking industry process fault diagnosis with small sample data. Compared to other single-condition fault diagnosis models, this study has the following advantages: (1) Local models were established using fully connected neural networks and convolutional neural networks to extract relevant features under different conditions. Training and testing were performed using data from various conditions. FedAvg and FedProx aggregation strategies were employed and compared in the parameter aggregation process. Results showed that CNNs performed better in feature learning than fully connected neural networks. FedProx, an improved parameter aggregation strategy based on FedAvg, achieved better diagnostic results. Overall, federated learning methods can learn feature parameters from each model by aggregating different local models. Although the federated global model’s performance on local test samples is slightly inferior to the isolated data model, it still shows comparable performance to the condition data used in modeling with small sample data. This indicates that federated learning can offer a new diagnostic method for small sample conditions. (2) Under the federated learning framework, model compression can improve data protection and transmission efficiency during transfer. Experimental results under different model compression rates indicate that an appropriate compression rate can effectively reduce communication overhead and enhance data privacy protection while maintaining a certain level of diagnostic accuracy.

3. Results and Discussion

3.1. Model Establishment Under different cluster numbers, the results of the comprehensive silhouette coefficient and CH index were evaluated. Both metrics were normalized separately, showing that the clustering effect was optimal when the number of clusters was 5. Subsequently, oversampling techniques were used to generate new synthetic samples through interpolation between minority-class samples, increasing the number of minority-class samples. Balancing the data from different classes improved the model’s prediction accuracy for the original minority class. In the minority class set, the K-nearest neighbor (KNN) algorithm was applied to each sample x_i∈S_min, where (K) is a specified integer. The K-nearest neighbors are the (K) closest sample points to x_i in the feature space within S_min. Then, a sample point $$\widehat{x}_{i}$$ is randomly selected from these (K) nearest sample points, and the resulting synthetic new sample point is obtained as:

```latexx_{\mathrm{new}}=x_i+(\widehat{x}_i-x_i)\times\delta ```

(7)

In the formula, x_new represents the synthetic new sample, and δ is a random number within the range [0, 1]. The sample synthesis process is repeated to achieve a balanced distribution of labeled data within each operating condition. A schematic of the synthesis process is shown in Figure 9.

Figure 9. New Samples Synthesized Based on the SMOTE Algorithm.

After dividing the operating conditions, there are significant differences in data distribution between different conditions. Taking Condition A as an example, the SMOTE algorithm was used to process the dataset. The data distribution before and after oversampling is shown in Figure 10.

Figure 10. Data Distribution Before and After Balancing for Condition A. (a) Distribution Before Data Balancing; (b) Distribution After Data Balancing.

From the figure, it can be observed that before data balancing, there were significant differences in the amount of data for each class label. This noticeable disparity could impact the subsequent data modeling process. After data balancing, the distribution of quantities across class labels is relatively uniform, eliminating the adverse effects of class imbalance on the modeling process. Dimensionality reduction visualizations show that post-balancing, the distribution differences among various classes are more pronounced, which can enhance the accuracy of subsequent modeling. The same SMOTE data balancing method was applied to the data from Conditions B and C, with the results in Figure 11 after balancing.

Figure 11. Results of the distribution after data balancing. (a) Distribution of Condition B Data After Balancing; (b) Distribution of Condition C Data After Balancing.

Subsequently, classification models were developed using logistic regression, support vector machine, and random forest techniques across different operating condition datasets to evaluate the efficacy of machine learning methods in identifying paper machine breakage faults. The performance of each model under various conditions is detailed in Table 1.

Table 1. Fault Diagnosis Accuracy Based on Machine Learning Methods.

Based on Table 1, the Random Forest (RF) model exhibits the highest overall accuracy across different machine learning methods, followed by Support Vector Machine (SVM), with Logistic Regression (LR) showing the lowest overall diagnostic rate. The diagnostic accuracy varies significantly across different conditions, with Condition A yielding notably higher accuracy compared to Conditions B and C. In the paper breakage fault diagnosis process, deep learning methods prove effective in handling datasets and identifying features. By employing the SAE-Softmax deep learning network architecture, different fault types are recognized. Model training and evaluation are conducted concurrently: the training process involves inputting data into the classification network with the goal of minimizing the loss function while simultaneously assessing model performance on a test set to verify fault recognition accuracy. When the loss function stabilizes during training, it indicates that the network training is nearly complete, and the model’s accuracy on the test data set reflects its final performance. The diagnostic performance of the SAE-Softmax model under different conditions is summarized in Table 2.

Table 2. Fault Diagnosis Results Based on SAE-Softmax Model.

From the table, it can be seen that the performance of the deep learning model varies under different conditions. The model performs better than traditional machine learning methods in all three conditions, with the best performance on the Condition A dataset and the worst on the Condition C dataset. The differences between datasets are quite noticeable. The output results of the SAE-Softmax classification model under Conditions B and C are shown in Figure 12.

Figure 12. Output Results of SAE-Softmax Classification Model under Condition B and Condition C. (a) Condition B; (b) Condition C.

From the figure, it can be observed that under Conditions B and C, the model shows some misclassification for faults of Category 3, while it identifies other fault categories more accurately. Traditional machine learning methods have the advantage of requiring less data and providing faster model training and prediction with lower computational resource demands, but they have limited generalization capabilities. In contrast, deep learning methods based on neural networks can learn features from input data, handle large datasets, and generally outperform traditional machine learning methods. However, deep learning models often suffer from poor interpretability and require a significant amount of data. Overall, the deep learning model that combines autoencoder feature extraction with a Softmax classifier demonstrates better performance across multiple evaluation metrics compared to traditional machine learning methods, although its performance on some datasets still needs improvement. Based on the experimental results, the SAE-Softmax-based paper break fault diagnosis model outperforms traditional machine learning methods. Using deep learning methods, models built on the same dataset can accurately identify different types of paper break faults. However, the models established in this chapter are based on paper machine operation data from a single condition. In a practical paper for break fault diagnosis, the production process parameters may be adjusted according to the working conditions, which can affect the model’s performance across different conditions. The study investigates cross-condition scenarios by transitioning from Condition A to Condition B to address this. The diagnosis results of different models for paper break faults under cross-condition scenarios are shown in Table 3.

Table 3. Diagnostic Accuracy of Different Models under Cross-Condition Scenarios.

From the table, it can be observed that, without adjusting model parameters, the overall performance of the model under different conditions is poorer due to the differences between conditions. The SAE-Softmax classification model is capable of extracting and leveraging the associated feature information present between different conditions to a certain extent, demonstrating better performance compared to traditional machine learning models. 3.2. Transfer Learning 3.2.1. Parameter Transfer In the previous condition classification phase, production data were divided into different conditions based on quantitative parameters. Separate models were built for data from each condition, with Condition A having the largest data volume, Condition B the next largest, and Condition C the smallest. Using the condition with more data as the source domain and the condition with less data as the target domain, cross-condition paper break fault diagnosis experiments were conducted. Three transfer learning tasks were set up: A→B, A→C, and B→C, named Task 1, Task 2, and Task 3, respectively. The experiments verified the feasibility of the transfer learning models based on parameter transfer and feature transfer methods. For the source and target domain models, the SAE-Softmax fault diagnosis network structure from the previous chapter was used, combining pre-training and fine-tuning. The classification layer parameters were frozen, and fine-tuning was performed using samples from the target domain. After training, the target domain test data were used for evaluation. The performance of the parameter transfer diagnostic models under different transfer tasks is shown in Table 4.

Table 4. Parameter Transfer Fault Diagnosis Performance.

As shown in the table, the diagnostic performance of the model varies under different transfer paths. It maintains a high accuracy in Task 1, whereas the accuracy for Task 2 and Task 3 is below 90%. It may be because the fault influencing factors in Condition C are more complex compared to Conditions A and B, which increases the difficulty of fault diagnosis. Additionally, during the training process, the amount of data for Condition A is larger than that for Conditions B and C, resulting in Task 1 having a higher accuracy than Tasks 2 and 3. Particularly, the diagnostic accuracy when transferring from Condition A to Condition C is the lowest. This may be due to the greater differences between Condition A and Condition C compared to the differences between Condition B and Condition C, which affects the model’s transfer performance. Taking Task 1 as an example, the model is trained by transferring the feature extraction layer parameters of the fault diagnosis model from Condition A to Condition B and then updating the classification layer weights using fine-tuning sample data from Condition B. After completing the training, the fine-tuned model is used for state recognition of the test samples in Condition B, resulting in the fault diagnosis accuracy for Condition B. The variation of model accuracy with the number of iterations is shown in Figure 13.

Figure 13. Training Process of the Parameter Transfer Fault Diagnosis Model.

As illustrated in the figure, the model’s diagnostic accuracy improves rapidly during the training iterations, reaching a plateau around the 20th iteration and maintaining this level thereafter. This demonstrates that transfer learning enables effective fault diagnosis across different conditions with relatively fewer training iterations and labeled samples. In contrast to traditional deep learning approaches, transfer learning necessitates fewer iterations—often only a few dozen epochs—to achieve robust performance, thus significantly enhancing training efficiency while preserving model accuracy. Building on the previous research, comparative experiments were introduced to investigate the impact of the number of fine-tuning samples on diagnostic results. The goal was to analyze the performance of the target model trained with varying numbers of fine-tuning samples. Since the quantity of fine-tuning samples can influence the number of iterations required to achieve optimal performance, and overfitting may occur during training, this experiment focused on examining the maximum accuracy achieved during the iterative process to explore how the quantity of labeled samples affects model performance. Fine-tuning was conducted with 10, 15, 20, and 25 samples, and the effects of fine-tuning with different numbers of labeled samples were studied. Fine-tuning samples were used in the training process, while test samples were used to evaluate the model’s performance. Table 5 presents the model performance under different transfer tasks.

Table 5. Accuracy of the Parameter Transfer Fault Diagnosis Model under Different Tasks and Numbers of Fine-Tuning Samples.

It can be observed that as the number of fine-tuning samples increases, the model’s learning of knowledge in the target domain becomes more accurate. The accuracy trend across different tasks improves with a higher number of fine-tuning samples. Task 1 exhibits the best performance among the three transfer tasks, while Task 2 shows the poorest performance. The output results with the highest accuracy for each transfer task are illustrated in Figure 14.

Figure 14. Diagnostic Results for Different Transfer Tasks. (a) Task 1; (b) Task 2; (c) Task 3.

The figure shows that in the classification processes for Task 1 and Task 3, misclassification occurs only in the label recognition of a single fault category. This suggests that the model performs relatively well in distinguishing between different fault categories in these two transfer tasks. However, in Task 2, the model demonstrates a certain level of misclassification when identifying Categories 3 and 4, indicating inferior performance compared to the models in the other transfer tasks. 3.2.2. Feature Transfer Before performing transfer learning, the distance between different operating conditions was measured using the MMD formula, as shown in Table 6.

Table 6. MMD Results for Different Transfer Paths.

The figure shows that during the classification processes of Task 1 and Task 3, misclassification occurs only within a single fault category label. This indicates that the model distinguishes different fault categories in these two transfer tasks. However, in Task 2, the model shows some degree of misclassification when identifying Categories 3 and 4, with performance inferior to that of the models in other transfer tasks. From the Figure 15, it can be observed that under the three transfer tasks, the fault diagnosis method based on the MK-MMD metric (Table 7) consistently performs well. Specifically, Task 1 shows the best performance, while Task 2 performs relatively worse, which is similar to the results from the parameter transfer method in the previous section. Overall, the MK-MMD-based model outperforms the parameter transfer model.

Table 7. Fault Diagnosis Performance of Feature Transfer Based on MK-MMD Metric.

Figure 15. Fault Diagnosis Performance of Feature Transfer Based on MK-MMD Metric.

In the deep feature transfer learning approach, different metric methods, including KL divergence and MMD, are introduced and compared with the MK-MMD method used in this study. These three metrics are applied to the feature transfer diagnostic models established in this section to evaluate the diagnostic performance on target domain test samples under various transfer tasks and distance metrics. The specific results are compared in Figure 16.

Figure 16. Fault Diagnosis Accuracy with Different Distance Metrics.

Figure 16 shows that in the diagnostic models established in this section, the MK-MMD metric performs better than the other two distance metrics. The results from MK-MMD and MMD metrics are relatively close. Among the three distance metrics, KL divergence shows the poorest performance, possibly due to its limited effectiveness in handling high-dimensional data and its less suitability for complex data structures compared to MMD. In the three transfer tasks, Task 1 has the highest overall accuracy, while Task 2 has the lowest, which is similar to the results obtained from the parameter transfer method. From the above figure, it can be concluded that when the transfer model moves from A to B, the accuracy of the feature transfer model is higher than that of the deep learning model under a single condition based on the SAE-Softmax classifier. Therefore, the transfer learning model outperforms the single-condition model. Through paired t-tests, we examined the significance of performance differences among the models, calculating the mean and standard deviation of the accuracy rates for the three models. The results indicate that the feature transfer model has the highest mean accuracy of 96.46, suggesting it has better diagnostic performance. Additionally, the feature transfer model has the smallest standard deviation of 1.83, indicating its greater stability. Due to the experimental results indicating that the accuracy of the feature transfer model is higher than that of the parameter transfer model, this study further validates the sensitivity of the feature transfer model based on the aforementioned experiments. The model is trained across different epochs (such as 10, 30, 60, and 120 epochs), and the accuracy at each stage is recorded. The results show that the model's accuracy rapidly increases during the 0–30 epoch range as it learns basic features and patterns. As training continues, the accuracy may further improve, but the rate of increase slows down. When the num_epochs exceeds 120, the model may experience overfitting, which can affect prediction results. Setting the model’s num_epochs between 60 and 120 yields a higher prediction accuracy. 3.3. Federated Learning After applying the Federated Averaging Algorithm and the Federated Proximal Gradient Descent method for aggregation separately, the F1-score and accuracy results of the model are shown in Table 8 and Figure 17, respectively.

Table 8. Diagnostic F1-score of the Federated Global Model under the FedAvg Aggregation Algorithm.

Table 8 and Figure 17 show that the model based on Convolutional Neural Networks (CNN) performs overall better than the Fully Connected Neural Network (FCNN). Under Condition A, the performance of both local models is optimal. Within the same local model, the maximum difference in recognition accuracy between different conditions is within 10%, indicating that the federated global model has effectively learned data features and fault information from different condition datasets. Overall, the results suggest that the performance of the local test samples is slightly inferior to models trained under a single condition, and diagnostic accuracy on small sample condition data is slightly lower than accuracy under specific conditions. This indicates that federated learning methods offer a viable new approach for modeling and analysis of small sample data.

Figure 17. Comparison of Accuracy Between Single Condition Model and Transfer Learning Model.

The FedProx algorithm was used to explore model performance under different aggregation strategies. FedProx is an improved version of FedAvg, introducing the concept of proximal gradient descent in the parameter aggregation process. This makes global model updates smoother and more stable, while the added regularization term helps prevent overfitting during federated learning [51]. The performance of the federated model under this aggregation strategy is shown in Table 9 and Figure 18.

Table 9. Diagnostic F1-Score of the Federated Global Model under the FedAvg Aggregation Algorithm.

Figure 18. Accuracy of the Feature Transfer Model Under Different num_epochs.

Table 9 and Figure 18 show that compared to the FedAvg aggregation strategy, the FedProx algorithm slightly improves overall diagnostic performance. However, the extent of improvement is not as significant as changing the local models, indicating that the effectiveness of federated learning methods largely depends on the choice of local models. While federated models show some performance, there is still considerable room for improvement. Although local endpoints do not need to upload data within the federated learning framework, they must continually upload and download model parameters during training. Transmission efficiency may decrease if there are many local endpoints or large parameter datasets. Therefore, methods are needed to improve data protection and transmission efficiency during this process. Deep neural network models often have redundant weight parameters, with only a subset being crucial for model performance. Thus, improving parameter transfer strategies in federated learning frameworks by compressing the model before transmission and only transferring essential parameters could be beneficial. To evaluate the impact of model compression on the diagnostic accuracy of federated models, experiments were conducted with varying compression rates on CNN-based local models. The compression rate values indicate the extent of compression, with a value of 1 meaning no compression. Both FedAvg and FedProx aggregation strategies were used to construct federated global models from local CNN models at different compression rates. All other parameters were kept constant during the experiments. The diagnostic accuracy results for different compression rates are shown in Figure 19 and Figure 20.

Figure 19. Diagnostic Accuracy of the Federated Global Model under the FedAvg Aggregation Algorithm.

Figure 20. Diagnostic Accuracy of the Federated Global Model under the FedProx Aggregation Algorithm.

As evident from Figure 21 and Figure 22, when the model compression rate is maintained at 0.85 or higher, the variation in diagnostic performance between the compressed and uncompressed models remains relatively minimal. Conversely, as the compression rate decreases to 0.8 or below, a notable decline in the model’s diagnostic accuracy becomes apparent. This observation suggests that insufficient transmission of model parameters during the parameter transmission phase hinders the model’s ability to adequately learn and capture the distinct features from each local client during the parameter weight updates, ultimately compromising the overall performance of the model.

Figure 21. Experimental results of federated models based on the FedAvg strategy at different compression rates.

Figure 22. Experimental results of the federated model based on the FedProx strategy at different compression rates.

4. Conclusions

To meet the requirements of cross-condition fault diagnosis, this paper develops fault diagnosis models based on two approaches: parameter transfer and feature transfer. The parameter transfer model primarily freezes the parameter weights of the feature extraction layer and fine-tunes the weights of the classification layer to achieve cross-condition fault diagnosis. The feature transfer model introduces a spatial distance measurement method to enhance the performance of the diagnosis model in cross-condition scenarios. Compared to the model without transfer learning, the two different transfer learning models established in this chapter show improved diagnostic performance in fault diagnosis. This indicates that combining deep learning models with transfer learning methods can effectively enhance the performance of cross-condition fault diagnosis. When comparing the parameter transfer-based model with the feature transfer-based model, the results demonstrate that the diagnostic performance of the feature transfer-based model is superior, achieving accurate classification for most samples with diagnostic accuracies of 98.31%, 94.64% and 96.43% under different transfer tasks. This may be because when there are certain differences between the source task and the target task, the feature transfer method focuses on transferring the general feature representations extracted from the source task rather than directly transferring model parameters. This approach allows the model to better generalize to the target task, resulting in better performance compared to parameter transfer. Under both transfer methods, it can be observed that the diagnostic results for Task 3 are better compared to Task 2. Additionally, the distance measurement between Condition A and Condition C is greater than the distance measurement between Condition B and Condition C. This indicates that the performance of the transfer learning model is influenced by the degree of difference between the source domain and the target domain, and the dataset itself also has a certain impact on the diagnostic effectiveness of the model. Fully connected neural networks and convolutional neural network models were established locally to extract relevant features under different conditions. Data from various conditions were used for training and testing. Two aggregation strategies, FedAvg and FedProx, were employed during parameter aggregation, and their performances were compared. The results indicate that CNN outperforms the fully connected neural network in feature learning. FedProx, as an improved parameter aggregation strategy based on the FedAvg strategy, achieves better diagnostic performance. Overall, by aggregating different local models, the federated learning method can learn the feature parameters of each model. The federated global model exhibits higher versatility compared to the isolated data model. Although its performance on local test samples is slightly inferior to the isolated data model, it can still demonstrate performance similar to the condition data involved in modeling on small sample data. This suggests that federated learning can provide a new diagnostic method for fault diagnosis in small sample conditions. Under the framework of federated learning, model compression can enhance data protection and data transmission efficiency during the transmission process. Experimental results under different model compression rates show that an appropriate model compression rate can effectively reduce communication overhead and improve data privacy protection while ensuring a certain level of model diagnostic accuracy. This study focuses exclusively on the paper breakage faults occurring in household paper machines and does not provide a detailed analysis of the production processes for other paper products. Future work could explore research on other paper types, such as industrial and specialty paper. By combining transfer learning and federated learning, the aim is to conduct fault diagnosis research on paper breakage across different paper types in the entire paper industry, enhancing the accuracy of fault prediction while ensuring privacy protection among different enterprises and improving production efficiency.

Author Contributions

Conceptualization, Methodology, Supervision, Project Administration, Funding Acquisition, Z.H.; Software, Validation, and Formal Analysis, G.C.; Investigation, Visualization, Writing–Original Draft Preparation X.Y.; Writing–Review & Editing, X.Z.

Ethics Statement

Not applicable.

Informed Consent Statement

Funding

This research was funded by the Science and Technology Program of Guangzhou, China (2023A04J1367), the National Local Joint Engineering Laboratory for Advanced Textile Processing and Clean Production (FX20230016).

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

Hilda L, Mutlaq MS, Waleed I, Althomali RH, Mahdi MH, Abdullaev SS, et al. Genosensor on-chip paper for point of care detection: A review of biomedical analysis and food safety application. Talanta 2024, 268, 125274. [Google Scholar]

Zhang Y, Hong M, Li J, Ren J, Man Y. Energy system optimization model for tissue papermaking process. Comput. Chem. Eng. 2021, 146, 107220. [Google Scholar]

Di F, Han D, Wan J, Wang G, Zhu B, Wang Y, et al. New insights into toxicity reduction and pollutants removal during typical treatment of papermaking wastewater. Sci. Total Environ. 2024, 915, 169937. [Google Scholar]

Hyppönen H, Lumme S, Reponen J, Vänskä J, Kaipio J, Heponiemi T, et al. Health information exchange in Finland: Usage of different access types and predictors of paper use. Int. J. Med. Inform. 2019, 122, 1–6. [Google Scholar]

Niu G, Liu Y, Zhou J, Fan X, Chen Z, Corriou J-P, et al. SBR-Extended Kalman Filter model-based fault diagnosis and signal reconstruction for the papermaking wastewater treatment process. J. Water Process Eng. 2023, 56, 104420. [Google Scholar]

Niu G, Liu Y, Zhou J, Fan X, Chen Z, Corriou J-P, et al. An information fusion-based meta transfer learning method for few-shot fault diagnosis under varying operating conditions. Mech. Syst. Signal Process. 2024, 220, 111652. [Google Scholar]

Ba-Alawi AH, Al-Masni MA, Yoo C. Simultaneous sensor fault diagnosis and reconstruction for intelligent monitoring in wastewater treatment plants: An explainable deep multi-task learning model. J. Water Process Eng. 2023, 55, 104119. [Google Scholar]

Jinhua W, Xuehua M, Jie C, Yunqiang L, Li C. A novel fault diagnosis method for Bayesian networks fusing models and data. Nucl. Eng. Des. 2024, 426, 113370. [Google Scholar]

Thomas MC, Zhu W, Romagnoli JA. Data mining and clustering in chemical process databases for monitoring and knowledge discovery. J. Process Control 2018, 67, 160–175. [Google Scholar]

10.

Xu J, Mo S, Jiang Z, Chen Z, Gui W, Wang H. A novel positive–negative graph convolutional network-based fault diagnosis method with application to complex systems. Neurocomputing 2024, 600, 128145. [Google Scholar]

11.

Cao Y, Tang S, Yao R, Chang L, Yin X. Interpretable hierarchical belief rule base expert system for complex system modeling. Measurement 2024, 226, 114033. [Google Scholar]

12.

Reyes-Malanche J A, Villalobos-Pina F J, Ramırez-Velasco E, Cabal-Yepez E, Hernandez-Gomez G, Lopez-Ramirez M. Short-Circuit Fault Diagnosis on Induction Motors through Electric Current Phasor Analysis and Fuzzy Logic [J/OL]. Energies 2023, 16, 516. doi:10.3390/en16010516.[Google Scholar]

13.

Ma Z, Deng S, Zhou Z, Ai X, Zhang J, Liu Y, et al. Expert knowledge modelling software design based on Signed Directed Graph with the application for PWR fault diagnosis. Ann. Nucl. Energy 2024, 196, 110206. [Google Scholar]

14.

Cao C, Li M, Li Y, Sun Y. Intelligent fault diagnosis of hot die forging press based on binary decision diagram and fault tree analysis. Procedia Manuf. 2018, 15, 459–466. [Google Scholar]

15.

Lu Q, Xie X, Parlikad A K, Schooling J M. Digital twin-enabled anomaly detection for built asset monitoring in operation and maintenance. Autom. Constr. 2020, 118, 103277. [Google Scholar]

16.

Zhang B, Wang P, Liu G, Ma Z, Zhao T. AHU sensor fault diagnosis in various operating conditions based on a hybrid data-driven model combined energy consumption. J. Build. Eng. 2024, 87, 109028. [Google Scholar]

17.

Lei Y, Yang B, Jiang X, Jia F, Li N, Nandi AK. Applications of machine learning to machine fault diagnosis: A review and roadmap. Mech. Syst. Signal Process. 2020, 138, 106587. [Google Scholar]

18.

Zhang M, Zhong J, Zhou C, Jia X, Zhu X, Huang B. Deep learning-driven pavement crack analysis: Autoencoder-enhanced crack feature extraction and structure classification. Eng. Appl. Artif. Intell. 2024, 132, 107949. [Google Scholar]

19.

Poonia RC, Al-Alshaikh HA. Ensemble approach of transfer learning and vision transformer leveraging explainable AI for disease diagnosis: An advancement towards smart healthcare 5.0. Comput. Biol. Med. 2024, 179, 108874. [Google Scholar]

20.

Cheng C, Liu W, Di L, Wang S. Generalized autoencoder-based fault detection method for traction systems with performance degradation. High-Speed Railw. 2024, 2, 180–186. [Google Scholar]

21.

Matin Malakouti S, Bagher Menhaj M, Abolfazl Suratgar A. Machine learning and transfer learning techniques for accurate brain tumor classification. Clin. Ehealth 2024, 7, 106–119. [Google Scholar]

22.

Xiao Y, Zhou X, Zhou H, Wang J. Multi-label deep transfer learning method for coupling fault diagnosis. Mech. Syst. Signal Process. 2024, 212, 111327. [Google Scholar]

23.

Economou A, Kokkinos C, Bousiakou L, Hianik T. Paper-Based Aptasensors: Working Principles, Detection Modes, and Applications. Sensors 2023, 23, 7786. [Google Scholar]

24.

Wiroonsri N. Clustering performance analysis using a new correlation-based cluster validity index. Pattern Recognit. 2024, 145, 109910. [Google Scholar]

25.

Parnes D, Gormus A. Prescreening bank failures with K-means clustering: Pros and cons. Int. Rev. Financ. Anal. 2024, 93, 103222. [Google Scholar]

26.

Tian M, Liu J, Chen Z, Wang S. Privacy-preserving logistic regression with improved efficiency. J. Inf. Secur. Appl. 2024, 85, 103848. [Google Scholar]

27.

Wang Y, Liao W, Shen H, Jiang Z, Zhou J. Some notes on the basic concepts of support vector machines. J. Comput. Sci. 2024, 82, 102390. [Google Scholar]

28.

Shi Y, Sun J, Li Z, Yang F, Yang X, Luo Q. Predicting and analyzing the cementing quality of oil well reservoirs based on Bayesian-random forest model. Geoenergy Sci. Eng. 2024, 241, 213077. [Google Scholar]

29.

Wang J, Wang H, Nie F, Li X. Feature selection with multi-class logistic regression. Neurocomputing 2023, 543, 126268. [Google Scholar]

30.

Keerthana D, Venugopal V, Nath M K, Mishra M. Hybrid convolutional neural networks with SVM classifier for classification of skin cancer. Biomed. Eng. Adv. 2023, 5, 100069. [Google Scholar]

31.

Yang P, Wang D, Zhao W-B, Fu L-H, Du J-L, Su H. Ensemble of kernel extreme learning machine based random forest classifiers for automatic heartbeat classification. Biomed. Signal Process. Control 2021, 63, 102138. [Google Scholar]

32.

Sun Z, Wang G, Li P, Wang H, Zhang M, Liang X. An improved random forest based on the classification accuracy and correlation measurement of decision trees. Expert Syst. Appl. 2024, 237, 121549. [Google Scholar]

33.

Ling Y, Nie F, Yu W, Ling Y, Li X. Robust autoencoder feature selector for unsupervised feature selection. Inf. Sci. 2024, 660, 120121. [Google Scholar]

34.

Gao F, Li B, Chen L, Shang Z, Wei X, He C. A softmax classifier for high-precision classification of ultrasonic similar signals. Ultrasonics 2021, 112, 106344. [Google Scholar]

35.

Chen B, Li Q, Ma R, Qian X, Wang X, Li X. Towards the generalization of time series classification: A feature-level style transfer and multi-source transfer learning perspective. Knowl. -Based Syst. 2024, 299, 112057. [Google Scholar]

36.

Yan Z, Zhong S, Lin L, Cui Z, Zhao M. A step parameters prediction model based on transfer process neural network for exhaust gas temperature estimation after washing aero-engines. Chin. J. Aeronaut. 2022, 35, 98–111. [Google Scholar]

37.

Jiang F, Lin W, Wu Z, Zhang S, Chen Z, Li W. Fault diagnosis of gearbox driven by vibration response mechanism and enhanced unsupervised domain adaptation. Adv. Eng. Inform. 2024, 61, 102460. [Google Scholar]

38.

Cui L, Jiang Z, Liu D, Wang H. A novel adaptive generalized domain data fusion-driven kernel sparse representation classification method for intelligent bearing fault diagnosis. Expert Syst. Appl. 2024, 247, 123225. [Google Scholar]

39.

Li J, Ye Z, Gao J, Meng Z, Tong K, Yu S. Fault transfer diagnosis of rolling bearings across different devices via multi-domain information fusion and multi-kernel maximum mean discrepancy. Appl. Soft Comput. 2024, 159, 111620. [Google Scholar]

40.

Li J, Lin M, Li Y, Wang X. Transfer learning network for nuclear power plant fault diagnosis with unlabeled data under varying operating conditions. Energy 2022, 254, 124358. [Google Scholar]

41.

Sabah F, Chen Y, Yang Z, Azam M, Ahmad N, Sarwar R. Model optimization techniques in personalized federated learning: A survey. Expert Syst. Appl. 2024, 243, 122874. [Google Scholar]

42.

Li Z, Li Z, Gu F. Intelligent diagnosis method for machine faults based on federated transfer learning. Appl. Soft Comput. 2024, 163, 111922. [Google Scholar]

43.

Zhou F, Liu S, Fujita H, Hu X, Zhang Y, Wang B, et al. Fault diagnosis based on federated learning driven by dynamic expansion for model layers of imbalanced client. Expert Syst. Appl. 2024, 238, 121982. [Google Scholar]

44.

Wang R, Yan F, Yu L, Shen C, Hu X, Chen J. A federated transfer learning method with low-quality knowledge filtering and dynamic model aggregation for rolling bearing fault diagnosis. Mech. Syst. Signal Process. 2023, 198, 110413. [Google Scholar]

45.

Liu B, Lv N, Guo Y, Li Y. Recent advances on federated learning: A systematic survey. Neurocomputing 2024, 597, 128019. [Google Scholar]

46.

Diba BS, Plabon JD, Mahmudur Rahman MD, Mistry D, Saha AK, Mridha MF. Explainable federated learning for privacy-preserving bangla sign language detection. Eng. Appl. Artif. Intell. 2024, 134, 108657. [Google Scholar]

47.

Liu M, Joseph Raj A N, Rajangam V, Ma K, Zhuang Z, Zhuang S. Multiscale-multichannel feature extraction and classification through one-dimensional convolutional neural network for Speech emotion recognition. Speech Commun. 2024, 156, 103010. [Google Scholar]

48.

Almodóvar A, Parras J, Zazo S. Propensity Weighted federated learning for treatment effect estimation in distributed imbalanced environments. Comput. Biol. Med. 2024, 178, 108779. [Google Scholar]

49.

Mora A, Bujari A, Bellavista P. Enhancing generalization in Federated Learning with heterogeneous data: A comparative literature review. Future Gener. Comput. Syst. 2024, 157, 1–15. [Google Scholar]

50.

Wang Q, Chen S, Zeng J, Du W, Wei L. A deep learning fault diagnosis method for metro on-board detection on rail corrugation. Eng. Fail. Anal. 2024, 164, 108662. [Google Scholar]

51.

Idrissi MJ, Alami H, El Mahdaouy A, El Mekki A, Oualil S, Yartaoui Z, et al. Fed-ANIDS: Federated learning for anomaly-based network intrusion detection systems. Expert Syst. Appl. 2023, 234, 121000. [Google Scholar]