Mao Minghui ,Liu Jiansong ,Tian Mengke ,Lin Pengrong ,Long Xu
1.School of Civil Engineering,Liaoning Petrochemical University,Fushun 113001,China;2.Beijing Microelectronics Technology Institute,Beijing 100076,China;3.School of Mechanics,Civil Engineering and Architecture,Northwestern Polytechnical University,Xi’an 710072,China
Abstract Particle swarm algorithm (PSO) and genetic algorithm (GA) were used to optimize the back propagation (BP) artificial neural network for predicting the dynamic responses of the through silicon via (TSV) based three-dimensional packaging structures.A finite element model of the TSV packaging structure with a strain-rate dependent constitutive model for solder joints was created to simulate the drop impact due to a free fall of 0.8 m to the rigid ground to investigate the structural dynamic responses during the whole impact process.The spatial coordinates of the solder joints were used as the input parameters of the hybrid neural network model for the drop impact responses,while the maximum Von Mises stress and PEEQ (plastic strain) values are identified the output parameters.The correlation coefficient (R),the mean absolute percentage error (MAPE) and the training time were used as the measures to validate and compare the proposed PSO-BP and GA-BP neural networks.The results show that both the PSO-BP model and the GA-BP model can achieve high accuracy predictions with strong generalization capability.Apparently,both optimized algorithms outperform the original BP model,but the PSO-BP model is slightly more superior than the GA-BP model.It is also demonstrated that the proposed optimized algorithms efficiently predict the drop impact responses of TSV packaging structures by greatly saving the computational and experimental cost of drop impact tests.
Key words drop impact,through silicon via,particle swarm algorithm,genetic algorithm,dynamic response
Through silicon via (TSV) based packaging structure enables the vertical interconnection between the chip and the substrate through three-dimensional (3D) integration technology,which essentially minimizes the alignment distance and area between the chips and reduces the package size compared to the traditional two-dimensional packaging structures[1].As the requirements for miniaturization and integration of chips increase in the future,the density and complexity of interconnections within integrated chips gradually increase,and TSV packaging technology is promising to solve the problems and improve chip performance for a wide range of aerospace and aviation applications.
However,due to the small size of the internal interconnect structure of the TSV and the significant 3D superposition effect between all components;it is necessary to investigate the dynamic responses under extreme service conditions.Recently,Tanaka et al.[2]investigated the mechanical effects caused by copper vias in TSV structures through a series of relevant experiments and numerical simulation models.Banijamali et al.[3]investigated the effect of spacing between solders joints in TSV structures with high aspect ratios on the mechanical properties of the chip,and optimized the TIM material and designs of under fill,microbump and passive heat sink.Thakur et al.[4]studied the size and shape of TSV structures,the thickness of the silicon film,and the thickness and material properties of the filler material to develop a 3D finite element model of the stacked packaging structure and evaluate the overall reliability.Qian et al.[5]analyzed the influence of TSV size and location on the information transmission and signal integrity between structures.Lee et al.[6]analyzed the regular and dense layouts of TSV compositions to address the performance degradation due to the increasing number of thin chips in the stack.
The above mentioned reliability studies on TSV packaging structures have mainly focused on static mechanical reliability,but no research has been conducted on the reliability of TSVs in drop impact conditions to the best knowledge of the authors.Jenq et al.[7]established a high G drop test condition to investigate the impact failure of an array of tin spheres located on a printed circuit board,and also analyzed the dynamic response during the entire drop process.Huang et al.[8]tested the drop failure mode of wafer level chip scale packaging (WLCSPs) with Sn-3.0Ag-0.5Cu(SAC305) solder joints according to the JEDEC standard for board-level drops.It was found that for internal solder joints,short FR-4 cracks have limited absorption of impact energy,leading to other failure modes on the chip side.However,traditional experiments or finite element analysis takes much time and efforts.In fact,with the continuous development of artificial intelligence methods,quite a number of engineering problems have been solved efficiently[9–10].Su et al.[11]used BP artificial neural network techniques to extract images of solder joint defects,which were identified by a geometric classification and significantly advantageous for microelectronic packaging technology.Liu et al.[12]used the Levenberg-Marquardt (LM)-BP artificial neural network technology to achieve defect monitoring of tiny solder joints,and realized nondestructive inspections of packaging structures.Zhang et al.[13]used the correlation driven neural network (CDNN) model to predict the dynamic response of a ball grid array (BGA) packaging structure due to drop impacts,and a good fit with the finite element simulations was achieved.Nevertheless,the neural networks have not been applied to predict the drop impact process of TSV structures,so it is urgently needed to predict the drop impact response of TSV packaging structures using the artificial neural network (ANN) technology.
This paper adopts the commercial finite element software ABAQUS to simulate the dynamic responses of a typical TSV packaging structure subjected to drop impacts.The obtained data sets are used to train the prediction model and establish ANN methods to predict and evaluate the mechanical reliability of packaging structures.Two hybrid algorithmic ANN models,i.e.,particle swarm algorithm(PSO)-BP and genetic algorithm (GA)-BP,are proposed to predict the dynamic responses of TSV packaging structures.The correlation coefficient(R),the mean absolute percentage error (MAPE) and the training time are used to validate and compare the two proposed models.In the future works,the proposed ANN method can be applied directly to simulate and predict the reliability of solder joints when solving similar multi-field coupling problems,rather than performing time-consuming finite element simulations.
As a typical packaging structure,the TSV packaging chip is shown in Fig.1,where the 3D finite element model is created,with the main parts consisting of the chip,the TSV structure,the silicon adapter,the epoxy mold compound (EMC),the solder joints and the ceramic shell housing.The chip size is 7.7 mm × 7.7 mm × 0.375 mm,the upper solder joint labeled as 1 is 0.06 mm in diameter and 0.05 mm in height,and the lower solder joint labeled as 2 is 0.11 mm in diameter and 0.95 mm in height.The size of the silicon adapter plate is 11 mm × 10 mm × 0.1 mm,and the size of the ceramic shell housing is 11 mm × 10 mm ×0.6 mm.There are 11 rows and 11 columns of solder joints,so a total of 121 solder joints are per layers.Solder joint 1 is connected to the chip at the top and to the silicon adapter plate at the bottom.The upper part of the TSV package chip solder joint 2 is connected to the silicon adapter plate and the lower part to the HTCC ceramic shell housing.The solder joints are surrounded by EMC for protection purposes.The solder joints are also linked by copper sheets for additional mechanical stability.The mechanical parameters of materials are given in Table 1.All input data must be given in consistent units in ABAQUS,thus,the SI system of units is used throughout the present study,that is,millimeter(m) for length,Newton(N) for force,Pa (N/m2) for stress,and kg/m3for density.The solder material used in this paper is SAC305,and the constitutive curves are shown in Fig.2.For different strain rates,the material constitutive model corresponds to different stress-strain curves.As the focus of this paper is on the training of machine learning algorithms and the prediction of stresses and deformations in packaging structures,the strain rate effect of the material is considered for more realistic simulation[14].

Fig.1 Schematic diagram of the TSV packaging structure

Table 1 Material parameters of the finite element model

Fig.2 Constitutive curves of SAC305 materials
Due to the different orders of magnitude of components in the meshing process,the mesh is divided in a mixed mesh type.The entire TSV structure has 405 916 elements and 70 394 nodes.Solder joint 2 has a total of 5 813 elements and 2 833 nodes.The overall meshing of the internal structure of the solder joints in the TSV package is shown in Fig.3.

Fig.3 Meshing scheme of a typical TSV structures
The package is simulated as a vertical free fall from the height of 0.8 m above the rigid ground with the acceleration of gravity of 10 m/s2,so the impact velocity to a rigid ground is 4 m/s.The initial velocity is set and the entire fall is simulated from a distance of 0.5 mm from the ground to reduce the calculation time,and the whole impact process is 2 × 10–4s,with 1 × 10–5s as a time interval to set the finite element parameters output.Thus,the whole simulation result has 21 time points for prediction output.The overall Von Mises stress nephogram of the TSV packaging structure after the drop impact process is shown in Figs.4 a–c.It is found that the strain rate has a significant effect on the stress-strain evolution of the solder joints.The PEEQ and Von Mises stress values for the ceramic shell are much larger than those for the solder joints and the TSV structure,so the shell structure provides better protections for the chip.The TSV structure first reaches the ground att= 0 s.The numerical simulation results show that the Von Mises stress in the solder joints reaches a maximum att= 5.5 × 10–5s.Because there are a total of 121 solder joints 2 in the whole process,it is not convenient to express all of them on the way,so this paper shows the maximum Mises stress and PEEQ values of the solder joints during the impact process.The maximum Mises value is 76.81 MPa at node 70 257 for 2.5 × 10–5s.The maximum PEEQ is 5.555 × 10–2at node 70 317 for 7.5 × 10–5s.It is worth noting that the node numbers here are for the entire packaging structure and not simply for solder joint 2.

Fig.4 TSV packaging structure Von Mises stress or PEEQ nephogram (a)–(c) are the Von Mises stress for whole structure,chip and EMC1 when t=5.5 × 10–5 s (d) Maximum Von Mises stress for solder joints 2 when t=2.5 × 10–5 s (e) Maximum PEEQ for solder joints 2 when t=7.5 × 10–5 s
The BP neural network,a classical artificial intelligence network,is based on the idea of errors backward adjustment which is the basis of most neural networks.This algorithm was first proposed by Rumelhart and McClelland[15]and consists of three main parts: an input layer,a hidden layer and an output layer.In this paper,in order to compare the effectiveness of the optimization algorithm for the drop impact simulation,the basic three layers neural network approach is chosen.The whole training process can be divided into two steps,i.e.,feed-forward of the input training pattern and the adjustment of the weights by error.
(a) Data propagation forward from the input to the output.The prediction values of each layer are calculated by


wherexjis the parameter of input layer at thejthneurons,wijandajare weight and threshold of thejthneurons between the input layer and hidden layer,andyiis the output value in hidden layer at theithneurons.Besides,wj1andbjare weight and threshold of thejthneurons between the hidden layer and output layer.Yis the output value of the BP neural network.The activation functionfused in this paper is hyperbolic tagent sigmoid transfer function tansig(x),which can be written as

According to the input-output sequence (x,H) of the network,the node numberlin the hidden layer is determined by following Eqs.(6)–(7) with the node numbernin the input layer and the node numbermin the output layer.In addition,the constantcis within the range from 1.0 to 10.

(b) Calculate the error and adjust the weight and threshold.All the values of each layer are calculated by

whereeis the error between the calculate value and the target value,ehis the error between the calculate value and the target value in hidden layer.Besides,thewj1new,bjnew,wijnew,ajneware the adjusted weights and thresholds in each layer.When the error meets the accuracy requirement,the learning process is terminated and the corresponding neural network is output.If accuracy requirement is not satisfied,one should iterate in steps (a) and (b) until the erroreis within the allowable tolerance.The flowchart of BP algorithm was shown in Fig.5 and the detailed learning parameters are described in Section 3.

Fig.5 Flowchart of the BP algorithm
GA is a kind of uncertain algorithm to reflect the biological mechanisms of organisms in nature and is superior to deterministic algorithms in solving certain specific problems.The algorithm generates paternal parent and maternal parent by generating a series of individuals,and selects the optimal individuals by continuously inheriting and changing individual characteristics.At the same time,the fitness function is used as the target direction to guide the GA optimization.Thus,GAs can optimize existing artificial intelligence algorithms.In essence,GAs usually consist of three parts,namely selection operations,crossover operations and mutation operations.After these operations,the optimal individual can be tracked by means of a fitness function.In order to getting the maximize the optimization capability,the mean square error (MSE) of the BP network is chosen as the fitness value,which can be expressed by

whereyiis the value calculated by FE simulations,ypreiis the value predicted by ANN algorithms,andnis the data size.As the MSE value means the square error between the two sets of data.The less MSE value represents greater consistency between the two groups.
GA can be divided into three main operations,i.e.,selection operation,crossover operation and mutation operation.All of above operations can be referred to the previous works[16].The weights and thresholds for the training process of the BP algorithm are optimized in conjunction with the genetic algorithm as shown in Fig.6.The weights and thresholds are arranged as an individual in step (a) of Section 3.2.In this paper,the entire optimization process is implemented through the above operations,while the optimized values are updated with the initial weights and thresholds as the initial parameters of the BP neural network.More discussions on the parameters are made in Section 3.

Fig.6 Flowchart of the proposed GA-BP algorithm
PSO is an evolutionary computation technique proposed in 1995 by James and Russell[17],based on the behavior of bird flocks in predation.The algorithm works by setting up random particles in a certain space,each of which individually searches for a target in a multidimensional space.Besides,each particle has a search speed and a position,and then the distance is calculated between all of them and the target.The speed and position of the best particle are found,and the information is shared among all particles so that other particles are constantly moving towards this particle.As each particle is searching for the optimal solution,the speed of the search can be greatly increased for iterating and updating the speed and position.The final optimal solution satisfies the termination condition.The PSO algorithm flow is as follows.
(a) Initialization of PSO parameters.The values are set for the following parameters: the maximum number of evolution generations,the number of independent variables of the target,the maximum velocity of the particle,the position information for the entire search space.The values are randomly initialized for the following parameters: the velocity and position on the velocity interval and search space,the particle swarm size,and the flying velocity of each particle.
(b) Definition of the individual extremes and global optimal solution.Eq.(13) defines the fitness function.Individual extremes are the optimal solutions found for each particle,and a global value is found from these optimal solutions,which is termed as the current global optimal solution.The comparison is made with the historical global optimum for updating.
(c) Update of velocity and position.The first part of Eq.(14) is called the memory term,which represents the effect of the last velocity magnitude and direction;the second part of Eq.(14) is called the self-cognition term,which is a vector pointing from the current point to the particle's own best point,indicating that the particle's action comes from the part of its own experience;the third part of Eq.(14) is called the population cognition term,which is a vector pointing from the current point to the best point of the population,and reflects the collaborative cooperation and knowledge sharing among particles.It is through one's own experience and the best of one's peers that a particle decides on its next movement.Eqs.(14)–(15) are used as a basis for the standard form of PSO.

wherec1andc2represent the learning factors,andr1andr2are random numbers between [0,1].viis the velocity of theithparticle,xiis the best position of the best particle,pbestiandgbestiare the best velocity of each particle and all particles.
During drop impact process,the failure criterion for solder joints is mainly determined by the PEEQ and Von Mises stress values.Therefore,there is an urgent need for a simple,accurate and efficient drop impact prediction method.To address this problem,the optimized BP neural networks are proposed to predict the dynamic response of a solder joint during the drop impact process.As shown in Fig.7,the input values of the neural network are the coordinates of the nodes of solder joint 2,which are values along theXaxis,Yaxis andZaxis,respectively.The number of nodes in the hidden layer is 10 and the number of parameters in the output layer is 1.The value is the maximum PEEQ or Mises value during the vertical drop condition.For this condition,there are a total of 2 833 cases,and this paper increases the robustness of the algorithm by setting the validation set during the training process.The 2 430 solder joint 2 cases is selected as the learning sample and 270 values as the validation sample randomly,which adopts the ratio of training set to validation set 9:1.The 133 remaining cases are selected as the test set,and the ratio of this sample is a composite of the recommended division ratio for artificial neural networks in general.Since the coordinate values of the solder points are small,all the coordinate values are normalized to within [0,1] to eliminate the prediction error caused by the order of magnitude difference between the input and output values.After the calculation,the value is subjected to an inverse normalization operation to obtain the true value.The maximum number of epochs of the BP algorithm in this paper is 3 000,with a target MSE value of 0.000 1,and the MSE value of the validation set increases continuously by 500 epochs will cause the algorithm to stop training.Also considering that the BP algorithm is a local search optimization method,the network weights are gradually adjusted by the direction of local improvement,which makes the algorithm into local extremes and the weights converge to the local minima,thus leading to network training failure.Therefore,this paper proposes two optimization algorithms using GA and PSO,based on the original initial values of the BP algorithm to improve the prediction accuracy of BP neural networks.According to the calculation of the initialized population size of the optimization algorithm,the GA generates an individual of length 41 as the initial individual of the optimization algorithm.The number of iterations of generation is 15,the initial population size is 5,and the crossover and variance probabilities are 0.2.For the PSO algorithm,the particle generates a 41-dimensional space for the particle to search for the optimal solution.The speed of search is 2,the learning factor is 1,the maximum number of evolution generation is 100,and 5 particles are used for the optimization.

Fig.7 Flow chart of the proposed GA optimized BP algorithm
The whole optimization process is shown in Fig.8.Figs.8 a–b show the Mises training error curve and the PEEQ training error curve,respectively.It is seen that for the same number of training sessions,the PSO algorithm has the smallest MSE error,the GA has the second largest MSE error,and the original BP neural network algorithm has the largest MSE error.This shows that the optimization algorithm is able to significantly improve the prediction of the maximum PEEQ and Von Mises stress values in the drop impact condition.To better describe the optimization process of the GA and PSO algorithms for the original neural network,the MSE error plots for Von Mises stress and PEEQ with the number of evolution generations are depicted in Fig.8 c–Fig.8 d.After 15 generations,the MSE of Mises stress and PEEQ are reduced from 0.045 96 and 0.036 51 to 0.429 5 and 0.032 05 respectively.The PSO algorithm can perform more evolution generations due to its simple structure,and its MSE with the number of searches is shown in Fig.8 e.After 100 evolution generations,the MSE values of Mises stress and PEEQ were reduced to 0.117 33 and 0.220 54,respectively.

Fig.8 Performance of the training process (a) and (b) MSE values along with the epochs for Von Mises stress and PEEQ (c)and (d) performance of GA optimization for Von Mises stress and PEEQ (e) performance of PSO optimization for Von Mises stress and PEEQ
Compare the FE model,the advantages and disadvantages between the three ANN algorithms are addressed.The simulation time for this TSV drop model is 6 861.9 s on a desktop workstation.The use of the artificial neural network method has a significant advantage in terms of computational speed compared to the time required for the finite element predictions of stress evolution during the drop impact.In order to objectively evaluate the prediction reliability,the correlation coefficient (R) and mean absolute percentage error (MAPE) are expressed as evaluation criteria and the specific expressions are shown in Eqs.(16)–(17).And theXi,Yi,Nare prediction by finite element simulation,the prediction by ANN models and the number of the cases.

Table 2 shows the comparison of training time,MAPE values andRvalues of the 133 cases in the test set by ANN models.It is observed that the PSO-BP reduces the MAPE of the Von Mises stress and PEEQ values by 5% and increases the time only by 7 s.The GA-BP is able to reduce the MAPE by 4% but increases the time by 40 times.The PSO algorithm has an advantage over the GA in terms of time and accuracy.It is worth noting that the magnitude of theRvalue,which in some ways indicates the degree of linear correlation between the two sets of data,has no absolute significance,but in general when theRvalue is over 0.8,the two data sets are strongly correlated.TheRvalue of the PSO optimized algorithm is significantly better than theGA-BP and the original BP algorithm.

Table 2 Comparison of time cost and accuracy with three ANN models
Figs.9 a–b depict the predicted values and the finite element simulation comparison values of Mises stress and PEEQ values for test set.It is clearly shown that the values predicted by the PSO-BP are concentrated on the 45° line,while the accuracy of GA-BP predictions are lower and the original BP predictions are the most dispersed.Fig.9 c–Fig.9 d depict the error probability distributions of the Mises stress and PEEQ values for the 133 test cases.It can be seen from the figures that the PSO algorithm prediction error probability is mainly concentrated around 0,indicating a higher accuracy of prediction.At the same time,these unfavorable factors greatly increase the prediction difficulty due to the small spatial distance between the weld joints and the strain-rate related intrinsic structure chosen for the material of the solder joints.It is also found that all three algorithms deviated significantly in predicting larger or smaller stress and strain values,due to less extreme working conditions,resulting in slower learning of their algorithms.Overall,the correlation between the predicted values of the ANN models the values calculated by the FE methods is above 75%,and the PSO-BP model is able to achieve the accuracy of over 80%.Therefore,the predicted values of the ANN algorithms are relatively reliable.In summary,the prediction results show that the PSO algorithm optimized model guarantees the prediction accuracy and computation time in a sufficient range and produce sufficiently accurate predictions.

Fig.9 Comparisons of dynamic responses predicted by FE,original BP,GA-BP and PSO-BP (a) and (b) The correlation between FE and ANN models for Von Mises stress and PEEQ (c) and (d) Error probability distribution for Von Mises stress and PEEQ
(1) The GA optimized neural network reduces MAPE by 4% but increases the time by a factor of 40.The PSO algorithm is able to accomplish 5% MAPE reduction in 1/40 of the time by using the GA algorithm.This means that the PSO-BP network provides outstanding advantages for both time cost and accuracy.
(2) By employing ANN models to investigate the dynamic response of TSV package at drop impact conditions,the computational methodology is generalizable.Applications of optimized ANN methods can lead to simplified modeling and increased computational speed.
(3) The PSO-BP and GA-BP ANN methods trained by finite element predictions have good applicability and great potential for further development in predicting the mechanical reliability of TSV-based 3D packaging structures under complex operating conditions.