Transparent computational intelligence models for pharmaceutical tableting process

Solid dosage forms are dominant on the pharmaceutical market. It is estimated that the tablets as the most common and popular oral dosage forms constitute more than two-thirds of the global market. They are prepared usually by compressing uniform volumes of powder mixtures consisting of active pharmaceutical ingredient (API) with suitable excipients such as diluents, binders, disintegrating agents, glidants, lubricant, taste maskers, etc. Therefore, understanding the physiochemical properties of ingredients and Abstract Purpose: Pharmaceutical industry is tightly regulated owing to health concerns. Over the years, the use of computational intelligence (CI) tools has increased in pharmaceutical research and development, manufacturing, and quality control. Quality characteristics of tablets like tensile strength are important indicators of expected tablet performance. Predictive, yet transparent, CI models which can be analysed for insights into the formulation and development process. Methods: This work uses data from a galenical tableting study and computational intelligence methods like decision trees, random forests, fuzzy systems, artificial neural networks, and symbolic regression to establish models for the outcome of tensile strength. Data was divided in training and test fold according to ten fold cross validation scheme and RMSE was used as an evaluation metric. Tree based ensembles and symbolic regression methods are presented as transparent models with extracted rules and mathematical formula, respectively, explaining the CI models in greater detail. Results: CI models for tensile strength of tablets based on the formulation design and process parameters have been established. Best models exhibit normalized RMSE of 7 %. Rules from fuzzy systems and random forests are shown to increase transparency of CI models. A mathematical formula generated by symbolic regression is presented as a transparent model. Conclusions: CI models explain the variation of tensile strength according to formulation and manufacturing process characteristics. CI models can be further analyzed to extract actionable knowledge making the artificial learning process more transparent and acceptable for use in pharmaceutical quality and safety domains.

the mechanical behavior of powders during tableting process is very important for the quality of tablets with mechanical strength as one the profound parameters.
It has been observed that the upstream process of formulation design and manufacturing has an intrinsic effect on the physical and mechanical properties of the tablet: an important one being expressed as tensile strength.Tensile strength of tablets is an indicator of how strongly the ingredients are compacted and it gives an indirect measure of how the tablet will perform once consumed.Development of formulations and optimization of tableting conditions are intrinsically complex in nature-leading to reliance on empirical methods in practice (Sun 2009).Variations while manufacturing the tablet could lead to an undesirably slow rate of disintegration if the tablet is too hard or failure during packaging and shipping if the tablet is weak.Disintegration and dissolution are equally important considerations within the scope of Quality by Design (QbD) (ICH 2009).
It is imperative to have in-depth understanding of the process and its parameters and their response to different formulations and manufacturing conditions.Complexity of the problem requires the use of advances empirical approaches.The need for constant supervision of the process is imperative since the intermediate points at which variation might be introduced are numerous.For example, unwarranted changes in different moisture conditions (Gupta et al. 2005), particle sizes and pores (Nicklasson and Podczeck 2007), crystalline forms of molecules (Maghsoodi 2012), effects of roller compaction (Sun and Himmelspach 2006), and batch sizes, etc., can cause significant variation is the quality characteristics of the product.It is expensive and time-consuming to run experimental tests for all possible upstream process combinations in order to optimize the endpoint, rather it is much cheaper and faster to develop predictive models using computational intelligence (CI) which can be used as guidance tools in a competitive and rapidly changing environment.
The use of CI has been demonstrated in pharmaceutical manufacture before by previous works; all aiming towards increasing understanding of systems and using CI models as a stepping stone towards implementation of QbD approach (Ibrić et al. 2012).Neural networks, fuzzy systems, and other techniques have been used for various applications in pharmaceutical environments (Bourquin et al. 1998;Shao et al. 2007;Landin et al. 2012) including but not limited to assessment of tensile strength and dissolution profiles.
Changes in one component of the system has a profound effect which cannot be explained by the sum of all changes within the system components.Complex adaptive systems absorb the changes upstream of the process and evolve as they progress (Chaffee and McNeill 2007).In case of pharmaceutical tableting, complexities are abundant.Powder physical and chemical properties, powder mixtures, response of powder mixtures to the pharmaceutical processes (mixing, roll compaction, milling, tableting, coating, etc.), and the processes themselves can add to the complexity within the whole system.The combined effect of variations added at different stages of the process are non-linear, sometimes unintentional, and adds to difficulties in prediction.Paley and Eva (2010) argue that the use of complex systems can capture the unintentional behaviors of entities, as has been observed in the case of powder segregation while in the feeder of the tableting machine (Ortiza et al. 2014).
Although CI techniques have rapidly gained pace within the pharmaceutical technology sphere, their black-box mode of work remains a reason for skepticism within regulatory authorities.Demand for CI models to be transparent is to ensure efficacy and safety of a drug by fulfilling modern requirements for ultimate control and understanding of every element of the process including modeling procedures.This work makes an attempt to develop models for the quality characteristic of tensile strength using various CI methods and to dissect the best tree based models to extract rules describing the model.Finally, we present symbolic regression, the output of which can be represented in the form of an equation clearly showing the relationship of input variables to the outcome.
This paper is an extended version of Khalid et al. (2015).

Data
Data was collected from a galenical study conducted on tableting using an undisclosed API in fixed quantity and four excipients (Silica Aerogel, MicroCrystalline Cellulose, Magnesium stearate, Sodium CarboxyMethyl Cellulose) in varying quantities.The study followed a vertex centroid experiment design generating 17 unique mixtures.For the 17 mixtures, two die compaction machines with three compaction pressures and two compaction speeds were used.One additional mixture from the preliminary trials was added.Details of the excipients used and experimental conditions are explained in the source publication for data (Bourquin et al. 1998).

Data transformation and organization
The main data set was divided into ten training and test files following the tenfold cross-validation (10cv) procedure using the cv tools library from CRAN.According to 10cv procedure, the data set was divided into ten training and test folds of 90 and 10 %, respectively.Models were created of nine folds and tested on the tenth over ten iterations.

Methods
Different methods were used to create a large number of models (~7000).The models were selected based on the RMSE values.The RMSE values were calculated using Eq. 1 on the test folds of the data set.
where n is the number of records, and pred and obs are predicted and observed values, respectively.All RMSE values were normalized using Eq. 2. Hence, they are represented in percentages and are comparable over all different methods. (1) where X max and X min are the maximum and minimum observed values of tensile strength in the original data set.

Tree based methods
The following tree based methods were used.

Cubist
Cubist is an implementation of tree based modeling approach in R where a resulting tree is a set of linear models at each node starting from the root to the last node.A tree is generated on the complete training data set and the best node of the tree is converted into a rule.Linear models are fit at the terminal node, results of which are smoothed with the predictions of linear models from earlier nodes within the tree.This process is continued in recursion until all the variables have been covered by a single or a set of rules.This is also known as the separate-and-conquer approach (Fürnkranz 1999).Furthermore, boosting-like mechanisms are applied where response adjustment is carried out for successive models based on the predictions of the previous models (Quinlan 1992).Cubist exhibits speed and an impressive generalization ability with regression problems.Cubist algorithm has been used in pharmaceutical research for ADME/ADMETox prediction models (Gupta et al. 2010).

Random forest and interactive trees
Random forests is a tree based model where many tree predictors are stacked together to form one model.Each tree is created on an independent and random sample taken from the training data set.In one forest, the sample distribution is kept same for all the trees.The generalization error of a forest depends on the errors of individual trees and the correlation between the trees (Breiman 2001).Random forests are known to be good for classification problems but they have work well with regression and feature selection problems too.To extract rules from randomForest models, CRAN package inTrees was used.inTrees extracts, measures, prunes, and selects rules from tree based ensembles (Houtao 2014;Pacławski et al. 2015).

Artificial neural networks
MON-MLP are generalized feed forward multi layer perceptron neural networks which work in a monotone fashion using NLM as their training algorithm.They allow two hidden layers with a choice of two activation and transfer (tanh and linear) functions.MON-MLPs are known to be robust with regression problems (Cannon 2005).

Fuzzy systems
This is an evolutionary algorithm for fuzzy systems, a genetic algorithm is used to construct a fuzzy system able to fit the given training data.This fuzzy system can then be used as a prediction model, composed of fuzzy logic rules that can be further analyzed to provide plausible linguistic representation.One of the implementation of genetic algorithm based fuzzy systems in R is FugeR (Bujard 2012).
In this experiment, a maximum generation of 500 and population of 1000 were allowed, respectively.Out of every generation, 20 % of the population was set to be elitist.The rules generated from these experiments were set to sizes 10, 20, and 50 with 1-10 maximum variables per rule allowed.Each input and output variable is assigned membership functions describing the range an input variable has.A collection of such rules guides the input variable values to the predicted output.

Evolutionary computation
This work makes use of fuzzy systems with co-evolution and symbolic regression methods in its course.The aim is to create models and use the models to extract rules and a mathematical formula in case of symbolic regression.

Symbolic regression by RGP
Genetic programming (GP) involves the automatic generation of computer programs to perform a user defined task.GP is bio inspired algorithm based on evolution principles to solve complex problems (Poli et al. 2008).Although RGP computations are expensive on time and computational power, their results are simple representations of the problem without being exposed to a priori information about the problem beforehand.RGP offers various options for initialization, variation, and selection procedures inherent in GP.
The population size was set to 1000 and the modeling process was set to 5 million evolution steps divided into ten stages.After each stage, the models were tested according to the tenfold cross-validation method.RMSE of 0.12 was used as an additional algorithm stop condition based on the guidance of previous results generated by other tree based packages.The equations were created on the whole data set initially and then selected ones were optimized using SANN algorithm followed by the BFGS method (Nash and Varadhan 2011;Nash 2014).

Results and discussion
All CI methods generalize on the outcome of tensile strength (Table 1).ANNs and fuzzy systems learn models with low error.ANNs are generally known to be a very powerful tool to create predictive models.Apart from being robust and powerful, a great disadvantage of ANNs is that the exact calculations of how the resulting model was created cannot be supervised owing to its black box nature.While they can be explained further by looking at the number of layers and neurons, transfer functions used, or the optimized weights, but the behavior of ANNs cannot be demystified with the certainty as one would have about a regular statistical methods.As explained in detail by Rodvold et al. (2001), weight adjustment is not exhaustive and the efficiency of the resulting neural network partially relies on the training algorithm adjusting randomly initiated weights.Such mode of work feeds into the intrinsic instability that ANNs come with.There exist method using which rigorous testing of ANNs can be carried out but most are inconclusive on the elusive nature of initialization and training of neural networks.
Owing to concerns about safety of use and reliability, more rule and formula based methods are explored where model behavior can be explicated.
Results from fugeR package can be represented in the form of linguistic rules although there exists no automatic methods which can defuzzify the system to a human readable form.Manual efforts are needed to extract rules and membership functions from fugeR results and to map the membership functions back to the fuzzy rules.A set of rules (Additional file 1) is generated where each rule contains information about two input variables interacting with each other.The rules guides the input variable values to the predicted output.FugeR models, however accurate with predictions, generate rules that are sometimes redundant and contradictory to each other within the same model raising doubts about the validity and safety of use of the models in a pharmaceutical environment.
Random forests show comparable results to monmlp and fugeR.The advantage of using random forests is that they are rule based techniques and that the output can be generated in a linguistic manner for further analysis using the inTrees package.Such rules, once simplified, can be used as guidance towards understanding and informed manipulation of the system.However, there are a few impediments; the rules created are large in number that generalizing through them can be daunting, and they might represent the problem in a wide manner leading to variability in the results.Variability in how the system processes inputs to compute results is to be avoided owing to consistency and quality considerations within the pharmaceutical industry and with the regulatory authorities.With symbolic regression, such variability can be avoided as the solution can be represented in the form of an equation.
Additionally, for the purpose of transparency of process of CI models, randomForest models were further analyzed to extract rules explaining the models.The rules lead to transparency in black-box modeling techniques.Table 2 shows rules generated from randomForest models by using the inTrees package.Table 3 shows the range output for predictions (pred) in Table 2.
The algorithm of inTrees discretizes all the input and output values in the data set before dividing them into three quantiles based on the outcome value (Table 3).The rules are then extracted and pruned to define the outcome as low, medium, and high which correspond to the initially defined three quantiles of the outcome values.In Table 2, the rules are presented as conditions in a simple linguistic manner which can be interpreted and used as guidance to create a product of tensile strength within a certain known range.'Freq' and 'err' are the number of occurrences of that particular conditions and how many cases deviate from the condition in the data set, respectively.
The set of rules generated from random Forest models can be flexible and represent the model in a wide design space.However, the flexibility of rules may lead to faltering accuracy of the models under various bordering scenarios.In certain cases where the data sets are not designed accurately, the randomForest rules may lead to conflicting outcomes.To tackle such drawbacks, symbolic regression can be used which represents the solution to a problem in the form of a mathematical formula (Eq.3). Figure 1 shows the scatter plot for Eq. 3. RMSE: 0.22722; where C 1 : −1.523866; X 3 : amount of Magnesium Stearate; X 5 : Dwell time (speed of the die compactor).
Equation 3 is simple and represents the problem in a concrete manner.The original data set contains six input features while the equation represents the two most important ones.This is an example of feature selection behaviour by rgp, which has been observed in other instances as well (Mendyk et al. 2015).Feature selection densifies the effect of crucial inputs in the system and discards the trivial ones in an attempt to capture more information in the model yet making it simpler.Out of all the inputs, Magnesium stearate and dwell time were selected as critical features.Although rgp prediction error for tensile strength was highest in the ranking (Table 1), it is the most transparent model of all methods tried.The choice of this equation was a tradeoff between simplicity and predictability performed due to the fact that complexity of rpg models found closer to the best generalization error was increased exponentially and the resulting models were over fitted.
(3)  Magnesium stearate is an excipient exhibiting a profound effect on the tensile strength of the tablets within these experimental conditions-supported by previous experimental studies (Hentzschel et al. 2012).As can be seen in Fig. 2 increasing Magnesium stearate concentration decreases tensile strength of the tablets, as previous research has shown (Fukui et al. 2001).This tensile strength decrement can be explained by reduction of inter-particulate bonding caused by formation of lubricant around carrier particles.Therefore, presence of lubricant around carrier particle causes weaker bond between them and consequently lower tensile strength (Bolhuis et al. 1975;Duberg and Nyström 1982;Vromans and Lerk 1988).Tensile strength is also dependent on the Dwell time which represents the interval during which maximum compaction pressure (defined as >90 % of peak pressure) is maintained by the punches during the compaction cycle (Vezin et al. 2008).The longer Dwell time leads to a longer inter particular bond formation under compaction resulting in stronger tablets (Xu et al. 2015).

A complex systems perspective
Prediction of tensile strength benefits the drug discovery and production chain by preventing failures beforehand, which can be extended designing a strategy that takes into account the design problems and their solutions at an early stage in the drug discovery and manufacture life cycle (Thomke and Fujimoto 2000), also conforming with QbD principles by FDA (ICH 2009).Developed CI models allow testing several approaches within the boundaries without the necessity of performing an assay/conducting experiments in laboratory.Increased understanding of the components of the system and how they interact lead to higher success rates of delivering the drug to market in less time and cost.Variables critical for (data-driven) predictive ability are discussed here, as opposed to variables already known to be typical for product quality.Our results focus on highlighting variables which are important in increasing predictive ability of the system for Fig. 1 Shows the predicted vs observed graph for all the modeling methods tensile strength.In this case, the amount of Magnesium stearate and the speed of tableting machine (dwell time) were found out to be the most important variables to predict tensile strength.Lesko et al. (2000) argues that the need for predictability in the pharmaceutical drug manufacture is of utmost importance as it can only be achieved by truly understanding the drug, underlying interactions, and the prevailing conditions knowledge of which will directly influence the design of production process (Van Dyck and Peter 2006).

Conclusions
CI models represent the problem of tensile strength satisfactorily.Furthermore, models have been further analyzed in an attempt to make them more transparent.Rules were extracted from randomForest models and represented in a simple and understandable manner which can be used by the pharmaceutical industry for research and regulatory purposes.A mathematical formula was created using symbolic regression which defines the problem of tensile strength for the particular data set used.Symbolic regression results exhibit feature selection behavior taking into account only the input variables which are contributing mostly towards the output.The latter is a starting point to further considerations about possible mechanisms governing analyzed problem.Tensile strength is a factor describing mechanical strength of a tablet.As addition of magnesium stearate was found to be responsible for tablets being less durable, it might be hypothesized that hydrophobic character of this excipient disrupts some hydrophilic interactions between particles in the tablet mass (Hersen-Delesalle et al. 2007).It conforms very well with one of the theories of tablets formation, where residual and/or crystalline water present in the bulk material of tablet mass during compression is relocated and causes re-crystallization of the material in-between particles thus creating inter-particles bonds contributing to the strength of the resulting tablets (Crouter and Briens 2014).

Fig. 2
Fig. 2 Effect of Dwell time and Mg on tensile strength of tablets

Table 2 Rules generated from randomForest models using inTrees
Mg, MC, SA, NaCMC represent excipients and Dwell and Compr.represents process conditions.Len Length of rules, freq frequency of occurrence of the rule, err error indicating the occurrence of a different prediction, condition: the rule itself, pred prediction if the condition is true, impRRF importance of the rule according to randomForest