A scheme to analyze agentbased social simulations using exploratory data mining techniques
 M. Hammad Patel^{1},
 Mujtaba Ahmed Abbasi^{1},
 M. Saeed^{1} and
 Shah Jamal Alam^{2}Email author
https://doi.org/10.1186/s4029401800528
© The Author(s) 2018
Received: 14 October 2017
Accepted: 30 December 2017
Published: 22 January 2018
Abstract
Keywords
Background
Agentbased models simulating social reality generate outputs which result from a complex interplay of processes related to agents’ rules of interaction and model’s parameters. As such agentbased models become more descriptive and driven by evidence, they become a useful tool in simulating and understanding social reality. However, the number of parameters and agents’ rules of interaction grows rapidly. Such models often have unvalidated parameters that must be introduced by the modeler in order for the model to be fully functional. Such unvalidated parameters are often informed by the modeler’s intuition only and may represent gaps in existing knowledge about the underlying case study. Hence, a rather long list of model parameters is not a limitation but an inherent feature of descriptive, evidencedriven models that simulate social complexity.
Theoretical exploration of a model’s behavior with respect to its parameters in particular those that are not constrained by validation is important but have been, until recently, limited by the lack of available computation resources and analysis tools to explore the vast parameter space. An agentbased model of moderate complexity will, when run across different parameters (i.e., the total number of configurations times the number of simulation runs) generates output data that could easily be on a scale of gigabytes and more. With high performance computing (HPC), it has become possible for agentbased modelers to explore their models’ (vast) parameter space, and while generating this simulated ‘big data’ is becoming (computationally) cheaper, analyzing agentbased model’s outputs over a (relatively) large parameter space remains a big challenge for researchers.
In this paper we present a selection of practical exploratory and data mining techniques that might be useful to understand outputs generated from agentbased models. We propose a simple schema and demonstrate its application on an evidencedriven agentbased model of interethnic partnerships (dating and marriages), called ‘DITCH’. The model is available on OpenABM^{1} and reported by Meyer et al. (2014). In the analysis reported in this paper, we focus on the dynamics and interplay of the key model parameters and their effect on model output(s). We do not consider the model’s validation in terms of the case studies on which it is based.
The next section (“Analyzing agentbased models: a brief survey” section) reviews selected papers that have previously addressed the issue of analyzing agentbased models. “A proposed schema combining exploratory, sensitivity analysis and data mining techniques” section present a general schema to analyze outputs generated by agentbased models and gives an overview of the exploratory and data mining techniques that we have used in this paper. In “Illustration: implementing the proposed schema on the ‘DITCH’ agentbased model” section, we present an overview of the DITCH agentbased model and discuss its parameters with their default values that have been reported by Meyer et al. (2014). This section also describes the experimental setup and results and finally, “Conclusions and outlook” section concludes with next steps in this direction.
Analyzing agentbased models: a brief survey
Agentbased models tend to generate large volumes of simulated data that is dynamic and highdimensional, making them (sometimes extremely) difficult to analyze. Various exploratory data analysis (EDA) and data mining (DM) techniques have been reported to explore and understand a model’s outcome against different input configurations (e.g., VillaVialaneix et al. 2014). These techniques include heatmaps, box and whisker plots, sensitivity, classification trees, the Kmeans clustering algorithm and ranking of model parameters’ in influencing the model’s outcomes.
Several papers have proposed and explored data mining techniques to analyze agentbased simulations. One such is by Remondino and Correndo (2006) where the authors applied ‘parameter tuning by repeated execution’, i.e., a technique in which, multiple runs are performed for different parameter values at discrete intervals to find parameters that turn out to be most influential. The authors suggested different data mining techniques such as regression, cluster analysis, analysis of variance (ANOVA), and association rules for this purpose. For illustration, Remondino and Correndo (2006) presented a case study in which a biological phenomenon involving some species of cicadas was analyzed by performing multiple runs of simulations and aggregating the results. In another work, Arroyo et al. (2010) proposed a methodological approach involving a data mining step to validate and improve the results of an agentbased model. They presented a case study in which cluster analysis was applied to validate simulation results of the ‘MENTAT’ model. Their aim was to study the factors influencing the evolution in a Spanish society from 1998 to 2000. The clustering results were found to be consistent with the survey data that was used to initially construct the model.
Edmonds et al. (2014) used clustering and classification techniques to explore the parameter space of a voter behavior model. The goal of this study was to understand the social factors influencing voter turnout. The authors used machine learning algorithms such Kmeans clustering, hierarchical clustering, and decision trees to evaluate data generated from the simulations. Recently, Broeke et al. (2016) used sensitivity analysis as the technique to study the behavior of agentbased models. The authors applied OFAT (‘One Factor at a Time’), global, and regressionbased sensitivity analysis on an agentbased model in which agents harvest a diffusing renewable source. Each of these methods was used to evaluate the robustness, outcome uncertainty and to understand the emergence of patterns in the model.
The above cited references are by no means exhaustive but provide some interesting examples of the use of data mining techniques in analyzing agentbased models. In the next section, we give an overview of some of the EDA and sensitivity analysis (SA) techniques used in this paper. “Illustration: implementing the proposed schema on the ‘DITCH’ agentbased model” section of this paper further discusses the EDA, SA and DM techniques visàvis the analysis of simulated outputs of an agentbased model.
A proposed schema combining exploratory, sensitivity analysis and data mining techniques
We propose a schematic approach as a step towards combining different analysis techniques that are typically used in the analysis of agentbased models. We present a methodological approach to use exploratory, statistical and data mining techniques for analyzing the relationships between inputs and output parameters of an agentbased model. Applying the appropriate technique (or a set of techniques) to analyze a model’s behavior and parameters sensitivity is the key to validate and predict any real word phenomena in an agentbased model. In “Illustration: implementing the proposed schema on the ‘DITCH’ agentbased model” section, we demonstrate the application of various exploratory data analysis, sensitivity analysis, and data mining techniques to understand the impact of various input parameters on the model output.
Next, we present an overview of some of the techniques that may be applied for each step in the schema, as shown in Fig. 1.
Exploratory data analysis
Data analysis in exploratory data analysis (EDA) is typically visual. EDA techniques help in highlighting important characteristics in a given dataset (Tukey 1977). Choosing EDA as a starting point in our proposed schema provides a simple yet effective way to analyze relationship between our model’s input and output parameters. Graphical EDA techniques such as box and whisker plots, scatter plots, and heat maps (Seltman 2012) are often reported in the generated data from an (agentbased) simulation. Heat maps are (visually) often good indicators of patterns in the simulated output when parameter values change, whereas, the scatter maps are good often indicators to highlight association between two independent variables (model parameters) for a particular dependent variable (model output). Box and whisker plots on the other hand, summarize a data distribution by showing median, the interquartile range, skewness and presence of outliers (if any) in the data. Other techniques such as histograms and violin plots are used to describe the full distribution of an output variable for a given input parameter configuration(s) and are more descriptive than box and whisker plots (Lee et al. 2015).
In this paper, we used the ggplot2 package in R to generate heat maps and box and whisker plots for output variables against the most influential parameters having variations. The result as shown in “Illustration: implementing the proposed schema on the ‘DITCH’ agentbased model” section highlights the tipping points in heat maps where the percentage of dependent variable changes significantly. In order to explore the variation in output across the varying parameters, box plots were plotted for different parameter configurations. The results produced while plotting box plots can thus be used to identify the subset of a dataset contributing more in increasing the proportion of a target variable.
Sensitivity analysis
The purpose of performing sensitivity analysis is to study the sensitivity of input parameters of our ABM in generating the output variables, and thus, provide a more focused insight than exploratory analysis techniques. Several techniques may be used to perform sensitivity analysis. For instance, for the results reported in “Illustration: implementing the proposed schema on the ‘DITCH’ agentbased model” section, we performed multiple sensitivity analysis techniques such variable importance method, recursive elimination method, and PRCC (partial rank correlation coefficient).
Following step 2 of the proposed schema (Fig. 1), we identify two useful methods that are used in the analysis in “Illustration: implementing the proposed schema on the ‘DITCH’ agentbased model” section: variable importance and recursive feature elimination.
Variable importance
For a given output variable, ranking of each input variable (model parameter) with respect to its importance can be estimated by using model information (training data set). Variable importance thus quantifies the contribution of each input variable (parameter) for a given output variable. The method assumes a linear model, whereby the absolute value of each model parameter is used to train the dataset to generate importance of each input variable. In our case, we used caret package in R, which constructs a linear model by targeting a dependent attribute against the number of input attributes and then ranking with respect to their estimated importance.
Recursive feature elimination
The recursive feature elimination (aka RFE) method builds many models based on the different subsets of attributes using the caret package in R. This part of analysis is carried out to explore all possible subsets of the attributes and predicting the accuracy of the different attribute subset sizes giving comparable results.
Using data mining to analyze ABM outputs
There is a growing interest in the social simulation on the application of data mining techniques to analyze multidimensional outputs that are generated from agentbased simulations across a vast parameter space. In this section, we present an overview of some of the common datamining techniques that have been used to analyzed agentbased models’ outputs.
Classification and regression trees
A classification/regression tree is based on a supervised learning algorithm which provides visual representation for the classification or regression of a dataset (Russell and Norvig 2009). It provides an effective way to generalize and predict output variables for a given dataset. In such trees, nodes represent the input attributes, and edges represent their values. One way to construct such a decision tree, is by using a divideandconquer approach to reach the desired output by performing a sequence of tests on each attribute node and splitting the node on each of its possible value. The process is repeated recursively, each time selecting a different attribute node to split on until there are no more nodes left to split and a single output value is obtained.
KMeans clustering
KMeans clustering is one of the widely implemented clustering algorithms and have been used to analyze agentbased models, e.g., Edmonds et al. (2014). It is often used in situations where the input variables are quantitative and a squared Euclidean distance is used as a dissimilarity measure to find clusters in a given dataset (Friedman et al. 2009). The accuracy of the Kmeans clustering algorithm depends upon the number of clusters that are specified at the initialization; depending upon the choice of the initial centers, the clustering results could vary significantly.
Illustration: implementing the proposed schema on the ‘DITCH’ agentbased model
In this section, we present an overview of the ‘DITCH’ agentbased model followed by a description of the experimental setup through which the data was generated. We then report analysis of the generated output using the techniques introduced in the previous section.
An overview of the DITCH agentbased model (Meyer et al. 2014)
We have used the DITCH (“Diversity and Interethnic marriage: Trust, Culture and Homopily”) agentbased model by Meyer et al. (2014, 2016) for our analysis. Written in NetLogo,^{2} the model simulates interethnic partnerships leading to crossethnic marriages reported in different cities of the UK and is evidencedriven.
Agents in the DITCH model are characterized by traits that influence their preferences for choosing suitable partner(s) over the course of a simulation run. The model assumes heterosexual partnerships/marriages within and across different ethnicities.

Gender {Male, Female}: Agents choose partners of opposite gender.

Age {18–35}: Preference based on a range with (default) mean 1.3 years and (default) standard deviation of 6.34 years.

Ethnicity (w, x, y, z): Agents have a preference for selecting partners of their own ethnicity or a different ethnicity.

Compatibility (score: 0–1): Agents prefer partners with a compatibility score that is closer to their own.

Education (levels: 0–4): Agents are assigned different levels of education, which influences their partner selection.
Environment Agents in the DITCH model are situated in a social space where they interact with each other and use their preexisting social connections to search for potential partners. The choice of a potential partner depends upon an agent’s aforementioned traits as well as other model parameters which we will discuss later on. Once a partnership is formed, agents then date each other to determine if the partner’s education and ethnicity satisfy their requirements. They continue dating for a specified period, after which they reveal their compatibility scores to each other; if the scores are within their preferred range, they become marriage partners. Once a marriage link is formed, agents remain in the network without searching for any more potential partners. There is no divorce or breakup of marriages in the model. The model runs on a monthly scale, i.e., a time step/tick corresponds to 1 month in the model.

ethproportion: Proportions of different ethnicities in the agent population.

numagents: Total number of agents. The population remains constant during simulation.

loveradar (values: 1, 2, 3): Defines the search range by an agent for a potential partner in its network as the ‘social distance’.

newlinkchance: Probability that two unconnected agents will form a new link during a simulation run.

meandating/sddating: Mean and standard deviation of an agent’s dating period (in years).

sdeducationpref: An agent’s tolerance for the difference in education level visàvis its romantic partner.
Experimental setup
 I.
Newham, London (Superdiverse^{3}): White: British (WB): 49.8%; Asian/Asian British: Indian (A/ABI): 17.9%; Asian/Asian British: Bangladeshi (A/ABB): 13.0%; Black/Black British: African (B/BBA): 19.3%.
 II.
Birmingham, W. Midlands (Cosmopolitan): WB: 75.53%; Asian/Asian British: Pakistani (A/ABP): 12.26%; A/ABI: 6.57%; Black/Black British: Caribbean (B/BBC): 5.64%.
 III.
Bradford, W. Yorkshire (Bifurcated): WB: 80.3%; White: Other (WO): 1.6%; A/ABI: 2.8%; A/ABP: 15.3%.
 IV.
Dover, Kent: WB: 98.17%; WO: 1.83%.
We conducted experiments using the BehaviorSpace tool in NetLogo, which allows exploring a model’s parameter space. The approach we used is also called “Parameters Tuning by repeated execution”, i.e., varying one input parameter at a time while keeping the remaining parameters unchanged (updatethreshold, secondchanceinterval) (Remondino and Correndo 2006).
The DITCH model generates several outputs and a complete description is reported by its developers in Meyer et al. (2014; 2016). In the analyses reported in this paper, we have focused on one output variable as the primary output: crossethnic, which is the percentage of crossethnic marriages in the system. The values taken for this variable were at the end of a simulation run (120 time steps; 10 years) and averaged over 10 replications per parameter configuration.
Given our resource constraints, we performed the experiments in two phases: In the first phase, we looked into the model’s sensitivity to scale (in terms of the number of agents) and the extent to which agents search their potential partners in the network (i.e., loveradar). In the second phase, we explored the model’s parameters specific to expanding agents’ social network and those related to agents’ compatibility with their potential partners.
PhaseI We first explored the model by varying two parameters with 10 repetitions for a total of 600 runs. All other parameters remained unchanged. Each simulation ran for 120 ticks (10 years).
Model parameters and their range of values that were explored in PhaseI of simulation experiments
Parameter  Values explored  Description 

numagents  1000, 2500, 5000, 7500, 10,000  The number of agents in the model 
loveradar  1, 2, 3  The diameter of an agent’s ego network through which potential partners are sought 
Simulation results and analyses
Here we present the results of the simulation experiments. For box plots and heat maps, we used R^{4} and its ggplot2 package. For regression/parameters importance analyses and for cluster analyses, we used R’s caret and cluster packages respectively. For classification trees, we used Weka3 software.^{5}
Results from simulation experiments (PhaseI)
Model parameters and their range of values that were explored in PhaseII of simulation experiments
Parameter  Values explored  Description 

loveradar  {1, 2, 3}  Represents the social distance with respect to an agent’s ego network through which potential partners are sought 
newlinkchance  {0.25, 0.5, 0.75}  The probability for an agent to form a new link at each time step (month) 
sdeducationpref  {1, 2, 3}  “Standard deviation of the normal distribution governing the agents’ preference for difference in education level (mean is always 0).”—Meyer et al. 
meandating  {1, 2, 3}  “Mean and standard deviation (in years) of the normal distribution governing the duration of the agents’ dating period.”—Meyer et al. 
sddating  {1, 2, 3} 
From an exploratory analysis of simulations from PhaseI, it is clear that the DITCH model is sensitive to the number of agents in the system. As the effect dampens when the agent population increases further on, we fix the number of agents to be 3000 for simulation experiments in PhaseII. In case of the loveradar, the observed nonlinear relation indicates that other model parameters that were kept fixed in PhaseI also contribute to the output. Thus, a further exploration and a deeper analysis of the four model parameters is presented next.
Results from simulation experiments (PhaseII)
In PhaseII, we fixed the agent population at 3000 and ran simulations across different values of the five other model parameters, as described in the previous section. Here we demonstrate the use of several predictive and data mining techniques that might be useful in exploring and analyzing outputs generated from agentbased models.
As Fig. 5 shows, increasing the value of loveradar parameter does result in increasing average crossethnic marriages in the DITCH model. Increasing chances of new links formation also contributes albeit less significantly. The variations observed in the box and whisker plots also suggest the role of other three parameters, which seem to play a role when the values of loveradar and newlinkchance are increased (see heat map in Figure S2 in Additional file 1: Appendix).
PRCC values of the top three contributing input parameters against each output variable
Output  Inputvariable1  Inputvariable2  Inputvariable3  Value of input1  Value of input2  Value of input3  

1  Crossethnic marriages overall  loveradar  newlinkchance  sddating  0.17161477  − 0.04572220  0.03247614 
2  Number of agents per ethnicity  loveradar  newlinkchance  sddating  − 0.00556323  0.00274108  0.00262957 
3  Married agents per ethnicity  loveradar  newlinkchance  sddating  0.28927404  0.11014048  − 0.47564036 
4  Crossethnic marriages per ethnicity  loveradar  newlinkchance  sddating  0.26181551  − 0.05297639  0.0114786 
As the constructed tree shows (Fig. 6), ethnic diversity (or the lack of) in the agent population was the strongest determinant of crossethnic marriages. Once again, loveradar was found to be the second most important determinant, especially, in situations where some ethnic diversity existed. When the value of the loveradar was set to 1 (i.e., only immediate neighbors in the social network were sought), it alone determined the percentage of crossethnic marriages; however, for higher values of the loveradar parameter (i.e., 2 and 3), the output was further influenced by newlinkchance and in other instances, the parameters related to agents’ dating in the simulation.
The silhouette analysis^{8} shown in Fig. 7 (right) also shows that the optimal value for k is around 3 or 4 in this case. Here the plot displays a measure of similarity between the instances in each cluster and thus provides a way to assess parameters like the number of optimal clusters (Rousseeuw 1987). The results from this analysis confirms that the optimal number of clusters should be around 4. Hence, we ran the Kmeans clustering algorithm for all the thirteen outputs; the centroids of the four Kmeans clusters are given in Table 3. The partitioning of data into the four clusters gives a good split across parameters explored.
Centroids of the four Kmeans clusters for all the thirteen output variables in the DITCH model based on the data generated through simulations in PhaseII (between_SS/total_SS = 87.1%)
Output variables in DITCH  Cluster1 (2431 instances)  Cluster 2 (2430 instances)  Cluster 3 (2429 instances)  Cluster 4 (2430 instances) 

numturtlesw  2265.52  1494.93  2410  2945.4 
numturtlesx  367.94  536.23  47.77  54.58 
numturtlesy  196.92  389.81  84  0 
numturtlesz  169.6  579  458.19  0 
marriedturtlesw  1043.43  672  1113.87  1377 
marriedturtlesx  150.3  225.5  17.29  18.93 
marriedturtlesy  77.22  160.33  30.58  0 
marriedturtlesz  65.88  245.82  189  0 
outpercentw (%)  6.63  13.13  5  0.6 
outpercentx (%)  29.19  26.73  57.37  47.54 
outpercenty (%)  38.68  29.8  50.95  0 
outpercentz (%)  42.45  25.28  24.68  0 
crossethnic (%)  12.75  19.78  9.49  1.25 
Within cluster sum of squares  3278.95  3981.93  5186.85  3899.76 
Conclusions and outlook
As agentbased models of social phenomenon become more complex, with many model parameters and endogenous processes, exploring and analyzing the generated data gets even more difficult. We need a whole suite of analyses to look into the data that such agentbased models generate, incorporating traditional or dynamic social network analysis, spatiotemporal analysis, machine learning or more recent ones such as deep learning algorithms. There is a growing number of social simulation researchers who are employing different data mining and machine learning techniques to explore agentbased simulations.
The techniques discussed in this paper are by no means exhaustive and the exploration of useful analysis techniques for complex agentbased simulations is an active area of research. Lee et al. (2015), for example, examined multiple approaches in understanding ABM outputs including both statistical and visualization techniques. The authors proposed methods to determine a minimum sample size followed by an exploration model parameters using sensitivity analysis. Finally, the authors discussed focused on the transient dynamics by using spatiotemporal methods to gain insights on how the model evolves over a time period.
In this paper, we propose a simple stepbystep approach to combine three different analysis techniques. For illustration, we selected an existing evidencedriven agentbased model by Meyer et al. (2014, 2016), called the ‘DITCH’ model. As a starting point, we recommend the use of exploratory data analysis (EDA) techniques for analyzing agentbased models. EDA provide simple, yet an effective set of techniques to analyze relationship between a model’s input and output variables. These techniques are useful to spot patterns and trends in a model’s output across varying input parameter(s) and to gain insight into the distribution of data that is generated. Sensitivity analysis (SA) techniques follow the exploratory space and are useful, e.g., to rank input parameters in terms of their contribution towards a particular model output. SA techniques are not only useful in identifying those parameters but also quantify the variability of the effect these input parameters may have on different model output variables. The application of datamining (DM) techniques to analyze agentbased social simulations is relatively new. While traditional techniques such as EDA or SA (or other statistical techniques) are useful, they may fail to fully capture a complex, multidimensional output that may result from agentbased simulations. DM can be useful in providing a better and holistic understanding of the role parameters and processes in generating such output.
Declarations
Authors’ contributions
HP, MA and SJA drafted the manuscript; SJA and MS designed the study; HP and MA generated the data; HP, MA, SJA and MS analyzed and interpreted the data. All authors read and approved the final manuscript.
Acknowledgements
We are thankful to the anonymous reviewers for their useful feedback and also to the reviewers of the Social Simulation 2017 conference where an earlier version of this paper was presented.
Consent for publication
Not applicable.
Ethical approval and consent to participate
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Funding
No funding received.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
 Arroyo J, Hassan S, Gutiérrez C, Pavón J (2010) Rethinking simulation: a methodological approach for the application of data mining in agentbased modelling. Comput Math Org Theory 16:416–435View ArticleGoogle Scholar
 Blower SM, Dowlatabadi H (1994) Sensitivity and uncertainty analysis of complex models of disease transmission: an HIV model as an example. Int Stat Rev Revue 62:229–243View ArticleMATHGoogle Scholar
 Broeke GT, van Voorn G, Ligtenberg A (2016) Which sensitivity analysis method should i use for my agentbased model? J Artif Soc Soc Simul 19:1. http://jasss.soc.surrey.ac.uk/19/1/5.html
 Edmonds B, Little C, LessardPhillips L, Fieldhouse E (2014) Analysing a complex agentbased model using datamining techniques. In: Social Simulation Conference 2014. http://ddd.uab.cat/record/125597
 Friedman J, Hastie J, Tibshirani R (2009) The elements of statistical learning, 2nd edn. Springer, New YorkMATHGoogle Scholar
 Hall M, Witten I, Frank E (2011) Data Mining: practical machine learning tools and techniques. Morgan Kaufmann, BurlingtonGoogle Scholar
 Janert PK (2010) Data analysis with open source tools: a handson guide for programmers and data scientists. Newton, O’ReillyGoogle Scholar
 Brownlee J (2016) Master Machine Learning Algorithms: discover how they work and implement them from scratch. https://machinelearningmastery.com/mastermachinelearningalgorithms/. Accessed 1 Dec 2017
 Lee JS, Filatova T, LigmannZielinska A, HassaniMahmooei B, Stonedahl F, Lorscheid I, Voinov A, Polhill JG, Sun Z, Parker DC (2015) The complexities of agentbased modeling output analysis. J Artif Soc Soc Simul 18:4. http://jasss.soc.surrey.ac.uk/18/4/4.html
 Meyer R, LessardPhillips L, Vasey H (2014) DITCH: a model of interethnic partnership formation. In: Social simulation conference 2014. http://fawlty.uab.cat/SSC2014/ESSA/socialsimulation2014_037.pdf
 Meyer R, LessardPhillips L, Vasey H (2016) DITCH—a model of interethnic partnership formation (version 2). CoMSES computational model library. https://www.openabm.org/model/4411/version/
 Morino S, Hoque IB, Ray CJ, Kirschner DE (2008) A methodology for performing global uncertainty and sensitivity analysis in systems biology. J Theor Biol 254(1):178–196MathSciNetView ArticleGoogle Scholar
 Remondino M, Correndo G (2006) MABS validation through repeated execution and data mining analysis. Int J Simul 7:6Google Scholar
 Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65View ArticleMATHGoogle Scholar
 Russell S, Norvig P (2009) Artificial intelligence: a modern approach. Pearson, Upper Saddle RiverMATHGoogle Scholar
 Seltman HJ (2012) Experimental design and analysis. Carnegie Mellon University, Pittsburgh, p 428Google Scholar
 Tukey JW (1977) Exploratory data analysis. Pearson, Upper Saddle RiverMATHGoogle Scholar
 VillaVialaneix N, SibertinBlanc C, Roggero P (2014) Statistical exploratory analysis of agentbased simulations in a social context. Case Stud Bus Ind Govern Stat 5:132–149Google Scholar