Pearson standard normal table

Table 2, displays a significant dispersion between the expected performances ( ExpProfit) for the reduced set of scenarios and the full-space (no reduction), since the scenarios in the reduced set may be different for each cluster and strategy.

Such a behaviour confirms the undesirable properties in the reduced set of scenarios for the extreme threshold points and stresses the need for a metric that represents a balanced rate between number of clusters and network density. Particularly, the larger the threshold values, the more isolated scenario clusters are produced, whereas for small values a very densely connected network is obtained which compromises the definition of cluster centroids.

Analysing the obtained networks ( Figure 1), it is evident that the density is inversely proportional to the threshold value. After applying the Louvain algorithm for this problem, a total of 9, 13 and 31 communities were identified for each threshold respectively. These two steps are iterated until no further modularity improvement is possible ( Blondel et al., 2008). Essentially, Louvain is a two-step algorithm that maximises the modularity metric, in which for a given network, the first step assigns nodes into clusters only if that increases the modularity value, whereas the second step creates a new network where each node represents a cluster from the previous step. Then, the Louvain algorithm was used to detect the communities within each network. Using the Pearson correlation and three thresholds values ( 0.91 0.92 and 0.93) the adjacency matrices and the associated networks were constructed as described in section 2. Note that all the values are equal to 1 on the diagonal, because these are the correlations of the variables with themselves. Therefore, the values on one side of the diagonal can be omitted. Given that each variable has a correlation with every other variable, the values are repeated around the diagonal.

Also, profession only has a high correlation with income however, it will be seen that this correlation pair (income, profession) is important to the type of business. Table 6.1 also shows that cell phone usage has a significantly lower reliability (0.3) than the other variables and this could have repercussions on its correlation value with the remaining variables. Hence the initial conclusion is that cell phone usage doesn’t have a high correlation with any other variable, so it could be considered for exclusion from the input variable set. The lowest correlations are cell phone usage with income (0.25) and cell phone usage with profession (0.28). The two variables that have the highest correlations are profession with income (US $), with a correlation of 0.85, and age with income (US $), with a correlation of 0.81. Table 6.2 shows correlations between four business variables taken from Table 6.1. Statistics in a Nutshell: A Desktop Quick Reference, ch. The covariance indicates the grade of synchronization of the variance (or volatility) of the two variables.īoslaugh, Sarah and Paul Andrew Watters. In contrast with the correlation value, which must be between − 1 and 1, the covariance may assume any numerical value. For example, age would be categorized into ranges (or buckets) such as: 18 to 30, 31 to 40, and so on.Īs well as the correlation, the covariance of two variables is often calculated.

One method to calculate the correlation of a numerical variable with a categorical one is to convert the numerical variable into categories. However, a company may also want to calculate correlations between variables of different types. Correlations between two numerical variablesĪ correlation can be calculated between two numerical values (e.g., age and salary) or between two category values (e.g., type of product and profession).