Shannon Entropy

Group Consensus Cluster Analysis

Since April 2022 a new feature of AHP-OS, Group Consensus Cluster Analysis is available. It can be reached from the AHP-OS main page.

The idea of the program is to cluster a group of decision makers into smaller subgroups with higher consensus. For each pair of decision makers the similarity of priorities is calculated, using Shannon alpha and beta entropy. The result is arranged in a similarity matrix and sorted into clusters of higher similarity based on a consensus threshold.

In order to use the program, you first need to load a priority json file, exported from the AHP-OS Group result menu, containing the priorities of all participants:

Group Result Menu – Export priorities using *Priorities (json).*

Once downloaded to your computer, you can import this file via the Group Consensus Menu:

Click on Browse… to select the file; then click Analyze.The result is structured in

Input data
Threshold table
Result for selected node and a
Similarity Matrix

Input Data

Project session code, selected node (default: pTot), number of categories, number of participants and scale are shown. pTot stands for the global priorities of a hierarchy.

Threshold Table

The program calculates the number of clusters and number of unclustered participants based on a similarity threshold in the range between 70% and 97.5% in steps of 2.5%. For each step the values are displayed in the threshold table.

Automatically the optimal threshold is determined.

In this case as 0.85 with 2 clusters and no unclustered members. If you want to change, for example the number of clusters to 3, you can enter 0.9 as new threshold in the AHP Group Consensus Menu manually.

Manual Threshold input field in the Group Consensus Menu

In the menu you also find a drop-down selection list for all nodes of the project. With Load new data another json file can be loaded.

Result for selected Node

First the AHP group consensus S* or relative homogeneity S for the whole group is shown, followed by the number of clusters. Next, for each cluster (subgroup) S* or S of the subgroup and the number of members in this cluster are displayed. Individual members are shown with a number and their name. The participants number corresponds to the number displayed on the project result page (Project Participants), so it is easy to select or deselect them by their number on the AHP-OS result page based on the result of the cluster analysis.

Similarity Matrix

The similarity matrix is a visualization of the clusters. Each cell (i,j) contains the AHP consensus S* or relative Homogeneity S for the pair of decision makers i and j in percent. Darker green color means higher values as show in the scale above the matrix. Clusters are always rectangles along the diagonal of the matrix, and are framed by borders.

As you can see in the figure above, the program found two clusters with members 1,3,6,7,10,11,12 respectively 2,4,5,8,9, and one unclustered member 13. In this example the group consensus without clustering is 52.4% (low), the consensus for subgroup 1 is 80.5% (high) and subgroup 2 80.7% (high). This means that within the group there are two individual parties in higher agreement. You can easily go back to the project’s group result page to analyze the consolidated priorities for each group by selecting the individual participants.

Once the number of participants exceeds 40, the similarity matrix is shown without values in order to better fit on the output page.

Example of the similarity matrix with 72 participants. You can clearly identify three clusters.

References

Goepel, K.D. (2022). Group Consensus Cluster Analysis using Shannon Alpha- and Beta Entropy. Submitted for publication. Preprint

Goepel, K.D. (2018). Implementation of an Online Software Tool for the Analytic Hierarchy Process (AHP-OS). International Journal of the Analytic Hierarchy Process, Vol. 10 Issue 3 2018, pp 469-487, https://doi.org/10.13033/ijahp.v10i3.590

AHP Group Consensus Indicator – how to understand and interpret?

BPMSG’s AHP excel template and AHP online software AHP-OS can be used for group decision making by asking several participants to give their inputs to a project in form of pairwise comparisons. Aggregation of individual judgments (AIJ) is done by calculating the geometric mean of the elements of all decision matrices using this consolidated decision matrix to derive the group priorities.

AHP consensus indicator

In [1] I proposed an AHP group consensus indicator to quantify the consensus of the group, i.e. to have an estimate of the agreement on the outcoming priorities between participants. This indicator ranges from 0% to 100%. Zero percent corresponds to no consensus at all, 100% to full consensus. This indicator is derived from the concept of diversity based on Shannon alpha and beta entropy, as described in [2]. It is a measure of homogeneity of priorities between the participants and can also be interpreted as a measure of overlap between priorities of the group members.

Continue reading AHP Group Consensus Indicator – how to understand and interpret?

BPMSG Diversity Online Calculator

If you need a quick calculation of diversity indices from your sample data, you might use my online diversity calculator here. Select the number of categories/classes (2 to 20) and input your samples data (positive integer or decimal numbers). As a result the following parameters and diversity indices will be calculated:

Richness
Berger-Parker Index
Shannon Entropy (nat)
Shannon number equivalent (true diversity of order 1)
Shannon Equitability
Simpson Dominance
Simpson Dominance (finite sample size)
True diversity of order 2
Gini-Simpson Index
Gini-Simpson Equitability

Welcome to BPMSG – May 2013

Concepts, Methods and Tools to manage Business Performance

Dear Friends, dear Visitors,

time for an update on my BPMSG welcome page! Being quite busy the last half year, I didn’t work so much on major articles or videos, but at least I tried to keep my site current with some regular updates.

Related to the analytical hierarchy process (AHP), you might find information about the consistency ratio (CR). CR is one of the most critical issue in the practical application of AHP, as it seems to be difficult for many decision makers to fulfill Saaty’s “ten-percent rule-of thumb”. The way out: either you accept higher ratios (up to 0.15 or even 0.2), modify the judgements in the pair-wise comparisons, or you use the balanced scale instead of the standard AHP 1 to 9 scale. All three can be done in my updated AHP template from Februar 2013.

As I received many requests to extend the number of participants to more than 10, here the detailed procedure, how you can do it by yourself. Extending the number of criteria beyond 10 is more complex and not recommended by me. If you actually have more than 10 criteria please try to group in sub-groups. At the moment I don’t have any planes to extend the number of criteria to more than ten.

I also started a new topic: Diversity. Triggered by some business related questions, I found out that the concept of diversity – as applied in ecology – is very universal, and can be applied in many business areas. You can watch my introduction as video:

Diversity as Business KPI – The Concept of Diversity, and
Diversity as Business KPI – Alpha and Beta Diversity.

I already applied the concept in several areas, and even developed a new consensus indicator for group decision making based on the partitioning of the Shannon entropy. A paper is submitted for the ISAHP conference in June, and after the event I will place a copy of the paper on my site for download.

For those of you, interested in the topic of diversity and the partitioning in alpha (within group) and beta (in-between group) components my free BPMSG Diversity Calculator could be a useful tool.

Now please enjoy your visit on the site and feel free to give me feedback –
it’s always appreciated.

Klaus D. Goepel,
Singapore, May 2013

Diversity as Business KPI – Alpha and Beta Diversity – Video

a-b-diversity The video explains partitioning of Shannon diversity into two independent components: alpha (within group) and beta (in between groups) diversity. It helps to understand beta diversity as a measure of variation between different samples of data distributions. Some practical applications in the field of business analysis are shown.

Enjoy watching!

Hoover Index, Theil Index and Shannon Entropy

Hoover index is one of the simplest inequality indices to measure the deviation from an ideal equal distribution. It can be interpreted as the maximum vertical deviation of the Lorenz curve from the 45 degree line.

Theil index is an inequality measure related to the Shannon entropy. It is often used to measure economic inequality.

Like the Shannon entropy, Theil index can be decomposed in two independent components, for example to descbribe inequality “within” and “in between” subgroups. Low Theil or Hoover index means low inequality, high values stand for a high deviation from an equal distribution.

With
E_i – Effect in group i, i = 1 to N
E_t – Total sum of effects in all N groups
A_i – Number of items in class i
A_t – Total number of items in all N groups

Theil Index:

Eq. 1a T_T = ln (A_t/E_t) – ∑[ E_i/E_t ln (A_i/E_i)]
Eq. 1b T_L = ln (A_t/E_t) – ∑[ A_i/A_t ln (E_i/A_i)]

Taking relative (proportional) variables
p_i = E_i/E_t
w_i = A_i/A_t we get

Eq. 2a T_T = ∑[ p_i ln (p_i/w_i)]
Eq. 2b T_L = ∑[ w_i ln (w_i/p_i)]

The symmetric Theil index T_s = ½ ( T_T + T_L) can be expressed as:

Eq. 3 T_s = ½ ∑[ (p_i –w_i) ln (p_i/w_i)]

Comparing the symmetric Theil index with the

Hoover index

Eq. 4 Hv = ½ ∑ |p_i – w_i|

we see that for the symmetric Theil index the difference (pi – w_i) is weighted with the logarithm of p_i/w_i.

The normalized Theil index ranges from 0 to 1:

Eq. 5 T_norm = 1 – e^–T

How does the Theil index relate to Shannon entropy?

For w_i = 1/N (same number of items in all groups) we get with Shannon entropy
H = – ∑ p_i ln p_i and true diversity D = exp (H):

Eq. 6a T_T = ln (N) – H
Eq. 6b T_Tnorm = 1 – D/N

and with
MLD = (1/N) ∑ ln (1/p_i)
(MLD = mean logarithmic deviation)

Eq. 7 T_L = MLD – ln (N)

For the symmetric Theil index:

Eq. 8 T_s = ½ (MLD – H)

The symmetric Theil index is simply half of the difference between mean log deviation and Shannon entropy.

Decomposition

The Theil index can be decomposed to find “within group” (w) and “between group” (b) components:

Eq. 9 T = T_w + T_b

For j subgroups (j = 1 to K) with individual Theil index T_j

Eq. 10a T_T = ∑ s_j T_Tj + ∑ s_j ln (s_j/w_j)
Eq. 10b T_L = ∑ w_j T_Lj + ∑ w_j ln (w_j/s_j)

s_j is the share of E in group j (E_j/E_tot); w_j the relative number of items in subgroup j (N_j/N_tot). The first term in (10) gives the “within group” component, the second the “between group” component.

Diversity as Business KPI – Alpha and Beta Diversity

cluster-analysis — Similarity analysis using beta diversity

The Concept of diversity is well introduced in Ecology, Economy and Information theory. The underlying mathematical theory relates to statistics (probabilities), multivariate analysis, cluster analysis etc. Diversity can be partitioned into two independent components: alpha and beta diversity. In the following the concept of alpha and beta diversity is explained, using a simple example of selling drinks in different sales areas. It helps to understand beta diversity as a measure of variation (similarity and overlap) between different samples of data distributions, and gives some practical applications in the field of business analysis.

Introduction

To understand the basic concept of diversity, you might watch my video here; it explains how diversity can be characterized using diversity indices – like the Simpson index – taking into account richness and evenness.

In general the concept of diversity can be formulated using the power mean. The Simpson index is based on the arithmetic mean, in the general concept of diversity it corresponds to a “true” diversity of order two.

Shannon Entropy

In the following we will use the Shannon diversity index H – in other applications also named Shannon entropy – which is based on the geometric mean, and the “true” diversity of order one. It uses the logarithm, and we will write it here with the natural logarithm

H = – ∑ p_i ln p_i.

For an equal distribution – all types in the data set are equally common – the Shannon entropy has the value of the natural logarithm of Richness H = ln(R), the more unequal the proportional abundances, the smaller the Shannon entropy. For only one type in the data set, Shannon entropy equals zero. Therefore high Shannon entropy stands for high, low Shannon entropy for low diversity.

Let us go back to our example of selling different drinks in a restaurant.

With seven types of drinks – each selling with 1/7 or 14% – the Shannon entropy equals ln (7) = 1.95

Selling only one type of drink, the Shannon entropy takes a value of zero, the natural logarithm of 1.

Now let us assume we manage a couple of restaurants in different locations, and we get a monthly summary report of total sales of the different type of drinks.

Comparison of samples

Does it mean we are selling all drinks evenly in all locations?

There are actually two possibilities.

1. The first one: yes, at each location we sell evenly all types of drinks.

High diversity – a Shannon entropy of 1.95 – in Boston, NY, Denver, Austin, etc., resulting in a high diversity of sales for the total sales area.

high beta diversity

2. What is the second possibility?

In Boston we are selling coffee only: low diversity with Shannon entropy of zero. Similar in NY; here we are selling tea only, low diversity with Shannon entropy of zero, but selling a different type of drink: tea instead of coffee! Similar in Denver with milk, Austin with coke, and so on.

low beta diversity

Looking at our total sales – it looks the same as in the first case – the total diversity is high, as overall we are selling all drinks equally.

Partitioning Diversity – Introducing Alpha- and Beta-Diversity

Diversity in the individual location is called alpha diversity. Our total sales report – the consolidation of all sales location gives us the gamma diversity, and the difference – gamma minus alpha diversity reflects the beta diversity.

Now I can also explain the reason, why we selected the Shannon entropy instead of the Simpson index: only for the Shannon entropy as a measure of diversity, the partitioning of the overall (gamma) diversity into two independent alpha and beta components follows the simple relation: H_α + H_β = H_γ

Beta Diversity – How to interpret?

As we have seen in our simple example:

In case one we find a high alpha diversity in each location, resulting in the same high consolidated gamma diversity taking all locations together. So the difference between alpha and gamma, i.e. the beta diversity, is zero – we have the same sales distribution and a total overlap in all locations.

In case two we find a low alpha diversity in each location, but a high consolidated gamma diversity taking all locations together: In this case the difference between alpha and gamma diversity, i.e. the beta diversity, is high – we have totally different sales distributions among the locations, selling only one, but a different type of drinks in each location – we got totally different distributions without overlap.

Beta diversity is a measure for similarity and overlap between samples of distributions. Partitioning diversity in alpha and beta diversity allows us to gain insight in the variation of distributions – relative abundances – across samples.

Diversity Calculation in Excel

Alpha, beta and gamma diversity can be calculated in a spreadsheet program. Read my post about my Excel template for diversity calculation.

Diversity Calculation in Excel – Diversity Indices and True Diversity

Diversity-Indices In my video “Diversity Index as Business KPI – The Concept of Diversity” I explain the mathematical concept of diversity introducing the Simpson Index λ and its complement (1-λ) as a measure of product diversification in markets.

Beside the Simpson Index there are many other indices used to describe diversity. I have developed a simple Diversity Excel template to calculate a couple of diversity indices for up to 20 categories. The following diversity indices are calculated:

Richness
Shannon entropy
Shannon equitability
Simpson dominance
Gini-Simpson Index
Berger-Parker Index
Hill numbers (“true diversity”) and Renyi entropy of order one to four

For a quick calculation of diversity indices you might also use my online calculator

For calculation of Shannon entropy and its partitioning into independent alpha and beta components see here.

Any feedback is welcome!