Hoover Index, Theil Index and Shannon Entropy

Hoover index is one of the simplest inequality indices to measure the deviation from an ideal equal distribution. It can be interpreted as the maximum vertical deviation of the Lorenz curve from the 45 degree line.

Theil index is an inequality measure related to the Shannon entropy. It is often used to measure economic  inequality.

Like the Shannon entropy, Theil index can be decomposed in two independent components, for example to descbribe inequality “within” and “in between” subgroups. Low Theil or Hoover index means low inequality, high values stand for a high deviation from an equal distribution.

With
Ei – Effect in group i, i = 1 to N
E
t – Total sum of effects in all N groups
Ai – Number of items in class i
A
t – Total number of items in all N groups

Theil Index:

Eq. 1a     TT = ln (At/Et) – ∑[ Ei/Et ln (Ai/Ei)]
Eq. 1b     
TL = ln (At/Et) – ∑[ Ai/At ln (Ei/Ai)]

Taking relative (proportional) variables
pi = Ei/Et
wi = Ai/At
we get

Eq. 2a      TT = ∑[ pi ln (pi/wi)]
Eq. 2b      TL = ∑[ wi ln (wi/pi)
]

The symmetric Theil index Ts = ½ ( TT + TL) can be expressed as:

Eq. 3      Ts = ½ ∑[ (piwi) ln (pi/wi)]

Comparing the symmetric Theil index with the

Hoover index

Eq. 4      Hv = ½ ∑ |piwi|

we see that for the symmetric Theil index the difference (piwi) is weighted with the logarithm of pi/wi.

The normalized Theil index ranges from 0 to 1:

Eq. 5     Tnorm = 1 – eT

How does the Theil index relate to Shannon entropy?

For wi = 1/N (same number of items in all groups) we get with Shannon entropy
H = – ∑ pi ln pi and true diversity D = exp (H):

Eq. 6a      TT = ln (N) – H
Eq. 6b      TTnorm = 1 – D/N

and with
MLD = (1/N) ∑ ln (1/pi)
(MLD = mean logarithmic deviation)

Eq. 7      TL = MLD – ln (N)

For the symmetric Theil index:

Eq. 8     Ts = ½ (MLD – H)

The symmetric Theil index is simply half of the difference between mean log deviation and Shannon entropy.

Decomposition

The Theil index can be decomposed to find “within group” (w) and “between group” (b) components:

Eq. 9      T = Tw + Tb

For j subgroups (j = 1 to K) with individual Theil index Tj

Eq. 10a   TT = ∑ sj TTj +  ∑ sj ln (sj/wj)
Eq. 10b   TL = ∑ wj TLj + ∑ wj ln (wj/sj)

sj is the share of E in group j (Ej/Etot); wj the relative number of items in subgroup j (Nj/Ntot). The first term in (10) gives the “within group” component, the second the “between group” component.

Incoming search terms:

  • how to calculate theil index

Diversity Calculator Excel – BPMSG

The diversity calculator is an excel template that allows you to calculate alpha-, beta- and gamma diversity for a set samples (input data), and to analyze similarities between the samples based on partitioning diversity in alpha and beta diversity.

The template works under Windows OS and Excel 2010 (xlsx extension). No macros or links to external workbooks are necessary. The workbook consists of an input worksheet for a set of data samples, a calculation worksheet, where all necessary calculations are done, and a result worksheet “beta” displaying the results.

Applications

The template may be used to partition data distributions into alpha and beta diversity, it can be applied in many areas, for example

  • Bio diversity – local (alpha) and regional (beta) diversity
  • AHP group consensus – identify sub-goups of decision makers with similar priorities
  • Marketing – cluster analysis of similarities in markets
  • Business diversification over time periods
  • and many more.

Let me know your application! If you just need to calculate a set of diversity indices, you can use my online diversity calculator.

Calculations and results

Following data will be calculated and displayed:

div-templ-02

  • Shannon Entropy H (natural logarithm) alpha-, beta- and gamma, and corresponding Hill numbers (true diversity of order one) for all samples
  • Homogeneity measure
    1. Mac Arthur homogeneity indicator M
    2. Relative homogeneity S
    3. AHP group consensus S* (for AHP priority distributions)

div-templ-03

  • Table 1: Shannon alpha-entropy, Equitability, Simpson Dominance, Gini-Simpson index and Hill numbers for each data sample

div-templ-04

  • Table 2: Top 24 pairs of most similar samples
  • Page 2: Matrix of pairs of data samples
  • Diagram 1: Gini-Simpson index and Shannon Equitability
  • Diagram 2: Average proportional distribution for all classes/categories
  • Diagram 3: Proportional distribution sorted from largest to smallest proportion (relative abundance)

Limitations:

  • Maximum number of classes/categories: 20
  • Maximum number of samples: 24

Description of the template:  BPMSG-Diversity-Calc-v14-09-08.pdf

Other posts explaining the concept of diversity

Downloads

PLEASE READ before DOWNLOAD
The template is free, but I appreciate any donation helping me to maintain the website. Thank you!

BPMSG Diversity Calculator Excel Template Version 2020-07-05 (zip)

The work is licensed under the Creative Commons Attribution-Noncommercial 3.0 Singapore License. For terms of use please see our user agreement and privacy policy.

As this version is the first release, please feedback any bugs or problems you might encounter.

Incoming search terms:

  • diversity calculator
  • How to insert data on a BPMSG Shanon exceel to calculate diversity
  • plant diversity shannon index calculator
  • shannon diversity index calculator excel
  • how to calcultae beta diversity online
  • diversification analysis excel

Updated AHP Excel Template Version 08.02.13

An updated version of my AHP Excel template for multiple inputs is now available as version 08.02.13. Beside the extension from 8 to 10 criteria and from 7 to 20 participants some new features have been added. In the past it was sometimes difficult for participants to achieve a low consistency ratio. Now inconsistent comparisons in the input sheet will be highlighted, if the required consistency level is exceeded.  The level of consistency needed (“alpha” in the summary sheet) can also be changed from 0.1 (standard rule of thumb from Saaty) to higher values, for example 0.15 or 0.2. In addition another scale for the judgment can be chosen. For my projects I made good experience with the balanced scale.

A new feature is the consensus index. If you have more than 1 participant and do the group aggregation (select participant “0”), the consensus index is an indicator, how homogenous the judgment within the group was done. Zero percent means no consensus, all participants put their preference on different criteria;  100% means full consensus. Here the changes in detail:

Summary sheet

  • Number of criteria increased from 8 to 10
  • Number of participants increased from 7 to 20
  • Different scales added:
  1. Linear standard scale
  2. Log
  3. Sqrt
  4. InvLin
  5. Balanced
  6. Power
  7. Geom.
  • Alpha – allows to adjust consistency threshold (0.1 default)
  • Consensus indicator for group aggregation added
  • Geometric Consistency Index CGI added

Input sheets

  • Consistency ratio is calculated on each input sheet.
  • Priorities are calculated and shown based on RGMM (row geometric mean method)
  • Top three inconsistent pairwise comparisons highlighted (if CR>alpha)

Known Issues

Thanks to feedback from Rick, sometimes there seems to be a problem with the correct display of weights beside the criteria in the summary sheet. If you face this problem, unprotect sheet summary. Select weigths (O18:O27). Click “conditional formating”, “clear rules”,”clear rules from selected cells”. Then the values will be displayed correctly, and you can format them in the way you want. It is a strange effect; it only appears on one of my PCs, on the other it works fine. I uploaded a modified version, but not sure whether it works for everyone.

I appreciate any feedback! Please download the latest version from my AHP template download page.

AHP – High Consistency Ratio

Question: I know how AHP is working, but what I’m struggling with is, how to resolve the inconsistency (CR>0.1), when participants are done with their pairwise comparisons. It is time consuming if they go through the matrix and re-evaluate all their inputs. Do you have any suggestions?

Answer:  Yes, CR often is a problem. Also my projects show that, making the pair-wise comparisons, for many participant CR ends up to be higher than 0.1.  Based on a sample of nearly 100 respondents in different AHP projects, the median value of CR is 16%, i.e. only half of the participants achieve a CR below 16%  in my projects; 80-percentile is 36%. There seems also to be a tendency of increasing CR with the number of criteria, i.e. the median value significantly increases for more than 5 criteria.

From my experience, CR > 0.1 is not critical per se. I get reasonable weights for CR 0.15 or even higher (up to 0.3), depending on the number of criteria. The acceptance of a higher CR also depends on the kind of project (the specific decision problem), the out coming  priorities and the required accuracy (what is the actual impact on the result due to minor changes of criteria weights?).

In my latest AHP excel template and AHP online software AHP-OS the three most inconsistent judgments will be highlighted. The ideal judgment (resulting in lowest inconsistency) is shown. This will help participants to adjust their judgments on the scale to make the answers more consistent.

The first measure to keep inconsistencies low is to stick to the Magical Number Seven, Plus or Minus Two, i.e. keep the number of criteria in a range between 5 and 9 max. It has to do with the human limits on our capacity for processing information, originally published by George A. Miller in 1956, and taken-up by Saaty and Ozdemir  in a publication in 2003. Review your criteria selection, and try to cluster them in groups of 5 to 9, if you really need more.

Another possibility to improve consistency is to select the balanced-n scale instead of the standard AHP scale.  In my sample, changing from standard AHP scale to balanced scale decreases the median from 16% to 6%. You might select different scales in my template.

Conclusion

  • Try to keep the number of criteria between 5 or 7, never use more than 9.
  • Ask decision makers to adjust their judgments  in direction of the most consistent input during the pair-wise comparisons for the highlighted three most inconsistent comparisons. A slight adjustment of intensities 1 or 2 up or down can sometimes help.
  • Accept answers with CR > 10%, practically up to 20%, depending on the nature and objective of your project.
  • Do the eigenvector calculation with the balanced scale instead of the AHP scale, and compare resulting priorities and consistency. This does not require to redo the pairwise comparisons.

References

George A. Miller, The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information, The Psychological Review, 1956, vol. 63, pp. 81-97

Saaty, T.L. and Ozdemir, M.S. Why the Magic Number Seven Plus or Minus Two, Mathematical and Computer Modelling, 2003, vol. 38, pp. 233-244

Goepel, K.D., Comparison of Judgment Scales
of the Analytical Hierarchy Process - A New Approach, Preprint of an
article submitted for consideration in International Journal of
Information Technology and Decision Making © 2017 World Scientific
Publishing Company http://www.worldscientific.com/worldscinet/ijitdm
(2017)

Incoming search terms:

  • consistency ratio
  • ahp consistency ratio
  • consistency ratio ahp
  • in ahp cr <o 1
  • what if the consistency ratio
  • how to reduce consistency ratio

Diversity as Business KPI – Alpha and Beta Diversity

cluster-analysis
Similarity analysis using beta diversity

The Concept of diversity is well introduced in Ecology, Economy and Information theory. The underlying mathematical theory relates to statistics (probabilities), multivariate analysis, cluster analysis etc. Diversity can be partitioned into two independent components: alpha and beta diversity. In the following the concept of alpha and beta diversity is explained,  using a simple example of selling drinks in different sales areas. It helps to understand beta diversity as a measure of variation (similarity and overlap) between different samples of data distributions, and gives some practical applications in the field of business analysis.

Introduction

To understand the basic concept of diversity, you might watch my video here; it explains how diversity can be characterized using diversity indices – like the Simpson index – taking into account richness and evenness.

In general the concept of diversity can be formulated using the power mean. The Simpson index is based on the arithmetic mean, in the general concept of diversity it corresponds to a “true” diversity of order two.

Shannon Entropy

In the following we will use the Shannon diversity index H – in other applications also named Shannon entropy – which is based on the geometric mean, and the “true” diversity of order one. It uses the logarithm, and we will write it here with the natural logarithm

H = – ∑ pi ln pi.

For an equal distribution – all types in the data set are equally common – the Shannon entropy has the value of the natural logarithm of Richness H = ln(R), the more unequal the proportional abundances, the smaller the Shannon entropy. For only one type in the data set, Shannon entropy equals zero. Therefore high Shannon entropy stands for high, low Shannon entropy for low diversity.

Let us go back to our example of selling different drinks in a restaurant.

With seven types of drinks – each selling with 1/7 or 14% – the Shannon entropy equals ln (7) = 1.95

Selling only one type of drink, the Shannon entropy takes a value of zero, the natural logarithm of 1.

Now let us assume we manage a couple of restaurants in different locations, and we get a monthly summary report of total sales of the different type of drinks.

Comparison of samples

Does it mean we are selling all drinks evenly in all locations?

There are actually two possibilities.

1. The first one: yes, at each location we sell evenly all types of drinks.

High diversity – a Shannon entropy of 1.95 – in Boston, NY, Denver, Austin, etc., resulting in a high diversity of sales for the total sales area.

high beta diversity

2. What is the second possibility?

In Boston we are selling coffee only: low diversity with Shannon entropy of zero. Similar in NY; here we are selling tea only, low diversity with Shannon entropy of zero, but selling a different type of drink: tea instead of coffee! Similar in Denver with milk, Austin with coke, and so on.

low beta diversity

Looking at our total sales – it looks the same as in the first case – the total diversity is high, as overall we are selling all drinks equally.

Partitioning Diversity – Introducing Alpha- and Beta-Diversity

Diversity in the individual location is called alpha diversity. Our total sales report – the consolidation of all sales location gives us the gamma diversity, and the difference – gamma minus alpha diversity reflects the beta diversity.

Now I can also explain the reason, why we selected the Shannon entropy instead of the Simpson index: only for the Shannon entropy as a measure of diversity, the partitioning of the overall (gamma) diversity into two independent alpha and beta components follows the simple relation: Hα + Hβ = Hγ

Beta Diversity – How to interpret?

As we have seen in our simple example:

In case one we find a high alpha diversity in each location, resulting in the same high consolidated gamma diversity taking all locations together. So the difference between alpha and gamma, i.e. the beta diversity, is zero – we have the same sales distribution and a total overlap in all locations.

In case two we find a low alpha diversity in each location, but a high consolidated gamma diversity taking all locations together: In this case the difference between alpha and gamma diversity, i.e. the beta diversity, is high – we have totally different sales distributions among the locations, selling only one, but a different type of drinks in each location – we got totally different distributions without overlap.

Beta diversity is a measure for similarity and overlap between samples of distributions. Partitioning diversity in alpha and beta diversity allows us to gain insight in the variation of distributions – relative abundances – across samples.

Diversity Calculation in Excel

Alpha, beta and gamma diversity can be calculated in a spreadsheet program. Read my post about my Excel template for diversity calculation.

Incoming search terms:

  • berger parker index
  • berger parker formula

Diversity Calculation in Excel – Diversity Indices and True Diversity

Diversity-IndicesIn my video “Diversity Index as Business KPI – The Concept of Diversity” I explain the mathematical concept of diversity introducing the Simpson Index λ and its complement (1-λ) as a measure of product diversification in markets.

Beside the Simpson Index there are many other indices used to describe diversity. I have developed a simple Diversity Excel template to calculate a couple of diversity indices for up to 20 categories. The following diversity indices are calculated:

  • Richness
  • Shannon entropy
  • Shannon equitability
  • Simpson dominance
  • Gini-Simpson Index
  • Berger-Parker Index
  • Hill numbers (“true diversity”) and Renyi entropy of order one to four

For a quick calculation of diversity indices you might also use my online calculator

For calculation of Shannon entropy and its partitioning into independent alpha and beta components  see here.

Any feedback is welcome!

Incoming search terms:

  • hill numbers diversity
  • how to create simpson diversity graph in excel
  • calculate diversification excel
  • diversity dimensions excel
  • diversity template excel
  • how to calculate blau index

Welcome to BPMSG – Dec 2012

Dear Friends, dear Visitors,

yesterday I realized 10000 visits on my website since April 2012, when I implemented the Piwik web statistics. Over the last couple of months the daily visitor frequency was actually increasing, doubling within the last 3 months. On my youtube channel http://www.youtube.com/bpmsg I am now slowly reaching 100,000 video views.

So first to all of you a big thank you, showing interest in the topics of bpmsg.com, and especially to those of you,  giving me feedback, as I can learn and progress from there. For me it also means to stay committed and keep the content interesting and updated.

The topic with the highest interest is AHP – the analytic hierarchy process, and many of you downloaded my AHP excel template. Actually, here I would really like even more feedback about your applications, just to get an idea, in what other areas my template is used. Some of them, as I received, are:

  • Asset management prioritisation
  • BPMSG AHP template as a teaching tool
  • Weights of textual elements that affect difficulty of a given text
  • Environmental quality
  • Threads to biodiversity
  • Green supply chain

In my last update of the template  I improved the accuracy of calculation  significantly, so please always use the latest version, and revisit the site from time to time, to get the latest update. Alternatively you might subscribe to the bpmsg newsfeed; the link is given in the footer of the page.

My latest topic “Diversity index as business KPI – the concept of diversity” seems also to gain some interest. My video on youtube  got in a short time more viewers than the previous video about operational and strategic business performance. For me it was intersting to apply the diversity concept in business performance, as I haven’t seen this before, and the mathematical concept, to measure diversity of species in a habitat (biodiversity), is quite well established . I am thinking to publish a second video, showing more practical applications of the diversity concept in a business context.

After starting my youtube channel in 2009, I gained more and more experience in making videos. You can  clearly see the difference, comparing one of my older videos with the latest ones. Now my camcorder – a Canon XA10 – is with me most of the time on my business trips or vacations. Therefore you also find some video travel impressions on this web site under the topic “others”. My last trip was to the Philippines showing the nice island of Bohol, as well as one of the world’s largest crater lakes on a lake on an island – Lake Taal.

Klaus Goepel,
Singapore, Dec 2012

BPMSG stands for Business Performance Management Singapore. As of now, it is a non-commercial website, and information is shared for educational purposes. Please see licensing conditions and terms of use. Please give credit or a link to my site, if you use parts in your website or blog.

About the author

Updated AHP Excel Template Version 11.12.12

AHP IconDue to feedback from several users, I revised the implementation of the power method for the calculation of the Eigenvector and Eigenvalue to improve the accuracy of my AHP excel template. The calculation sheet ‘8×8 in the workbook was completely reworked. My tests show a significant increase in accuracy. As an example see my updated post AHP template – numerical accuracy.

By default the number of iterations is now set to 12.  The check value in sheet ‘8×8 cell B33 shows the sum of all matrix elements solving the Eigenvalue equation (AI*λ) x = 0 with A the Decision matrix, λ = estimated principal Eigenvalue and x = estimated Eigenvector. The ideal check value is zero. With the example numbers given in the template the result is 5E-08.

Please let me know, if  you find any problems in the new version.

For the download of the latest version please go to the AHP template download page .

Practical Experience with Canon XA-10

XA-10

Since a few months I use the Canon XA-10 camcorder to take the videos on this website, and when traveling. The video about the diversity index as business KPI, and the trip to the Philippines, visiting Bohol Island and Lake Taal, I shoot with the XA-10.

Overall I am quite o.k. with it, but a few details – like for example the tiny custom assignable buttons – are less satisfying. Now I am working on a review to share my experience. Stay tuned – and thanks for visiting.

Lake Taal -Philippines

Taal Lake is a freshwater lake in the province of Batangas, on the island of Luzon in the Philippines. The lake fills a large volcanic caldera formed by eruptions between 500,000 and 100,000 years ago. That crater lake is the world’s largest lake on an island in a lake on an island, and it in turn contains its own small island, Vulcan Point.

Enjoy Watching! If you like the video, leave a comment below.