Information gain calculation for decision tree when choosing root node

$\begingroup$

I want to know if my calculation is wrong or correct, because i got a different result when i use an online calculator.

Here is the dataset:

A;B;C;Class
y;y;y;group1
n;y;y;group2
y;n;n;group3
y;y;n;group3
n;y;n;group2
y;n;y;group1
n;n;n;group1

I am using ID3 to pick a root node. The difference appear on when i calculate the information gain when using attribute C to be the root node. my calculation on attribute A and B are aligned with the online calculator.

E(S) = -3/7 log2 3/7 - 2/7 log2 2/7 - 2/7 log2 2/7 = 1.557 (round to 3 decimal spaces)

when using attribute C as root

P(Y) = 4/7
P(N) = 3/7
P(Group1|Y) = 3/4
P(Group2|Y) = 1/4
P(Group3|Y) = 0/4
Entropy(Y) = -3/4 log2 3/4 - 1/4 log2 1/4 = 0.811
P(Group1|N) = 0/3
P(Group2|N) = 1/3
P(Group3|N) = 2/3
Entropy(N) = -1/3 log2 1/3 - 2/3 log2 2/3 = 0.918

G(C) = 1.557 - 4/7 x 0.811 - 3/7 x 0.918 = 0.700

However, when i use online calculator to check my calculation, the gain of G(C) is 0.306. I have double check my calculation, am not sure what's wrong. Could someone help to point out?

Thanks

$\endgroup$ 7 Reset to default

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

You Might Also Like