{"id":342,"date":"2023-08-13T03:35:14","date_gmt":"2023-08-13T02:35:14","guid":{"rendered":"https:\/\/datascihubs.com\/?p=342"},"modified":"2023-08-13T03:35:16","modified_gmt":"2023-08-13T02:35:16","slug":"decision-tree-a-complete-guide-with-example","status":"publish","type":"post","link":"https:\/\/datascihubs.com\/index.php\/2023\/08\/13\/decision-tree-a-complete-guide-with-example\/","title":{"rendered":"Decision Tree: A Complete Guide (with Example)"},"content":{"rendered":"\n<figure class=\"wp-block-image aligncenter size-full\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"1498\" height=\"992\" src=\"https:\/\/i0.wp.com\/datascihubs.com\/wp-content\/uploads\/2023\/08\/pexels-photo-6044629.jpeg?resize=1498%2C992&#038;ssl=1\" alt=\" \" class=\"wp-image-360\" srcset=\"https:\/\/i0.wp.com\/datascihubs.com\/wp-content\/uploads\/2023\/08\/pexels-photo-6044629.jpeg?w=1880&amp;ssl=1 1880w, https:\/\/i0.wp.com\/datascihubs.com\/wp-content\/uploads\/2023\/08\/pexels-photo-6044629.jpeg?resize=300%2C199&amp;ssl=1 300w, https:\/\/i0.wp.com\/datascihubs.com\/wp-content\/uploads\/2023\/08\/pexels-photo-6044629.jpeg?resize=1024%2C678&amp;ssl=1 1024w, https:\/\/i0.wp.com\/datascihubs.com\/wp-content\/uploads\/2023\/08\/pexels-photo-6044629.jpeg?resize=768%2C509&amp;ssl=1 768w, https:\/\/i0.wp.com\/datascihubs.com\/wp-content\/uploads\/2023\/08\/pexels-photo-6044629.jpeg?resize=1536%2C1017&amp;ssl=1 1536w, https:\/\/i0.wp.com\/datascihubs.com\/wp-content\/uploads\/2023\/08\/pexels-photo-6044629.jpeg?resize=1320%2C874&amp;ssl=1 1320w, https:\/\/i0.wp.com\/datascihubs.com\/wp-content\/uploads\/2023\/08\/pexels-photo-6044629.jpeg?resize=600%2C397&amp;ssl=1 600w\" sizes=\"auto, (max-width: 1498px) 100vw, 1498px\" \/><figcaption class=\"wp-element-caption\">Photo by Skylar Kang on <a href=\"https:\/\/www.pexels.com\/photo\/roots-of-plant-with-thin-twigs-6044629\/\" rel=\"nofollow\">Pexels.com<\/a><\/figcaption><\/figure>\n\n\n\n<p class=\"has-text-align-justify\">Decision tree is a versatile and widely used machine learning algorithm that&#8217;s primarily used for classification and regression tasks. It&#8217;s a tree-like structure where every possible choice is split into different branches. It is easy to understand and explainable, making them valuable for data-driven decision making. In this article, we will try to explain the underlying logic behind and see how decision tree make decisions. <\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Table of Content<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>What is Decision Tree<\/li>\n\n\n\n<li>The Structure of Decision Tree<\/li>\n\n\n\n<li>Entropy<\/li>\n\n\n\n<li>Information Gain<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">What is Decison Tree <\/h2>\n\n\n\n<p class=\"has-text-align-justify\">Decision tree is a supervised model that employs a tree-like structure to make predictions based on input data. It starts with a starting point called &#8216;root&#8217;, then branches out for every possible outcome. Each branching action is controlled by a conditional control statement. This technique serves classification and regression tasks, offering easily understandable models. Decision tree finds utility across various domains. It handles classification and regression challenges, using a tree-like structure to display predictions derived from feature-based divisions. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The Structure of Decision Tree<\/h2>\n\n\n\n<p class=\"has-text-align-justify\">Its structure includes a root node, branches, internal nodes, and terminal nodes, creating a hierarchical, tree-shaped arrangement. Here is a table and diagram for a quick review: <\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"576\" src=\"https:\/\/i0.wp.com\/datascihubs.com\/wp-content\/uploads\/2023\/08\/Decision-Tree-Structure-min-1-1024x576.png?resize=1024%2C576&#038;ssl=1\" alt=\"decision tree structure\" class=\"wp-image-372\" srcset=\"https:\/\/i0.wp.com\/datascihubs.com\/wp-content\/uploads\/2023\/08\/Decision-Tree-Structure-min-1.png?resize=1024%2C576&amp;ssl=1 1024w, https:\/\/i0.wp.com\/datascihubs.com\/wp-content\/uploads\/2023\/08\/Decision-Tree-Structure-min-1.png?resize=300%2C169&amp;ssl=1 300w, https:\/\/i0.wp.com\/datascihubs.com\/wp-content\/uploads\/2023\/08\/Decision-Tree-Structure-min-1.png?resize=768%2C432&amp;ssl=1 768w, https:\/\/i0.wp.com\/datascihubs.com\/wp-content\/uploads\/2023\/08\/Decision-Tree-Structure-min-1.png?resize=600%2C338&amp;ssl=1 600w, https:\/\/i0.wp.com\/datascihubs.com\/wp-content\/uploads\/2023\/08\/Decision-Tree-Structure-min-1.png?w=1280&amp;ssl=1 1280w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p class=\"has-text-align-justify\"><strong>Root Node<\/strong>: The root node is the initial point of decision tree. At this node, a feature is selected to make the first decision.<\/p>\n\n\n\n<p class=\"has-text-align-justify\"><strong>Internal Node<\/strong>: An internal node (also called decision node) is a decision point to further split the data based on specific feature conditions. <\/p>\n\n\n\n<p class=\"has-text-align-justify\"><strong>Terminal Node<\/strong>: A terminal node (also called leaf node) represents the decision tree&#8217;s endpoints, which are the possible outcomes. <\/p>\n\n\n\n<p>Let&#8217;s say we have a table:<\/p>\n\n\n\n<figure class=\"wp-block-table is-style-stripes\"><table class=\"has-fixed-layout\"><thead><tr><th>Person<\/th><th>Education Level<\/th><th>Income<\/th><th>Marital Status<\/th><th>Loan Approval<\/th><\/tr><\/thead><tbody><tr><td>A<\/td><td>Graduate <\/td><td>60000<\/td><td>Married<\/td><td>Denied<\/td><\/tr><tr><td>B<\/td><td>High School <\/td><td>30000<\/td><td>Single<\/td><td>Denied<\/td><\/tr><tr><td>C<\/td><td>Graduate <\/td><td>75000<\/td><td>Single<\/td><td>Approved<\/td><\/tr><tr><td>D<\/td><td>Graduate<\/td><td>70000<\/td><td>Married<\/td><td>Denied<\/td><\/tr><tr><td>E<\/td><td>Graduate<\/td><td>45000<\/td><td>Single<\/td><td>Denied<\/td><\/tr><tr><td>F<\/td><td>Postgraduate <\/td><td>80000<\/td><td>Married<\/td><td>Approved<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p class=\"has-text-align-justify\">In this table, we have data about individuals&#8217; income, education level, and marital status. The target variable is &#8220;Loan Approval,&#8221; which indicates whether a loan application was approved or denied.<\/p>\n\n\n\n<p>This is how it looks in the decision tree:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"576\" src=\"https:\/\/i0.wp.com\/datascihubs.com\/wp-content\/uploads\/2023\/08\/Decision-Tree-1.png?resize=1024%2C576\" alt=\"Decision Tree\" class=\"wp-image-368\" srcset=\"https:\/\/i0.wp.com\/datascihubs.com\/wp-content\/uploads\/2023\/08\/Decision-Tree-1.png?resize=1024%2C576&amp;ssl=1 1024w, https:\/\/i0.wp.com\/datascihubs.com\/wp-content\/uploads\/2023\/08\/Decision-Tree-1.png?resize=300%2C169&amp;ssl=1 300w, https:\/\/i0.wp.com\/datascihubs.com\/wp-content\/uploads\/2023\/08\/Decision-Tree-1.png?resize=768%2C432&amp;ssl=1 768w, https:\/\/i0.wp.com\/datascihubs.com\/wp-content\/uploads\/2023\/08\/Decision-Tree-1.png?resize=600%2C338&amp;ssl=1 600w, https:\/\/i0.wp.com\/datascihubs.com\/wp-content\/uploads\/2023\/08\/Decision-Tree-1.png?w=1280&amp;ssl=1 1280w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p class=\"has-text-align-justify\">As you might notice, the decision tree automatically denies those with &#8220;High School&#8221; education and approves those with &#8220;Postgraduate&#8221; education. This is because all records\/rows with &#8220;High School&#8221; or &#8220;Postgraduate&#8221; are consistently assigned to the same outcome. Therefore, there would be no further splitting along that branch of a decision tree. The decision tree recognizes that no additional information or distinctions can be gained from that specific feature since it uniformly leads to a single outcome. <\/p>\n\n\n\n<p>But here come the question:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>How does a decision tree decide which feature to be the root node? <\/li>\n<\/ol>\n\n\n\n<p>To understand how, we need to know two important metrics: <strong>Entropy <\/strong>and <strong>Information Gain<\/strong>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Entropy<\/h2>\n\n\n\n<p class=\"has-text-align-justify\">Entropy is a metric that represents the uncertainty in the subset after splitting. In other words, <strong>entropy evaluates the effectiveness of a split in a decision tree.<\/strong> It returns a number scaled between 0 and 1, the lower the value, the lesser uncertainty. Here is the formula to calculate the entropy:<\/p>\n\n\n\n<div class=\"wp-block-group is-layout-constrained wp-block-group-is-layout-constrained\">\n<div class=\"wp-block-columns are-vertically-aligned-center is-layout-flex wp-container-core-columns-is-layout-28f84493 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:100%\">\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-28f84493 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:100%\">\n<p class=\"has-text-align-center\">\\(E(X) = -\\sum_{i=1}^{n} p_i \\log_2(p_i)\\) <\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<ul class=\"wp-block-list\">\n<li>\\(n\\) = number of outcomes<\/li>\n\n\n\n<li>\\(p_i\\) = probability of outcome, \\(i\\)<\/li>\n<\/ul>\n\n\n\n<p>Consider the loan approval diagram above, the decision tree returns 3 subsets after splitting by education level. We want to know how many &#8220;Approved&#8221; and &#8220;Denied&#8221; are in each subset. Here is the overview:<\/p>\n\n\n\n<figure class=\"wp-block-table is-style-stripes\"><table class=\"has-fixed-layout\"><thead><tr><th>Subset<\/th><th>Approved<\/th><th>Denied<\/th><\/tr><\/thead><tbody><tr><td>High School<\/td><td>0<\/td><td>1<\/td><\/tr><tr><td>Graduate<\/td><td>1<\/td><td>3<\/td><\/tr><tr><td>Postgraduate<\/td><td>1<\/td><td>0<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>Let&#8217;s calculate the entropy of these subsets. Since there are two outcomes, then the entropy formula will be: <\/p>\n\n\n\n<p class=\"has-text-align-center\">\\(E(X) = -\\sum_{i=1}^{2} p_i \\log_2(p_i) = -p_1 \\cdot \\log_2(p_1) &#8211; (p_2) \\cdot \\log_2(p_2)\\)<\/p>\n\n\n\n<div class=\"wp-block-columns has-border-color is-layout-flex wp-container-core-columns-is-layout-b653aba5 wp-block-columns-is-layout-flex\" style=\"border-color:var(--theme-palette-color-4, #192a3d);border-radius:18px;padding-top:var(--wp--preset--spacing--50);padding-right:var(--wp--preset--spacing--50);padding-bottom:var(--wp--preset--spacing--50);padding-left:var(--wp--preset--spacing--50)\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p class=\"has-text-align-center\">\\(E(High School) = \\)<\/p>\n\n\n\n<p class=\"has-text-align-center\">\\(-(\\frac{0}{1}) \\cdot \\log_2(\\frac{0}{1}) &#8211; (\\frac{1}{1}) \\cdot \\log_2(\\frac{1}{1})\\)<\/p>\n\n\n\n<p class=\"has-text-align-center\">\\(= 0\\)<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p class=\"has-text-align-center\">\\(E(Graduate) = \\)<\/p>\n\n\n\n<p class=\"has-text-align-center\">\\(-(\\frac{1}{4}) \\cdot \\log_2(\\frac{1}{4}) &#8211; (\\frac{3}{4}) \\cdot \\log_2(\\frac{3}{4})\\)<\/p>\n\n\n\n<p class=\"has-text-align-center\">\\(= 0.8113\\)<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p class=\"has-text-align-center\">\\(E(Postgraduate) = \\)<\/p>\n\n\n\n<p class=\"has-text-align-center\">\\(-(\\frac{1}{1}) \\cdot \\log_2(\\frac{1}{1}) &#8211; (\\frac{0}{1}) \\cdot \\log_2(\\frac{0}{1})\\)<\/p>\n\n\n\n<p class=\"has-text-align-center\">\\(= 0\\)<\/p>\n<\/div>\n<\/div>\n\n\n\n<p class=\"has-text-align-justify\">Notice how the entropy represents the uncertainty? Since the &#8220;High School&#8221; and &#8220;Postgraduate&#8221; subsets only shows one specific outcome, there is no uncertainty. Thus, the result is 0. On the other hand, &#8220;Graduate&#8221; contains different fractions of both possible outcomes. Thus, uncertainty exists.  <\/p>\n\n\n\n<p class=\"has-text-align-justify\">Remember that the objective of entropy is to evaluate the effectiveness of a decision point. We can do that by creating a weighted average of the entropy values above. <\/p>\n\n\n\n<p class=\"has-text-align-center\">\\(\\text{Weighted Average} = (\\frac{1}{6}\\cdot 0) + (\\frac{4}{6} \\cdot 0.8113) + (\\frac{1}{6} \\cdot 0) = 0.541\\)<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Information Gain<\/h2>\n\n\n\n<p class=\"has-text-align-justify\">Information gain simply means the reduction of uncertainty. It is a metric to determine how much entropy could be reduced if we select a certain feature as decision point.    <\/p>\n\n\n\n<p class=\"has-text-align-center\">\\(Information Gain = E(S) &#8211; E(S|X)\\)<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>\\(E(S)\\) = the entropy value of entire dataset<\/li>\n\n\n\n<li>\\(E(S|X)\\) = the entropy value of a subset<\/li>\n<\/ul>\n\n\n\n<p>So the information gain when we split by education level is:<\/p>\n\n\n\n<p class=\"has-text-align-center\">\\(Information Gain = E(S) &#8211; E(S|X) = 0.918 &#8211; 0.541 = 0.377\\)<\/p>\n\n\n\n<p class=\"has-text-align-justify\">This means that if we split the dataset by education level, we could reduce the entropy value by 0.377. Therefore, to select the best feature as decision node, decision tree calculates the entropy for all features and selects the one with the highest information gain. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Summary<\/h2>\n\n\n\n<p class=\"has-text-align-justify\">A decision tree is an algorithm frequently used in classification and regression problems.  It is a tree-structure algorithm that contains 3 different types of nodes, which are the root node, internal node, and terminal node. To decide which feature to use as root, decision tree will do the following<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Select a feature<\/li>\n\n\n\n<li>Split the data with the selected feature and calculate the entropy<\/li>\n\n\n\n<li>Calculate the information gain with the entropy in step 2<\/li>\n\n\n\n<li>Repeat step 1 &#8211; 3 for other features<\/li>\n\n\n\n<li>Select the feature with highest information gain <\/li>\n<\/ol>\n\n\n\n<p class=\"has-text-align-justify\">Decision trees stand as an invaluable asset in the realm of data-driven decision-making. Their ability to unravel complexities, offer clear insights, and guide strategic choices makes them an essential tool for those seeking to navigate the ever-evolving landscape of information with confidence and precision<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Decision tree is a versatile and widely used machine learning algorithm that&#8217;s primarily used for classification and regression tasks. It&#8217;s a tree-like structure where every possible choice is split into different branches. It is easy to understand and explainable, making them valuable for data-driven decision making. In this article, we will try to explain the [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":522,"comment_status":"open","ping_status":"open","sticky":false,"template":"brizy-blank-template.php","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_uf_show_specific_survey":0,"_uf_disable_surveys":false,"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[1],"tags":[49,20],"class_list":["post-342","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-information","tag-decision-tree","tag-machine-learning"],"blocksy_meta":[],"aioseo_notices":[],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"https:\/\/i0.wp.com\/datascihubs.com\/wp-content\/uploads\/2023\/08\/Feature-Image-ML-min.png?fit=1280%2C720&ssl=1","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/datascihubs.com\/index.php\/wp-json\/wp\/v2\/posts\/342","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/datascihubs.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/datascihubs.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/datascihubs.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/datascihubs.com\/index.php\/wp-json\/wp\/v2\/comments?post=342"}],"version-history":[{"count":0,"href":"https:\/\/datascihubs.com\/index.php\/wp-json\/wp\/v2\/posts\/342\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/datascihubs.com\/index.php\/wp-json\/wp\/v2\/media\/522"}],"wp:attachment":[{"href":"https:\/\/datascihubs.com\/index.php\/wp-json\/wp\/v2\/media?parent=342"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/datascihubs.com\/index.php\/wp-json\/wp\/v2\/categories?post=342"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/datascihubs.com\/index.php\/wp-json\/wp\/v2\/tags?post=342"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}