{"id":695,"date":"2023-08-23T05:11:35","date_gmt":"2023-08-22T21:11:35","guid":{"rendered":"https:\/\/datascihubs.com\/?p=695"},"modified":"2023-08-23T05:11:38","modified_gmt":"2023-08-22T21:11:38","slug":"how-to-handle-categorical-data-with-encoding-python","status":"publish","type":"post","link":"https:\/\/datascihubs.com\/index.php\/2023\/08\/23\/how-to-handle-categorical-data-with-encoding-python\/","title":{"rendered":"How to Handle Categorical Data with Encoding"},"content":{"rendered":"\n<figure class=\"wp-block-image aligncenter size-full\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"1498\" height=\"998\" src=\"https:\/\/i0.wp.com\/datascihubs.com\/wp-content\/uploads\/2023\/08\/pexels-photo-325185.jpeg?resize=1498%2C998&#038;ssl=1\" alt=\"Handle Categorical Data\" class=\"wp-image-743\" srcset=\"https:\/\/i0.wp.com\/datascihubs.com\/wp-content\/uploads\/2023\/08\/pexels-photo-325185.jpeg?w=1880&amp;ssl=1 1880w, https:\/\/i0.wp.com\/datascihubs.com\/wp-content\/uploads\/2023\/08\/pexels-photo-325185.jpeg?resize=300%2C200&amp;ssl=1 300w, https:\/\/i0.wp.com\/datascihubs.com\/wp-content\/uploads\/2023\/08\/pexels-photo-325185.jpeg?resize=1024%2C682&amp;ssl=1 1024w, https:\/\/i0.wp.com\/datascihubs.com\/wp-content\/uploads\/2023\/08\/pexels-photo-325185.jpeg?resize=768%2C512&amp;ssl=1 768w, https:\/\/i0.wp.com\/datascihubs.com\/wp-content\/uploads\/2023\/08\/pexels-photo-325185.jpeg?resize=1536%2C1024&amp;ssl=1 1536w, https:\/\/i0.wp.com\/datascihubs.com\/wp-content\/uploads\/2023\/08\/pexels-photo-325185.jpeg?resize=1320%2C880&amp;ssl=1 1320w, https:\/\/i0.wp.com\/datascihubs.com\/wp-content\/uploads\/2023\/08\/pexels-photo-325185.jpeg?resize=600%2C400&amp;ssl=1 600w\" sizes=\"auto, (max-width: 1498px) 100vw, 1498px\" \/><figcaption class=\"wp-element-caption\">Photo by Aleksandar Pasaric on <a href=\"https:\/\/www.pexels.com\/photo\/view-of-cityscape-325185\/\" rel=\"nofollow\">Pexels.com<\/a><\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">Introduction<\/h3>\n\n\n\n<p class=\"has-text-align-justify\">When we train a model, computers can struggle with categorical data because they prefer numbers. This is where encoding methods come into play. Think of encoding as assigning a special code to each group, which helps computers work better with and analyze the data. These methods act as translators, turning words into numbers that computers can handle effortlessly. Through encoding, we connect the world of human-readable words with the math-focused world of computers. This simplifies tricky data, reveals new understandings, and gives us the power to make smart choices based on the hidden insights in the data. In this article, we will discuss how to handle categorical data with encoding techniques.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Table of Content<\/h4>\n\n\n\n<div class=\"wp-block-aioseo-table-of-contents\"><ul><li><a href=\"#aioseo-label-encoding\">Label Encoding<\/a><\/li><li><a href=\"#aioseo-one-hot-encoding\">One-Hot Encoding<\/a><\/li><\/ul><\/div>\n\n\n\n<p>For demonstration, we will use the following table as an example.<\/p>\n\n\n\n<figure class=\"wp-block-table is-style-stripes\"><table class=\"has-fixed-layout\"><thead><tr><th>Fruit<\/th><th>Color<\/th><\/tr><\/thead><tbody><tr><td>Orange<\/td><td>Orange<\/td><\/tr><tr><td>Apple<\/td><td>Red<\/td><\/tr><tr><td>Banana<\/td><td>Yellow<\/td><\/tr><tr><td>Strawberry<\/td><td>Red<\/td><\/tr><tr><td>Blueberry<\/td><td>Blue<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"aioseo-label-encoding\">Label Encoding<\/h3>\n\n\n\n<p class=\"has-text-align-justify\">Let&#8217;s start with the first encoding technique: Label Encoding. This method gives numbers to different categories in a column of data. Consider the table above, label encoding will assign numbers to every color like <strong>{orange: 0, red: 1, yellow: 2, blue: 3}<\/strong>. This makes it simpler for our computer or model to grasp and work with the information. Here is an example in Python.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import pandas as pd\nfrom sklearn.preprocessing import LabelEncoder\n\n# Creating the dataset\ndata = {'Fruit': &#91;'Orange', 'Apple', 'Banana', 'Strawberry', 'Blueberry'],\n        'Color': &#91;'Orange', 'Red', 'Yellow', 'Red', 'Blue']}\n\ndf = pd.DataFrame(data)\n\n# Using Label Encoding\nlabel_encoder = LabelEncoder()\ndf&#91;'Color_LabelEncoded'] = label_encoder.fit_transform(df&#91;'Color'])\n\nprint(\"Label Encoding:\")\nprint(df&#91;&#91;'Color', 'Color_LabelEncoded']])<\/code><\/pre>\n\n\n\n<p><strong>Output<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-table is-style-stripes\"><table class=\"has-fixed-layout\"><thead><tr><th>Color<\/th><th>Color_LabelEncoded<\/th><\/tr><\/thead><tbody><tr><td>Orange<\/td><td>0<\/td><\/tr><tr><td>Red<\/td><td>1<\/td><\/tr><tr><td>Yellow<\/td><td>2<\/td><\/tr><tr><td>Red<\/td><td>1<\/td><\/tr><tr><td>Blue<\/td><td>3<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"aioseo-one-hot-encoding\">One-Hot Encoding<\/h3>\n\n\n\n<p class=\"has-text-align-justify\">One-Hot Encoding transforms each different category or label from a categorical variable into its own binary representation. This technique helps machine learning models better understand and work with categorical data by creating separate &#8220;flags&#8221; for each category, making it easier for algorithms to process and analyze.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import pandas as pd\nfrom sklearn.preprocessing import OneHotEncoder\n\n# Creating the dataset\ndata = {'Fruit': &#91;'Orange', 'Apple', 'Banana', 'Strawberry', 'Blueberry'],\n        'Color': &#91;'Orange', 'Red', 'Yellow', 'Red', 'Blue']}\n\ndf = pd.DataFrame(data)\n\n# Using One-Hot Encoding\nonehot_encoder = OneHotEncoder()\nonehot_encoded = onehot_encoder.fit_transform(df&#91;'Color'].values.reshape(-1, 1)).toarray()\ndf_onehot = pd.DataFrame(onehot_encoded, columns=&#91;f'Color_{color}' for color in df&#91;'Color'].unique()])\n\nprint(\"One-Hot Encoding:\")\nprint(df_onehot)\n<\/code><\/pre>\n\n\n\n<p><strong>Output<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-table is-style-stripes\"><table class=\"has-fixed-layout\"><thead><tr><th>Color_Orange<\/th><th>Color_Red<\/th><th>Color_Yellow<\/th><th>Color_Blue<\/th><\/tr><\/thead><tbody><tr><td>0<\/td><td>1<\/td><td>0<\/td><td>0<\/td><\/tr><tr><td>0<\/td><td>0<\/td><td>1<\/td><td>0<\/td><\/tr><tr><td>0<\/td><td>0<\/td><td>0<\/td><td>1<\/td><\/tr><tr><td>0<\/td><td>0<\/td><td>1<\/td><td>0<\/td><\/tr><tr><td>1<\/td><td>0<\/td><td>0<\/td><td>0<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p class=\"has-large-font-size\" style=\"margin-top:var(--wp--preset--spacing--30);margin-bottom:var(--wp--preset--spacing--30)\"><strong>Summary<\/strong><\/p>\n\n\n\n<p class=\"has-text-align-justify\">Label Encoding and One-Hot Encoding are methods to convert categorical data into a numerical format for machine learning. Label Encoding assigns unique integers to categories without considering an order, while One-Hot Encoding creates binary columns for each category, preserving their distinctiveness. Label Encoding is suitable for nominal data, while One-Hot Encoding works well for nominal and ordinal data. The choice depends on the data&#8217;s nature and the algorithm&#8217;s requirements.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduction When we train a model, computers can struggle with categorical data because they prefer numbers. This is where encoding methods come into play. Think of encoding as assigning a special code to each group, which helps computers work better with and analyze the data. These methods act as translators, turning words into numbers that [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":746,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_uf_show_specific_survey":0,"_uf_disable_surveys":false,"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[1],"tags":[28,57,66,47],"class_list":["post-695","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-information","tag-data-analysis","tag-data-preprocessing","tag-preprocess","tag-python"],"blocksy_meta":[],"aioseo_notices":[],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"https:\/\/i0.wp.com\/datascihubs.com\/wp-content\/uploads\/2023\/08\/Untitled-1.png?fit=1280%2C720&ssl=1","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/datascihubs.com\/index.php\/wp-json\/wp\/v2\/posts\/695","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/datascihubs.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/datascihubs.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/datascihubs.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/datascihubs.com\/index.php\/wp-json\/wp\/v2\/comments?post=695"}],"version-history":[{"count":10,"href":"https:\/\/datascihubs.com\/index.php\/wp-json\/wp\/v2\/posts\/695\/revisions"}],"predecessor-version":[{"id":747,"href":"https:\/\/datascihubs.com\/index.php\/wp-json\/wp\/v2\/posts\/695\/revisions\/747"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/datascihubs.com\/index.php\/wp-json\/wp\/v2\/media\/746"}],"wp:attachment":[{"href":"https:\/\/datascihubs.com\/index.php\/wp-json\/wp\/v2\/media?parent=695"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/datascihubs.com\/index.php\/wp-json\/wp\/v2\/categories?post=695"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/datascihubs.com\/index.php\/wp-json\/wp\/v2\/tags?post=695"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}