Exploratory Data Analysis


EDA

A Basic NLP Walkthrough

To be clear, this was my first project with NLP, but also with Deep Learning, but as I learned about these concepts, they seemed to go hand in hand. The first thing I needed to do was find a good dataset to work with. I ended up going with a News Category Dataset (“https://rishabhmisra.github.io/publications/”) I found on Kaggle, which is becoming a bit of a go-to for me. I love that I can plug in what type of work I want to do in their search bar, and be given recomendations on what datasets to use from there, but moving on. The dataset contains around 200k news headlines from the year 2012 to 2018 obtained from HuffPost. The idea being to train a model that could be used to identify tags for untracked news articles or to identify the type of language used in different news articles. To put it simply, I wanted to build a model that could take unlabeled news articles, and identify which catagory of article it belonged to. Each news headline has a corresponding category. Categories and corresponding article counts are as follows:


**Credit Card Fraud Detection with Decision Tree and Random Forest**

Context


Module 2: Final Project - Northwind Database

June 13, 2019


Exploratory Data Analysis .

There is so much more data in our world today. Pretty much every business collects data today in some form or another. Collecting data is not a new practice by any means. However, new data analysis and visualization programs allow for reaching even better understanding. Modern data analytics allow businesses to better understand their efficiency and performance, and will ultimately help the business make more informed decisions. One example could be analyzing consumer attributes in order to create targeted ads. Data analysis can be applied to nearly every aspect of business, as long as one understands the tools available. Exploratory Data Analysis – EDA – is crucial in understanding the what, why, and how of any problem. Here is the proper definition: exploratory data analysis is an approach to analyzing data sets by summarizing their main characteristics with visualizations. The EDA process is a crucial step prior to building a model in order to unravel various insights that later become important in developing a robust algorithmic model. Before you can really start exploring your data, you have to scrub/clean it first. These different processes actually go hand in hand, and you will likely jump back and forth between the two as you work through your data set. For My Mod1 project, the first thing I did was import my libraries that would help me import, open and work with the data. I actually imported a lot of libraries because I am building up to regressions, but the following should get you started: