A lot of you who have been working in the field of data-science might have used or at least heard of Tableau. It is a very powerful and fast-growing tool that is used in the field of Business Intelligence. In short, it makes data collected by engineers easier to consume and understand for the business team using visualization techniques. Data analysis is faster using these visualizations which usually are in the form of dashboards and worksheets. Non-technical users and industry-wide officials at any level can understand what is churned out by this BI software. But one thing that one must make note of is that you need data sources to create visualizations and in case you can’t decide where to look for data, here are some sources that can help you. The sources vary from paid to unpaid, and their usages may also vary from experimental, or research-based to market-research and more. We will be discussing data sources under a few different categories.
Extracting data via Web Scraping
You have the entire world wide web at your disposal. You can extract data from various types of websites and make that data available to you in different ways as well. In case the requirement is research-based or a one-time thing, or once a month requirement, this is something you can go for. You must have a grasp of at least one language that is used by web-scrapers- this is usually Python. You must also know the external packages that you will need to use since writing web scraping code from scratch in any language is next to impossible. Also, you will need to practice on different types of websites, ones with id/password, ones which redirect you to a separate page once you click on certain elements, ones with scaffolding and more. This is required since the techniques behind scraping data cleanly from different types of websites may be very different.
Use datasets available for free on the web
In case your sole objective is to test out how to create visualizations with data, or you are looking for some research-data that is already shared by some institute, you can check out one of the many websites that provide data that has been collected and used for various purposes in the past. Some of the popular open-source datasets are-
- Kaggle datasets
- UCI’s massive data archive
- Human genome diversity project
- Labelled faces in the wild.
These are just a few of the datasets available freely to researchers. Today millions of datasets are present all over the internet and links are much easier to find, thanks to many Github users who gather different sources and present them all together like this.
Usage of data generated by yourself or volunteers
In case you are fond of analyzing real-time data, or fresh data that has been generated by actual human beings, you can go for this option. All you need to do is collect data from devices that you use. You can also get the same data from other people that you know (with their permission of course). You can create visualizations on different types of such data-
- The mobile screen-on time usage.
- Daily step count.
- Top websites visited.
- Calories burnt daily.
- Sleep patterns (this data can be collected if you are using smart bands or smartwatches)
Social media or News related data
One of the biggest sources of data today is social media as well as news media. Social media websites like Twitter allow you developer access using which you can easily access certain data-points using APIs that are especially exposed for developers. However, these usually come with certain restrictions-
- Usually, you can hit the developer-APIs only a certain number of times in a specific period, which can range from a few hits per second to a few hundred per day.
- There are also restrictions on how you use the data. Most social media websites allow you to use the data only for research purposes and you cannot use it for any sort of commercial activities.
- You might need to explain what sort of usage you will be putting their data to before you are even permitted to access their APIs.
In case you do not want to go down this path, you can also write your own code to scrape data from social media websites.
When it comes to news-data, few may have the ability to share APIs with you, but the problem with them isn’t that. News-websites are mostly easy to scrape because data is divided into different boxes, each for a separate news-story. The problem is that each story contains massive amounts of text, along with images and videos- a lot of unstructured data. These websites also usually have a lot of advertisements, which you need to remove from the data even before you start making some sense of the rest of it.
Purchase data from DaaS Providers
In case a part of a company that requires certain data- which can be market research data or competitor data which then you want to feed into Tableau, one option for you is purchasing data from DaaS providers. You can go down this road in case you do not have a tech team, or your tech team isn’t large or mature enough to take care of your web-scraping needs. Our team at PromptCloud understands these issues faced by companies both big and small and hence came up with DataStock, where you can download clean and ready to use data-sets. It is essentially a web-store where you can get structured datasets which contain data from the websites of different industries. Industries covered include retail, eCommerce, healthcare, travel, and more. These data-sets are all scraped using automated code and are cleaned to make sure that those consuming it, face no issues. Getting your required data-sets from DataStock is pretty simple:
- You need to sign up on this link.
- You can view the different datasets, and check out details such as crawl-date, price, number of rows, format, etc. You can also download some sample data to get a bird’s eye view of the entire dataset.
- Once you are done, you can select the datasets that you will be requiring and then make the payment.
While Tableau is an excellent BI tool, but unless your data is clean and of good quality, and you have selected proper data-points, your data might project a wrong trend or may give you only half of the picture. This, in turn, will cause you to make the wrong business decisions. This is why you should spend time to understand the data that you need for your business workflow and talk to the proper team to get hold of it. Using DataStock would be a great choice in case you belong to an organization and would make your data-collection efforts significantly simpler.