Earlier we discussed about Supervised Learning, where we tend to find the “right answer” based upon our given data sets( training examples) for the given problem. In Unsupervised Learning, however, the data sets are not mentioned with their characteristics. We, are not told what to do with your data (training) sets.
So, our problem is like- “See, here is data set can you find some structured pattern out of it?” Unlike in Supervised Learning, you won’t be given the task to find an “exact answer” per se, but you need to tell the predictable pattern for the given data sets. And, that is what Unsupervised Learning is all about.
We are given a data with no or same label. We need to deduce a pattern among the data, without knowing the actual property of the same. Given this in hand, with Unsupervised Learning we will divide the data into two simpler clusters. Basically what we are doing with Unsupervised Learning? We are taking a chunk of data sets which we don’t have any idea about and we need to find a pattern with regard to it, so we sub divide those data sets into small clusters to perform our algorithms. This, Unsupervised Learning Algorithm is also called as Clusters Algorithm.
Let’s see it via an example in order to surface this more clearly. Best example is Google news. If you haven’t seen it already then do visit google news.
You will find a sorted arranged pattern of all the latest news updates from various publishing platform at one place that is at Google news. So, you kind of see something like this on that page.
So, notice how carefully laid news from almost all publishing house are portrayed over here. As you can see that, the clusters of all the URLs of different e- news portal sharing the same news are mentioned at one place. Let’s take the very first news item here( where it states- “Canada’s air assault daunting mission against ISIS”) –
Now, if you click the news you will be directed to the URL of that news in this case it is this–
So, while clicking one URL you are directed to the main news covered by that e-publishing platform. While, if you see the thumbnails below the main featured news at Google news, you will find that they all represent one or the other e-news publishing platforms. With every thumbnail representing URL of that particular portal. Hence, basically, what Google is doing? It is taking one news( or related news) and clustering all the URLs of all e-publishing platform at one place. So, that people can read the single news in one or more related URLs.
As you can clearly see above, the Google News have a cluster of same( or related) news update for different URL ( so called e-publishing websites). Hence, that’s my friend, is the perfect example of Unsupervised Learning(Cluster) Algorithm. Every damn story for the same topic from different Google results are cluster together to make them accessible, this is a very efficient way to read stories, isn’t it?
Let’s take another example of Unsupervised Learning. While, this time we will deal with an example of Genetics per se.
Here, we have a Genes vs Individuals Graph(DNA micro array data) depicting the kind of Gene an individual have. So, all the colors ( green,grey, red and so on) depicts the presence(or absence) of a particular type of genes an individual have ( or do not have). So, our task is to find how much of the particular Genes that the particular Individual possesses and/or do not possesses?
What we can do is, we can group(or cluster) together the categories we can have for different types of people having different genes. So, this the example of Unsupervised Learning, as we are not telling the Genetics array that who are type-1 people and who are type -2 people and so on. We, just have clustered them with different types.
We are just stating that, we have a bunch of data, we don’t know what type of data is it, we don’t know the structure of it and we certainly don’t know what type of individuals are there in it. So, can you automatically generate a pattern for us or can you divide them into simpler clusters so that we can recognize a pattern for it? Since, we are not giving the algorithm , it’s “right answer” for the given data sets, hence this falls in Unsupervised Learning.
Well, Unsupervised Learning do finds it’s applications in various industries and is very diversified. Some further down the lane applications –
Organize Computing Clusters -It finds it’s use for very large chunk of data. From organizing the large cluster of data and providing which cluster belongs to which machine for that particular data.
Social Network Analysis – With social network analysis, wherein given a knowledge of all your Facebook friends the Unsupervised Learning Algorithm can identify which are the cohesive group of friends or people in general you need to follow. Remember here, we don’t know what is the “right answer” for the people you need to follow, but we deal with structured pattern which we can observe with your friend circle in Facebook ( or any other social networking site for that matter). –
Market Segmentation – Well certain large companies do have a big databases of their customers. So, by using Unsupervised Learning, we can segment them into different groups (clusters) of market segments in order to identify the needs customers care for. With that concept you can very well understand the demography of your customers and hence you can efficiently sell or market your business to them. Again, one might wonder how this is Unsupervised Learning? Well, here we know the data sets( customers) but we don’t know the behavior of customers and market segments in advance. We don’t know who is in market segment 1, who is in market segment-2 so on and so forth; in advance. So, we need to let the algorithm discover the type of market segmentation that these customers will be put onto.
Astronomical Data Analysis – Now this has come to a little surprise for us. Surprisingly, Unsupervised Learning are able to provide some good analysis for data related to astronomy. These cluster algorithms do give interesting output regarding the data of various astronomical and galactic data outputs while dealing with the studies of stars, asteroids and planets.
So, all this applications do find their gist from Cluster algorithm which is a subset of Unsupervised Learning. You can clearly imagine how vast and big the Unsupervised Learning is. Well, since we have dealt with clusters algorithm over here, we will find more ground breaking algorithms for Unsupervised Learning in the next post.
So, now the answer to the important question here- What is Unsupervised Learning?
Unsupervised Learning is the concept of Machine Learning where we tend to find a structure or pattern for the problem set without labeling the data set. Since, we don’t know what exactly the data is labeled for, we tend to find the pattern and not the right or wrong answer about the problem.
In Unsupervised Learning there is no potential signal for error and correctness of the solution of the problem. It is just the structure that we observe based on the unlabeled data sets.
Hope, you all find this post interesting. If yes! then do like it and share it with others, in order to spread the word. Make sure, you comment below in case of any query or suggestions and feedback. Wingshore is growing to it’s mission of making Machines to grow hand in hand with human civilization. We look forward for any new ideas you have related to Machine Learning and their algorithms.