Machine Learning (ML) is about understanding what really can be interpreted from data when datasets are very complex. So, its basic application is when datasets are more complex and larger than the usual techniques can handle.
What is Machine Learning?
Machine Learning is the field that tries to define algorithms that can understand and make sense out of large and complex datasets. A computer must execute the algorithms.
The general difference between the operation of Machine Learning and regression, for instance, is that ML have a particular set of algorithms that works for specific applications with specific types of data.
For example, if each record in a dataset is related to each other and all features predict the same way for all of them, regression works perfectly for obtaining results.
Machine learning, on the other hand, deals with datasets that are structured in a way that the correlation between the variables is different in each segment.
That means that Machine Learning can comprehend the different characteristics from the segments. Being those orders, products, classes, and information.
The classes of Machine Learning algorithms
There are four fundamental classes of machine learning algorithms, and they differ on overall application and what kind of data are available. They are:
- Classification: assigning records to pre-defined discrete groups.
- Clustering: splitting records into discrete groups based on similarity, withouht previous knowledge.
- Regression: predicting values of continuous and discrete variables.
- Associating learning: observing which values appear together frequently.
Within the classes there are two main different types, the supervised learning, and the unsupervised learning, which we’ll talk about next.
Supervised vs Unsupervised
The major difference between the two are whether we know the outcome data ahead of time.
In supervised learning the idea is that there is a way to define a correct answer, or label, in the training data to get the specific result you’re searching for.
For instance, the label may be applied by a person who knows the expected outcome or already exists in the dataset. So, this labeled data is used to train an algorithm using feedback.
Generally, what supervised learning tries to do is look at the labels of outcomes and figure out the underline’s characteristics of the input data and understanding how they lead to successful orders.
After that process it’s time to test the model on new data, seeking to predict the label.
On the other hand, unsupervised seeks to find previously unknown patterns in data without any label or guidance. The specific outcome isn’t predefined, but the algorithm will look for similarities on clusters of data.
On this case, the model doesn’t require any training, testing and validation, precisely because the correct answer is unknown.
Supervised Machine Learning workflow
The normal workflow for a supervised algorithm is taking raw data and place it some labels. Those labels can be characteristics of the operation or success and failures.
Then, those two things together to train the algorithm that will consolidate the model. Once that training is complete it can proceed to evaluation.
The evaluating phase consists in giving new data to the consolidated model to see how well the model do in practice. With the results it’s possible to define if the model is correct or not.
Finally, the model will make predictions on unlabeled data and decide whether the data corresponds to the results expected.
Unsupervised Machine Learning workflow
Here the procedures are a little different. It takes the raw data without label or specific parameter. Then it looks at how all the records or the clusters of data relate to each other.
Then from the algorithm we get to a model. In this case, the model will define itself boundaries around specific types of records to find successes and failures.
What the model will do is try to label the differences clusters of data to use it in production to predict possible outcomes.
Machine Learning is the technique to extract valuable information for clusters of data. Mostly when the dataset are larger and more complex than the usual models can handle.
Supervised learning uses outcomes variables known as labels to identify patterns and features related to the the variable. Unsupervised the outcome value is unknown, so it seeks for relationships among the input variables.
In conclusion machine is about making predictions or learn about new, unlabeled data.