Summary of C4.5 Algorithm: Data Mining

Do you need this or any other assignment done for you from scratch?
We have qualified writers to help you.
We assure you a quality paper that is 100% free from plagiarism and AI.
You can choose either format of your choice ( Apa, Mla, Havard, Chicago, or any other)

NB: We do not resell your papers. Upon ordering, we do an original paper exclusively for you.

NB: All your data is kept safe from the public.

Click Here To Order Now!

C4.5 algorithm is a decision tree with unlimited number of paths within the node. This algorithm can work only with discrete dependent attribute, that is why it can solve only classification tasks. C4.5 algorithm is considered to be one of the most famous and widely used algorithms of generating decision trees. It is necessary to follow the next demands for working with C4.5 algorism:

  1. Each record from set of data should be associated with one of the offered classes, it means that one of the attributes of the class should be considered as a class mark. It may be concluded that all the samples should belong to the same class, otherwise the mistakes are inevitable.
  2. Each class should be discrete. Each sample should belong to one of the classes.
  3. The number of classes should be much fewer from the number of samples in the considered scope of data.

One should understand that C4.5 algorithm works slowly with very large scale set of data.

Using the concept of information entropy, C4.5 builds the decision trees based on the set of data, like ID3 algorithm. Filestem.ext is the form for the files which are read and written within C4.5 algorithm (filestem is a file name, and ext is a file extension which is aimed at defining the file type). Working with the program, one is expected to have at least two files, the first one is with the file name and class definition and the second one is with the date which gathers the set of objects described by the value of the class attributes. Considering the structure of a decision tree based on C4.5 algorithm, it may be either a leaf, which is predicted to identify a class or a decision node with a number of branches and sub trees, which show the possible outcome of the trial (Quinlan 5).

There are two ways how this algorithm can generate decision trees, batch mode and iterative model. Batch mode (often called default mode) generates a single decision tree. This tree covers all the data available for the decision. Another kind of this algorithm, iterative mode, is based on the random basis. The set of data is selected randomly. Then, a decision tree is generated with adding some specific objects which have been misclassified.

The actions are repeated and the decision tree is continued until it is classified in a correct way or it is found out that there is no any progress. Keeping in mind that iterative model is based on the subset selected randomly, many trials may be used for generating decision trees based on the same data. Keeping in mind that there can be many different decision trees due to the multiple trials, the presence of the filestem.unpruned is necessary. This file is created with the purpose to collect the decision trees in the process. If the similar data is used for generating decision trees, the latest variant of the tree is used. The machine saves the best generated decision tree in the file filestem.tree.

Works Cited

Quinlan, John Ross, C4.5: programs for machine learning. Burlington: Morgan Kaufmann, 1993.

Do you need this or any other assignment done for you from scratch?
We have qualified writers to help you.
We assure you a quality paper that is 100% free from plagiarism and AI.
You can choose either format of your choice ( Apa, Mla, Havard, Chicago, or any other)

NB: We do not resell your papers. Upon ordering, we do an original paper exclusively for you.

NB: All your data is kept safe from the public.

Click Here To Order Now!