Machine Learning for Improved Management

Overview of Project

The topic of machine learning and its role in the entire concept of managers’ work is essential today due to technological development. Various factories, manufacturers, and other businesses implement different innovations in their working processes to make them more accurate, fast, and productive (Choi, Jung & Noh 2015, p. 50). Nowadays, the profession of a manager is not as popular as it was several decades ago because machines and robots demonstrate better working results. The innovation outlined in the proposal will be examined at the meso-organizational level as the significant changes in managers’ occupations might be adopted by global companies that employ thousands of managers at the present moment. The method of using machine educational techniques is beneficial for many businesses that cannot afford to train their employees due to the possibility of a significant profit loss.

The problem of the proposed research is that some companies do not know whether it is better to invest their finances in informational technologies or it would be more advantageous to spend the same amount of money on their employees’ education. The purpose of the given study is to identify the role of information technologies and applications in modern production processes.

Theory and Hypothesis

The theoretical perspectives that will be used in the proposed study will discuss the question of information technologies’ impact on the management of professional activities and the world in general. The scholarly literature necessary for developing appropriate statements and thoughts for the given research will be taken only from reliable journals and books published and released by people who study the same topic for an extended period. Unfortunately, there are not many studies done in the area of machine learning in management as this approach to production processes is new, and still needs improvement (Jaiswal 2015, p. 74). One of the most important study questions is to determine whether managers might be deprived of their professional activities because of new technologies. The hypothesis is to evaluate whether new technologies and applications can replace people at the positions of managers.

Research Methodology

The primary research sampling will include approximately one thousand participants employed by an organization that requires particular management activities to be applied in the working process. The sample will be divided into two groups (five hundred people will work with applications, and other employees will be given recommendations by a professional manager). These people will be required to work in such conditions for three months. In the end, their productivity rates will be analyzed and compared with the help of such research instruments as questionnaires and observation.

Analysis of Results and Limitations

As it is mentioned above, the descriptive statistics data acquired during the research will be analyzed to support or refute its hypothesis. This method is appropriate as it is used by the majority of scholars with accurate study results and outcomes (Perles-Ribes et al. 2016, p. 693). However, some limitations might emerge due to the participants’ age differences. Some people might be more comfortable working with a manager as they are used to such a work organization.

Contributions and Outlets

Scholars who will do further researches in the sphere of machine learning might be interested in the results if this project. Various managers’ conferences would be the most appropriate outlet for sharing the study results. Also, professional journals that specialize in information technology might use the research data for one of their issues.

Citations/References and Exhibits

As it is mentioned above, only credible and scholarly sources will be cited in the study article. All of the references will be structured according to a certain citation style. Such exhibits as graphs and tables will help include particular statistical data.

Anticipated Challenges

Such challenges as gathering results and interviewing the study participants might be challenging due to a large sampling size. This issue will be addressed by a questionnaire that every member of the research will be asked to complete. According to the timeline, every employee will be required to work regular shifts during the first four weeks. The progress of both groups will be observed during the next four weeks. The participants will be asked to complete an interview to record precise results during the final month of the research.

Reference List

Choi, S, Jung, K & Noh, S 2015, ‘Virtual reality applications in manufacturing industries: past research, present findings, and future directions’, Concurrent Engineering, vol. 23, no. 1, pp. 40–63.

Jaiswal, S 2015, ‘Book review: common sense talent management: using strategic HR to improve company performance’, Vision: The Journal of Business Perspective, vol. 19, no. 1, pp. 73–75.

Perles-Ribes, J, Ramon-Rodriguez, A, Moreno-Izquierdo, L & Sevilla-Jeminez, M 2016, ‘Economic crises and market performance—a machine learning approach’, Tourism Economics, vol. 23, no. 3, pp. 692–696.

Concept Drifts and Machine Learning

Introduction

The purpose of this research paper is to conduct a review of concept drift with reference to machine learning. A concept is defined as a quantity that needs to be predicted where the concept is unstable and its changes over a certain period of time. Common types of concepts are weather patterns, customer preferences, temperature and behavioral changes. The underlying data distribution that is used in explaining concepts will also be subject to some changes as a result of the unstable nature of concepts. Such changes in the underlying data distribution cause the models built on old data to be inconsistent with the new concept’s data which will lead to the updating of the model. This creates a problem known as concept drift which complicates the task of learning the new model and the new data that makes up the concept (Tsymbal 1).

Machine learning in concept drifts involves the learning of a target that is shifting or data that has time changing data streams. It is also the learning of non-stationary environments that have unstable concepts to ensure that the approaches used in dealing with concept drift problems develop the final concept. Since the 1990s, various learning approaches have been developed and implemented to deal with the problem of concept drifts as these problems have become common in every concept. Such learning approaches include the AQ algorithm and the stagger concept which were developed in the 1990s to deal with the problem of concept drifts (Koronacki 23).

The discussion in this research paper will therefore be focused on the aspect of concept drifts and machine learning by examining these two concepts.

Machine Learning

Machine learning is a branch of artificial intelligence as it involves the use of cognitive science, probability theory, behavioral science and adaptive control disciplines to determine changing behaviors of certain concepts. The major focus of machine learning is to identify and learn complex behavioral patterns that precede concept changes so as to develop intelligent decisions that are based on data. Machine learning involves the use of human cognitive processes when performing data analysis and also collaborative approaches that exist between the machine and the user (Bishop 2).

There various types of machine learning that are used to determine the desired outcomes of algorithms. These include supervised learning, unsupervised learning, semi-supervised learning, transduction and reinforcement learning. Supervised learning is where machine learning converts inputs into outputs, unsupervised learning where the machine learning inputs are clustered, reinforcement learning where the machine algorithms are used to input observations and semi-supervised learning where the labeled and unlabeled examples of the target concept are used to generate an appropriate function. The other type of machine learning algorithm is referred to as transduction where the learning algorithm tries to predict new outputs that are based on the training inputs and outputs as well as the testing inputs (Bishop 3).

The main theory that is used in explaining machine learning is referred to as the computational learning theory where the learning theory is focused on the probabilistic performance bounds of the learning algorithm because the training sets are finite and uncertain in nature. This means that the computational learning theory will not provide any absolute results on the learning algorithms. Apart from performance algorithms, computational learning theory studies the complexities of time and the feasibility of machine learning given the unstable nature of concepts. Computations are usually considered as feasible concepts under the computational learning theory especially if they are studied under polynomial time (Yue et al 257).

There are various types of machine learning algorithms that exist which are used in machine learning activities. The most common types of machine learning algorithms include decision tree learning where the algorithm is examined through the use of decision trees that act as predictive models. Decision trees are used to draw up the observations of the algorithm to gain a general conclusion of the target item that is under consideration. The other machine-learning algorithm that is commonly used is referred to as the association rule-learning algorithm which involves discovering relationships and links between variables that exist in large databases. The neural network algorithm, which is a computational model, processes the information that exists in biological networks by using a connectivity approach to computation and simulation (Bishop 225).

Genetic programming is another learning algorithm that is used in machine learning. It deals with the determination of computer programs that can be used to perform user-defined tasks based on biological evolution. Genetic programming also deals with the specializing of genetic algorithms which enable the human user to become a computer program. Genetic programs are mostly used to optimize machine or computer programs by determining the program’s ability to perform a user-defined task. Bayesian networks are other commonly used machine learning algorithms and they are described as graphical models that involve the use of probabilities to represent random variables. This machine-learning algorithm is commonly used in determining the connection between the symptoms/signs of an illness and the illness itself. Once the symptoms have been determined, the Bayesian network can be used to determine the occurrence or presence of various diseases (Bishop 21).

Machine learning has a variety of uses in the modern and technological world. The most common applications include the development of processing activities for natural languages, the detection of credit card fraud, the development of syntactic pattern recognition technology and the medical diagnosis of various types of illnesses through the analysis of symptoms. Machine learning is also used in the analysis of the financial market as well as in the creation of brain and machine interfaces for radiographic equipment, in the classification of DNA sequences and properties, in software engineering processes and in the development of robot locomotion abilities. Machine learning is also used in structural health monitoring activities, bio-surveillance and also in speech and handwriting recognition (Mitchell 2).

Concept Drifts

Concept drifts as described before in the introduction section of the research paper are the problems that are caused by a change in the model that is used in examining the underlying data distribution of the concept. Concept drifts are also described as phenomenon that includes examples that might have legitimate labels at one time and illegitimate labels at another time. To explain this statement, Koronacki (26) uses the example of a cloud as a target concept where concept drifts occurs when the cloud changes its position, shape and size in the sky over a certain period of time. With regards to Bayesian decision theory, the transformations to the cloud equate to the changes that take place on the form of the prior target cloud (Koronacki 26). Concept drifts have become common occurrences in the real world especially when it comes to people’s changing preferences for products and services.

Concepts are subject to change over time which means that they are unstable in nature. Such changes in the underlying data distribution models make the task of learning especially machine learning more complicated. Learning also becomes difficult if there are changes in the hidden context of the target concept which leads to concept drifts. The problem of handling concept drifts usually arises when it comes to distinguishing between the true concept and noise. Some machine learning algorithms might overreact to noise, misinterpreting the noise to be a concept drift while other algorithms might react to noise by adjusting to the changes very slowly (Perner 236).

Most of the research that has been conducted on concept drifts has been theoretical in nature where assumptions have been drawn to determine the kinds of concept drifts that lead to the establishment of performance bounds. Researchers such as Helmbold and Long (Stanley 2) have established bounds that are based on the extent of the concept drift which can be tolerated by assuming a more permanent drift. The extent of a drift is defined as the probability of two successive concepts being irreconcilable in a random variable. Other researchers such as Freund and Mansour, Barve, Long and Bartlett established the necessary bounds in determining the rate of concept drifts by sampling the complexity of an algorithm to learn the structure of a repeating sequence of concept changes (Stanley 2).

There are various types of algorithms that are used to detect concept drifts and they have been divided into two categories which include the single learner based tracker that aims at selecting data that is relevant to learning the target concept otherwise referred to as the data combining approach and the ensemble approach to formulating and restructuring base learning. The data combining approach is described as a conventional way of dealing with concept drift problems through the use of time windows that are fixed over data streams. The time window uses the most recent data streams or batches that are used to construct the computational predictive model. The problem with this approach arises when a large size time window is unable to adapt quickly to the concept drift while a small size time window is unable to track a target concept that is stable or recurrent (Yeon et al 3).

The optimal size of the window in the data combining approach cannot therefore be set unless they type and degree of the concept drift has been determined in advance. Widmer and Kubat in their 1996 study of concept drift incorporated the use of the Window Adjustment Heuristic approach (WAH) in adaptively determining the size of the time window. Other researchers, Klinkenberg and Joachims proposed an algorithm in 2000 that would be used in tracking the concept drift through the use of a support vector machine (SVM) while the target concept was continuously changing. Such methods ensured that the size of the time window could be determined (Dries and Ruckert 235).

While the data combining approach is able to select a subset of past data that is related to the new information, it is unable to define the related data streams to the new information. This method is also unable to retain all or parts of the previous sets of data making it an inefficient approach to managing concept drift problems especially in machine learning. The ensemble approach on the other hand involves the use of an ensemble strategy that is used in learning changing environments. Ensemble approaches such as boosting; bagging and stacking have been known to produce more stable prediction models in static environments than the data combining approaches which incorporate single models (Yeon 4).

Ensemble approaches maintain a set of data descriptions and predictions that will be combined through the use of weighted voting so as to gain the most relevant description of the new data. The methods that have been used to conduct the weighted voting include STAGGER which maintains a set of concept descriptions that will be used to construct the best according to their relevance with the new data. Another method that is used in weighted voting is conceptual clustering where stable hidden contexts are identified by clustering instances of the new concept that are similar to the hidden context. When compared to data combining approaches, ensemble techniques have been more effective in determining concept drift problems than the data combining approaches and they are therefore more suitable in data streams and batches because they do not need to retain any previous data sets as with the data combining methods (Tsymbal 3).

Types of Concept Drift

The two most common types of concept drifts that might occur in the real world include sudden or instantaneous concept drifts such as when an individual graduates from an institution of higher learning to find himself or herself in a different environment that is full of monetary concerns and problems. Another example of a sudden concept drift is the changing preferences of consumers when they demand products or services that will meet their constantly changing needs. The other type of concept drift that exists in the real world is the gradual concept drift where a certain aspect changes over a gradual period of time such as car tires and factory equipment which might cause a gradual change in the production of outputs. Both the sudden and gradual concept drifts are referred to as real concept drifts (Tsymbal 2).

Other types of concept drifts include the virtual concept drift which is defined as the need to change the current model due to a change in the data distribution. The hidden changes that exist in a certain context might cause a change in the target concept which might in turn cause a change in the underlying data distribution of the concept. If the target concept was to remain the same, the underlying data distribution might change to reflect changes to the concept which might create a need to revise the current model that is used in explaining the concept. This creates a virtual concept drift that necessitates a change in the current model (Tsymbal 2).

The major difference between a virtual concept drift and a real concept drift is that virtual concepts might occur in cases of spam categorization while the real concept drifts might not be caused by spam categorizations. Virtual concept drifts ensure that the shifts in the concept have been properly represented in the current model that is used in explaining the underlying distribution data. Virtual concept drifts which are also known as sampling shifts help in determining the types of unwanted messages that remain the same over a long period of time (Tysmbal 2).

Detecting Changes in Concepts

To effectively deal with the problem of concept drifts, the changes that take place in concepts have to be suitably detected. The most common method that is used in detecting concept changes is information filtering where data streams are classified according to whether they are relevant or irrelevant to the target concept. The main purpose of information filtering is to reduce the information load presented to a user that might be of interest to them. Information filters are supposed to remove irrelevant information from the data streams to ensure that only the relevant information has been presented to the user. Because concepts are unstable and constantly changing, information filters that are used in unstable environments have to consider classification accuracy to ensure that the concept changes have been properly documented (Lanquillon and Renz 538).

Information filtering is an important approach in detecting the changes to a data stream of a concept drift because it classifies the problems of the drift that can be solved through the use of learning techniques such as the machine supervised learning techniques. The use of these techniques ensures that the learning of a given set of examples is possible and these examples once learned can be used to determine the new category of data streams. The use of machine supervised learning algorithms in dealing with classification problems has proved to be an important technique in detecting changes to data streams because it is based on important assumptions of the underlying data distribution where the old data is similar to the new data. The hidden context of the data streams changes as time continues to change and it also changes as new data on the concept continues to emerge. The supervised machine learning technique ensures that changes in data streams are suitable detected and the changes are adapted to suit the new data (Lanquillon and Renz 538).

Another method that can be used to detect the changes in a concept’s data stream is the Shewhart control chart which tests whether a single observation will detect any changes to the data stream. This approach assumes that the data streams have been divided into batches that are represented in chronological order. The value that is allocated to these batches is usually used to detect changes to the data streams by calculating each batch separately to determine whether any changes have taken place in the data stream. The Shewhart control chart ensures that changes can be detected by observing deviations in the data batches (Lanquillon and Renz 539).

Conclusion

This research paper has focused on the aspects of machine learning and also concept drifts. The concept of machine learning has been discussed with regards to the various types of machine learning processes, theoretical work on machine learning that exists as well as the applications of machine learning in various processes. Machine learning is commonly used in artificial intelligence activities as well as in the development of various types of technology that are used in the real world such as in the diagnosis of diseases. The discussion has also focused on the concept drifts by defining the term and also identifying the various types of methods that can be used in dealing concept drifts.

References

Bishop, Christopher, M. Pattern recognition and machine learning. New York: Springer Science, 2006. Print.

Dries, Anton and Ulrick Ruckert. Adaptive concept drift detection. n.d. Web.

Koronacki, Jacek. Advances in machine learning. Berlin, Germany: Springer Verlag, 2010. Print.

Lanquillon, Carsten and Ingrid Renz. Adaptive information filtering: detecting changes in text streams. Kansas, US: ACM Press, 2000. Print.

Mitchell, Tom. 2006. Web.

Perner, Petra. Machine learning and data mining in pattern recognition. Berlin, Germany: Springer Verlag, 2009. Print.

Stanley, Kenneth. Learning concept drift with a committee of decision trees. Austin, Texas: Department of Computer Sciences, 2010. Print.

Tsymbal, Alexey. 2004. Web.

Yeon, Kyupil, Moon Sup Song, Yongdai Kim, Hosik Choi and Cheolwoo Park. Model averaging via penalized regression for tracking concept drifts. 2010. Web.

Yue, Sun, Mao Guojun, and Liu Xu Liu Chunnian. Mining concept drifts from data streams based on multi-classifiers. Advanced Information Networking and Applications, Vol. 2, pp 257-263, 2007.

Machine Learning and Bagging Predictors

Bagging method improves the accuracy of the prediction by use of an aggregate predictor constructed from repeated bootstrap samples. According to Breiman, the aggregate predictor therefore is a better predictor than a single set predictor is (123). To obtain the aggregate predictor, , the replicate data sets, {L (B)}, are drawn from a distribution, L. The aggregate uses the average of the single predictors, ψ(x, L) to improve the accuracy of prediction especially for unstable procedures such as neural sets, regression trees and classification trees. However, bagging reduces the efficiency of stable procedures such as k-nearest neighbor method.

Bagging improves the accuracy when used with classification trees with moderate data sets such as heart and breast cancer data. In constructing the classification tree, the data set is randomly divided into test set, T and Learning set, L, which makes the classification tree, followed by the selection of the bootstrap sample, LB, using the original set, L, for pruning. This procedure is repeated fifty times to give tree classifiers and the errors of misclassification averaged to improve accuracy. For larger data sets, Statlog project, which groups classifiers by their average rank, increases the accuracy of prediction by decreasing the misclassification errors greatly. Bagging can also be used to improve the prediction accuracy of regression trees where a similar procedure is used to construct regression trees followed by averaging the errors generated by each repetition.

Bagging is effective in reducing the prediction errors when the single predictor, ψ(x, L) is highly variable. By use of numerical prediction, the mean square error of the aggregated predictor, ФA(x), is much lower than the mean square error averaged over the learning set, L. This means that bagging is effective in reducing the prediction errors. However, this scenario is only true for unstable data set. Another way to test the effectiveness of bagging in improving prediction accuracy is by classification. Classification predictors like the Bayes predictor give a near optimal correct-order prediction but aggregation improves its prediction to an optimal level. The learning set can also be used as test set to determine the effectiveness of bagging. The test set is randomly sampled from the same distribution from the original set, L. The optimal point of early stopping in neural sets is determined using the test set.

Bagging has some limitations when dealing with stable data as shown by linear regression involving variable selection. The linear regression predictor, is generated through forward entry of variables or through backward variable selection. In this case, small changes in the data causes significant change in the hence not a good subset predictor. Using simulated data, the most accurate predictor is found to be the one that predicts subset data most accurately. Bagging shows no substantial improvement when the subset predictor is near optimal. Linear regression is a stable procedure; however, the stability of this procedure decreases as the number of predictor variable used are reduced making the bagged predictors to produce a larger prediction error than the un-bagged predictors. This indicates an obvious limitation of bagging. For a stable procedure, bagging is not as accurate as with an unstable procedure. As the Residual Sum of Squares (m), which represents the prediction error decreases, instability increases to a point whereby the un-bagged predictor tends to be more accurate than bagged predictor .

Works Cited

Breiman, Leo. “Bagging Predictors.” Machine Learning 24 (1996): 123-140.

Developments in the Field of Machine Learning

From the period of our ancestors, man has been consistent in trying to improve the quality of his life (Burges 28). Such a direction has led to the development of basic tool forms; which were used by our ancestors for performing basic tasks such as digging and cutting (Burges 28). At this early stage, man was still utilizing a lot of physical energy while operating the early forms of machines that he had developed. With the discovery of other forms of energy like electricity and fuel energy, a stage was set for the development of a new generation of machines that required a very limited input of human energy (Burges 28). However, at this stage, there was still a wide scale monitoring of machine processes by man. To reduce the role of man in machine processes, a generation of automated machines was born. The main purpose of automation has been to reduce the role of man (mental and physical participation of man) in machine processes. A limitation of human participation in machine processes has been implemented through the incorporation of self monitoring mechanisms in machine systems. In this direction, there has been a need to develop machines that have human like aspects for the purposes of self learning and control. The field of “learning machines” has therefore been growing considerably in the past two decades. Such machines are capable of utilizing circumstances that they have encountered in the past to improve on their future efficiency (Burges 28). As it will be seen in this paper, such a system has numerous benefits that cannot be overlooked by machine designers. An important direction that has emerged in the design of learning machines is the aspect of multi-view learning. In multi-view learning, It is possible for a machine to view (understand) an input in a multi-dimensional manner (Burges 28). With the development of system management frameworks that utilize relational databases, some scientists have suggested the combination of multi-view learning with relational database for the purpose of increasing the capabilities of machine learning. In this literature review, a range of developments in the field of machine learning and multi-view learning in relational databases has been considered.

Machine learning can be understood as the process in which machines improve their capacities to function more effectively and efficiently in the future (Nisson 4). Such machines are therefore able to adjust their software programs and their general structure for the purposes of improving on their future performance (Winder 74). Such changes in the program and structure of machines are catalyzed by the environment in which the machines operate (Nisson 5). Machine learning is therefore an imitation of human intelligence where machines acquire some form of learning from their environment (Nisson 5). The environment of learning consists of a machine input, or a piece of information that a machine can respond to (Winder 74). Among the forms of learning that a machine can undergo includes a process of updating its database information dependant on the kind of inputs that it gets from its environment (Kroegel 16). The form of learning that has just been described above has inspired less interest from professionals in the machine learning field (Nisson 6). What is of more interest to professionals in machine learning are impressive learning processes such as it may occur when say a machine that is capable of recognizing the voice of some one is able to perform better; after recognizing repeated samples of speech from an individual. Therefore, we can think of machine learning as the process in which adjustments are implemented in the mechanism of machine actuators (Implementers of given instructions) that performs duties (Kroegel 16). Such a mechanism of a learning machine is usually embedded with a form of intelligence (Nisson 7). Examples of duties that are normally performed by intelligent machines include activities like the recognition of voices, sensing of parameters in the physical environment, predictive capacities among many others (Nisson 8).

Many benefits can be accrued from the process of machine learning (Blum 92). An obvious benefit that has an origin in machine learning includes a possible capacity for humans to comprehend how learning occurs in man; hence, finding an application by psychologists and educationalists among others (Blum 92). When it comes to the field of design and manufacture of machines, very important benefits can be accrued from machine learning (Blum 93). Any engineer that has specialized in machine design is aware of the challenge that he/she may face while trying to develop a concise relationship that maps inputs into predetermined outputs (Blum 94). Although we may therefore know about the outputs that we might get from a given sample of inputs, we may be unable to understand the function that will generate outputs for our system (Blum 94). One of the best ways that we can think of in solving the problem of understanding machine functions is to allow for a versatile system of “machine learning” to operate (Kroegel 20). By adopting the machine learning approach, we are able to design a machine with an inherent system that can approximate for some inputs for the purposes of giving us forms of outputs that are useful to us (Blum 95). Moreover, we may not be able to understand and therefore design for a complex web of interrelationships that generate machine outputs (Blum 96). Adopting machine learning helps in resolving the challenge of understanding complex interrelationships between outputs and inputs while generating expected outputs for us (Nisson 7). It is also true that the projected environment in which a machine will operate on cannot be fully understood by a machine designer at the stage of designing a machine. Indeed, an environment in which computer embedded machines operate in is bound to change considerably with time (Nisson 8). Since it is not possible go design for each and every change that will occur, developing learning machines is obviously an excellent approach to undertake (Nisson 8).

In the process of developing and improving on machine learning, “machine learning” engineers have adopted several approaches in obtaining sources of information on machine learning (Widrow 273). Among the important information sources that are applicable in machine learning include statistics

(Widrow 273). Among the challenges that have been encountered in the statistical approach is a difficulty in determining data samples that should be adopted due to non uniformity in probability distributions in data samples (ReRaed 630). Such a problem has been extended to make it impossible for a determination of an output that is governed by an unknown function; therefore, making it impossible to map some points to their new positions (ReRaed 630). Machine learning itself has been adopted as an approach in resolving the problems that are encountered while dealing with statistical sources (Nisson 7).

Another approach that has been adopted in machine learning includes the use of what are commonly referred to as brain models (Nisson 8). Here, use is made of elements that have a complex relationship that is non- linear (Dzeroski 8). The non-linear elements that are employed in machine learning reside in networks that approximate those (real) that are inherent in the brain of humans-neural networks (Dzeroski 8). On the other hand, in the adaptive control approach, an attempt is made to implement a process that has no clear elements; therefore, necessitating a need of estimating these unknown elements for the process to complete (Nisson 12). In adaptive control approach, an attempt is made to determine how a system will behave despite the presence of unknown elements in the system (Dzeroski 8). The presence of unknown elements in a system is mostly inherent from unpredictable parameters that keep changing their values in some systems (Bollinger and Duffie, 1988). Other approaches that have been used in the study of machine learning include the use of psychological models, evolutionary models and the use of what is commonly known as artificial intelligence (Nisson 12).

There are two kinds of environments in which the process of machine learning can occur (Dzeroski 6). The first environmental setting of machine learning is commonly referred to as supervised learning (Dzeroski 6). Here, there is at least some form of knowledge on the kind of outputs that we expect from a given source of inputs (Dzeroski 6). Such knowledge is obtained from an understanding of a function that governs a sample of values in a set that contains the data that we wish to train (Nisson 13). We therefore estimate that we can obtain a relationship that governs the sample that contains a set of values that we wish to train; consequently making the outputs of a given function true to the training set (Nisson 14). A simplified example of supervised learning includes a process such as curve-fitting (Nisson 14).

Another kind of environmental setting in which machine learning can occur is unsupervised learning (Muggleton 52). Here, unlike in supervised learning, we only have a sset of data samples that we wish to train, but we don’t have a function that will map the inputs in the available set to specific outputs in a way that we can determine (Muggleton 52). A challenge that is therefore commonly encountered while handling this kind of trainings sets is a difficulty in subdividing the set into smaller sets so that we can understand the outputs (Muggleton 52). Interestingly, this kind of a challenge forms part of the machine learning process (Muggleton 52). Therefore, the value that is obtained from a given function is related to a specific subset that takes in certain inputs (input vector) (Muggleton 52). Unsupervised learning has found a lot of applicability in forming classification systems whereby classified data is understood in a more useful way (Muggleton 52). As it is normal, there are many instances where both supervised and unsupervised learning systems exist in parallel (in machine learning systems) (Muggleton 52). In designing a learning system, it is normally appropriate to try and improve an existing function (Muggleton 52). This type of learning is normally referred to as speed-up learning (Nisson 14).

It will be useful to consider a number of important parameters that are commonly used in machine learning (Nisson 14). Among the parameters that are used in machine learning is Input Vectors (Input Sets) (Nisson 14). An input vector may contain input elements that are of different natures (Nisson 14). Among the types of inputs that may be found in an input vector include the following: real numbers, discrete values and categorical values (Nisson 14). An example of a categorical type of input is information on the sex of a given person (Nisson 14). Such information can be represented as either male or female. Therefore, a given individual can have a representative input vector that is of the following format: (Male, Tall, History) (Nisson 14).

Another important parameter that is useful in the study of machine learning is the output parameter (Nisson 15). In some instances, an output can take the form of real numbers (Nisson 15). However, in other cases, the output of a learning machine may take the form of categorical values (Nisson 15). Here, the resultant output from a learning machine is used to classify the value of its output to a given category (Nisson 15). Such an output is known as a categorizer; consequently, such an output may represent a label, a decision, a category or a class (Nisson 15). An output that is in a vector format may include both categorical values and numbers (Nisson 15).

Another parameter that one needs to understand in machine learning is training regimes (Nisson 15). Normally, Learning machines contain a trainable set of data (Nisson 15). The batch method is one among other possible approaches that can be employed in training the data set (Nisson 15). Here, all the elements in the set are applied in implementing a given function at the same time (Nisson 15). On the other hand, the incremental approach allows the operation of a given function on each member of the set separately. As a result, all the elements that are contained in the trainable set are iterated through a given function in a one at time arrangement (Nisson 15). The incremental learning process can occur in a predetermined sequence, or randomly (Nisson 16). In a common arrangement that is known as an online process, operations are performed on elements depending on their availability (Nisson 16). Therefore, operations are performed on the elements that have updated their availability (Nisson 16). Such a system of operation is especially applicable when a preceding process inputs an oncoming process (Nisson 16). As in any other machine process, a machine learning process can be influenced by Noise. In one type of noise, the function that operates on the trainable set is impacted (Nisson 17). On the other hand, there is another type of noise that impacts on the elements that are contained in the input vector. For an efficient system of machine learning, it is important to evaluate the effectiveness of an implemented learning process (Nisson 17). A common approach that is utilized in evaluating supervised learning is to use a special comparison set that is generated for the purpose comparison (Nisson 17). Here, the outputs of the comparison set are compared with the outputs of the learning set in order to evaluate how effective the learning process has been (Nisson 17). Moreover, it is important to appreciate that for any learning activity to occur, an element of some form of bias is necessary (Nisson 18). For example, in machine learning, we may decide to restrict our functions to a small set of values (Nisson 18). We may also decide to restrict our functions to quadratic functions for the purpose of achieving the results that we desire (Nisson 18).

Dietterich Thomas has described machine learning as a study of diverse approaches that are employed in computer programming for the purposes of learning (Thomas 7). The purpose of machine learning is therefore to solve special tasks that cannot be solved by normal computer software (Thomas 7). There are several examples of complex tasks that cannot be solved by normal computer software (Thomas 7). For example, there is a need to determine machine breakdown in factories through the employment of systems that scan sensor outputs (Thomas 7). A learning machine is therefore able to learn how recorded sensor inputs have related with machine breakdowns; therefore, creating an accurate system that can predict machine breakdowns before they occur (Thomas 7).

In another way, we know that as much as human beings have some inherent skills such as an ability to recognize unique voices, they cannot really understand the step of processes that they have followed in employing their skills (Thomas 8). Such a reality has limited the capability of humans to employ their skills on some unique situations in a consistent manner (Thomas 8). By giving a learning machine some examples of sample inputs and corresponding outputs, a learning machine can take over to give us a set of consistent results in unique circumstances (Thomas 8). Moreover, some parameters in the environment of a machine will keep changing in a non predictable manner such that it is only wise to employ machine learning in such environments (Thomas 8). Still, it has been desirable to tailor computer applications to the specific need of an individual for effective functioning; hence, drawing a need of machine learning in the process (Thomas 8). Examples of areas with characteristics that have been described above where machine learning has found an array of applicability include statistical analysis, mining and psychology among others (Thomas 8). For example, when performing data mining, what we normally try to do by the help of learning machines is to collect important sets of data that are useful to us (Thomas 8).

The process of learning can be grouped into two categories: empirical and analytical learning. The distinct difference in analytical learning and empirical learning is that while empirical learning relies on some form of inputs from the external environment, analytical learning is non reliant on the external environment (Thomas 9). At times it may not be easy to distinguish between empirical learning and analytical learning (Thomas 9). Take something like file compression for example (Thomas 9). As it can be seen, such a process would involve both empirical and analytical learning (Thomas 9). Normally, the process of compression involves the removal of data that is repetitive or irrelevant in a file (Thomas 9). Such information can be retrieved from a kind of a dictionary when it is required again (Thomas 9). Such a process can only occur by studying the how sets of file systems are organized hence a kind of empirical learning (Thomas 9). On the other hand, the process of compressing and recompressing files is inbuilt; and therefore it does not require information from the external an environment; hence, a type of analytical learning (Thomas 9).

Multi-view learning with relational Database

Today, most systems that are used in data storage employ the use of relational databases (Guo 5). Here, it is possible to store information that is interrelated through the use of foreign keys (Guo 5). A challenge that has been encountered is the difficulty n storing mining information in the format of relational databases (Guo 5). Such a challenge has mostly arisen with the nature of mining approaches that employ the use of single dimension data (Guo 5). Examples of such approaches include the use of neural networks (Guo 5). A difficult task that is therefore presented in this kind of arrangement is the tedious effort of converting the multi dimensional relations that are inherent in mining data into a one dimensional format (Guo 5). To overcome this challenge, a number of applications such as Polka [14] have been developed to map mining data into a single dimension (Guo 6). One setback that has arisen from these converting applications is the loss of relational information in mining data (Guo 6). A considerable amount of information is therefore lost even as a data baggage is created (Guo 6).

An important approach that is emerging in the resolution of mining data problems is multi-view learning (Perlich 167). The approach of multi-view learning has been useful in tackling a range of issues in our world (Guo 6). Consider a multi-view data such as [4, 11, 14, 21] (Guo 6). We may retrieve this kind of information in the above data such that: the retrieval of data information [4], the recognition of voice [11], signature identification [21] (Guo 6). It is therefore possible to ingrain the idea of relational database in multi-view learning. Here (Multi-view learning), it is possible to obtain a specific view that is desired depending on a set of unique features that is present is a training set, say [14] (Perlich 168). It is therefore possible to learn diverse concepts from each of the views present in multi-view data (Perlich 168). Following this process, all the concepts that have been learned are then combined to form the learning process (Perlich 168). To understand multi-view learning, consider a system that may be employed to group emails dependant on their contents and subject (Perlich 168). While one system will learn to classify emails depending on their subject, another system will learn to classify emails dependant on their content (Thomas 5). Finally, learned concepts of the content learner and the subject learner are combined to perform the final classification of emails (Thomas 5). Therefore, for a multi-view system with n views, It is possible to obtain n related relationships that can be employed in multi-relational learning (Perlich 168). For the application of multi-relational learning in mining, we are able to obtain some patterns from multidimensional relationships (Perlich 168). For each of the relationships, there is some specific information that is learned (Perlich 168).

Let us consider another kind of a problem where we need to identify whether a banking customer is a good or not (Guo 10). Form the bank’s database; we can obtain relational data about the customer (Guo 10). We can for example obtain relational data about the name of the customer from the client relation, credit card details from the account relation and so on; thus, determine whether the customer is good or not (Guo 10). An important thing to note here is that for each of the database relations, there are diverse views on the customer on whether he/she is good or not; therefore, contributing to the final concept of information that will be learned about the customer (Guo 10).

In multi-dimensional learning, a set of instructions in the form of multi-view classification (MVC) is employed (Guo 10). The purpose of multi-view classification is to use the framework of multi-view learning so as go carry out processes on the data of multi-relation database in data mining (Guo 10). The description of the multi-relational process can be understood as follows. In the relational database, there are identifiers for each of the characteristic found therein (Guo 10). These characteristics are linked to other dependent characteristics by the use of foreign keys (Guo 10). Once the above process has completed, the second stage involves attributing specific functions to each of the characteristic that has been linked to a specific identifier through a foreign key linkage (Guo 10). Such a direction is helpful in handling each of the many inter dependent characteristics that are present in concerned data (Guo 10). The next stage involves using each of the foreign assigned characteristics as an input to a unique multi-view learner (Guo 10). In the next process, normal data mining approaches are applied such that they obtain each of the intended concepts available from each of the present data views (Guo 10). The above process precedes the final stage where learners used in the development of a useful model that contains the needed information (Guo 10). Therefore, a MVC method that works in a framework of multi-view learning can be used to incorporate normal mining data in a relational-database (Guo 10).

Having described the above process in brief, let us consider important concepts that have been employed in the above process (Pierce 1). It would be useful to start by understanding relational databases (Pierce 1). In a relational database arrangement, there are sets of various tables represented as follows [T1, T2, T3……](Pierce 1). There is also a set that represents interrelations between the tables (Pierce 1). In each set, like in a normal database management system (Pierce 1), each table has at least one unique key called the primary key (Guo 16). This primary key represents a unique attribute that is common to all elements in a given table (Pierce 3). Other attributes that can be found in a table apart from those that have been underlain in a primary key include descriptive attributes and foreign attributes (Pierce 3). Foreign attributes are used to link table elements to attributes that are present in other tables (Nisson 23). Tables in a relational database are therefore linked by the aid of foreign keys (Pierce 3).

Having understood relational databases, let us now move on to understand the process of relational classification (Pierce 4). An important approach that has been employed in machine learning is the classification of activities for the purpose of effecting targeted learning (Pierce 4). For example, consider a situation in which we intend to obtain a unique relation (U) in a given database (Pierce 4). Let us also say our unique relation (U) has also a unique variable (Y) (Pierce 4). Here, the purpose of implementing relational classification would be to obtain a function F that would give an output from each of the elements in a given table (Pierce 4). The relationship that has be described above can be represented in the function below:

F = Ptarget.key + Y + A(Pk) Akey(Pk)……………………………………………..(i)

Akey(Pk) are the key elements of table Pk.

We can therefore go ahead and analize the process of relational classification as it has been described above (Pierce 4). The figures below (In figure 1) represent table interrelationships and can help us to understand the process of relational classification (Guo 17). Looking the target table called loan table, attributes therein include account-id, status, among others. The important row in this table that will be targeted is loan-id (Guo 17). The intended concept for learning is the status. We can see that the target table has been linked to other foreign tables including the order table (Guo 17). It is from the order table that we wish to create training views (Guo 18).

Table Relationships
Figure 1: Table Relationships

Looking at how the arrow has been directed between the target table and the order table below, we can see that the account-id element has been linked through a foreign key (Quinlan 19). Each of the elements that are linked to the target table through the account-id will therefore consist of the loan-id (it is a primary key and therefore inherent in all fields in a table) and the status element (Intended concept of learning) (Quinlan 19). In addition to the above fields, the account-id would also consist of all the other fields (account-id, to-bank, to-account, amount and type) that are present in the order table with the exception of the order-id field (Quinlan 19). In the algorithm of SQL, performing the operations that have been mentioned above would consist of the following (Quinlan 19). One would be required to create a table object with the mentioned parameters from the loan table and the order table (Quinlan 19). The determining condition for the creation of the objects would be limited to situation where the account-id from the order table is equivalent to the corresponding account-id from the loan table (Quinlan 19).

Multi-View Learning: Single Direction
Figure 2: Multi-View Learning: Single Direction

Let us now describe the process of multi-view learning again for a better understanding. For each of the learners that are present in a multi-view learning environment, the learner is given a group of data for the training purpose. To understand how this arrangement applies to multi-relational classification, we need to consider an intended concept for learning; which is contained in the targeted table (Quinlan 19). The first thing that occurs in a multi-view environment is the relay of the intended learning concept to all other relations that have been linked to the targeted table through foreign keys (Quinlan 19). All the elements that are required by the implementing function from the target table will also be transferred to the other tables that have been linked to the target table through the aid of foreign keys (Quinlan 19).

Multi-View Learning: Bi-directional
Figure 3: Multi-View Learning: Bi-directional

Let us now consider another kind of a situation in the tale above (Pierce 10). The target table remains as the Loan table (Pierce 10). On the other hand, the intended training data is obtained in a different way from the previous example (Pierce 10). From the client table, intended data for training is no directional foreign keys linking it with the target table (Pierce 10). What the client table has done is to link with the disposition table through the element of client-id (Pierce 10). On the other hand, the disposition table has linked to the target table through the aid of the account-id element (Pierce 10). With multi-view relationship in mind, this arrangement can be described as follows (Pierce 10). Basically, elements with client-id from the client table have been connected to elements with client-id in the disposition table (Pierce 10). On the other scenario, elements with elements with account-id in the disposition table have been linked with their counterparts with the same id in the target table (Pierce 10). Intended data for training will therefore consist of two elements (birthday and gender) from the client table in the first place (Pierce 10). The other two elements (loan-id and status) are obtained from the target table (Pierce 10). In an SQL algorithm the above process can be implemented in the following way. First, we create an object of four elements from the client, disposition and client tables (Pierce 10). We then set two preconditions that will act as a threshold in the formation of the table object (Pierce 10). First the account-id has from the disposition table needs to be equivalent to its corresponding account-id in the target table (Pierce 10). Likewise, the client-id from the client table needs to correspond to its counterpart in the disposition table (Pierce 10).

In more difficult data applications such as those that are encountered in data mining, there is a complex web of interrelationships between tables (Data) (Rijsbergen 42). Some of these relationships can be broken down to one table with many arrows linking to it, while others can be broken down to many interconnecting links (Rijsbergen 42). What we have examined above is a simple case of many connections linking to one table through the aid of the primary key (Rijsbergen 42). It is possible to obtain a range of outputs from this kind of interconnection (Rijsbergen 42). A difficulty is therefore presented in identifying the correct output (Rijsbergen 42). An approach that has been undertaken to resolve the above challenge is to employ the use of aggregation functions in a MVC setting (Rijsbergen 42). What an aggregation function does is to unify all related outputs into a single output. Therefore, an aggregation function acts like a summary of properties that have been presented in a range of outputs in a single output format (Nisson 22). In unifying a range of outputs into a single output, an aggregation function employs the use of the primary key that is present in the target table (Rijsbergen 42). Each of every table that is formed afresh is acted upon by the aggregation function to summarize its related properties in a single output format (Rijsbergen 42). The resultant output is what is employed for multi-view training (Rijsbergen 42). All the resultant multi-view outputs are employed to train a corresponding number of multi-view learners (Rijsbergen 42). Examples of aggregation functions that are commonly used on data include COUNT, MAX and MIN (Rijsbergen 42).

Since it is the MVC algorithm that is used in linking multi-view learning with relational databases, it is important to evaluate the working of MVC algorithm (Guo 32). The approach that has been presented has therefore been intended to allow the framework of multi-view learning input data from relational databases (Guo 32). There steps are normally involved in a typical MVC algorithm (Guo 32). First, a group of training data is obtained from a relational database (Guo 32). Here, just as we had seen in relational classification, groups of data for multi-view training are obtained from a relational database source (Guo 32). Such a process is normally implemented by the aid of foreign key connections (Guo 32). Therefore, elements in the target table are associated with elements in other tables through foreign keys (Guo 32). Moreover, aggregation functions are applied to unify a range of related outputs in the many to one relationship (Guo 32). Secondly, multi-view learners are set in motion to ingrain the intended concept in every of the resultant data groups (Guo 32). Consequently, trained learners that have been created in the second step are employed in creating an information model with useful knowledge(Guo 32).

The above is a summary of the three important steps in the implementation of MVC algorithm (Srinivasan 300). Let us now evaluate the above steps in more detail. In the first step, we intend to create a group of data that will be used for training purposes (Srinivasan 300). This group of training data is obtainable from relational databases (Srinivasan 300). A group of multi-view data for training is created for each element in the target table based on relationships from other tables (Srinivasan 300). Since it is paramount to provide sufficient information for each of the multi-view learners, the above approach of relating all elements in the target table to information from other tables is important (Srinivasan 300). Once this process has completed, aggregated functions are then employed to solve the problem of one to many relations that may exist between other tables and the target table (Srinivasan 300). For example, in figure one; there are about seven associations with the target table from other tables (Nisson 17). In this kind of a scenario, the MVC will develop eight groups of training data in a multi-view format (Srinivasan 300). Here, one of the training data groups will be created for the Loan table while the rest will come from other tables (Srinivasan 300). Aggregated functions will therefore present a kind of a summary from all of the multi-view data (for data training) (Sav 1099). As aggregated functions act on the multi-view relationships, some elements from the tables are unified to create training data groups (Sav 1099). In the end, it is the elements that have direct and indirect associations through foreign keys from other tables to the target table that are selected (Sav 1099). Therefore, it often happens that once aggregated functions have acted on relationship data, the numbers of data training groups are decreased (Sav 1099).

The second step that is implemented in a typical MVC algorithm is the creation of learners to learn from the trainers that have been formed in the previous stage (Sav 1099). The learning process is therefore started here with an emphasis on the intended concept for learning (Vens 124). It is important to understand that each of the learners will form a unique theory from a group of training data (Vens 124). Therefore, a range of perspectives from different learners is given; hence, allowing for a system of checking and unifying these diverse perspectives from learners in the final step (Vens 124). In the final step of the MVC algorithm, we have a final learner (meta-learner) that gets inputs from a range of learners from the previous step (Nisson 30). However, before the perspectives of learners are used by the meta-learner, they first undergo a validation process (King 337). Here, a system is used to check for the accuracy of the perspective that has been presented by each of the learner (King 337). If a learner is found to have an error that has surpassed the 50% percent mark, such a learner is ignored (Nisson 28). Therefore, the performance of the learners is thus evaluated to ensure that the ongoing information to the meta-learner is accurate (King 337). Once the perspectives of the learners have been evaluated, validated perspectives from the learners are fed to the meta-learner (King 337). The work of the meta-learner is to unify all the perspectives from the learners for the purpose of creating a useful model of information (King 337). Each of the perspectives that are presented to the meta-learner by a learner consists of a unique judgment in predicting an output (King 337). Let us consider the eight tables in figure one. Our task is to find out the truth about the following situations on the condition (status) of a loan: whether a loan is good and unfinished, good and finished, bad and unfinished, bad but finished (Guo 14). In our target table, there are over six hundred possible records that contain an attribute that indicates the condition (status) of a loan (Guo 32). As it has been indicated in figure 1, all of the tables have some form of association with the target table (Guo 32). Such a relationship has been underlain by direct and non direct foreign key relationships (Viktor 45). Therefore, all the other tables have stored some form of information about the target table (Viktor 45). We can therefore consider three learning activities that can be undertaken here. First of all, we need to find out whether a loan is bad or good based on about 234 possible finished conditions (Guo 32). Secondly, we also need to find out whether a loan is either bad or good based on about 682 possible records irrespective of the finished status (Guo 32). Finally, we employ the use of the transaction table to remove some positive relationships in the target table in order to balance and enhance the learning process (Guo 32).

One of the boulder challenges that have emerged in the study of machine learning is the difficulty of labeling data for the purposes of training (Muslea 2). Such a process is time consuming, tiring and may also result in inaccuracies (Muslea 2). Muslea has argued that it is desirable and possible to reduce and/or eliminate the task of data labeling in machine learning applications (Muslea 2). In multi-view learning, it is possible for different views of a learning machine (of multi-view learning type) to perceive a targeted concept in isolation (Muslea 2). For example, by applying the use of either an infra-red sensor or a sonar sensor, it is possible for a robot to navigate around an approaching obstacle (Muslea 2).

An important approach that is desirable in avoiding the use of data labeling for learning machines is Co-testing (Muslea 4). In Co-testing, focus is placed on the usefulness of mistake learning (Muslea 4). In a situation where views present a range of conflicting outputs, the false view will automatically present mistakes in the system (Muslea 4). Therefore, the system learns to adopt the correct label for targeted concepts by referring to the database of mistakes that it has made (Witten 13). Through Co-testing, machine learning has moved towards the lane of active learning.

The active learning process in Co-testing is presented in the following way. At the initial point, the system has an inbuilt array of a few instances where it can infer to label a targeted concept in each of its views (Witten 13). For a situation where, its views output a non expected outcome, a user inputs a new label for that concept label (Muslea 17). Once this has been achieved, the co-testing system will automatically entrench the new labels into it database for use in identifying and labeling other views in the future (Muslea 17). What transpires in this process is that for instances where learning views of a learning machine predict conflicting labels for a targeted concept (Yin 108), then, it is true that one or more of the machine learners has made a mistake in interpreting the learned concept (Muslea 17). Therefore, the task of identifying the label is taken to a user for identification (Muslea 17). Once a user identifies and authenticates a targeted concept, then, the view that had erred in identifying the label is provided with correctional information (Muslea 17). However, it is important to note that foreign parameters that are non desirable in a learning setting such as noise are capable of influencing the learning process (Yin 108).

It is possible to extend the use of Co-testing so that the system can be able to update its database without referring to the user (Wolpert 244); hence, learn to identify new labels automatically (Muslea 17). Such a type of a system that is able to combine both automatic learning and human intervened learning known as a Co-EMT system (Muslea 17). For a simplified understanding of the Co-EMT system, we can understand the system as composition of the Co-testing system and the Co-EM system (Muslea 17). In the first place, the system applies an approach whereby unknown labels are identified in accordance with how other views in the system understand the concept (Wolpert 244). Thereafter, it updates its learning database in each of its views through training on the understanding of the concerned concept from the other views (Muslea 17). An obvious advantage that is presented to the counterpart of the CO-EM system in a Co-EMT arrangement is that the Co-EM is now able to identify unique data that has been encountered by the system unlike the previous arrangement whereby it identified labels in a non predictable manner (Muslea 17).

As in any other arrangement of multi-view learning, one cannot escape the stage of data validation as a way of checking on authenticity for the purposes of minimizing errors; hence, form a model that would provide useful information for the system and its users (Watkins 54). Considering that a Co-Testing system consists of many targeted concepts that are not labeled, a more difficult challenge is presented at the stage of data validation (if one was to employ the normal approach that is employed in data evaluation) (Watkins 54). In noting that most validation procedures employ the use of positive and inverse learning for a given view, and that such a process is not required at all times, Muslea has suggested a new system that can be used to authenticate data (Muslea 17). Muslea considered unique instances whereby positive evaluations produced low accuracies as inverse evaluations produced high accuracies on a similar piece of multi-view data (Muslea 17).

In this direction, Muslea has suggested a new approach that can be used in the authentication of data: Adaptive view validation (Muslea 17). Basically, this type of a system is a kind of a meta-learner that incorporates previous circumstances and experiences in authenticating data to perform a data authentication task (Muslea 17). The inherent experience of the meta-learner is employed for one important purpose: to decide whether it is useful to incorporate available views from learners for effective learners (Muslea 17). Thus, a number of views from learners are considered as unnecessary or inadequate for the learning process (Muslea 17).

Conclusion

Due to its obvious benefits, machine learning has progressively incorporated into machine and manufacturing systems. In fact, forms of machine learning have existed for a long period of time before the birth of experts in machine learning. The current trend is not just to design machines with human like intelligence; the scope is wider as to design machines that can learn in a way that man has been unable to (Guo 22). As more and more knowledge on machine learning has built, an array of approaches in the design of learning machines has emerged. Among the approaches that have been utilized in the learning process of machines include the use of the statistical approach, the use of brain models, the use of neural networks and the use of adaptive control mechanisms among other approaches. With multi-view learning whereby machines have been installed with a capacity of learning from an array of different possibilities, the multi-view approach has especially been helpful in producing accurate outputs. Here, an array of outputs from a single input is possible because of the different views that are incorporated in learning machines. Since, it is possible to produce a single output from all the views by the use of approaches such as the method of aggregated functions, obtained outputs from learning machines are likely to be accurate(Guo 22). Also, since the design of learning machines has been partly driven by a need to supplement for human limitations, it is useful to use learning systems in analyzing complex data types such as those that are found in data mining (Guo 22). In this direction, an approach that has been laden with a lot of promise is the marriage of multi-view learning and relational databases (Guo 5). Such an approach has presented numerous benefits to the process of machine learning. In one way, many people are already familiar with the system of relational databases and therefore, adopting relational database in multi-view learning presents a simplified approach in machine learning. Moreover, the use of MVC algorithm and aggregated functions has presented an opportunity of obtaining useful models from complex data types such like mining data (Guo 22). It is interesting to note that there has been a consistent array of new developments in machine learning. Among the significant directions that have originated in machine learning is the use of Co-testing approach in understanding labels (Muslea 2). It is also interesting to make an observation that we may eliminate non useful views during data analysis for the development of models (Muslea 10). It can therefore be expected that more effective and fruitful tools will continue to emerge in machine learning for the creation of more efficient learning machines.

References

Blum, Mitchell Combining Labeled and Unlabeled Data with Co-training. New York: McMillan, 1997. Print.

Burges, Cole “A Tutorial on Support Vector Machines for Pattern Recognition.” Data Mining and Knowledge Discovery 2.1, 1994: 121-168. Print

ReRaed, Muggleton. “Inductive logic programming: Theory and methods.” Dzeroski, Simon. “Multi-relational Data Mining: An Introduction” ACM SIGKDD Explorations, 5(1), 2003: 1-16. Print

The Journal of Logic Programming 20.5, 1994: 629-680. Print

Guo, Hongyu. Mining Relational Databases with Multi-view Learning. Ottawa: University of Ottawa, 2003. Print

King, Camacho. “Proceedings of the Fourteenth International Conference on Inductive Logic Programming.” Springer-Verlag 13.4, 2004: 323-340. Print

Kroegel, Mark “Multi-Relational Learning, Text Mining, and Semi-Supervised Learning for Functional Genomics.” Machine Learning 57.8, 2004: 61-68. Print.

Muggleton, Feng. Efficient induction of Logic Program Tokyo: Ohmsma, 1993. Print.

Muslea, Alexandru. “Active Learning with Multiple Views” New York: University of Southern California Press, 1993. Print

Nisson, Nils. Introduction to Machine Learning London: McMillan, 1995. Print

Perlich, Provost. Aggregation-based feature invention and relational concept classes Washington, D.C. 2003: Mayfield, 2000. Print.

Pierce, Cardie. Limitations of co-training for natural language learning from large Databases McMillan: New York, 2005. Print

Quinlan, Cameron “A midterm Report.” Vienna, Austria: European Learning Press, 1993.

Rijsbergen, John. Information Retrieval. London: Butterworths, 1979. Print

Sav, Ballard “Category learning from multimodality.” Neural Computation, 10.5 1998: 1097-1117. Print

Srinivasan, Bristol. “An assessment of ILP-assisted models for toxicology and the PTE-3 experiment.” London: European Leaning Press

Thomas, Dietterich Machine Learning. Oregon: Oregon University Press, 2003.

Vens, Assche. First order Random Forests with Complex Aggregates. New York: McMillan, 1997. Print

Viktor, Herbert. “The CILT Multi-Agent Learning System” South African Computer Journal (SACJ) 24.3, 1999: 43-48.

Watkins, Simons. “Learning from Delayed Rewards” PhD Thesis, University of Cambridge: England, 1989.

Winder, Ronald. “Threshold Logic.” PhD Dissertation. Princeton University: Princeton press, 1962. Print

Widrow, Stearns. Adaptive Signal Processing Englewood: Prentice-Hall, 1999. Print

Witten, Frank. Data mining – practical machine learning tools and Techniques with Java implementations. London: McMillan, 2002. Print

Wolpert, David Stacked Generalization, Neural Network. McMillan: New York

Yin, Han. Cross Mine: Efficient Classification across Multiple Database Relations. London: European Learning Press, 2003. Print.

Data Mining and Machine Learning Algorithms

Introduction

Machine learning algorithms are very important in providing real value attributes. Other benefits derived from such machine algorithms include handling of missing values as well as those of symbolic approach. Algorithms with such attributes include K* among others. It refers to an instance-based learner that applies entropy in its distance measure. It also has the advantage of comparing favorably with other machine based learning algorithms. Classifying objects have been utilized over the years by all categories of researchers throughout the world. The task is very involving as some data become noisy and can as well have irrelevant attributes, which makes it difficult to learn from. To achieve this, several approaches and schemes have been tried, these include decision trees, rules and case based classifiers, among others. Real valued features have presented an enormous challenge to instance based algorithm. This was mainly because of inadequate information on theoretical background. K* uses distance measure to examine performance on different problems. There are two ways of Data mining; these are either through supervised or unsupervised methods. The former describes methods that identify target variables while the latter does not. In essence these algorithms identify structures as well as patterns in variables. The main methods utilized in data mining, which are unsupervised, include clustering and association rule, among others. However, it is important to note that most data mining methods, as explained above, are supervised, meaning that they have targeted variables. These include those named above such as decision trees, K-nearest neighbor and neural network, among others. This paper will explore two of those algorithms, that is, the K* and K-nearest neighbor algorithms (Cleary 2-14).

K*

K* is an instance based learner, which uses distance based measure to classify variables by examining their performance on a variety of problems. These learners classify instances by comparing them to other databases which entail pre-classified examples. The process assumes that similar instances usually have similar classification even though this poses challenge in defining such instances and classifications. Instance based learners include K*, K-nearest neighbor and IBL, among others. Entropy as a distance measure employs the approach of computing distance between instances by using information theory. It therefore employs the intuition that such distances define the complexity of converting one instance into the other. This can be done in two processes, one of which involves defining finite set of transformations to map one instance on another. This is known as a program and is made prefix free. This is done by adding termination symbol at the end of each string. The shortest distance of string between two instances defines the distance of measure. This results in a distance that does not solve issues of smoothness since it is very sensitive to small changes (Cleary 2-14).

K* on the other hand tries to reduce this problem of high sensitivity to change and hence reduced smoothness by summing over all these transformations that exists between any two instances. However, this is also not very clear as to which transformations are summed, and thus it aims to a probability with the sequence. For instance, if the program is c, then the probability becomes 2-c. This method is referred to as the Kolmogorov complexity and its summation satisfies Kraft inequality. This can b interpreted as the probability of generating a program through random selection of such transformations. It can also refer to the probability of arriving at an instance by random walk from the first instance. The units of complexity are therefore obtained by calculating its logarithms. This method has been found to bring out the most realistic and robust measure of the link to DNA sequence (Cleary 2-14).

K* Algorithm

In order to use K* which applies distance measure, one needs to have a way of selecting parameters x0 and s. The individual also needs to find ways of utilizing the results from distance measure to ascertain the predictions. The variables above represent real and symbolic attributes respectively. As the parameters change, distance measure also changes, and in the process drawing interesting facts. For instance, as s tends to 1, instances that different from the current develop very low transformations. On the other hand, instance s of similar symbol will develop high transformation probability. In essence, when this happens, the distance unction will convey nearest neighbor behavior. In the other case where s tends towards 0, transformation probability will show the symbol’s probability distribution. Further change in s causes smooth behavioral change, between the two extreme instances (Cleary 2-14).

Similarities are prominent in distance measures of real valued attributes. For example, probability instances drops heavily with increase in distance when x0 is small and therefore functions as a measure of nearest neighbor. However, in case x0 is very large, then virtually all instances shall have similar transformations, which are equally weighted. In both cases the number of instances tends to vary from extreme 1, in which the distribution is nearest neighbor, to that of extreme N, where the instances are equally weighted. In this regard, the effectual number of instances for any function can be calculated as follows (Cleary 2-14).

N0 ≤ (∑b P*(b/a)) 2 / ∑b P* (b/a) 2 ≤ N

Where:

  • N = effective number of training instances
  • N0 = number of training instances at the smallest distance
  • b = blending parameter

K* algorithm works to choose one value fro x0 (s). To achieve this, it selects this number between N and n0. After which, it inverts the expression shown above (Cleary 2-14).

K-Nearest Neighbor Algorithm

This type algorithm is usually used for classification. In some cases it is also utilized in prediction and estimation. It gives a proper example of instance-based learning, which stores data. It does this to obtain classification for unclassified records which are new. To do this, it compares such records with those similar in the training set. In dealing with this classifier, several issues must be considered. These include the number of neighbors that one should consider, for instance, determination of k; since k represents the nearest neighbors. It also involves other issues such as how to measure the distance from the nearest neighbors as well as combining information from al the observations. The algorithm also involves determination of whether points should be weighted equally or not (Larose 90-105).

Weighted Voting

In most cases, it would be assumed that neighbors closest to the new record should be considered more than those far and thus weighted heavily. However, analysts tend to apply weighted voting which has the propensity to reduce ties. Several algorithms may be employed in classification of objects. In K-nearest neighbor classification, one looks at the number of nearest similar variables to classify, predict or estimate its performance. This can be utilized in situations such in administering drugs to patients. By using known classifications, one can classify unknown object by using the known ones to classify, estimate or predict its behavior (Larose 90-105).

Use of K-nearest neighbor algorithm for prediction and estimation

K-nearest neighbor algorithm may also be used for prediction and estimation. This may also included its use for continuous valued target variables. This can by achieved through locally weighted averaging method, among others. In the same manner as classification is done, by comparing the nearest similar neighbor, prediction may be done as well as estimation by using the same technique. For instance, in a hospital prescription, when we have classified variables, we can predict or estimate the unclassified using those that are classified. Such instances can include estimation of systolic blood pressure. In this case, locally weighted method would estimate blood pressure for k = the number of nearest neighbors. This will be accomplished by sing the inverse of weights (Larose 90-105).

For instance, estimated target y = summation of wiyi / summation of wi, where w = 1/d (new, xi) 2 for the records x1, x2, xk. This would give the systolic blood pressure when calculated (Larose 90-105).

Choosing k

Careful considerations should be taken when choosing k in classifying variables. This is mainly because choosing a small k may result in problems such as noise, among others. On the other hand k that is not very small may smoothen out idiosyncratic behaviors which may be learned from the training set. Moreover, taking a larger k also has the probability of overlooking locally interesting behavior (Larose 90-105).

Conclusion

There are two ways of Data mining; these are either through supervised or unsupervised methods. The former describes the methods that identify target variables while the latter does not. The paper explored both K* algorithms and K-nearest neighbor algorithm as well as their usage. In doing this, it was found that it works well on real datasets. The fundamental method employed sums the probability of all possible paths from one instance to another. This helps in solving the smoothness problem, which contributes greatly to a robotic and realistic performance. The methods also works to enable integration of real valued attributes, symbolic attributes as well as ethical ways of dealing with missing values. K* can therefore be used to predict real value attributes and its similarity to 2-dimages. K* performance works best for only one of the two simple learning algorithm IR. This can be solved by raising blend for the unimportant attributes and lowering the blend for important ones (Cleary 2-14).

The paper has also explored the use of K-nearest neighbor algorithm in classification as well as in prediction and estimation. These have been enabled by methods such as locally weighted averaging, among others. The paper also goes in detail of how to choose k and how it affects classification, prediction or estimation results. Choosing a small k may result in problems such as noise, among others. On the other hand k that is not very small may smoothen out idiosyncratic behaviors which may be learned from the training set. Moreover, taking a larger k also has the probability of overlooking locally interesting behavior. It is therefore quite important that one considers such implications when choosing k. This may be resolved by allowing the data to solve such problems on its own. To manage this, it may employ the use of cross-validation procedure. The two methods are therefore useful in classification of objects (Cleary 2-14).

Works Cited

Cleary, John. “K*: An Instance-based Learner Using an Entropic Distance Measure”. Dept. of Computer Science, University of Waikato. New Zealand,

Larose, Daniel. “k-nearest neighbor algorithm”. Discovering Knowledge in Data: An introduction to Data Mining. John Wiley & Sons, 2005.

Machine Learning: Bias and Variance

There are critical points in machine learning that may compromise a prediction model. The relation between bias, variances, and learning models requires a careful examination of data sets used for training (Provost and Fawcett, 2013). This paper aims to assess the impact of bias and variances on prediction models and discuss three ways in which the behavior of such frameworks is adjusted to accommodate their influence.

Model prediction can provide highly valuable insights into many real-life situations. However, hidden patterns that are revealed by machine analysis require extrapolating data that may not explicitly copy the examples on which such frameworks were created (Rocks and Mehta, 2022). Therefore, there is a direct relation between bias, variances, and the efficiency of the prediction models. High levels of bias lead to models that are fast to generate yet underfitting, implying that the data is not represented correctly (Brand, Koch, and Xu, 2020; Botvinick et al., 2019). High variance can be similarly detrimental for a prediction, as a model trained on a highly specific data cluster will be able to predict outcomes that are too complex for utilizing outside of the example set (Brand, Koch, and Xu, 2020; Knox, 2018). Optimization of a prediction model can be achieved by utilizing overparameterized sets that can be later ‘trimmed’ for less global methods (Belkin et al., 2019). It is paramount to decide on the desired level of generalizability of a learning model prior to setting maximum acceptable bias and variance.

The trade-off in such cases requires one to sacrifice either applicability or accuracy in order to find a suitable level of complexity for a model. The optimal performance of a learning model can only be achieved by minimizing the total error (Singh, 2018). The three states of a prediction model are either too complex, too simple, or a perfect fit (Kadi, 2021). The goals of a model must define complexity, as leaving decisions to an improperly trained model may severely impact a firm’s performance (Delua, 2021). Traditional machine learning methods require finding a sufficient level of generalization at the cost of functional losses (McAfee and Brynjolfsson, 2012; Yang et al., 2020). In real life, any implementation of a statistical predictor is linked with margins for error that must be acceptable for the given situation. For example, IBM’s AI-powered cancer treatment advisor Watson was giving incorrect suggestions due to high bias (Mumtaz, 2020). The detrimental impact of such a learning model is apparent in its potential for harm.

In conclusion, an efficient prediction model requires its creators to find a balance between bias and variances to remain applicable in practice. Oversimplification or overfitting can lead to errors in predictions to the point of turning an algorithm unusable in real life. The trade-off in accuracy is required for a learning model to remain applicable, yet such a decision must be based on a practical implication.

Reference List

Belkin, M. et al. (2019) ‘Reconciling modern machine-learning practice and the classical bias-variance trade-off’, Proceedings of the National Academy of Sciences, 116(32), pp. 15849–15854.

Botvinick, M. et al. (2019) ‘Reinforcement learning, fast and slow’, Trends in Cognitive Sciences, 23(5), pp. 408–422.

Brand, J., Koch, B. and Xu, J. (2020) Machine learning. London, UK: SAGE Publications Ltd.

Delua, J. (2021) , IBM. Web.

Kadi, J. (2021) The Relationship Between Bias, Variance, Overfitting & Generalisation in Machine Learning Models, Towards Data Science. Web.

Knox, S.W. (2018) Machine learning: A concise introduction. Hoboken, NJ: John Wiley & Sons, Inc.

McAfee, A. and Brynjolfsson, E. (2012) Web.

Mumtaz, A. (2020) Web.

Provost, F. and Fawcett, T. (2013) Data science for business: What you need to know about data mining and data-analytic thinking. Sebastopol, CA: O’Reilly.

Rocks, J.W. and Mehta, P. (2022) ‘Memorizing without overfitting: Bias, variance, and interpolation in overparameterized models’, Physical Review Research, 4(1).

Singh, S. (2018) Web.

Yang, Z. et al. (2020) Proceedings of Machine Learning Research. Web.

Machine Learning and Regularization Techniques

Machine learning belongs among the advanced methods of data processing techniques. The data set plays a crucial role in machine learning, providing the material to generalize and model specific patterns (Deluna, 2021; McAfee and Brynjolfsson, 2012). However, it is essential to distinguish the model states of “generalizing” and simply “memorizing” (Kotsilieris, Anagnostopoulos, and Livieris, 2022, 1). Consequently, several techniques were developed to adjust the learning process, including regularization (Brand, Koch, and Xu, 2020, 1; Alonso, Blanche, and Avresky, 2011, 163). Regularization is multifaceted – it has different forms with unique features.

The concise definition of regularization coincides with its primary purpose – simplification. Overfitting means over-optimizing the model’s fit to the provided data; in this context, regularization focuses not only on optimizing certain combinations of fit but also on simplifying them (Provost, and Fawcett, 2013, 136; Belkin et al., 2019, 1). The regularization techniques that are of interest to me are L2-norm regularization, dropout, and adversarial regularization.

L2-norm regularization has wide usage in machine learning and statistics. It is usually being used for regularization of linear models (Nusrat and Jang 2018, 8; Zhu et al. 2018, 6-7). Its L1 form imposes a diagonal Gaussian prior with zero mean on the weights (Chen et al., 2019, 4). The technique was extended by using the L2 distance from the trained model’s weights to penalize the weights during testing (Barone et al., 2017). This technique provokes my interest because of its fine-tuning application, such as translation improvement (Google Translate). Another reason is L2 being non-sparse, which makes it more flexible compared to L1. Lastly, it can be used outside the machine-learning, making it a valuable tool in the data processing.

Considering the neural machine translation, dropout is also worth the attention. The principle of dropout’s operation presents another reason for curiosity – dropout randomly drops units from the model during training in each iteration (Barone et al., 2017). In addition, I appreciate the ability to use dropout in a learning model without the need to use it in the testing process. Dropout is sometimes used for computation libraries (Keras framework for Python).

The last regularization technique is adversarial regularization; the reason for attention is the privacy protection. Machine learning models might leak data because of predictions – adversarial regularization makes the predictions untrackable (Nasr, Shokri, and Houmansadr, 2018, 634). Another reason to be interested is the authors’ ambitions to create a truly universal technique. Lastly, I am fascinated by technique’s universality itself – it trains ANN, regularizes it, and ensures privacy protection.

Numerous studies showcase the multifaceted nature of regularization techniques – depending on the needs, different features are required for regularization. In the case of statistical regularization, such as fine-tuning, L2-norm regularization will narrow the data set. In the need for additional regularization outside the learning process, dropout will be of use. Finally, with the substantial concern for data privacy, adversarial regularization can provide the needed protection.

Reference List

Alonso, J., Belanche, L., and Avresky, D. R. (2011) 2011 IEEE 10th international symposium on network computing and applications. Cambridge MA, Massachusetts, USA. Massachusetts: IEEE, pp. 163-170. Web.

Barone, A. M. et al. (2017) The University of Edinburgh, Edinburgh, The United Kingdom. Edinburgh: Association for Computational Linguistics, pp. 1489-1494. Web.

Belkin, M. et al. (2019) Proceedings of the National Academy of Sciences, 116(32), pp. 15849-15854. Web.

Brand, J. E., Koch, B., and Xu, J. (2020) SAGE Research Methods Foundations. Web.

Chen, J. et al. (2019) Environment international, 130. Web.

Deluna, J.(2021) ‘Supervised vs. unsupervised learning: What’s the differences?’ IBM.

Kotsilieris, T., Anagnostopoulos, I., and Livieris, I. E. (2022) Electronics, 11(4), p. 521. Web.

McAfee, A. and Brynjolfsson, E. (2012) ‘Big data: The management revolution.’ Harvard Business Review, October. (Accessed 27 May 2022).

Nasr, M., Shokri, R., and Houmansadr, A. (2018) New York: Association for Computing Machinery, pp. 634-646. Web.

Nusrat, I., and Jang, S. B. (2018) Symmetry, 10(11), p. 648. Web.

Provost, F., and Fawcett, T. (2013). Data science for business: What you need to know about data mining and data-analytic thinking. Sebastopol: O’Reilly Media.

Zhu, D. et al. (2018) Big data and cognitive computing, 2(1), p. 5. Web.

Regularization Techniques in Machine Learning

Based on the approach used to overcome overfitting, the forms of regularization can be divided into two categories. Both are used either to prevent interpolation or to change the effective throughput of a class of functions (Belkin et al., 2019). One of them is L1 regularization, which reduces the weight of uninformative objects to zero by subtracting a small amount of weight from each integration. Thus, its main feature is that the weight eventually becomes zero leading to a smoother optimization (Oymak, 2018). This form of regulation may be of interest because it helps to work with big data, effectively constrains sparsity properties, and uses the method of equating the optimum to zero (Lin et al., 2018). Of no less interest is that the form can underlie structures with a reduction in the generalization error (Zhao et al., 2018).

In real life, L1 regularization can be used when making machine predictions. Many of them are used to find sparse block solutions (Janati, Cuturi and Gramfort, 2019). For example, when predicting housing prices, the regularization L1 will consider important factors such as the area, infrastructure, and year of construction. At the same time, the form will exclude minor elements, such as the price of flooring or built-in gas equipment. In another example, when predicting the payback of a business product, the system can use indicators of the area’s population and the presence of competitors in the district, ignoring the age or gender aspects of potential buyers. In general, in this form, the solution to sparsity problems can be taken as representative (Yang and Liu, 2018). Thus, the method provides robust results when working with big data (Alizadeh et al., 2020).

Another form is L2 regularization, which main feature is the optimization of the average cost. This type deploys the most commonly used penalty, the sum of the squares of the weights (Provost and Fawcett, 2013). It may be of interest because of the uniqueness of the final solution, computationally inexpensiveness, and the reduction of the probability of an overall error. Even in the presence of noises, the L2 estimation error may still tend to zero with possibly optimal indicators (Hu et al., 2021). The method can also be used to smooth monotonic regression on a single predictor variable, increasing its interest in the context of analysis (Sysoev and Burdakov, 2019).

In real life, L2 regularization is used to evaluate the significance of predictors. It can become a way to overcome the convergence problem by norm, represented by other regularization methods (Zhang, Lu, and Shai, 2018). In the context of forecasting prices example, the slightest factors will also be considered, which will reduce the difference from the final result. In the machine calculation of business payback example, the L2 regularization can complicate the forecast since weight decay helps less with deeper models on more complex datasets (Tanay and Griffin, 2018).

Reference List

Alizadeh, M., Behboodi, A., van Baalen, M., Louizos, C., Blankevoort, T. and Welling, M. (2020) ‘Gradient L1 Regularization for Quantization Robustness’, ICLR 2020.

Belkin, M., Hsu, D., Ma, S. and Mandal, S. (2019) Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences, 116(32), pp.15849-15854.

Hu, T., Wang, W., Lin, C. and Cheng, G. (2021) ‘Regularization matters: A nonparametric perspective on overparametrized neural network’, International Conference on Artificial Intelligence and Statistics, 130(829-837), pp. 829-837.

Janati, H., Cuturi, M. and Gramfort, A. (2019) ‘Wasserstein regularization for sparse multi-task regression’, The 22nd International Conference on Artificial Intelligence and Statistics, 89(1407-1416), pp. 1407-1416.

Lin, P., Peng, S., Zhao, J., Cui, X. and Wang, H. (2018) ‘L1-norm regularization and wavelet transform: An improved plane-wave destruction method’, Journal of Applied Geophysics, 148, pp.16-22.

Oymak, S. (2018) ‘Learning compact neural networks with regularization’, International Conference on Machine Learning, 80(3966-3975), pp. 3966-3975.

Provost, F. and Fawcett, T. (2013) Data Science for Business: What you need to know about data mining and data-analytic thinking. Sebastopol, California : O’Reilly Media, Inc.

Sysoev, O. and Burdakov, O. (2019) ‘A smoothed monotonic regression via L2 regularization’, Knowledge and Information Systems, 59(1), pp.197-218.

Tanay, T. and Griffin, L. D. (2018) ‘A new angle on L2 regularization’, Cornell University. doi : 10.48550/arXiv.1806.11186

Yang, D. and Liu, Y. (2018) ‘L1/2 regularization learning for smoothing interval neural networks: Algorithms and convergence analysis’, Neurocomputing, 272, pp.122-129.

Zhang, Y., Lu, J. and Shai, O. (2018) ‘Improve network embeddings with regularization’, Proceedings of the 27th ACM international conference on information and knowledge management, pp. 1643-1646.

Zhao, Y., Han, J., Chen, Y., Sun, H., Chen, J., Ke, A., Han, Y., Zhang, P., Zhang, Y., Zhou, J. and Wang, C. (2018) ‘Improving generalization based on l1-norm regularization for EEG-based motor imagery classification’, Frontiers in Neuroscience, 12, p.272.

Data Analysis Package for Machine Learning

Introduction

The conflict between proprietary software and open source has been raging for a long time. All sides of the tale investigate and evaluate a variety of topics and concerns. Open-source software is less secure than proprietary software is a misconception that stems from a variety of prejudices. However, a commercial permit does not guarantee security (Melwani, 2019, pg. 1). In contrast to proprietary software, open-source software discloses possible flaws. Anyone may access the code because it is open source. People frequently complain that seeing the code allows malicious hackers to snoop at it and exploit weaknesses. Therefore, open-source software’s advantages and disadvantages influence the project manager’s decision to improve technological devices.

Advantages of open source and disadvantages of proprietary software

Open-source software follows four main principles: it can be used for any reason, anybody has free entry to the source code and can edit it, the original manuscript can be redistributed, and changed copies can be redistributed. Although not every free software is open-source, it belongs to the freeware category. The benefits of open-source software include the fact that it is free to experiment, use, alter, and redistribute. It provides an accessible discussion board for support. It possesses a standard protocol that promotes openness (Melwani, 2019, pg. 1). Disadvantages include a lack of a competitive edge, community support that is not appropriate for business settings, and a high level of technicality owing to developer-focused development. As a result, numerous adaptations are required to fulfill unique use cases.

Disadvantages of open source and advantages of proprietary software

Legal safeguards limit the usage, distribution, and alteration of proprietary software. The fact that end users cannot obtain the source code that retains the owner’s intellectual property distinguishes proprietary software. The original firm manages all modifications, updates, and patches. Advantages include actual usability owing to a constrained feature set, high product reliability, specialist technical assistance, warranties, and limited responsibility protections for the customers (Melwani, 2019, pg. 1). Disadvantages include restricted versatility and extensibility and higher initial prices. Modifications may incur additional fees and dependency on the manufacturer to continue troubleshooting and developing the product. I prefer open-source software, in which one is free to use or alter the code as required because proprietary software imposes constraints.

Table 1. Summary of Results

Open source software
  1. Proprietary software
Advantages
  1. free to experiment, use, alter, and redistribute;
  2. free discussion board for support.
  1. actual usability;
  2. constrained feature set;
  3. high product reliability;
  4. specialist technical assistance;
  5. warranties;
  6. limited responsibility protections for the customers.
Disadvantages
  1. lack of a competitive edge;
  2. community support is not appropriate for business settings;
  3. high level of technicality owing to developer-focused development.
  1. restricted versatility and extensibility;
  2. higher initial prices;
  3. modifications may incur additional fees;
  4. dependency on the manufacturer.

Problem solving: Improving diabetes sensor

As a projector manager, I have to solve the problem of improving the diabetes sensor, which is part of my professional responsibilities. TRIZ also referred to as the theory of creative problem solving, is a strategy for encouraging creativity in project teams that have been stuck while trying to resolve a business situation. It is one of the methodologies where diabetic patients use a freestyle libre device even though it has several issues, the ease with which the device is knocked off, and the slight lag between the readings.

There are several TRIZ techniques for improving the diabetes sensor and detecting sugar levels. Exposure to constant blood glucose measures paired with information regarding food consumption and physical activity generates a unique data set about the patient’s daily patterns and reaction to food intake. The data collection can improve insulin administration and forecast glucose level changes (Roessner, 2019, pg. 1). Hospitals may more rapidly detect which patients will profit from a change in therapy, thereby averting catastrophic consequences of chronic illnesses by collecting big data. Precision medicine refers to the phenomena of tailor-made medical treatment, in which vast volumes of data are analyzed and mined for precise patterns, bringing patient personal information into sharp focus.

The FreeStyle Libre system is a classic illustration of how big data may improve diabetes care. It not only assists individuals in enhancing their glucose management and handling their condition effectively, but they also create a massive amount of information (Roessner, 2019, pg. 1). Aggregating, assessing, and interpreting such data can result in quality standards that can be condensed into practical adjustments to assist individuals in living their best lives.

The second stage is to employ Continuous glucose monitoring (CGM), a new technology that checks glucose levels. It works by placing a tiny sensor with a needle on the patient’s skin and continually sampling the glucose level, storing it, and sending it to a mobile phone or insulin pump (Roessner, 2019, pg. 1). It provides the patient with a continuous glimpse of their glucose concentration. The advantages are numerous, including the ability to set alerts, change insulin pumps, gather an understanding of the pattern of the glucose concentration, eliminate many fingersticks, and have a much better-controlled glucose level.

The Coach can make tiny meaningful, individualized modifications to the patient’s lifestyle using the AGM device data to obtain blood sugar levels in the prescribed range. AGM systems from various manufacturers provide a variety of advantages. AGM systems can give real-time and adaptive glucose data and can be used by persons of all ages (Werhane, 2018, pg. 269). They are ideal for young children aged four and higher since they allow parents or physicians to monitor their blood glucose levels gently. It acts as a 24-hour lifeline, alerting patients when their blood sugar levels dip into a dangerous range. As a result, the user may change their diet, medicine, and exercise levels. Continuous glucose monitoring and big data analytics are cutting-edge technology for assessing glucose levels and managing diabetes. Each aims to make diabetes simpler to manage and maybe cure it one day.

Output from MS Project

Project management is the art of organizing, planning, coordinating, directing, and regulating resources using tables, graphs, and calculations. A project has a defined beginning and end point and is designed to achieve particular goals. Project crashing aims to shorten the project timetable and provide the product, services, or outcome earlier than intended. However, they require additional cost, usually 1.5 times more than intended. Activity evaluation and classification enable teams to find the most value for the least possible cost increment, shortening the project duration via crushing (Kumar & Hitesh, 2020, pg. 34). If even crashing cannot help in completing the project timely, fast-tracking may be applied, which means the simultaneous execution of all operations, despite it is a risky operation.

A project network diagram; one can note that it takes 217 days in normal duration and 175 days in crash duration
Figure 1. A project network diagram; one can note that it takes 217 days in normal duration and 175 days in crash duration. Tables 2 and 3 shows the duration of all tasks

A network diagram depicts the project workflow, which enables one to see how a project should be executed. Critical activities are those which should be completed strictly on time: if they are overdue, the whole project is overdue as well. Figure 1 shows this workflow for the current project: for example, need planning is a starting point, which continues for 70 days in normal conditions or 56 days in crush conditions, as seen in Tables 2 and 3. Then, there are two workflows: work with equipment and with lab trainings, both of which converge on the last stage of system testing. Mentioned tables show the latest and earliest starts and finishes for each activity and the project duration in general.

Table 2. A normal project workflow: shows the latest starts and finishes of each activity, 217 days in total

Task Name Duration Start Finish
Plan needs 70 days Mon 12/12/22 Fri 3/17/23
Order equipment 56 days Mon 3/20/23 Mon 6/5/23
Install equipment 70 days Tue 6/6/23 Mon 9/11/23
Set up training lab 49 days Mon 3/20/23 Thu 5/25/23
Training courses 70 days Fri 5/26/23 Thu 8/31/23
Test system 21 days Tue 9/12/23 Tue 10/10/23

Cost analysis is the calculation of how many pounds one should spend on each project stage, and cashflow diagrams are an excellent way to show it. Two cashflow diagrams, Figure 2 and Figure 3 show the cost of each project workflow and stage for normal and crush activities, respectively. Each diagram has a table that shows the precise cost of each activity, while numbers on diagrams show approximately the project phases; as the normal workflow is more extended, it has five steps, while crush workflow has four. As one can see, a crush project way costs 540,000 pounds, almost 1.5 times more expensive than a normal one, costing 400,000 pounds. Therefore, it is wise to use normal project workflow when possible.

Table 3. A crush project workflow: shows the earliest starts and finishes of each activity, 175 days in total

Task Name Duration Start Finish
Plan needs 56 days Mon 12/12/22 Mon 2/27/23
Order equipment 42 days Tue 2/28/23 Wed 4/26/23
Install equipment 49 days Thu 4/27/23 Tue 7/4/23
Set up training lab 42 days Tue 2/28/23 Wed 4/26/23
Training courses 56 days Thu 4/27/23 Thu 7/13/23
Test system 21 days Fri 7/14/23 Fri 8/11/23
A cashflow for the normal project workflow, a cost for each activity, and a cumulative cost (£400,000)
Figure 2. A cashflow for the normal project workflow, a cost for each activity, and a cumulative cost (£400,000)
A cashflow for the crush project workflow, a cost for each activity, and a cumulative cost (£565,000)
Figure 3. A cashflow for the crush project workflow, a cost for each activity, and a cumulative cost (£565,000)

Ethical Awareness of the Engineer

The opinion of the engineering council determines the code of ethics for professionals in this sphere. Even though now I am only preparing to become a qualified engineer, I apply ethical thinking principles to my work at the university. I plan to contribute to the development of society as an ethical engineer after graduation, which is only possible with a high awareness of moral issues. For example, the Institution of Engineering and Technology (IET) highly values engineers acting professionally and ethically. It states that engineers must adhere to the highest levels of integrity and integrity in all interactions. They must own their mistakes and not twist or modify the facts. Engineers must notify clients if they suspect a project may fail (Maslen, 2020, pg. 417). Project managers are evaluated based on how successfully they accomplish tasks within the time frame, budget, and scope.

However, when project choices are made by individuals evaluated on various factors, such as boosting sales or meeting a revenue target, these actions may contradict the superior judgment of a skilled project manager. Whether working alone or for a firm, an engineer must deal with ethical difficulties, most of which arise during the product’s conceptualizing, issues in the development and evaluation departments, or issues affecting production, sales, and operations. Moral concerns exist during monitoring and teamwork as well. An engineer’s ethical responsibilities and moral principles must be examined because an engineer’s judgments influence the goods and services.

Engineers must adhere to a set of values to avoid moral degradation. Honoring others, respecting other people’s rights, and honoring commitments are all examples of appropriate behavior. Morality demands that we respect others. It entails being fair and reasonable, satisfying obligations and rights, and without creating undue harm by deceit. Whenever a problem arises, it is necessary to have a few talents to solve it (Herkert et al., 2020, pg. 1). Engineers must deal with challenges with tolerance, and a few moral goals must be considered while coping with these issues.

An engineer must be able to detect moral dilemmas and concerns that arise in engineering. The study of the situation is required to discriminate and judge as per ethics or the norms to be followed. The argument must be evaluated and grasped to decide on a subject. Both sides of the debate must estimate all probability, and the substance of the dispute must be rational and moral. After going through all the ethical and sensible facts, consistent and complete viewpoints should be created based on a review of pertinent facts.

Conclusion

The moral and practical difficulties must be addressed independently. While on the job, the language used to describe one’s moral beliefs should be so exact that the phrase or words do not affect the original meaning. Despite having all of these moral aims, the ethical rationale for attaining moral behavior with commitment and responsibility is gained by a few abilities (Herkert et al., 2020, pg. 1). An engineer should have a few ethical reasoning abilities, such as the ability and desire to be ethically rational while dealing with problems. Justice can only be served if someone is willing to enhance one’s skills and respects all parties involved in the dispute.

List of References

Herkert, J., Borenstein, J., & Miller, K 2020 ‘’, Science and Engineering Ethics. Web.

Kumar, A & Hitesh, R 2020, ‘Fundamentals of Software Engineering, BPB Publications.

Maslen, S., Hayes, J., Wong, J., & Scott-Young, C 2020 ‘,’ Environment Systems and Decisions, 40(3), 413–426. Web.

Melwani, U 2019 ‘’, Srijan. Web.

Roessner, K 2019 ‘’, Abbott. Web.

Werhane, P 2018 ‘’, Philosophy of Management, 17(3), 265–278. Web.

Approach For Understanding Machine Learning Methods

Introduction

I am a consultant to Diligent Consulting Group. In this case, consultations were conducted with an organization called Loving Organic Foods. To better understand what might motivate shopping habits, I was tasked with analyzing the factors that influence the cost of organic food. To achieve this, I decided to use linear regression analysis.

Regression analysis is one of the most popular statistical research methods. It can be used to set the degree of influence of independent variables on the dependent ones. Winston (2019) claims that by analyzing and examining the raw data, the researcher can make and come to inferences, compare and contrast, or even classify the data based on a specification attribute. Using statistical regression techniques is one of the most effective ways to examine these attributes properly. Moreover, according to Zhou (2020), it may be necessary to apply the concepts of correlation and linear regression equations. Once enough data has been collected and analyzed, the attributes with the most and least significant data can be identified.

Before proceeding to the analysis of data, it is vital to identify the variables. In this study, variables such as Annual Amount Spent on Organic Food and consumer Age are used. In this case, the independent variable (x) is Age, and the dependent variable (y) is the Annual Amount Spent on Organic Food.

Interpretation of the obtained results

The regression output generated in Excel

  • Multiple R = 0, 114912552
  • R Square = 0, 013204895
  • Adjusted R Square = 0, 00511641
  • Standard Error = 3718, 777442
  • Observations = 124

Interpretation of the coefficient of determination (R-squared)

R-squared statistically measures the closeness of the data to the fitted regression line. The R-squared is 0,013, which means the model describes 1, 3% variability in the response data based on its mean.

Interpretation of the coefficient estimate for the Age variable

The correlation coefficient is the square root of the R-squared value. It is used to measure how strong the relationship is between two variables.

Interpretation of the statistical significance of the coefficient estimate for the Age variable

A correlation coefficient of 0, 114 shows a weak positive relationship between the Annual Amount Spent on Organic Food and Age.

Furthermore, there is a block of information containing hypothesis tests for a particular coefficient.

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95, 0% Upper 95, 0%
Intercept 9778, 277424 1047, 233888 9, 337243124 5, 73909E-16 7705, 173345 11851, 3815 7705, 173345 11851, 3815
Age 26, 29286064 20, 57803719 1, 277714701 0, 203775979 -14, 44341928 67, 02914055 -14, 44341928 67, 02914055

Since there are p-values corresponding to Age (0, 204) > 0, 05, their presence in this regression model is negligible. Based on this, it can also be concluded that there is no strong relationship between the Annual Amount Spent on Organic Food and Age. Moreover, Alexander et al. (2017) note that the variable Age is not statistically significant based on the correlation coefficient.

The regression equation with estimates substituted into the equation

y = 9778, 28 + 26, 29x

According to Winston (2019), the coefficient of Intercept indicates what value Y will have with all other factors equal to zero. Zhou (2020) asserts that the coefficient of Age shows the level of dependence of Y on X. In this case, it is the level of dependence of the Annual Amount Spent on Organic Food on the Age of consumers.

The slope of the equation 26,29 tells us that increasing Age enlarges the Annual Amount Spent on Organic Food by 26,29, and the intersection point Y, 9778,28 is the initial Annual Amount Spent on Organic Food.

An estimate of Annual Amount Spent on Organic Food for the average consumer

Recall that the “average” customer is one that is close to 48 years old and spends an average of 11046 dollars a year on organic food. This indicator corresponds to the mean.

Y = 9778, 28 + 26, 29 * 48 = 11040, 2

Thus, the Annual Amount Spent on Organic Food for the average consumer is 11040, 2.

Conclusion

Regression analysis is a set of statistical methods for evaluating relationships between variables. It can be used to assess the degree of relationship between variables and to model future dependencies. In fact, regression methods show how the change in the dependent variable can be recorded based on changes in the independent ones. With the help of analysts, the correlation coefficient is deduced, which means the strength of the connections. The more significant it is, the easier it is to create a regression model.

As a result of the conducted research, it can be concluded that there is no strong relationship between the Annual Amount Spent on Organic Food and Age. The coefficient of determination is 0, 013, which means that there is 1, 3% of the total variation in the sample of the Annual Amount Spent on Organic Products. When the variable Age increases by one unit while keeping the other dependent variables constant, the Annual Amount Spent on Organic Food increases by 26, 29 units.

References

Alexander, H., Illowsky, B., & Dean, S. (2017). . Openstax.

Winston, W. (2019). Data analysis and business modeling. (6th ed.). Microsoft Press.

Zhou, H. (2020). Learn data mining through excel: A step-by-step approach for understanding machine learning methods. Apress.