There is no doubt that GIS is an integral part of the modern technical basis for Earth exploration. If one looks at the study of Burrough and McDonnel (1998), one can see that the authors understand GIS as a tool for creating, storing, and exploring information. Thus, the field of science and technology in which this tool finds application includes an environmental study of territories, their mapping, and reliable assessment (Longley et al., 2005). However, it is fair to admit that GIS technologies have undergone fundamental changes in recent decades, making it possible to use them in other areas as well. In particular, hydrological research is a promising area of practical application (Chow et al., 1988; Clark, 1998). For thousands of years, floods have been known to be a severe environmental problem that threatens not only human industry but also natural terrestrial ecosystems.
This situation necessitates the prediction of potential flooding sources, for which Digital Elevation Models (DEMs) are appropriate. Specifically, such models make it possible to evaluate complex geographical forms of catchments by studying river systems, stream headings, and basin-specific land use, soil, and climate information (Siart et al., 2009). This means that, according to Wang et al. (2011), GIS is widely used for analytical work on the quantitative and subjective effects of floods and runoff. Moreover, the use of spatial models addresses the challenges of predicting and investigating sources of coastal flooding (Sarker and Sivertun, 2011; Zerger and Wealands, 2004). Finally, in addition, it should be recognized that the combination of traditional GIS forms with LiDAR technologies can modify practical approaches and improve procedures for obtaining and processing results (Merwade et al., 2008; Gallegos et al., 2009).
Hydrologic Models
Technical solutions are essential to critically assess the potential for inundation, especially in coastal areas. Until recently, the range of technologies used was limited to hydrological models, such as the reproductive model. This model has been particularly relevant in the context of the effects of climate change: in other words, it has made it possible to identify problems in the early days to support complex management decisions (Messner and Meyer, 2006). However, according to Djokic and Maidment (1993), today’s world dictates new rules and the GIS environment can be an excellent alternative for obtaining more in-depth knowledge. Therefore, finding solutions to integrate hydrological models and spatial GIS technologies is a priority. It is known that both sides benefit from such a compromise, so implementation should be initiated as soon as possible (Xu et al., 2001).
The objective of this study is to develop this problem by creating an integrated GIS and LiDAR communication model as a unique solution for determining surface runoff and estimating the risk from inundation in built-up coastal areas for CoW. It should be recognized that this model is unique to modern technology, as a review of existing literature does not allow for highlighting earlier uses of this combination. For example, Pourali et al. (2014b) used conceptual hydrological models along with computer modeling of inundation-sensitive zones for CoW. Based on this, it seems clear that improved models will provide more valuable and complete information on the morphological structure of water basins. It should be further noted that this advanced methodology greatly simplifies the modeling of potential flood threats and provides a logical basis for flood flow estimation. In addition, the use of this combination extends the range of possible threat sources to atmospheric climate conditions, including precipitation.
Machine Learning Techniques
In modern science, there is an increasing trend towards the digitalization of classical research methods. Thus, the introduction of machine learning technologies has the potential to improve the practice of hydrological research on floods significantly. It should be recognized that floods often have sufficient destructive power to cause economic and health damage to settlements (Tehrany et al., 2019). Numerous attempts at analysis have been made to forecast and early warning of developing flooding. However, it is fair to say that because of the lack of knowledge about the factors and conditions of flooding, no one-size-fits-all solution has been developed. Nevertheless, machine learning methods can be used to qualitatively assess the spatial correlations between factors and their significance for mapping. Traditionally, such approaches include studies of texture, spectral and structural features of the region (Kuffer et al., 2016). Among the known learning models that have already found applications in the hydrological analysis are “Decision Tree” (DT) and “Support Vector Machine” (SVM). Although analysis of both pixel and object classifications using machine learning algorithms has already been performed for both DT and SVM, previous studies did not reveal any statistical difference between the two types of classification (Tehrany et al., 2014; Duque et al., 2017; Duro et al., 2012; Wieland et al., 2016).
An analysis of the applicability of existing solutions, both in terms of classical models and more innovative ones, was the initial point in determining the goals of scientific research. In particular, with the development of computer technology, it can be expected that the introduction of machine learning methods in hydrological practice will yield only positive results. There is an established viewpoint on this issue. The academic environment is characterized by an abundance of opinions that the use of innovative models produces comparatively better results than the traditional methods of research (Marjanović et al., 2011; Tehrany et al., 2015; Althuwaynee et al., 2016; Tien Bui et al., 2016c). The range of such work is not limited to hydrological analysis of coastal zones only. A large amount of thematic literature refers to the practical application of machine learning for flood, landslide, bushfires, and land subsidence prevention (Tehrany et al., 2013; Tien Bui et al., 2016b; Reid et al., 2015; Lee and Park, 2013). Moreover, a wide range of algorithmic machine learning methods is widely used for operational flood control. This includes already known CVM and DT, as well as artificial neural networks, genetic code programming, adaptive network-based fuzzy inference system, and some other methods (Kitsikoudis et al., 2015; Saito et al., 2009). Ultimately, this leads to the idea that the use of innovative computer technologies helps to simplify specialist work practices and provide more qualitative and reliable data on environmental threats.
However, it is worth acknowledging that it would be a mistake to assume that stand-alone methods alone could have been highly effective. In other words, neither classical methods nor the use of machine learning models alone will achieve efficiency in the context of flood studies. On the contrary, the integration of these ideas is of high value. Thus, numerous scientific works demonstrate high efficiency from the use of DT and SVM, along with well-known technologies (Pham et al., 2016; Tehrany et al., 2013). The disagreement between studies begins at the moment of discussion of which of the methods shows higher efficiency for hydrology: DT and SVM. In particular, Hong et al. (2015), Pradhan (2013), and Singh et al. (2009) found that the DT is more reliable than the SVM. In contrast, Marjanović et al. (2011), Wu et al. (2014), Tien Bui et al. (2012) found the opposite effect. Without the need for a more detailed discussion of the advantages and disadvantages of each method, it is essential to postulate that both DT and SVM have sufficient potential to improve flood modeling in hydrological expertise. For this reason, this study stops at examining the extent to which external and internal factors influence the procedure for mapping areas vulnerable to flooding.
As mentioned earlier, there is no doubt that GIS technology has enormous potential for early detection and prevention of flood consequences. However, as practice in recent years has shown, it is becoming increasingly important to find ways to harmoniously combine different technical methods and switch to spatial methods of assessment. Thus, this approach allows for solving two problems at once. First of all, using several methods reduces the possibility of errors and inaccuracies. On the other hand, the combination of techniques can significantly expand the range of the device.
For this reason, this paper explores the possibility of harmonic integration between the GIS hydrological model and machine learning methods. In other words, this study has a two-way orientation. First, it seeks to shed light on the use of GIS-based hydrological models. Secondly, it analyses differential machine learning tools and assesses their contribution to the hydrological model.
Based on the above, it should be recognized that CoW has recognized the need for urgent scientific project development to expand current hydrological directions. In particular, recognizing a deep gap in research, CoW has formed a plan to incorporate digital data from LiDAR into current hydrological spatial technologies (Baby et al., 2019). As can be seen, this is a very multidimensional and complex work, the implementation of which is possible in solving specific problems. Five objectives have been identified for the success of the project, the achievement of which is of paramount importance to the study.
First, the practice of mapping water flows should be used to make decisions about land use and the development of vacant areas.
Second, a flood and drainage management program must be developed to ensure that risks to human communities and natural ecosystems are reduced.
Third, it is essential to assess the available data in the CoW regarding government regulation of the problem, including the Local Government Spatial Strategy v2.2.
Fourth, it is necessary to assess existing technologies and IT barriers and constraints that hinder the functional development of projects for spatial information management in the Department of Environment and Primary Industry and other critical indicators (Rajabifard et al., 2002).
Finally, it is necessary to achieve the conditions under which machine learning methods can be used to assess spatial correlations between potential flood causes, as well as their level of significance for the vulnerability mapping procedure.
It seems evident that a significant challenge in the implementation of this project is to model the form then it would be possible to integrate different methods and provide relevant information. To solve this problem, an interface was created, the purpose of which was to provide information about the project to all interested parties. This approach allowed to lay the foundation for this study, as it supported GIS users in project groups for decision-making.
The overall objective of this study is to design a support system for decision-making in the context of flood management. It should be emphasized that this project is being implemented in Victoria, Australia. In particular, given the objective of the project, efforts were focused on finding a harmonious mechanism for integrating the methods described. Thus, a hydrological model based on GIS has been developed, in which machine learning elements have been introduced. This approach is expected to significantly support the implementation of floodplain management strategies based on an analysis of potential risks from floods.
As with any technology with the potential to become commonplace, this project requires preliminary tests to determine the degree of effectiveness in hydrological practice. For this purpose, the CoW has been selected as a pilot area where distributed hydrological models and methods for monitoring potential flood sources both in the road, and building areas will be assessed. The technical implementation of the study is based on the use of high-resolution digital data provided by LiDAR and crowdsourcing information. It should be admitted that the number of technological solutions is continuously growing, but two technologies have been chosen from the whole variety: ArcHydro and ArcSWAT. Thus, ArcHydro is an extension of the tool for modeling based on hydrological data, ArcGIS. On the other hand, ArcSWAT is an alternative tool included in the ArcGIS package aimed at soil and water assessment. It is fair to say that the second extension is slightly more comfortable to use for those who do not have enough knowledge and GIS systems, while ArcHydro is more suitable for professionals with experience: the program provides more technical features and can be customized to the user’s needs. Additionally, it should be noted that among all the variety of GIS systems, including, for example, QGIS QSWAT, were chosen ArcGIS tools, because they are more integrated with the software language Python and geodata.
It is worth noticing that the hydrological model proposed in this study, along with mechanisms for monitoring potential flood threats, will provide an excellent solution for experienced and inexperienced users. The results obtained by the project will inform future predictions of possible coastal inundation for all categories of users regardless of experience and knowledge (Hine et al., 2017). A user with hydrological expertise refers to professional GIS specialists, along with staff from design and modeling offices. As can be seen, the proposed integration solution will benefit them in developing visualization tools. On the contrary, users without sufficient experience are, as a rule, heads, and organizers of departments who use the results to promote their company.
The ArcSWAT and ArcHydro used in this project are similar in general but show different results in practice. The diversity of approaches to the analysis of hydrological systems makes it possible to assess which model is more appropriate for further research. For example, in field tests, it was found that the ArcSWAT provided more qualitative and useful information than the ArcHydro, especially for smaller basins such as the Darebin Creek. On the other hand, ArcHydro outperforms the competition in terms of ease of use and working volumes of data for large basins like Inverloch.
Finally, it should be stressed that built-in GIS-based hydrological models, integrated with machine learning methods, can realize many methodological possibilities. This means that, along with the technical advances of computer-based devices, the era of big data has arrived in hydrology, with many breakthrough discoveries. Automatic methods become the answer to the current needs of specialists. Therefore, this study gives an overview of the most important algorithms of machine learning, which have found application in thematic literature. The main idea of the work is that there is no single method that would be effective in a single-use. On the contrary, only a comprehensive approach that combines unique methods can guarantee effectiveness.
Managing people is one of the most complex activities that leaders have to undertake on a regular basis within an organizational setting. It takes time to recruit and train employees who can meet organizational goals within their departments. It is important to ensure that once such talents are recruited and trained, they should be retained and constantly motivated to improve their performance. In this study, the researcher focused on determining how machine learning and big data can be used to enhance the work of managers. The study relied on two main sources of data. Secondary data was obtained from books and relevant online sources.
Primary data was collected from sampled respondents. Primary data was analyzed both qualitatively and quantitatively to respond to the research questions. The primary and secondary data both demonstrate the importance of machine learning and big data in enhancing the role of managers in the modern business environment. Managers find themselves in situations where they have to process large data and come up with decisions within a limited time. Using the machines has made it easy to make such decisions and to predict the future with higher accuracy than before when the tasks were done by people. The study shows that although some managers are still reluctant when it comes to embracing this new approach to management, machine learning is a very promising area in the field of management that organizations cannot ignore.
Introduction
Machine learning is an emerging concept in the field of big data that is increasingly becoming important in management. According to Murphy (2012), the current business environment is very challenging. People in the management positions find themselves in very delicate positions where they have to deal with both internal and external forces. External forces such as market competition, changing customers’ demands, inflation, and government policies have a direct impact on a firm’s ability to achieve success. Internal forces relating to employee management, operational management, and policy implementation also need the attention of the managers.
As Raschka (2015) observes, management is becoming a more challenging occupation because some of the strategies that managers used traditionally may not be applicable in the current business environment. In the past, managers had a near absolute power when coordinating the work of the employees. The coercive approach to management was a popular and the employees had to follow the directives of the managers without questioning their rationale. However, that is no longer the case in the modern business environment. According to Marsland (2015), stiff competition in the market forces firms to ensure that they hire and retain top talents. Such top talents cannot be retained in a dictatorial regime if they can be hired at other firms. It forces managers to embrace consultative management approaches that involve active engagement of the employees. It means that managers must be very accommodative and understanding of their employees.
Machine learning is a promising field that may help in solving various management problems in the modern business environment. Alpaydin (2016) says that modern-day managers must understand how to embrace diversity in culture and views of their employees. Using machine learning, Bell (2015) says that it is possible to improve the performance of the employees through the creation of an environment that is highly accommodating. Instead of the manager struggling to understand specific needs of individual employees, technology can be used to capture and analyze these diverging needs. After analysis, the machines can help in the development of the most appropriate governance model that should be used.
The model must take into consideration the strategic goals of a firm and the manner in which the current workforce can be used to achieve these goals. Hackeling (2014) says that through such strategies, managers can find it easy to coordinate the activities of the employees. This technology can also be used in improving the skills and capabilities of the employees. Given the fact that such technologies can be used to compare the capabilities of individual employees with their skills, the analysis can help in determining the gap. It can also propose ways through which such gaps can be addressed to ensure that the skills of the employees are aligned with the job requirements. In this paper, the researcher seeks to analyze the concept of machine learning, and the manner in which, it could be used to improve employee engagement and ultimately the performance of an organization.
Rational for the Study
In the current competitive business environment, companies cannot afford to make constant mistakes. Achieving perfection is critical when it comes to managing challenges in the external environment. In such a highly disruptive external environment, managers still have to deal with very demanding employees who must be handled with care to ensure that they do not consider moving to other companies. Management role has become so complex that success is not guaranteed even among the most knowledgeable and experienced individuals. Alpaydin (2016) says that even in such a challenging environment, firms must find ways of achieving success. Machine learning promises to solve serious managerial problems that firms are currently facing. The need to understand the capabilities of individual employees accurately and their needs cannot be done appropriately by a manager. However, using information technology, it is now possible to understand areas of improvement that each employee needs.
The machines are becoming superior management tools with capabilities that are beyond that of a human being. It is true, as Camastra and Vinciarelli (2015) put it, that it may not be possible for the machines to replace people completely in the management role. However, it can make the work simple and more accurate in the current challenging business environment. Machine learning can enable the managers to understand their current environment and accurately predict the future in a way that would enhance the success of the organization. That is why this research is very important. It will explain how modern-day managers can increase their reliance on information technology to enhance their managerial functions. It will explain the new role of managers in an environment where technology has become a critical component of management.
Research Questions
When conducting research, it is important to focus on the goals that must be achieved by the end of the study. Coming up with clear research questions makes it possible to define the research focus. It acts as a constant reminder to the researcher about the data that should be collected and analyzed. It also eliminates cases where a researcher would collect massive amounts of data partially related or completely unrelated to the research topic. The following are the research questions that the study seeks to answer:
What are the main challenges that managers face in coordinating and controlling the activities of the employees?
How can companies use the concept of machine learning to improve employee engagement, performance, and motivation?
How can managers use machine learning to improve their performance?
Theory and Hypothesis
Background
Employees are very critical component of an organization. In fact, Hackeling (2014) says that employees are the wheels of an organization without which it may not achieve the desired goals and objectives. Once the top management has formulated strategic goals and objectives, the mid-managers will translate them into actionable goals to be assigned to the employees. Success of employees in meeting those goals defines how successful an organization would become. Unlike in the past when large corporations monopolized the market, Alpaydin (2016) says that today companies are struggling to manage stiff competition because of the liberalized global market. Customers understand that they have several options to choose from whenever they want to purchase a given product.
As clients become more demanding than they were before, firms find it necessary to improve their operational strategies. They have to ensure that they deliver improved quality products at lower costs than they previously did. They have to embrace efficiency to achieve sustainability in such challenging business environment. Successful firms have learnt how to empower their employees so that they can help in delivering the desired value (Camastra & Vinciarelli, 2015). Managers have to learn how to streamline skills of the employees with job requirement. These employees must also remain regularly motivated to improve their level of performance. It explains why the role of employee management has become a critical area that leaders must take seriously.
People management is not as easy as it used to be in the past. Marsland (2015) says that managers find it difficult maintaining their employees constantly motivated. The problem is caused in part by the diversity in most of the current workplace environment. In large organizations, which employs hundreds of employees, managers cannot give their workers personalized attention on a regular basis. It means that understanding their unique needs and capabilities is not easy. Factors that may be motivating to a section of the employees may not be the same as those preferable to the other section. Bell (2015) says that satisfying the generation Y and millennial employees is posing another serious challenge to the current managers. These employees are very demanding and do not hesitate to move from one organization to the other at the slightest provocation. It means that a firm may invest heavily on training an employee only for him or her to move to another organization with the new skills learned. Hackeling (2014) says that whenever a firm commits its resources to train an employee, it must be capable of retaining him or her for a considerable period to get the returns on the investment. The realities that modern-day managers face have made it necessary for them to look for how technology can be used to improve their performance.
According to Pentreath and Paunikar (2015), the concepts of big data, artificial intelligence, and machine learning have been in existence for several decades. However, more than before these concepts have become very relevant today. The challenges that managers face may not be addressed effectively without the assistance of technology. The decisions that they make have a significant impact on their organization. These managers cannot afford to make decisions that fail to deliver desired results. As such, their decisions must be based on facts. They need information that can help them understand a given pattern. They can then use such patterns to make a decision, having the knowledge of the possible consequences of their choices. Developments made in the field of information technology have improved data collection, processing, storing, and sharing whenever it is necessary.
Managers can now access relevant information that can inform their decisions and guide them whenever it is necessary to make choices. As Alpaydin (2016) explains, machines are not replacing managers. They are only taking over the role of decision-making when such decisions must be based on data that has to be processed and interrelated. The machines are reducing and in some cases, eliminating decisions made without understanding the possible outcome. It creates a scenario where all the possible outcomes for each strategic choice made are made clear before one of them is chosen. It makes it possible to know the most desirable option based on the organizational goals and the prevailing forces. Hackeling (2014) argues that such systems improve efficiency in decision-making, especially when one is presented with statistics that may influence the future of a firm.
Literature Review
In this section of the report, it is important to focus on reviewing what other scholars have found out in this topic. Machine learning is becoming a critical area of research as organizations try to come up with unique ways of achieving success. According to Sugiyama and Kawanabe (2012), the responsibility of managers is increasingly becoming complex in modern organizations, and it is necessary to find ways of making it more effective. Machine learning promises to solve most of the problems that managers face when it comes to making decisions about the future based on large data. Looking at what scholars have found out in this topic will help in identifying research gaps and coming up with hypotheses, which can be analyzed using primary data collected from the field.
Understanding the Concept of Big Data
According to Marsland (2015), big data refers to “a set of data that is so complex and voluminous that it cannot be dealt with using traditional data processing applications.” Data can be classified as such based on three premises, which are the volume, variety, and velocity (Christiano & Zhao, 2016). When the volume of data is very large, then it may not be possible to analyze it using traditional methods. For instance, a company Abu Dhabi National Oil (ADNOC) has over 25,000 employees whose activities must be monitored on a daily basis. Information about the daily performance of each of these employees is very voluminous and a manager using analogue systems cannot collect and analyze it in time to make a critical decision. In fact, it may not be easy to collect data from such employees manually. Collection of such data will need a digital system that is computerized to capture their performance based on the set goals and objectives. Such a manager will need assistance to summarize the daily and monthly performance of each of these employees in a digital format. It is only through that approach that it will be possible to have an accurate understanding of the performance of these employees. At that point, the manager will have a clear picture of the employees who are underperforming and a decision can be made on what needs to be done to address the weaknesses.
Variety is the second factor that defines how big a given data set is within an organization. Large multinational-corporations such as General Electric and Samsung that operates in numerous industries across the world receive a wide variety of data on a daily basis. In most of the cases, the top management unit may be needed to make decisions about the wide variety of data presented to them (Cleophas & Zwinderman, 2015). It may be information about employees, customers, shareholders, strategic partners, competitors, suppliers, government agencies, environmental watch groups, or any other relevant party. The management unit may have a short time to process and respond to the needs in each of these areas. Their decision may have a significant impact on the overall performance of the firm. It is not humanly possible to process such a variety of data and make decisions within that short period. Even if the top management engages their junior officers, serious mistakes cannot be avoided if it is done manually. The digital technology platform makes it possible for these managers to have such data processed within a short time (Zhang & Ma, 2012). By stating the possible choices that have to be made, the manager will be able to predict the consequence of each action plan. It makes it possible to know the most appropriate approach that should be taken within such a short period.
The third factor that makes a set of information be classified, as big data is velocity. The speed with which new information streams in and the urgency with which they have to be processed defines how the management must act (Dehmer & Basak, 2012). Large corporations such as ADNOC constantly have to deal with various issues in its many departments. For instance, customer requests that this firm must deal with comes in very high velocity. It is the responsibility of the sales department to ensure that information coming from the clients is processed and acted upon within the shortest time possible. The information may be about changes in the order, complaint about a previous order that needs to be addressed, cancellation of an order, creation of new order, existence of a new product in the market that is affecting sales of the company’s product, or any other relevant information.
The sales manager must process and act upon that set of information streaming in at high speed within the expected time. It is not possible, even when dealing with the most effective manager, to deliver the expected outcome without the help of technology. It requires the manager to use information technology to processes such data, and give necessary directives on how decisions should be made to address the concerns raised. Bell (2015) warns that when handling customer needs, efficiency and effectiveness are critical factors that cannot be ignored. The speed with which an issue raised by a customer is addressed determines whether they will remain loyal to the firm. Marsland (2015) says that it is not just about a speedy response to the issue raised. The response should be in line with the expectations of the customer for them to see that their needs are addressed.
According to Hassanien and Gaber (2017), big data is gaining popularity in predictive analytics. The market has become very unpredictable as emerging technologies continue to be disruptive in the normal operations of companies. When planning, Alpaydin (2016) says that a number of assumptions are always made about the future. These assumptions must be made because at the time of planning, it is not clear what shall become of the future environment within which a firm operates. The assumptions that a company makes defines the approaches and strategies that it will embrace. If wrong assumptions are made, it is likely that the strategies proposed may fail to yield the desired outcome. In such cases, sustainability of such an organization may be seriously compromised.
It means that managers cannot just afford to make assumptions without any proper basis. The assumptions must be based on a given pattern and must hold true for such a company to achieve success using the set strategies. It explains why companies have now embraced the approach of using data to make assumptions and decisions. Major corporations are now using information technology to process large data to come up with a prediction about the future. By critically analyzing the past and present, Hackeling (2014) says that it may be possible to predict the future. One must understand that most of these trends do not move in straight line. Sudden changes may occur that completely changes the direction expected. Using modern technology, such possible disruptions can be explained, and an alternative approach adopted to deal with it as it emerges.
Machine Learning in Management
According to Farrar and Worden (2012), sometimes artificial intelligence and machine learning are used interchangeably, which should not be the case. Artificial intelligence refers to the ability of machines to undertake tasks in a smart way, while machine learning is a concept that believes in giving machines data access s that they can learn from themselves. In machine learning, a greater freedom is granted to the machines in allowing them to undertake tasks that should be done by human beings. Marsland (2015) considers machine learning as a vehicle that is currently driving artificial intelligence so that machines can become more independent of human control and be able to deliver the expected outcome. Alpaydin (2016, p. 87) says, “rather than teaching computers everything they need to know about the world and how to carry out tasks, it might be possible to teach them to learn for themselves.” Machine learning emphasizes on the need not to teach computers everything that they need to do, but instead code them in a way that they would think and act like a human, and then offer them access to information that would inform their decisions (Sjardin et al., 2016). Scientists believe it is a better way of using technology. Machines have proven to be better than human in undertaking activities entrusted to them in terms of speed, accuracy, uniformity, and other qualitative and quantitative factors. It is believed that they can also be more effective in making decisions if they are allowed to learn and act independently without a significant influence by human.
According to Ivezić et al. (2014), human beings are not perfect when it comes to decision making. A manager may compromise success of an organization because of the desire to please a close friend or a relative. A wrong decision may be made deliberately because a leader is biased in a given manner. Corruption is another critical area that often influences wrong management decision. When a manager is corrupt, it is expected that his or her decisions cannot be for the good of the organization. They will serve personal interest at the expense of the organization and people who deserve better. Compassion is another factor that may impair the decision-making process of managers. Bell (2015) says that it comes a time when difficult choices have to be made.
Such a decision may involve eliminating a very popular employee within the firm because of underperformance of negative influence in the organization. As a human being, it is natural to rethink such decisions, and sometimes one may opt against them. However, Marsland (2015) says that the modern competitive business environment cannot afford weak minds. When hard choices have to be made, then the sooner they are made the better for the organization (Sra et al., 2012). Success in the market requires managers who are only focused on goals, not personal interest. The forces may be so unforgiving that even if the manager’s mistake was caused by misinformation or lack of it other than deliberate decision, the consequence can still be devastating. As Hassanien and Gaber (2017) say, management in the modern business environment requires perfection.
Given the fact that human beings cannot be perfect, machines come in as an alternative, meant to address weaknesses of people. According to Zhang and Ma (2012), when using machines to make decisions, every stakeholder will be aware that the set standards must be met strictly, because no one will be favored by the machines. If it is about performance of the employees, the machines will give accurate data on how each one has delivered based on the set goals. When such an organization has a system where underperformers have to be eliminated, it will be clear to everyone who should be sent home (Bell, 2015). Such systems eliminate blame games when an employee is laid off against their wish. Before embracing such system of people management, everyone should be informed about the new assessment approach and the standards expected of everyone.
At the end of the set period, every employee will see the performance record and the suggestions about their position in the firm made by the machine. It creates a sense of responsibility among the employees because they know that acts of favoritism do not exist. The machine also eliminates challenges associated with corruption, biasness or stereotyping, and mistakes caused by sympathy when it is not necessary. According to Marsland (2015), machine learning promises to address people management challenges explained in McGregor’s Theory X and Y. The problem with Theory X is that it is so pessimistic that managers tend to spend a lot of their managerial time looking for mistakes of the employees so that they can be punished. On the other hand, Theory Y is so optimistic that it ignores the need for the manager to monitor activities of the employees with keenness. The new concept will promote fairness in people management, where every individual is judged based on performance and capabilities.
Hypothesis
The information from secondary sources demonstrates that machine learning and big data offers a bright future to the field of management (Bell, 2015). Currently, people are reluctant to embrace this new concept because it is not properly understood. It is also yet to receive maximum support in many organizations because of the fear that it may replace people in the role of management. As Alpaydin (2016) says, power can be very addictive, and once one gains it, there is always the desire to retain it at all costs.
It is true that the machines will replace people in the decision making process if the concept of machine learning is fully embraced. It is also true that these machines will address human weaknesses such as corruption, favoritism, excessive compassion, and fear when it comes to making critical decisions. These weaknesses must be eliminated in the modern business environment, and that is why it may not be possible to avoid machine learning and artificial intelligence for long (Marsland, 2015). Firms that continue to resist this new concept may find themselves struggling to achieve success in the market. This information from the literature review has informed the formation of the following hypothesis.
H1. The use of machine learning and big data is a solution to improve company’s performance by monitoring, analyzing, and readjusting efforts for employee engagement.
Methods
In this section of the report, the focus is to explain the methods that were used in collecting and analyzing data from various sources. The researcher collected data from two main sources. Secondary sources of data were very important in providing background information and focus for the research. Secondary data came from books, journal articles, and reliable online sources. They formed the basis of literature review and informed the research hypothesis. Primary data was collected from sampled respondents and it helped in answering the research questions and confirming the hypothesis.
Research Design
The overall strategy that a researcher uses in a given study should be based on the goals that should be achieved in the project. According to Sugiyama et al (2012), choosing the right research strategy is critical in ensuring that the right data is collected and analyzed to inform conclusion and recommendations in the study. In this research, the primary goal was to determine how machine learning and big data could be used to improve company’s performance by monitoring, analyzing, and readjusting efforts for employee engagement. To achieve the set goal, the researcher chose to use both primary and secondary data. Primary data was collected from a sample of 100 respondents through survey as discussed below. Once the data was collected, a series of analysis was conducted to come up with findings to confirm the research hypothesis and to respond to the research questions. Descriptive statistics was considered important in explaining the relevance of machine learning in an organizational setting.
Data collected was analyzed mathematically to help bring out the views of the respondents. Excel spreadsheet was used to facilitate the statistical analysis. The findings from the analysis were presented in charts and graphs. Other than the statistical analysis, it was also important to conduct qualitative analysis in the research. Some of the respondents are in managerial positions, and they have experienced the relevance of machine learning in their organizations. A section of them feels that this new concept is very important and can improve their managerial responsibilities, while others are skeptical towards it. Capturing their varying views, through qualitative analysis was considered important. It helped in explaining why machine learning is yet to be widely embraced in the country despite its increasing relevance as a management tool.
Sample and Data Sources
Machine learning is an emerging concept in this country and it was important to collect data from respondents who have been affected by it, or have heard about it in their career. A sample of 100 participants was selected to take part in this study. The researcher used simple random sampling to identify the participants. The respondents were selected based on their willingness to be part of the study, their availability, and their knowledge of the concept of machine learning. After sampling the respondents, the researcher used online survey to collect data from them. It was considered an appropriate approach because of the limited time within which the research project had to be completed.
Procedure
As explained above, the researcher used online survey to collect data from the respondents. The first procedure was to contact the individuals believed to have the needed knowledge about machine learning. They were contacted on phone and the researcher explained their role in the study and significance of the project. They were informed that they had the liberty of withdrawing from the study in case they considered it necessary (Murphy, 2012). The researcher then e-mailed the questionnaires, which had been prepared prior to the data collection process, to them. They responded to the questions and then e-mailed back the filled questionnaire. The researcher then analyzed the data using both quantitative and descriptive statistics.
Measure
In this paper, the independent variable is the use of machine learning. The dependent variable is people management role in an organizational context. The focus is to measure the machine learning (the independent variable) can affect people management (dependent variable) within an organization. By using statistical methods, it will be possible to quantify the relationship. The researcher will be able to explain the level of relationship between these variables. If it is determined that there is a close relationship between these variables, then the study will have confirmed the set hypothesis.
Analysis and Results
Descriptive Statistics
In this section of the report, the researcher will provide descriptive statistics based on the information that was collected from the respondents. The three questions that were set will be answered in this section based on the primary data. The hypothesis will also be tested to determine if the respondents believe machine learning and big data can be used to improve company’s performance by monitoring, analyzing, and readjusting efforts for employee engagement.
What are the main challenges that managers face in coordinating and controlling the activities of the employees?
The first question focused on capturing the main challenges that managers are currently facing in their role of coordinating and controlling employees. In the literature review, it was noted that management in the modern-day organizational setting has changed significantly. Managers face unique challenges that require unique solutions. Figure 1 below identifies major challenges that managers have to deal with in the current organizational setting.
Majority of the respondents felt that favoritism is one of the biggest challenges that managers have to deal with in their organizations. It is common to find cases where a manager feels compelled to offer special treatment to a section of the employees because of various reasons. Sometimes the relationship between the manager and some employees that makes it difficult for the manager to make a decision that would be unfavorable to these employees. The decision and actions of such a manager will be compromised. Instead of issuing warnings or even punitive measures as would be expected when these employees fail to perform, they will prefer to ignore such issues for fear of ruining the relationship. Such actions may have a significant impact on the overall performance of an organization because other employees will be demoralized by these acts of favoritism. Compassion is a desirable attribute that every person should embrace to lead a fulfilling life, but it poses managerial challenges. Managing people is a very tasking role that may force one to make tough unpopular decision for the sake of the firm. When one is so compassionate that he or she cannot afford to lay off an employee, then such a person may not be a good manager.
The respondents also noted that limited access to information is another challenge that managers sometimes face in their respective organizations. For a manager to make a decision, he or she needs information that would guide them in such processes. With limited information, it is possible that the choices or decisions that such a manager would make will not be the most appropriate ones. Such mistakes may be very costly to an organization. Corruption was another factor that was identified in the study. Some respondents felt that when managers allow themselves to be bribed, they become compromised in their decision-making processes. They may ignore organizational values and its interests for personal gains. The respondents also noted that in some cases managers are affected by the pressure from their superiors. They may be forced to circumvent organizational rules and principles in favor of the interest of a section of the board of directions. Given the authority of these directors, the managers may have limited choice but to do as directed.
How can companies use the concept of machine learning to improve employee engagement, performance, and motivation?
The second question focused on determining how companies can use the concept of machine learning to improve employee engagement, performance, and motivation. The respondents were asked to state whether they believe machine learning can address the problem. Figure 2 below shows their response:
An overwhelming majority (70%) of the respondents agree with the statement that machine learning can improve employee engagement, performance, and motivation. It improves the process of collecting, analyzing, and sharing information that is needed for operational purposes. Employees’ performance will be improved if they are allowed access to relevant information that would guide their actions. It makes it possible for them to engage with the management in consultative debates through improved systems of communication. When machine learning eliminates weaknesses of management and promotes fairness within an organization, employees will be motivated. They will realize that their career growth and future at the firm will be based on their capabilities and effort other than favoritism and corruption. Their improved commitment to the firm will enhance the overall performance of the organization.
How can managers use machine learning to improve their performance?
The last question focused on determining how managers can use machine learning to improve their performance. The respondents were asked to state whether they believe machine learning can improve the performance of managers. Figure 3 below shows their response.
Majority of the respondents (90%) agree with the statement that machine learning can help improve the performance of the managers. They explained that using machine learning, it is possible to eliminate major weaknesses of employees associated with human nature such as greed and favoritism. It also improves accuracy in the decision-making process, especially if it involves predicting the future.
H1. The use of machine learning and big data is a solution to improve company’s performance by monitoring, analyzing, and readjusting efforts for employee engagement.
The researcher then focused on analyzing primary data to confirm or reject the hypothesis that had been set. To confirm the hypothesis, the researcher asked the respondents whether they believe that machine learning and big data can improve a company’s performance. Figure 4 below shows their response:
As shown in the figure above, the majority of the respondents (94%) believe that machine learning and big data can improve a company’s performance. They stated that machine learning could improve the process of monitoring, analyzing, and readjusting efforts for employee engagement. Their response means that the above hypothesis is confirmed.
Case Study: Machine Learning at General Electric
General Electric is one of the largest multinational corporations in the world, and it has its headquarters in Boston, Massachusetts (Alpaydin, 2016). The company has highly diversified its product portfolio, and as such, it operates in several industries. Managing hundreds of thousands of people and coordinating all operational activities at this company is a very complex process. The top management unit realized that the large size of the company and its diversification into many industries could not be a justification to underperform. As such, it has been working on ways of improving the performance of managers and that of the firm.
In early 2017, the chief executive officer of this company announced the commitment of GE’s top management to embrace digital transformation. It was announced that the management was keen on building a digital organization to improve the performance of the managers and that of the employees. One of the areas that the machine learning was initiated at that early stage was supplier and client data management. Instead of relying on traditional methods, the company opted to use machine learning and artificial intelligence to ensure that information from clients, suppliers are analyzed within the shortest time possible, and accuracy is maintained when making predictions. The move has significantly improved the performance of GE in these two departments (Raschka, 2015). The success registered at GE in its implementation of machine learning can be replicated in other organizations keen on embracing this new concept.
Discussion
The analysis of primary data and review of literature show that one of the most important areas that one cannot ignore when researching on machine learning is the concept of artificial intelligence. According to Murphy (2012), artificial intelligence refers intelligence that is displayed by machines. Under normal circumstances, intelligence is expected to be displayed by human beings and animals. These living things need some level of intelligence to monitor their environment, get food, and avoid harm. However, advancement in the field of information technology has created a platform where machines can also learn and display intelligence.
The technology of computer programming has made it possible to empower computers with problem-solving capabilities. It is expected that machines must receive instructions from people to undertake a given task. That has been the case for many years that machines have been used to ease the way of life of humankind. However, these machines have never been trusted with the ability to make independent decisions based on the forces in the environment. The fact that they are not living beings makes it impossible for them to make rational decisions. They lack compassion to act in a manner that a human being would (Raschka, 2015). However, advancements witnessed in artificial intelligence promises to change this. It is not possible for the machines to make decisions without getting instructions from people.
According to Hackeling (2014), one of the areas where machines have proven to be perfect in making independent decisions is on room temperature management such as the technology used in the fridge. The system can monitor itself, and ensure that a given temperature is not exceeded. Fully automated cars with the capacity of driving themselves on designated roads have been created. Such machines are empowered, through modern technology, to monitor the immediate environment, collect relevant data, process, and act upon it within the desirable time. Although such smart cars are yet to be commercialized, they point to a future where technology will be used to make decisions that could previously be made only by human beings. Artificial intelligence is also actively used in the control systems of the airplanes. The systems are designed to help pilots in making decisions. They get to warn them when they make wrong decisions and recommend the appropriate decisions based on the prevailing circumstances (Marsland, 2015). In the military, artificial intelligence is currently used actively to monitor actions of enemies and in determining the appropriate actions that should be taken. They help in eliminating cases where erratic decisions are made by the officers due to fear or misinformation.
The success of artificial intelligence in various fields means that it can also be applicable in the field of management. Managers are also facing various challenges in their respective organizations that may require super-brain to come up with an effective answer (Alpaydin, 2016). For instance, employees form a critical component of any organization. Many companies spend many resources to train and improve capabilities of their employees. After spending such resources, it would be necessary to retain all of them for as long as possible so that the organization can benefit from their skills (Murphy, 2012). However, sometimes it may be necessary to eliminate some employees even after spending resources to improve their performance.
Theoretical Implication
It is important to look at theoretical implications of this study. This primary and secondary data demonstrate that machine learning is a critical tool that cannot be ignored in the modern business environment. The study strongly suggests that some theories of management that have been used in the past may not be suitable anymore in the modern business environment (Raschka, 2015). For instance, McGregor’s Theory X and Y may not be suitable in the modern business context. The outcome of this study shows that Theory X is too pessimistic while Theory Y is too optimistic to be applicable in the business environment today. Both theories also tend to generalize employee’s performance, something that machine-learning concept strongly opposes.
Managerial Implications
This study strongly suggests an urgent need for a major shift in management approach in a modern organization. Firms cannot afford to make constant mistakes, especially in relation to employee management. Sometimes it may be necessary for a manager to make a difficult decision that may not be popular with the majority within an organization. A number of factors may determine the need to make such a painful decision, and a manager may need to put into consideration so many factors. Without the aid of artificial intelligence, a firm may end up eliminating an employee that might have become a star within the firm if specific issues were to be addressed (Alpaydin, 2016). For instance, such an employee may be having a problem with his or her family, which in turn affects productivity at work. Using artificial intelligence, a manager can develop a comprehensive analysis of each employee within a short time to determine whether he or she is worth keeping. It eliminates mistakes by presenting a detailed analysis of the key issues, which are of interest to the firm.
Limitations and Future Research
When conducting this research, one of the main limitations faced was the ability to collect data from managers in various top companies in the United Arab Emirates. To understand the current trend in the application of this modern technology in the country, it would be important to have a face-to-face interview with several managers of top companies so that they can explain the internal practices in their organizations, challenges they face, and the plans put in place to embrace machine learning in management. Future research should focus on how machine learning is being applied locally in companies such as ADNOC. Existing studies explain how it has been applied in the United States and European companies, but limited information exists about its application in the United Arab Emirates.
Conclusion
Machine learning is a concept that is increasingly gaining relevance in the modern organizational environment. Machines have been used for a long time to make work easy but in a controlled manner. In the past, machines had to rely on the decisions made by people who operate them. However, time has come when machines can be allowed to make decisions independently without any human influence. Machine learning is a concept where computers are coded to work as a human mind, allowed access to data that can facilitate learning, and enabled to make independent decisions. One of the areas that machine learning is gaining rapid relevance is in the field of management. Managers face numerous challenges in their normal duties.
Sometimes they have to make unpopular decisions for the sake of the company. At times, they make decisions with limited information, leading to mistakes. Managers also have personal weaknesses such as favoritism, greed, and nepotism that sometimes impair their judgment when making critical managerial decisions. It is important to find ways of addressing these challenges to enhance the performance of the managers in their respective organizations. Using machine learning makes it possible to manage people without being affected by the challenges made above. It also enhances accuracy when predicting the future. Machine learning enables managers to process large and complex data within a short time and with high levels of accuracy. The new trend cannot be ignored in the modern competitive business environment.
References
Alpaydin, E. (2016). Machine learning: The new AI. Cambridge, MA: MIT Press.
Bell, J. (2015). Machine learning: Hands-on for developers and technical professionals. Hoboken, NJ: John Wiley & Sons, Inc.
Camastra, F., & Vinciarelli, A. (2015). Machine learning for audio, image and video analysis: Theory and applications. London, UK: Springer.
Christiano, T., & Zhao, L. (2016). Machine learning in complex networks. Cham, Switzerland: Springer.
Cleophas, M., & Zwinderman, A. (2015). Machine learning in medicine: A complete overview. Cambridge, MA: MIT Press.
Dehmer, M., & Basak, S. C. (2012). Statistical and machine learning approaches for network analysis. Hoboken, N.J: Wiley.
Farrar, C. R., & Worden, K. (2012). Structural health monitoring: A machine learning perspective. West Sussex, U.K: Wiley.
Hackeling, G. (2014). Mastering machine learning with scikit-learn. Birmingham, UK: Packt Publishing.
Hassanien, A. E., & Gaber, T. (2017). Handbook of research on machine learning innovations and trends. New York, NY: Cengage.
Ivezić, Z., Connolly, A., Vanderplas, J. T., & Gray, A. (2014). Statistics, data mining, and machine learning in astronomy: A practical Python guide for the analysis of survey data. Princeton, N.J: Princeton University Press.
Marsland, S (2015). Machine learning: an algorithmic perspective, New York, NY: Taylor & Francis Group.
Murphy, K. P. (2012). Machine learning: A probabilistic perspective. Cambridge, MA: MIT Press.
Pentreath, N., & Paunikar, A. (2015). Machine learning with spark: Create scalable machine learning applications to power a modern data-driven business using Spark. Birmingham, UK: Packt Publishing.
Raschka, S. (2015). Python machine learning: Unlock deeper insights into machine learning with this vital guide to cutting-edge predictive analytics. Birmingham, UK: Packt Publishing.
Sjardin, B., Massaron, L., & Boschetti, A. (2016). Large-scale machine learning with python: Learn to build powerful machine learning models quickly and deploy large-scale predictive applications. Birmingham, UK: Packt Publishing Ltd.
Sra, S., Nowozin, S., & Wright, S. J. (2012). Optimization for machine learning. Cambridge, MA: MIT Press.
Sugiyama, M., & Kawanabe, M. (2012). Machine learning in non-stationary environments: Introduction to covariate shift adaptation. Cambridge, MA: MIT Press.
Sugiyama, M., Suzuki, T., & Kanamori, T. (2012). Density ratio estimation in machine learning. New York, NY: Cengage.
Zhang, C., & Ma, Y. (2012). Ensemble machine learning: Methods and applications. New York, NY: Springer.
The topic of machine learning and its role in the entire concept of managers’ work is essential today due to technological development. Various factories, manufacturers, and other businesses implement different innovations in their working processes to make them more accurate, fast, and productive (Choi, Jung & Noh 2015, p. 50). Nowadays, the profession of a manager is not as popular as it was several decades ago because machines and robots demonstrate better working results. The innovation outlined in the proposal will be examined at the meso-organizational level as the significant changes in managers’ occupations might be adopted by global companies that employ thousands of managers at the present moment. The method of using machine educational techniques is beneficial for many businesses that cannot afford to train their employees due to the possibility of a significant profit loss.
The problem of the proposed research is that some companies do not know whether it is better to invest their finances in informational technologies or it would be more advantageous to spend the same amount of money on their employees’ education. The purpose of the given study is to identify the role of information technologies and applications in modern production processes.
Theory and Hypothesis
The theoretical perspectives that will be used in the proposed study will discuss the question of information technologies’ impact on the management of professional activities and the world in general. The scholarly literature necessary for developing appropriate statements and thoughts for the given research will be taken only from reliable journals and books published and released by people who study the same topic for an extended period. Unfortunately, there are not many studies done in the area of machine learning in management as this approach to production processes is new, and still needs improvement (Jaiswal 2015, p. 74). One of the most important study questions is to determine whether managers might be deprived of their professional activities because of new technologies. The hypothesis is to evaluate whether new technologies and applications can replace people at the positions of managers.
Research Methodology
The primary research sampling will include approximately one thousand participants employed by an organization that requires particular management activities to be applied in the working process. The sample will be divided into two groups (five hundred people will work with applications, and other employees will be given recommendations by a professional manager). These people will be required to work in such conditions for three months. In the end, their productivity rates will be analyzed and compared with the help of such research instruments as questionnaires and observation.
Analysis of Results and Limitations
As it is mentioned above, the descriptive statistics data acquired during the research will be analyzed to support or refute its hypothesis. This method is appropriate as it is used by the majority of scholars with accurate study results and outcomes (Perles-Ribes et al. 2016, p. 693). However, some limitations might emerge due to the participants’ age differences. Some people might be more comfortable working with a manager as they are used to such a work organization.
Contributions and Outlets
Scholars who will do further researches in the sphere of machine learning might be interested in the results if this project. Various managers’ conferences would be the most appropriate outlet for sharing the study results. Also, professional journals that specialize in information technology might use the research data for one of their issues.
Citations/References and Exhibits
As it is mentioned above, only credible and scholarly sources will be cited in the study article. All of the references will be structured according to a certain citation style. Such exhibits as graphs and tables will help include particular statistical data.
Anticipated Challenges
Such challenges as gathering results and interviewing the study participants might be challenging due to a large sampling size. This issue will be addressed by a questionnaire that every member of the research will be asked to complete. According to the timeline, every employee will be required to work regular shifts during the first four weeks. The progress of both groups will be observed during the next four weeks. The participants will be asked to complete an interview to record precise results during the final month of the research.
Reference List
Choi, S, Jung, K & Noh, S 2015, ‘Virtual reality applications in manufacturing industries: past research, present findings, and future directions’, Concurrent Engineering, vol. 23, no. 1, pp. 40–63.
Jaiswal, S 2015, ‘Book review: common sense talent management: using strategic HR to improve company performance’, Vision: The Journal of Business Perspective, vol. 19, no. 1, pp. 73–75.
Perles-Ribes, J, Ramon-Rodriguez, A, Moreno-Izquierdo, L & Sevilla-Jeminez, M 2016, ‘Economic crises and market performance—a machine learning approach’, Tourism Economics, vol. 23, no. 3, pp. 692–696.
The purpose of this research paper is to conduct a review of concept drift with reference to machine learning. A concept is defined as a quantity that needs to be predicted where the concept is unstable and its changes over a certain period of time. Common types of concepts are weather patterns, customer preferences, temperature and behavioral changes. The underlying data distribution that is used in explaining concepts will also be subject to some changes as a result of the unstable nature of concepts. Such changes in the underlying data distribution cause the models built on old data to be inconsistent with the new concept’s data which will lead to the updating of the model. This creates a problem known as concept drift which complicates the task of learning the new model and the new data that makes up the concept (Tsymbal 1).
Machine learning in concept drifts involves the learning of a target that is shifting or data that has time changing data streams. It is also the learning of non-stationary environments that have unstable concepts to ensure that the approaches used in dealing with concept drift problems develop the final concept. Since the 1990s, various learning approaches have been developed and implemented to deal with the problem of concept drifts as these problems have become common in every concept. Such learning approaches include the AQ algorithm and the stagger concept which were developed in the 1990s to deal with the problem of concept drifts (Koronacki 23).
The discussion in this research paper will therefore be focused on the aspect of concept drifts and machine learning by examining these two concepts.
Machine Learning
Machine learning is a branch of artificial intelligence as it involves the use of cognitive science, probability theory, behavioral science and adaptive control disciplines to determine changing behaviors of certain concepts. The major focus of machine learning is to identify and learn complex behavioral patterns that precede concept changes so as to develop intelligent decisions that are based on data. Machine learning involves the use of human cognitive processes when performing data analysis and also collaborative approaches that exist between the machine and the user (Bishop 2).
There various types of machine learning that are used to determine the desired outcomes of algorithms. These include supervised learning, unsupervised learning, semi-supervised learning, transduction and reinforcement learning. Supervised learning is where machine learning converts inputs into outputs, unsupervised learning where the machine learning inputs are clustered, reinforcement learning where the machine algorithms are used to input observations and semi-supervised learning where the labeled and unlabeled examples of the target concept are used to generate an appropriate function. The other type of machine learning algorithm is referred to as transduction where the learning algorithm tries to predict new outputs that are based on the training inputs and outputs as well as the testing inputs (Bishop 3).
The main theory that is used in explaining machine learning is referred to as the computational learning theory where the learning theory is focused on the probabilistic performance bounds of the learning algorithm because the training sets are finite and uncertain in nature. This means that the computational learning theory will not provide any absolute results on the learning algorithms. Apart from performance algorithms, computational learning theory studies the complexities of time and the feasibility of machine learning given the unstable nature of concepts. Computations are usually considered as feasible concepts under the computational learning theory especially if they are studied under polynomial time (Yue et al 257).
There are various types of machine learning algorithms that exist which are used in machine learning activities. The most common types of machine learning algorithms include decision tree learning where the algorithm is examined through the use of decision trees that act as predictive models. Decision trees are used to draw up the observations of the algorithm to gain a general conclusion of the target item that is under consideration. The other machine-learning algorithm that is commonly used is referred to as the association rule-learning algorithm which involves discovering relationships and links between variables that exist in large databases. The neural network algorithm, which is a computational model, processes the information that exists in biological networks by using a connectivity approach to computation and simulation (Bishop 225).
Genetic programming is another learning algorithm that is used in machine learning. It deals with the determination of computer programs that can be used to perform user-defined tasks based on biological evolution. Genetic programming also deals with the specializing of genetic algorithms which enable the human user to become a computer program. Genetic programs are mostly used to optimize machine or computer programs by determining the program’s ability to perform a user-defined task. Bayesian networks are other commonly used machine learning algorithms and they are described as graphical models that involve the use of probabilities to represent random variables. This machine-learning algorithm is commonly used in determining the connection between the symptoms/signs of an illness and the illness itself. Once the symptoms have been determined, the Bayesian network can be used to determine the occurrence or presence of various diseases (Bishop 21).
Machine learning has a variety of uses in the modern and technological world. The most common applications include the development of processing activities for natural languages, the detection of credit card fraud, the development of syntactic pattern recognition technology and the medical diagnosis of various types of illnesses through the analysis of symptoms. Machine learning is also used in the analysis of the financial market as well as in the creation of brain and machine interfaces for radiographic equipment, in the classification of DNA sequences and properties, in software engineering processes and in the development of robot locomotion abilities. Machine learning is also used in structural health monitoring activities, bio-surveillance and also in speech and handwriting recognition (Mitchell 2).
Concept Drifts
Concept drifts as described before in the introduction section of the research paper are the problems that are caused by a change in the model that is used in examining the underlying data distribution of the concept. Concept drifts are also described as phenomenon that includes examples that might have legitimate labels at one time and illegitimate labels at another time. To explain this statement, Koronacki (26) uses the example of a cloud as a target concept where concept drifts occurs when the cloud changes its position, shape and size in the sky over a certain period of time. With regards to Bayesian decision theory, the transformations to the cloud equate to the changes that take place on the form of the prior target cloud (Koronacki 26). Concept drifts have become common occurrences in the real world especially when it comes to people’s changing preferences for products and services.
Concepts are subject to change over time which means that they are unstable in nature. Such changes in the underlying data distribution models make the task of learning especially machine learning more complicated. Learning also becomes difficult if there are changes in the hidden context of the target concept which leads to concept drifts. The problem of handling concept drifts usually arises when it comes to distinguishing between the true concept and noise. Some machine learning algorithms might overreact to noise, misinterpreting the noise to be a concept drift while other algorithms might react to noise by adjusting to the changes very slowly (Perner 236).
Most of the research that has been conducted on concept drifts has been theoretical in nature where assumptions have been drawn to determine the kinds of concept drifts that lead to the establishment of performance bounds. Researchers such as Helmbold and Long (Stanley 2) have established bounds that are based on the extent of the concept drift which can be tolerated by assuming a more permanent drift. The extent of a drift is defined as the probability of two successive concepts being irreconcilable in a random variable. Other researchers such as Freund and Mansour, Barve, Long and Bartlett established the necessary bounds in determining the rate of concept drifts by sampling the complexity of an algorithm to learn the structure of a repeating sequence of concept changes (Stanley 2).
There are various types of algorithms that are used to detect concept drifts and they have been divided into two categories which include the single learner based tracker that aims at selecting data that is relevant to learning the target concept otherwise referred to as the data combining approach and the ensemble approach to formulating and restructuring base learning. The data combining approach is described as a conventional way of dealing with concept drift problems through the use of time windows that are fixed over data streams. The time window uses the most recent data streams or batches that are used to construct the computational predictive model. The problem with this approach arises when a large size time window is unable to adapt quickly to the concept drift while a small size time window is unable to track a target concept that is stable or recurrent (Yeon et al 3).
The optimal size of the window in the data combining approach cannot therefore be set unless they type and degree of the concept drift has been determined in advance. Widmer and Kubat in their 1996 study of concept drift incorporated the use of the Window Adjustment Heuristic approach (WAH) in adaptively determining the size of the time window. Other researchers, Klinkenberg and Joachims proposed an algorithm in 2000 that would be used in tracking the concept drift through the use of a support vector machine (SVM) while the target concept was continuously changing. Such methods ensured that the size of the time window could be determined (Dries and Ruckert 235).
While the data combining approach is able to select a subset of past data that is related to the new information, it is unable to define the related data streams to the new information. This method is also unable to retain all or parts of the previous sets of data making it an inefficient approach to managing concept drift problems especially in machine learning. The ensemble approach on the other hand involves the use of an ensemble strategy that is used in learning changing environments. Ensemble approaches such as boosting; bagging and stacking have been known to produce more stable prediction models in static environments than the data combining approaches which incorporate single models (Yeon 4).
Ensemble approaches maintain a set of data descriptions and predictions that will be combined through the use of weighted voting so as to gain the most relevant description of the new data. The methods that have been used to conduct the weighted voting include STAGGER which maintains a set of concept descriptions that will be used to construct the best according to their relevance with the new data. Another method that is used in weighted voting is conceptual clustering where stable hidden contexts are identified by clustering instances of the new concept that are similar to the hidden context. When compared to data combining approaches, ensemble techniques have been more effective in determining concept drift problems than the data combining approaches and they are therefore more suitable in data streams and batches because they do not need to retain any previous data sets as with the data combining methods (Tsymbal 3).
Types of Concept Drift
The two most common types of concept drifts that might occur in the real world include sudden or instantaneous concept drifts such as when an individual graduates from an institution of higher learning to find himself or herself in a different environment that is full of monetary concerns and problems. Another example of a sudden concept drift is the changing preferences of consumers when they demand products or services that will meet their constantly changing needs. The other type of concept drift that exists in the real world is the gradual concept drift where a certain aspect changes over a gradual period of time such as car tires and factory equipment which might cause a gradual change in the production of outputs. Both the sudden and gradual concept drifts are referred to as real concept drifts (Tsymbal 2).
Other types of concept drifts include the virtual concept drift which is defined as the need to change the current model due to a change in the data distribution. The hidden changes that exist in a certain context might cause a change in the target concept which might in turn cause a change in the underlying data distribution of the concept. If the target concept was to remain the same, the underlying data distribution might change to reflect changes to the concept which might create a need to revise the current model that is used in explaining the concept. This creates a virtual concept drift that necessitates a change in the current model (Tsymbal 2).
The major difference between a virtual concept drift and a real concept drift is that virtual concepts might occur in cases of spam categorization while the real concept drifts might not be caused by spam categorizations. Virtual concept drifts ensure that the shifts in the concept have been properly represented in the current model that is used in explaining the underlying distribution data. Virtual concept drifts which are also known as sampling shifts help in determining the types of unwanted messages that remain the same over a long period of time (Tysmbal 2).
Detecting Changes in Concepts
To effectively deal with the problem of concept drifts, the changes that take place in concepts have to be suitably detected. The most common method that is used in detecting concept changes is information filtering where data streams are classified according to whether they are relevant or irrelevant to the target concept. The main purpose of information filtering is to reduce the information load presented to a user that might be of interest to them. Information filters are supposed to remove irrelevant information from the data streams to ensure that only the relevant information has been presented to the user. Because concepts are unstable and constantly changing, information filters that are used in unstable environments have to consider classification accuracy to ensure that the concept changes have been properly documented (Lanquillon and Renz 538).
Information filtering is an important approach in detecting the changes to a data stream of a concept drift because it classifies the problems of the drift that can be solved through the use of learning techniques such as the machine supervised learning techniques. The use of these techniques ensures that the learning of a given set of examples is possible and these examples once learned can be used to determine the new category of data streams. The use of machine supervised learning algorithms in dealing with classification problems has proved to be an important technique in detecting changes to data streams because it is based on important assumptions of the underlying data distribution where the old data is similar to the new data. The hidden context of the data streams changes as time continues to change and it also changes as new data on the concept continues to emerge. The supervised machine learning technique ensures that changes in data streams are suitable detected and the changes are adapted to suit the new data (Lanquillon and Renz 538).
Another method that can be used to detect the changes in a concept’s data stream is the Shewhart control chart which tests whether a single observation will detect any changes to the data stream. This approach assumes that the data streams have been divided into batches that are represented in chronological order. The value that is allocated to these batches is usually used to detect changes to the data streams by calculating each batch separately to determine whether any changes have taken place in the data stream. The Shewhart control chart ensures that changes can be detected by observing deviations in the data batches (Lanquillon and Renz 539).
Conclusion
This research paper has focused on the aspects of machine learning and also concept drifts. The concept of machine learning has been discussed with regards to the various types of machine learning processes, theoretical work on machine learning that exists as well as the applications of machine learning in various processes. Machine learning is commonly used in artificial intelligence activities as well as in the development of various types of technology that are used in the real world such as in the diagnosis of diseases. The discussion has also focused on the concept drifts by defining the term and also identifying the various types of methods that can be used in dealing concept drifts.
References
Bishop, Christopher, M. Pattern recognition and machine learning. New York: Springer Science, 2006. Print.
Dries, Anton and Ulrick Ruckert. Adaptive concept drift detection. n.d. Web.
Yeon, Kyupil, Moon Sup Song, Yongdai Kim, Hosik Choi and Cheolwoo Park. Model averaging via penalized regression for tracking concept drifts. 2010. Web.
Yue, Sun, Mao Guojun, and Liu Xu Liu Chunnian. Mining concept drifts from data streams based on multi-classifiers. Advanced Information Networking and Applications, Vol. 2, pp 257-263, 2007.
Bagging method improves the accuracy of the prediction by use of an aggregate predictor constructed from repeated bootstrap samples. According to Breiman, the aggregate predictor therefore is a better predictor than a single set predictor is (123). To obtain the aggregate predictor, , the replicate data sets, {L (B)}, are drawn from a distribution, L. The aggregate uses the average of the single predictors, ψ(x, L) to improve the accuracy of prediction especially for unstable procedures such as neural sets, regression trees and classification trees. However, bagging reduces the efficiency of stable procedures such as k-nearest neighbor method.
Bagging improves the accuracy when used with classification trees with moderate data sets such as heart and breast cancer data. In constructing the classification tree, the data set is randomly divided into test set, T and Learning set, L, which makes the classification tree, followed by the selection of the bootstrap sample, LB, using the original set, L, for pruning. This procedure is repeated fifty times to give tree classifiers and the errors of misclassification averaged to improve accuracy. For larger data sets, Statlog project, which groups classifiers by their average rank, increases the accuracy of prediction by decreasing the misclassification errors greatly. Bagging can also be used to improve the prediction accuracy of regression trees where a similar procedure is used to construct regression trees followed by averaging the errors generated by each repetition.
Bagging is effective in reducing the prediction errors when the single predictor, ψ(x, L) is highly variable. By use of numerical prediction, the mean square error of the aggregated predictor, ФA(x), is much lower than the mean square error averaged over the learning set, L. This means that bagging is effective in reducing the prediction errors. However, this scenario is only true for unstable data set. Another way to test the effectiveness of bagging in improving prediction accuracy is by classification. Classification predictors like the Bayes predictor give a near optimal correct-order prediction but aggregation improves its prediction to an optimal level. The learning set can also be used as test set to determine the effectiveness of bagging. The test set is randomly sampled from the same distribution from the original set, L. The optimal point of early stopping in neural sets is determined using the test set.
Bagging has some limitations when dealing with stable data as shown by linear regression involving variable selection. The linear regression predictor, is generated through forward entry of variables or through backward variable selection. In this case, small changes in the data causes significant change in the hence not a good subset predictor. Using simulated data, the most accurate predictor is found to be the one that predicts subset data most accurately. Bagging shows no substantial improvement when the subset predictor is near optimal. Linear regression is a stable procedure; however, the stability of this procedure decreases as the number of predictor variable used are reduced making the bagged predictors to produce a larger prediction error than the un-bagged predictors. This indicates an obvious limitation of bagging. For a stable procedure, bagging is not as accurate as with an unstable procedure. As the Residual Sum of Squares (m), which represents the prediction error decreases, instability increases to a point whereby the un-bagged predictor tends to be more accurate than bagged predictor .
From the period of our ancestors, man has been consistent in trying to improve the quality of his life (Burges 28). Such a direction has led to the development of basic tool forms; which were used by our ancestors for performing basic tasks such as digging and cutting (Burges 28). At this early stage, man was still utilizing a lot of physical energy while operating the early forms of machines that he had developed. With the discovery of other forms of energy like electricity and fuel energy, a stage was set for the development of a new generation of machines that required a very limited input of human energy (Burges 28). However, at this stage, there was still a wide scale monitoring of machine processes by man. To reduce the role of man in machine processes, a generation of automated machines was born. The main purpose of automation has been to reduce the role of man (mental and physical participation of man) in machine processes. A limitation of human participation in machine processes has been implemented through the incorporation of self monitoring mechanisms in machine systems. In this direction, there has been a need to develop machines that have human like aspects for the purposes of self learning and control. The field of “learning machines” has therefore been growing considerably in the past two decades. Such machines are capable of utilizing circumstances that they have encountered in the past to improve on their future efficiency (Burges 28). As it will be seen in this paper, such a system has numerous benefits that cannot be overlooked by machine designers. An important direction that has emerged in the design of learning machines is the aspect of multi-view learning. In multi-view learning, It is possible for a machine to view (understand) an input in a multi-dimensional manner (Burges 28). With the development of system management frameworks that utilize relational databases, some scientists have suggested the combination of multi-view learning with relational database for the purpose of increasing the capabilities of machine learning. In this literature review, a range of developments in the field of machine learning and multi-view learning in relational databases has been considered.
Machine learning can be understood as the process in which machines improve their capacities to function more effectively and efficiently in the future (Nisson 4). Such machines are therefore able to adjust their software programs and their general structure for the purposes of improving on their future performance (Winder 74). Such changes in the program and structure of machines are catalyzed by the environment in which the machines operate (Nisson 5). Machine learning is therefore an imitation of human intelligence where machines acquire some form of learning from their environment (Nisson 5). The environment of learning consists of a machine input, or a piece of information that a machine can respond to (Winder 74). Among the forms of learning that a machine can undergo includes a process of updating its database information dependant on the kind of inputs that it gets from its environment (Kroegel 16). The form of learning that has just been described above has inspired less interest from professionals in the machine learning field (Nisson 6). What is of more interest to professionals in machine learning are impressive learning processes such as it may occur when say a machine that is capable of recognizing the voice of some one is able to perform better; after recognizing repeated samples of speech from an individual. Therefore, we can think of machine learning as the process in which adjustments are implemented in the mechanism of machine actuators (Implementers of given instructions) that performs duties (Kroegel 16). Such a mechanism of a learning machine is usually embedded with a form of intelligence (Nisson 7). Examples of duties that are normally performed by intelligent machines include activities like the recognition of voices, sensing of parameters in the physical environment, predictive capacities among many others (Nisson 8).
Many benefits can be accrued from the process of machine learning (Blum 92). An obvious benefit that has an origin in machine learning includes a possible capacity for humans to comprehend how learning occurs in man; hence, finding an application by psychologists and educationalists among others (Blum 92). When it comes to the field of design and manufacture of machines, very important benefits can be accrued from machine learning (Blum 93). Any engineer that has specialized in machine design is aware of the challenge that he/she may face while trying to develop a concise relationship that maps inputs into predetermined outputs (Blum 94). Although we may therefore know about the outputs that we might get from a given sample of inputs, we may be unable to understand the function that will generate outputs for our system (Blum 94). One of the best ways that we can think of in solving the problem of understanding machine functions is to allow for a versatile system of “machine learning” to operate (Kroegel 20). By adopting the machine learning approach, we are able to design a machine with an inherent system that can approximate for some inputs for the purposes of giving us forms of outputs that are useful to us (Blum 95). Moreover, we may not be able to understand and therefore design for a complex web of interrelationships that generate machine outputs (Blum 96). Adopting machine learning helps in resolving the challenge of understanding complex interrelationships between outputs and inputs while generating expected outputs for us (Nisson 7). It is also true that the projected environment in which a machine will operate on cannot be fully understood by a machine designer at the stage of designing a machine. Indeed, an environment in which computer embedded machines operate in is bound to change considerably with time (Nisson 8). Since it is not possible go design for each and every change that will occur, developing learning machines is obviously an excellent approach to undertake (Nisson 8).
In the process of developing and improving on machine learning, “machine learning” engineers have adopted several approaches in obtaining sources of information on machine learning (Widrow 273). Among the important information sources that are applicable in machine learning include statistics
(Widrow 273). Among the challenges that have been encountered in the statistical approach is a difficulty in determining data samples that should be adopted due to non uniformity in probability distributions in data samples (ReRaed 630). Such a problem has been extended to make it impossible for a determination of an output that is governed by an unknown function; therefore, making it impossible to map some points to their new positions (ReRaed 630). Machine learning itself has been adopted as an approach in resolving the problems that are encountered while dealing with statistical sources (Nisson 7).
Another approach that has been adopted in machine learning includes the use of what are commonly referred to as brain models (Nisson 8). Here, use is made of elements that have a complex relationship that is non- linear (Dzeroski 8). The non-linear elements that are employed in machine learning reside in networks that approximate those (real) that are inherent in the brain of humans-neural networks (Dzeroski 8). On the other hand, in the adaptive control approach, an attempt is made to implement a process that has no clear elements; therefore, necessitating a need of estimating these unknown elements for the process to complete (Nisson 12). In adaptive control approach, an attempt is made to determine how a system will behave despite the presence of unknown elements in the system (Dzeroski 8). The presence of unknown elements in a system is mostly inherent from unpredictable parameters that keep changing their values in some systems (Bollinger and Duffie, 1988). Other approaches that have been used in the study of machine learning include the use of psychological models, evolutionary models and the use of what is commonly known as artificial intelligence (Nisson 12).
There are two kinds of environments in which the process of machine learning can occur (Dzeroski 6). The first environmental setting of machine learning is commonly referred to as supervised learning (Dzeroski 6). Here, there is at least some form of knowledge on the kind of outputs that we expect from a given source of inputs (Dzeroski 6). Such knowledge is obtained from an understanding of a function that governs a sample of values in a set that contains the data that we wish to train (Nisson 13). We therefore estimate that we can obtain a relationship that governs the sample that contains a set of values that we wish to train; consequently making the outputs of a given function true to the training set (Nisson 14). A simplified example of supervised learning includes a process such as curve-fitting (Nisson 14).
Another kind of environmental setting in which machine learning can occur is unsupervised learning (Muggleton 52). Here, unlike in supervised learning, we only have a sset of data samples that we wish to train, but we don’t have a function that will map the inputs in the available set to specific outputs in a way that we can determine (Muggleton 52). A challenge that is therefore commonly encountered while handling this kind of trainings sets is a difficulty in subdividing the set into smaller sets so that we can understand the outputs (Muggleton 52). Interestingly, this kind of a challenge forms part of the machine learning process (Muggleton 52). Therefore, the value that is obtained from a given function is related to a specific subset that takes in certain inputs (input vector) (Muggleton 52). Unsupervised learning has found a lot of applicability in forming classification systems whereby classified data is understood in a more useful way (Muggleton 52). As it is normal, there are many instances where both supervised and unsupervised learning systems exist in parallel (in machine learning systems) (Muggleton 52). In designing a learning system, it is normally appropriate to try and improve an existing function (Muggleton 52). This type of learning is normally referred to as speed-up learning (Nisson 14).
It will be useful to consider a number of important parameters that are commonly used in machine learning (Nisson 14). Among the parameters that are used in machine learning is Input Vectors (Input Sets) (Nisson 14). An input vector may contain input elements that are of different natures (Nisson 14). Among the types of inputs that may be found in an input vector include the following: real numbers, discrete values and categorical values (Nisson 14). An example of a categorical type of input is information on the sex of a given person (Nisson 14). Such information can be represented as either male or female. Therefore, a given individual can have a representative input vector that is of the following format: (Male, Tall, History) (Nisson 14).
Another important parameter that is useful in the study of machine learning is the output parameter (Nisson 15). In some instances, an output can take the form of real numbers (Nisson 15). However, in other cases, the output of a learning machine may take the form of categorical values (Nisson 15). Here, the resultant output from a learning machine is used to classify the value of its output to a given category (Nisson 15). Such an output is known as a categorizer; consequently, such an output may represent a label, a decision, a category or a class (Nisson 15). An output that is in a vector format may include both categorical values and numbers (Nisson 15).
Another parameter that one needs to understand in machine learning is training regimes (Nisson 15). Normally, Learning machines contain a trainable set of data (Nisson 15). The batch method is one among other possible approaches that can be employed in training the data set (Nisson 15). Here, all the elements in the set are applied in implementing a given function at the same time (Nisson 15). On the other hand, the incremental approach allows the operation of a given function on each member of the set separately. As a result, all the elements that are contained in the trainable set are iterated through a given function in a one at time arrangement (Nisson 15). The incremental learning process can occur in a predetermined sequence, or randomly (Nisson 16). In a common arrangement that is known as an online process, operations are performed on elements depending on their availability (Nisson 16). Therefore, operations are performed on the elements that have updated their availability (Nisson 16). Such a system of operation is especially applicable when a preceding process inputs an oncoming process (Nisson 16). As in any other machine process, a machine learning process can be influenced by Noise. In one type of noise, the function that operates on the trainable set is impacted (Nisson 17). On the other hand, there is another type of noise that impacts on the elements that are contained in the input vector. For an efficient system of machine learning, it is important to evaluate the effectiveness of an implemented learning process (Nisson 17). A common approach that is utilized in evaluating supervised learning is to use a special comparison set that is generated for the purpose comparison (Nisson 17). Here, the outputs of the comparison set are compared with the outputs of the learning set in order to evaluate how effective the learning process has been (Nisson 17). Moreover, it is important to appreciate that for any learning activity to occur, an element of some form of bias is necessary (Nisson 18). For example, in machine learning, we may decide to restrict our functions to a small set of values (Nisson 18). We may also decide to restrict our functions to quadratic functions for the purpose of achieving the results that we desire (Nisson 18).
Dietterich Thomas has described machine learning as a study of diverse approaches that are employed in computer programming for the purposes of learning (Thomas 7). The purpose of machine learning is therefore to solve special tasks that cannot be solved by normal computer software (Thomas 7). There are several examples of complex tasks that cannot be solved by normal computer software (Thomas 7). For example, there is a need to determine machine breakdown in factories through the employment of systems that scan sensor outputs (Thomas 7). A learning machine is therefore able to learn how recorded sensor inputs have related with machine breakdowns; therefore, creating an accurate system that can predict machine breakdowns before they occur (Thomas 7).
In another way, we know that as much as human beings have some inherent skills such as an ability to recognize unique voices, they cannot really understand the step of processes that they have followed in employing their skills (Thomas 8). Such a reality has limited the capability of humans to employ their skills on some unique situations in a consistent manner (Thomas 8). By giving a learning machine some examples of sample inputs and corresponding outputs, a learning machine can take over to give us a set of consistent results in unique circumstances (Thomas 8). Moreover, some parameters in the environment of a machine will keep changing in a non predictable manner such that it is only wise to employ machine learning in such environments (Thomas 8). Still, it has been desirable to tailor computer applications to the specific need of an individual for effective functioning; hence, drawing a need of machine learning in the process (Thomas 8). Examples of areas with characteristics that have been described above where machine learning has found an array of applicability include statistical analysis, mining and psychology among others (Thomas 8). For example, when performing data mining, what we normally try to do by the help of learning machines is to collect important sets of data that are useful to us (Thomas 8).
The process of learning can be grouped into two categories: empirical and analytical learning. The distinct difference in analytical learning and empirical learning is that while empirical learning relies on some form of inputs from the external environment, analytical learning is non reliant on the external environment (Thomas 9). At times it may not be easy to distinguish between empirical learning and analytical learning (Thomas 9). Take something like file compression for example (Thomas 9). As it can be seen, such a process would involve both empirical and analytical learning (Thomas 9). Normally, the process of compression involves the removal of data that is repetitive or irrelevant in a file (Thomas 9). Such information can be retrieved from a kind of a dictionary when it is required again (Thomas 9). Such a process can only occur by studying the how sets of file systems are organized hence a kind of empirical learning (Thomas 9). On the other hand, the process of compressing and recompressing files is inbuilt; and therefore it does not require information from the external an environment; hence, a type of analytical learning (Thomas 9).
Multi-view learning with relational Database
Today, most systems that are used in data storage employ the use of relational databases (Guo 5). Here, it is possible to store information that is interrelated through the use of foreign keys (Guo 5). A challenge that has been encountered is the difficulty n storing mining information in the format of relational databases (Guo 5). Such a challenge has mostly arisen with the nature of mining approaches that employ the use of single dimension data (Guo 5). Examples of such approaches include the use of neural networks (Guo 5). A difficult task that is therefore presented in this kind of arrangement is the tedious effort of converting the multi dimensional relations that are inherent in mining data into a one dimensional format (Guo 5). To overcome this challenge, a number of applications such as Polka [14] have been developed to map mining data into a single dimension (Guo 6). One setback that has arisen from these converting applications is the loss of relational information in mining data (Guo 6). A considerable amount of information is therefore lost even as a data baggage is created (Guo 6).
An important approach that is emerging in the resolution of mining data problems is multi-view learning (Perlich 167). The approach of multi-view learning has been useful in tackling a range of issues in our world (Guo 6). Consider a multi-view data such as [4, 11, 14, 21] (Guo 6). We may retrieve this kind of information in the above data such that: the retrieval of data information [4], the recognition of voice [11], signature identification [21] (Guo 6). It is therefore possible to ingrain the idea of relational database in multi-view learning. Here (Multi-view learning), it is possible to obtain a specific view that is desired depending on a set of unique features that is present is a training set, say [14] (Perlich 168). It is therefore possible to learn diverse concepts from each of the views present in multi-view data (Perlich 168). Following this process, all the concepts that have been learned are then combined to form the learning process (Perlich 168). To understand multi-view learning, consider a system that may be employed to group emails dependant on their contents and subject (Perlich 168). While one system will learn to classify emails depending on their subject, another system will learn to classify emails dependant on their content (Thomas 5). Finally, learned concepts of the content learner and the subject learner are combined to perform the final classification of emails (Thomas 5). Therefore, for a multi-view system with n views, It is possible to obtain n related relationships that can be employed in multi-relational learning (Perlich 168). For the application of multi-relational learning in mining, we are able to obtain some patterns from multidimensional relationships (Perlich 168). For each of the relationships, there is some specific information that is learned (Perlich 168).
Let us consider another kind of a problem where we need to identify whether a banking customer is a good or not (Guo 10). Form the bank’s database; we can obtain relational data about the customer (Guo 10). We can for example obtain relational data about the name of the customer from the client relation, credit card details from the account relation and so on; thus, determine whether the customer is good or not (Guo 10). An important thing to note here is that for each of the database relations, there are diverse views on the customer on whether he/she is good or not; therefore, contributing to the final concept of information that will be learned about the customer (Guo 10).
In multi-dimensional learning, a set of instructions in the form of multi-view classification (MVC) is employed (Guo 10). The purpose of multi-view classification is to use the framework of multi-view learning so as go carry out processes on the data of multi-relation database in data mining (Guo 10). The description of the multi-relational process can be understood as follows. In the relational database, there are identifiers for each of the characteristic found therein (Guo 10). These characteristics are linked to other dependent characteristics by the use of foreign keys (Guo 10). Once the above process has completed, the second stage involves attributing specific functions to each of the characteristic that has been linked to a specific identifier through a foreign key linkage (Guo 10). Such a direction is helpful in handling each of the many inter dependent characteristics that are present in concerned data (Guo 10). The next stage involves using each of the foreign assigned characteristics as an input to a unique multi-view learner (Guo 10). In the next process, normal data mining approaches are applied such that they obtain each of the intended concepts available from each of the present data views (Guo 10). The above process precedes the final stage where learners used in the development of a useful model that contains the needed information (Guo 10). Therefore, a MVC method that works in a framework of multi-view learning can be used to incorporate normal mining data in a relational-database (Guo 10).
Having described the above process in brief, let us consider important concepts that have been employed in the above process (Pierce 1). It would be useful to start by understanding relational databases (Pierce 1). In a relational database arrangement, there are sets of various tables represented as follows [T1, T2, T3……](Pierce 1). There is also a set that represents interrelations between the tables (Pierce 1). In each set, like in a normal database management system (Pierce 1), each table has at least one unique key called the primary key (Guo 16). This primary key represents a unique attribute that is common to all elements in a given table (Pierce 3). Other attributes that can be found in a table apart from those that have been underlain in a primary key include descriptive attributes and foreign attributes (Pierce 3). Foreign attributes are used to link table elements to attributes that are present in other tables (Nisson 23). Tables in a relational database are therefore linked by the aid of foreign keys (Pierce 3).
Having understood relational databases, let us now move on to understand the process of relational classification (Pierce 4). An important approach that has been employed in machine learning is the classification of activities for the purpose of effecting targeted learning (Pierce 4). For example, consider a situation in which we intend to obtain a unique relation (U) in a given database (Pierce 4). Let us also say our unique relation (U) has also a unique variable (Y) (Pierce 4). Here, the purpose of implementing relational classification would be to obtain a function F that would give an output from each of the elements in a given table (Pierce 4). The relationship that has be described above can be represented in the function below:
F = Ptarget.key + Y + A(Pk) – Akey(Pk)……………………………………………..(i)
Akey(Pk) are the key elements of table Pk.
We can therefore go ahead and analize the process of relational classification as it has been described above (Pierce 4). The figures below (In figure 1) represent table interrelationships and can help us to understand the process of relational classification (Guo 17). Looking the target table called loan table, attributes therein include account-id, status, among others. The important row in this table that will be targeted is loan-id (Guo 17). The intended concept for learning is the status. We can see that the target table has been linked to other foreign tables including the order table (Guo 17). It is from the order table that we wish to create training views (Guo 18).
Looking at how the arrow has been directed between the target table and the order table below, we can see that the account-id element has been linked through a foreign key (Quinlan 19). Each of the elements that are linked to the target table through the account-id will therefore consist of the loan-id (it is a primary key and therefore inherent in all fields in a table) and the status element (Intended concept of learning) (Quinlan 19). In addition to the above fields, the account-id would also consist of all the other fields (account-id, to-bank, to-account, amount and type) that are present in the order table with the exception of the order-id field (Quinlan 19). In the algorithm of SQL, performing the operations that have been mentioned above would consist of the following (Quinlan 19). One would be required to create a table object with the mentioned parameters from the loan table and the order table (Quinlan 19). The determining condition for the creation of the objects would be limited to situation where the account-id from the order table is equivalent to the corresponding account-id from the loan table (Quinlan 19).
Let us now describe the process of multi-view learning again for a better understanding. For each of the learners that are present in a multi-view learning environment, the learner is given a group of data for the training purpose. To understand how this arrangement applies to multi-relational classification, we need to consider an intended concept for learning; which is contained in the targeted table (Quinlan 19). The first thing that occurs in a multi-view environment is the relay of the intended learning concept to all other relations that have been linked to the targeted table through foreign keys (Quinlan 19). All the elements that are required by the implementing function from the target table will also be transferred to the other tables that have been linked to the target table through the aid of foreign keys (Quinlan 19).
Let us now consider another kind of a situation in the tale above (Pierce 10). The target table remains as the Loan table (Pierce 10). On the other hand, the intended training data is obtained in a different way from the previous example (Pierce 10). From the client table, intended data for training is no directional foreign keys linking it with the target table (Pierce 10). What the client table has done is to link with the disposition table through the element of client-id (Pierce 10). On the other hand, the disposition table has linked to the target table through the aid of the account-id element (Pierce 10). With multi-view relationship in mind, this arrangement can be described as follows (Pierce 10). Basically, elements with client-id from the client table have been connected to elements with client-id in the disposition table (Pierce 10). On the other scenario, elements with elements with account-id in the disposition table have been linked with their counterparts with the same id in the target table (Pierce 10). Intended data for training will therefore consist of two elements (birthday and gender) from the client table in the first place (Pierce 10). The other two elements (loan-id and status) are obtained from the target table (Pierce 10). In an SQL algorithm the above process can be implemented in the following way. First, we create an object of four elements from the client, disposition and client tables (Pierce 10). We then set two preconditions that will act as a threshold in the formation of the table object (Pierce 10). First the account-id has from the disposition table needs to be equivalent to its corresponding account-id in the target table (Pierce 10). Likewise, the client-id from the client table needs to correspond to its counterpart in the disposition table (Pierce 10).
In more difficult data applications such as those that are encountered in data mining, there is a complex web of interrelationships between tables (Data) (Rijsbergen 42). Some of these relationships can be broken down to one table with many arrows linking to it, while others can be broken down to many interconnecting links (Rijsbergen 42). What we have examined above is a simple case of many connections linking to one table through the aid of the primary key (Rijsbergen 42). It is possible to obtain a range of outputs from this kind of interconnection (Rijsbergen 42). A difficulty is therefore presented in identifying the correct output (Rijsbergen 42). An approach that has been undertaken to resolve the above challenge is to employ the use of aggregation functions in a MVC setting (Rijsbergen 42). What an aggregation function does is to unify all related outputs into a single output. Therefore, an aggregation function acts like a summary of properties that have been presented in a range of outputs in a single output format (Nisson 22). In unifying a range of outputs into a single output, an aggregation function employs the use of the primary key that is present in the target table (Rijsbergen 42). Each of every table that is formed afresh is acted upon by the aggregation function to summarize its related properties in a single output format (Rijsbergen 42). The resultant output is what is employed for multi-view training (Rijsbergen 42). All the resultant multi-view outputs are employed to train a corresponding number of multi-view learners (Rijsbergen 42). Examples of aggregation functions that are commonly used on data include COUNT, MAX and MIN (Rijsbergen 42).
Since it is the MVC algorithm that is used in linking multi-view learning with relational databases, it is important to evaluate the working of MVC algorithm (Guo 32). The approach that has been presented has therefore been intended to allow the framework of multi-view learning input data from relational databases (Guo 32). There steps are normally involved in a typical MVC algorithm (Guo 32). First, a group of training data is obtained from a relational database (Guo 32). Here, just as we had seen in relational classification, groups of data for multi-view training are obtained from a relational database source (Guo 32). Such a process is normally implemented by the aid of foreign key connections (Guo 32). Therefore, elements in the target table are associated with elements in other tables through foreign keys (Guo 32). Moreover, aggregation functions are applied to unify a range of related outputs in the many to one relationship (Guo 32). Secondly, multi-view learners are set in motion to ingrain the intended concept in every of the resultant data groups (Guo 32). Consequently, trained learners that have been created in the second step are employed in creating an information model with useful knowledge(Guo 32).
The above is a summary of the three important steps in the implementation of MVC algorithm (Srinivasan 300). Let us now evaluate the above steps in more detail. In the first step, we intend to create a group of data that will be used for training purposes (Srinivasan 300). This group of training data is obtainable from relational databases (Srinivasan 300). A group of multi-view data for training is created for each element in the target table based on relationships from other tables (Srinivasan 300). Since it is paramount to provide sufficient information for each of the multi-view learners, the above approach of relating all elements in the target table to information from other tables is important (Srinivasan 300). Once this process has completed, aggregated functions are then employed to solve the problem of one to many relations that may exist between other tables and the target table (Srinivasan 300). For example, in figure one; there are about seven associations with the target table from other tables (Nisson 17). In this kind of a scenario, the MVC will develop eight groups of training data in a multi-view format (Srinivasan 300). Here, one of the training data groups will be created for the Loan table while the rest will come from other tables (Srinivasan 300). Aggregated functions will therefore present a kind of a summary from all of the multi-view data (for data training) (Sav 1099). As aggregated functions act on the multi-view relationships, some elements from the tables are unified to create training data groups (Sav 1099). In the end, it is the elements that have direct and indirect associations through foreign keys from other tables to the target table that are selected (Sav 1099). Therefore, it often happens that once aggregated functions have acted on relationship data, the numbers of data training groups are decreased (Sav 1099).
The second step that is implemented in a typical MVC algorithm is the creation of learners to learn from the trainers that have been formed in the previous stage (Sav 1099). The learning process is therefore started here with an emphasis on the intended concept for learning (Vens 124). It is important to understand that each of the learners will form a unique theory from a group of training data (Vens 124). Therefore, a range of perspectives from different learners is given; hence, allowing for a system of checking and unifying these diverse perspectives from learners in the final step (Vens 124). In the final step of the MVC algorithm, we have a final learner (meta-learner) that gets inputs from a range of learners from the previous step (Nisson 30). However, before the perspectives of learners are used by the meta-learner, they first undergo a validation process (King 337). Here, a system is used to check for the accuracy of the perspective that has been presented by each of the learner (King 337). If a learner is found to have an error that has surpassed the 50% percent mark, such a learner is ignored (Nisson 28). Therefore, the performance of the learners is thus evaluated to ensure that the ongoing information to the meta-learner is accurate (King 337). Once the perspectives of the learners have been evaluated, validated perspectives from the learners are fed to the meta-learner (King 337). The work of the meta-learner is to unify all the perspectives from the learners for the purpose of creating a useful model of information (King 337). Each of the perspectives that are presented to the meta-learner by a learner consists of a unique judgment in predicting an output (King 337). Let us consider the eight tables in figure one. Our task is to find out the truth about the following situations on the condition (status) of a loan: whether a loan is good and unfinished, good and finished, bad and unfinished, bad but finished (Guo 14). In our target table, there are over six hundred possible records that contain an attribute that indicates the condition (status) of a loan (Guo 32). As it has been indicated in figure 1, all of the tables have some form of association with the target table (Guo 32). Such a relationship has been underlain by direct and non direct foreign key relationships (Viktor 45). Therefore, all the other tables have stored some form of information about the target table (Viktor 45). We can therefore consider three learning activities that can be undertaken here. First of all, we need to find out whether a loan is bad or good based on about 234 possible finished conditions (Guo 32). Secondly, we also need to find out whether a loan is either bad or good based on about 682 possible records irrespective of the finished status (Guo 32). Finally, we employ the use of the transaction table to remove some positive relationships in the target table in order to balance and enhance the learning process (Guo 32).
One of the boulder challenges that have emerged in the study of machine learning is the difficulty of labeling data for the purposes of training (Muslea 2). Such a process is time consuming, tiring and may also result in inaccuracies (Muslea 2). Muslea has argued that it is desirable and possible to reduce and/or eliminate the task of data labeling in machine learning applications (Muslea 2). In multi-view learning, it is possible for different views of a learning machine (of multi-view learning type) to perceive a targeted concept in isolation (Muslea 2). For example, by applying the use of either an infra-red sensor or a sonar sensor, it is possible for a robot to navigate around an approaching obstacle (Muslea 2).
An important approach that is desirable in avoiding the use of data labeling for learning machines is Co-testing (Muslea 4). In Co-testing, focus is placed on the usefulness of mistake learning (Muslea 4). In a situation where views present a range of conflicting outputs, the false view will automatically present mistakes in the system (Muslea 4). Therefore, the system learns to adopt the correct label for targeted concepts by referring to the database of mistakes that it has made (Witten 13). Through Co-testing, machine learning has moved towards the lane of active learning.
The active learning process in Co-testing is presented in the following way. At the initial point, the system has an inbuilt array of a few instances where it can infer to label a targeted concept in each of its views (Witten 13). For a situation where, its views output a non expected outcome, a user inputs a new label for that concept label (Muslea 17). Once this has been achieved, the co-testing system will automatically entrench the new labels into it database for use in identifying and labeling other views in the future (Muslea 17). What transpires in this process is that for instances where learning views of a learning machine predict conflicting labels for a targeted concept (Yin 108), then, it is true that one or more of the machine learners has made a mistake in interpreting the learned concept (Muslea 17). Therefore, the task of identifying the label is taken to a user for identification (Muslea 17). Once a user identifies and authenticates a targeted concept, then, the view that had erred in identifying the label is provided with correctional information (Muslea 17). However, it is important to note that foreign parameters that are non desirable in a learning setting such as noise are capable of influencing the learning process (Yin 108).
It is possible to extend the use of Co-testing so that the system can be able to update its database without referring to the user (Wolpert 244); hence, learn to identify new labels automatically (Muslea 17). Such a type of a system that is able to combine both automatic learning and human intervened learning known as a Co-EMT system (Muslea 17). For a simplified understanding of the Co-EMT system, we can understand the system as composition of the Co-testing system and the Co-EM system (Muslea 17). In the first place, the system applies an approach whereby unknown labels are identified in accordance with how other views in the system understand the concept (Wolpert 244). Thereafter, it updates its learning database in each of its views through training on the understanding of the concerned concept from the other views (Muslea 17). An obvious advantage that is presented to the counterpart of the CO-EM system in a Co-EMT arrangement is that the Co-EM is now able to identify unique data that has been encountered by the system unlike the previous arrangement whereby it identified labels in a non predictable manner (Muslea 17).
As in any other arrangement of multi-view learning, one cannot escape the stage of data validation as a way of checking on authenticity for the purposes of minimizing errors; hence, form a model that would provide useful information for the system and its users (Watkins 54). Considering that a Co-Testing system consists of many targeted concepts that are not labeled, a more difficult challenge is presented at the stage of data validation (if one was to employ the normal approach that is employed in data evaluation) (Watkins 54). In noting that most validation procedures employ the use of positive and inverse learning for a given view, and that such a process is not required at all times, Muslea has suggested a new system that can be used to authenticate data (Muslea 17). Muslea considered unique instances whereby positive evaluations produced low accuracies as inverse evaluations produced high accuracies on a similar piece of multi-view data (Muslea 17).
In this direction, Muslea has suggested a new approach that can be used in the authentication of data: Adaptive view validation (Muslea 17). Basically, this type of a system is a kind of a meta-learner that incorporates previous circumstances and experiences in authenticating data to perform a data authentication task (Muslea 17). The inherent experience of the meta-learner is employed for one important purpose: to decide whether it is useful to incorporate available views from learners for effective learners (Muslea 17). Thus, a number of views from learners are considered as unnecessary or inadequate for the learning process (Muslea 17).
Conclusion
Due to its obvious benefits, machine learning has progressively incorporated into machine and manufacturing systems. In fact, forms of machine learning have existed for a long period of time before the birth of experts in machine learning. The current trend is not just to design machines with human like intelligence; the scope is wider as to design machines that can learn in a way that man has been unable to (Guo 22). As more and more knowledge on machine learning has built, an array of approaches in the design of learning machines has emerged. Among the approaches that have been utilized in the learning process of machines include the use of the statistical approach, the use of brain models, the use of neural networks and the use of adaptive control mechanisms among other approaches. With multi-view learning whereby machines have been installed with a capacity of learning from an array of different possibilities, the multi-view approach has especially been helpful in producing accurate outputs. Here, an array of outputs from a single input is possible because of the different views that are incorporated in learning machines. Since, it is possible to produce a single output from all the views by the use of approaches such as the method of aggregated functions, obtained outputs from learning machines are likely to be accurate(Guo 22). Also, since the design of learning machines has been partly driven by a need to supplement for human limitations, it is useful to use learning systems in analyzing complex data types such as those that are found in data mining (Guo 22). In this direction, an approach that has been laden with a lot of promise is the marriage of multi-view learning and relational databases (Guo 5). Such an approach has presented numerous benefits to the process of machine learning. In one way, many people are already familiar with the system of relational databases and therefore, adopting relational database in multi-view learning presents a simplified approach in machine learning. Moreover, the use of MVC algorithm and aggregated functions has presented an opportunity of obtaining useful models from complex data types such like mining data (Guo 22). It is interesting to note that there has been a consistent array of new developments in machine learning. Among the significant directions that have originated in machine learning is the use of Co-testing approach in understanding labels (Muslea 2). It is also interesting to make an observation that we may eliminate non useful views during data analysis for the development of models (Muslea 10). It can therefore be expected that more effective and fruitful tools will continue to emerge in machine learning for the creation of more efficient learning machines.
References
Blum, Mitchell Combining Labeled and Unlabeled Data with Co-training. New York: McMillan, 1997. Print.
Burges, Cole “A Tutorial on Support Vector Machines for Pattern Recognition.” Data Mining and Knowledge Discovery 2.1, 1994: 121-168. Print
ReRaed, Muggleton. “Inductive logic programming: Theory and methods.” Dzeroski, Simon. “Multi-relational Data Mining: An Introduction” ACM SIGKDD Explorations, 5(1), 2003: 1-16. Print
The Journal of Logic Programming 20.5, 1994: 629-680. Print
Guo, Hongyu. Mining Relational Databases with Multi-view Learning. Ottawa: University of Ottawa, 2003. Print
King, Camacho. “Proceedings of the Fourteenth International Conference on Inductive Logic Programming.” Springer-Verlag 13.4, 2004: 323-340. Print
Kroegel, Mark “Multi-Relational Learning, Text Mining, and Semi-Supervised Learning for Functional Genomics.” Machine Learning 57.8, 2004: 61-68. Print.
Muggleton, Feng. Efficient induction of Logic Program Tokyo: Ohmsma, 1993. Print.
Muslea, Alexandru. “Active Learning with Multiple Views” New York: University of Southern California Press, 1993. Print
Nisson, Nils. Introduction to Machine Learning London: McMillan, 1995. Print
Machine learning algorithms are very important in providing real value attributes. Other benefits derived from such machine algorithms include handling of missing values as well as those of symbolic approach. Algorithms with such attributes include K* among others. It refers to an instance-based learner that applies entropy in its distance measure. It also has the advantage of comparing favorably with other machine based learning algorithms. Classifying objects have been utilized over the years by all categories of researchers throughout the world. The task is very involving as some data become noisy and can as well have irrelevant attributes, which makes it difficult to learn from. To achieve this, several approaches and schemes have been tried, these include decision trees, rules and case based classifiers, among others. Real valued features have presented an enormous challenge to instance based algorithm. This was mainly because of inadequate information on theoretical background. K* uses distance measure to examine performance on different problems. There are two ways of Data mining; these are either through supervised or unsupervised methods. The former describes methods that identify target variables while the latter does not. In essence these algorithms identify structures as well as patterns in variables. The main methods utilized in data mining, which are unsupervised, include clustering and association rule, among others. However, it is important to note that most data mining methods, as explained above, are supervised, meaning that they have targeted variables. These include those named above such as decision trees, K-nearest neighbor and neural network, among others. This paper will explore two of those algorithms, that is, the K* and K-nearest neighbor algorithms (Cleary 2-14).
K*
K* is an instance based learner, which uses distance based measure to classify variables by examining their performance on a variety of problems. These learners classify instances by comparing them to other databases which entail pre-classified examples. The process assumes that similar instances usually have similar classification even though this poses challenge in defining such instances and classifications. Instance based learners include K*, K-nearest neighbor and IBL, among others. Entropy as a distance measure employs the approach of computing distance between instances by using information theory. It therefore employs the intuition that such distances define the complexity of converting one instance into the other. This can be done in two processes, one of which involves defining finite set of transformations to map one instance on another. This is known as a program and is made prefix free. This is done by adding termination symbol at the end of each string. The shortest distance of string between two instances defines the distance of measure. This results in a distance that does not solve issues of smoothness since it is very sensitive to small changes (Cleary 2-14).
K* on the other hand tries to reduce this problem of high sensitivity to change and hence reduced smoothness by summing over all these transformations that exists between any two instances. However, this is also not very clear as to which transformations are summed, and thus it aims to a probability with the sequence. For instance, if the program is c, then the probability becomes 2-c. This method is referred to as the Kolmogorov complexity and its summation satisfies Kraft inequality. This can b interpreted as the probability of generating a program through random selection of such transformations. It can also refer to the probability of arriving at an instance by random walk from the first instance. The units of complexity are therefore obtained by calculating its logarithms. This method has been found to bring out the most realistic and robust measure of the link to DNA sequence (Cleary 2-14).
K* Algorithm
In order to use K* which applies distance measure, one needs to have a way of selecting parameters x0 and s. The individual also needs to find ways of utilizing the results from distance measure to ascertain the predictions. The variables above represent real and symbolic attributes respectively. As the parameters change, distance measure also changes, and in the process drawing interesting facts. For instance, as s tends to 1, instances that different from the current develop very low transformations. On the other hand, instance s of similar symbol will develop high transformation probability. In essence, when this happens, the distance unction will convey nearest neighbor behavior. In the other case where s tends towards 0, transformation probability will show the symbol’s probability distribution. Further change in s causes smooth behavioral change, between the two extreme instances (Cleary 2-14).
Similarities are prominent in distance measures of real valued attributes. For example, probability instances drops heavily with increase in distance when x0 is small and therefore functions as a measure of nearest neighbor. However, in case x0 is very large, then virtually all instances shall have similar transformations, which are equally weighted. In both cases the number of instances tends to vary from extreme 1, in which the distribution is nearest neighbor, to that of extreme N, where the instances are equally weighted. In this regard, the effectual number of instances for any function can be calculated as follows (Cleary 2-14).
N0 ≤ (∑b P*(b/a)) 2 / ∑b P* (b/a) 2 ≤ N
Where:
N = effective number of training instances
N0 = number of training instances at the smallest distance
b = blending parameter
K* algorithm works to choose one value fro x0 (s). To achieve this, it selects this number between N and n0. After which, it inverts the expression shown above (Cleary 2-14).
K-Nearest Neighbor Algorithm
This type algorithm is usually used for classification. In some cases it is also utilized in prediction and estimation. It gives a proper example of instance-based learning, which stores data. It does this to obtain classification for unclassified records which are new. To do this, it compares such records with those similar in the training set. In dealing with this classifier, several issues must be considered. These include the number of neighbors that one should consider, for instance, determination of k; since k represents the nearest neighbors. It also involves other issues such as how to measure the distance from the nearest neighbors as well as combining information from al the observations. The algorithm also involves determination of whether points should be weighted equally or not (Larose 90-105).
Weighted Voting
In most cases, it would be assumed that neighbors closest to the new record should be considered more than those far and thus weighted heavily. However, analysts tend to apply weighted voting which has the propensity to reduce ties. Several algorithms may be employed in classification of objects. In K-nearest neighbor classification, one looks at the number of nearest similar variables to classify, predict or estimate its performance. This can be utilized in situations such in administering drugs to patients. By using known classifications, one can classify unknown object by using the known ones to classify, estimate or predict its behavior (Larose 90-105).
Use of K-nearest neighbor algorithm for prediction and estimation
K-nearest neighbor algorithm may also be used for prediction and estimation. This may also included its use for continuous valued target variables. This can by achieved through locally weighted averaging method, among others. In the same manner as classification is done, by comparing the nearest similar neighbor, prediction may be done as well as estimation by using the same technique. For instance, in a hospital prescription, when we have classified variables, we can predict or estimate the unclassified using those that are classified. Such instances can include estimation of systolic blood pressure. In this case, locally weighted method would estimate blood pressure for k = the number of nearest neighbors. This will be accomplished by sing the inverse of weights (Larose 90-105).
For instance, estimated target y = summation of wiyi / summation of wi, where w = 1/d (new, xi) 2 for the records x1, x2, xk. This would give the systolic blood pressure when calculated (Larose 90-105).
Choosing k
Careful considerations should be taken when choosing k in classifying variables. This is mainly because choosing a small k may result in problems such as noise, among others. On the other hand k that is not very small may smoothen out idiosyncratic behaviors which may be learned from the training set. Moreover, taking a larger k also has the probability of overlooking locally interesting behavior (Larose 90-105).
Conclusion
There are two ways of Data mining; these are either through supervised or unsupervised methods. The former describes the methods that identify target variables while the latter does not. The paper explored both K* algorithms and K-nearest neighbor algorithm as well as their usage. In doing this, it was found that it works well on real datasets. The fundamental method employed sums the probability of all possible paths from one instance to another. This helps in solving the smoothness problem, which contributes greatly to a robotic and realistic performance. The methods also works to enable integration of real valued attributes, symbolic attributes as well as ethical ways of dealing with missing values. K* can therefore be used to predict real value attributes and its similarity to 2-dimages. K* performance works best for only one of the two simple learning algorithm IR. This can be solved by raising blend for the unimportant attributes and lowering the blend for important ones (Cleary 2-14).
The paper has also explored the use of K-nearest neighbor algorithm in classification as well as in prediction and estimation. These have been enabled by methods such as locally weighted averaging, among others. The paper also goes in detail of how to choose k and how it affects classification, prediction or estimation results. Choosing a small k may result in problems such as noise, among others. On the other hand k that is not very small may smoothen out idiosyncratic behaviors which may be learned from the training set. Moreover, taking a larger k also has the probability of overlooking locally interesting behavior. It is therefore quite important that one considers such implications when choosing k. This may be resolved by allowing the data to solve such problems on its own. To manage this, it may employ the use of cross-validation procedure. The two methods are therefore useful in classification of objects (Cleary 2-14).
Works Cited
Cleary, John. “K*: An Instance-based Learner Using an Entropic Distance Measure”. Dept. of Computer Science, University of Waikato. New Zealand,
Larose, Daniel. “k-nearest neighbor algorithm”. Discovering Knowledge in Data: An introduction to Data Mining. John Wiley & Sons, 2005.
There are critical points in machine learning that may compromise a prediction model. The relation between bias, variances, and learning models requires a careful examination of data sets used for training (Provost and Fawcett, 2013). This paper aims to assess the impact of bias and variances on prediction models and discuss three ways in which the behavior of such frameworks is adjusted to accommodate their influence.
Model prediction can provide highly valuable insights into many real-life situations. However, hidden patterns that are revealed by machine analysis require extrapolating data that may not explicitly copy the examples on which such frameworks were created (Rocks and Mehta, 2022). Therefore, there is a direct relation between bias, variances, and the efficiency of the prediction models. High levels of bias lead to models that are fast to generate yet underfitting, implying that the data is not represented correctly (Brand, Koch, and Xu, 2020; Botvinick et al., 2019). High variance can be similarly detrimental for a prediction, as a model trained on a highly specific data cluster will be able to predict outcomes that are too complex for utilizing outside of the example set (Brand, Koch, and Xu, 2020; Knox, 2018). Optimization of a prediction model can be achieved by utilizing overparameterized sets that can be later ‘trimmed’ for less global methods (Belkin et al., 2019). It is paramount to decide on the desired level of generalizability of a learning model prior to setting maximum acceptable bias and variance.
The trade-off in such cases requires one to sacrifice either applicability or accuracy in order to find a suitable level of complexity for a model. The optimal performance of a learning model can only be achieved by minimizing the total error (Singh, 2018). The three states of a prediction model are either too complex, too simple, or a perfect fit (Kadi, 2021). The goals of a model must define complexity, as leaving decisions to an improperly trained model may severely impact a firm’s performance (Delua, 2021). Traditional machine learning methods require finding a sufficient level of generalization at the cost of functional losses (McAfee and Brynjolfsson, 2012; Yang et al., 2020). In real life, any implementation of a statistical predictor is linked with margins for error that must be acceptable for the given situation. For example, IBM’s AI-powered cancer treatment advisor Watson was giving incorrect suggestions due to high bias (Mumtaz, 2020). The detrimental impact of such a learning model is apparent in its potential for harm.
In conclusion, an efficient prediction model requires its creators to find a balance between bias and variances to remain applicable in practice. Oversimplification or overfitting can lead to errors in predictions to the point of turning an algorithm unusable in real life. The trade-off in accuracy is required for a learning model to remain applicable, yet such a decision must be based on a practical implication.
Reference List
Belkin, M. et al. (2019) ‘Reconciling modern machine-learning practice and the classical bias-variance trade-off’, Proceedings of the National Academy of Sciences, 116(32), pp. 15849–15854.
Botvinick, M. et al. (2019) ‘Reinforcement learning, fast and slow’, Trends in Cognitive Sciences, 23(5), pp. 408–422.
Brand, J., Koch, B. and Xu, J. (2020) Machine learning. London, UK: SAGE Publications Ltd.
Provost, F. and Fawcett, T. (2013) Data science for business: What you need to know about data mining and data-analytic thinking. Sebastopol, CA: O’Reilly.
Rocks, J.W. and Mehta, P. (2022) ‘Memorizing without overfitting: Bias, variance, and interpolation in overparameterized models’, Physical Review Research, 4(1).
Machine learning belongs among the advanced methods of data processing techniques. The data set plays a crucial role in machine learning, providing the material to generalize and model specific patterns (Deluna, 2021; McAfee and Brynjolfsson, 2012). However, it is essential to distinguish the model states of “generalizing” and simply “memorizing” (Kotsilieris, Anagnostopoulos, and Livieris, 2022, 1). Consequently, several techniques were developed to adjust the learning process, including regularization (Brand, Koch, and Xu, 2020, 1; Alonso, Blanche, and Avresky, 2011, 163). Regularization is multifaceted – it has different forms with unique features.
The concise definition of regularization coincides with its primary purpose – simplification. Overfitting means over-optimizing the model’s fit to the provided data; in this context, regularization focuses not only on optimizing certain combinations of fit but also on simplifying them (Provost, and Fawcett, 2013, 136; Belkin et al., 2019, 1). The regularization techniques that are of interest to me are L2-norm regularization, dropout, and adversarial regularization.
L2-norm regularization has wide usage in machine learning and statistics. It is usually being used for regularization of linear models (Nusrat and Jang 2018, 8; Zhu et al. 2018, 6-7). Its L1 form imposes a diagonal Gaussian prior with zero mean on the weights (Chen et al., 2019, 4). The technique was extended by using the L2 distance from the trained model’s weights to penalize the weights during testing (Barone et al., 2017). This technique provokes my interest because of its fine-tuning application, such as translation improvement (Google Translate). Another reason is L2 being non-sparse, which makes it more flexible compared to L1. Lastly, it can be used outside the machine-learning, making it a valuable tool in the data processing.
Considering the neural machine translation, dropout is also worth the attention. The principle of dropout’s operation presents another reason for curiosity – dropout randomly drops units from the model during training in each iteration (Barone et al., 2017). In addition, I appreciate the ability to use dropout in a learning model without the need to use it in the testing process. Dropout is sometimes used for computation libraries (Keras framework for Python).
The last regularization technique is adversarial regularization; the reason for attention is the privacy protection. Machine learning models might leak data because of predictions – adversarial regularization makes the predictions untrackable (Nasr, Shokri, and Houmansadr, 2018, 634). Another reason to be interested is the authors’ ambitions to create a truly universal technique. Lastly, I am fascinated by technique’s universality itself – it trains ANN, regularizes it, and ensures privacy protection.
Numerous studies showcase the multifaceted nature of regularization techniques – depending on the needs, different features are required for regularization. In the case of statistical regularization, such as fine-tuning, L2-norm regularization will narrow the data set. In the need for additional regularization outside the learning process, dropout will be of use. Finally, with the substantial concern for data privacy, adversarial regularization can provide the needed protection.
Reference List
Alonso, J., Belanche, L., and Avresky, D. R. (2011) ‘Predicting software anomalies using machine learning techniques.’2011 IEEE 10th international symposium on network computing and applications. Cambridge MA, Massachusetts, USA. Massachusetts: IEEE, pp. 163-170. Web.
Provost, F., and Fawcett, T. (2013). Data science for business: What you need to know about data mining and data-analytic thinking. Sebastopol: O’Reilly Media.
Based on the approach used to overcome overfitting, the forms of regularization can be divided into two categories. Both are used either to prevent interpolation or to change the effective throughput of a class of functions (Belkin et al., 2019). One of them is L1 regularization, which reduces the weight of uninformative objects to zero by subtracting a small amount of weight from each integration. Thus, its main feature is that the weight eventually becomes zero leading to a smoother optimization (Oymak, 2018). This form of regulation may be of interest because it helps to work with big data, effectively constrains sparsity properties, and uses the method of equating the optimum to zero (Lin et al., 2018). Of no less interest is that the form can underlie structures with a reduction in the generalization error (Zhao et al., 2018).
In real life, L1 regularization can be used when making machine predictions. Many of them are used to find sparse block solutions (Janati, Cuturi and Gramfort, 2019). For example, when predicting housing prices, the regularization L1 will consider important factors such as the area, infrastructure, and year of construction. At the same time, the form will exclude minor elements, such as the price of flooring or built-in gas equipment. In another example, when predicting the payback of a business product, the system can use indicators of the area’s population and the presence of competitors in the district, ignoring the age or gender aspects of potential buyers. In general, in this form, the solution to sparsity problems can be taken as representative (Yang and Liu, 2018). Thus, the method provides robust results when working with big data (Alizadeh et al., 2020).
Another form is L2 regularization, which main feature is the optimization of the average cost. This type deploys the most commonly used penalty, the sum of the squares of the weights (Provost and Fawcett, 2013). It may be of interest because of the uniqueness of the final solution, computationally inexpensiveness, and the reduction of the probability of an overall error. Even in the presence of noises, the L2 estimation error may still tend to zero with possibly optimal indicators (Hu et al., 2021). The method can also be used to smooth monotonic regression on a single predictor variable, increasing its interest in the context of analysis (Sysoev and Burdakov, 2019).
In real life, L2 regularization is used to evaluate the significance of predictors. It can become a way to overcome the convergence problem by norm, represented by other regularization methods (Zhang, Lu, and Shai, 2018). In the context of forecasting prices example, the slightest factors will also be considered, which will reduce the difference from the final result. In the machine calculation of business payback example, the L2 regularization can complicate the forecast since weight decay helps less with deeper models on more complex datasets (Tanay and Griffin, 2018).
Reference List
Alizadeh, M., Behboodi, A., van Baalen, M., Louizos, C., Blankevoort, T. and Welling, M. (2020) ‘Gradient L1 Regularization for Quantization Robustness’, ICLR 2020.
Belkin, M., Hsu, D., Ma, S. and Mandal, S. (2019) Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences, 116(32), pp.15849-15854.
Hu, T., Wang, W., Lin, C. and Cheng, G. (2021) ‘Regularization matters: A nonparametric perspective on overparametrized neural network’, International Conference on Artificial Intelligence and Statistics, 130(829-837), pp. 829-837.
Janati, H., Cuturi, M. and Gramfort, A. (2019) ‘Wasserstein regularization for sparse multi-task regression’, The 22nd International Conference on Artificial Intelligence and Statistics, 89(1407-1416), pp. 1407-1416.
Lin, P., Peng, S., Zhao, J., Cui, X. and Wang, H. (2018) ‘L1-norm regularization and wavelet transform: An improved plane-wave destruction method’, Journal of Applied Geophysics, 148, pp.16-22.
Oymak, S. (2018) ‘Learning compact neural networks with regularization’, International Conference on Machine Learning, 80(3966-3975), pp. 3966-3975.
Provost, F. and Fawcett, T. (2013) Data Science for Business: What you need to know about data mining and data-analytic thinking. Sebastopol, California : O’Reilly Media, Inc.
Sysoev, O. and Burdakov, O. (2019) ‘A smoothed monotonic regression via L2 regularization’, Knowledge and Information Systems, 59(1), pp.197-218.
Tanay, T. and Griffin, L. D. (2018) ‘A new angle on L2 regularization’, Cornell University. doi : 10.48550/arXiv.1806.11186
Yang, D. and Liu, Y. (2018) ‘L1/2 regularization learning for smoothing interval neural networks: Algorithms and convergence analysis’, Neurocomputing, 272, pp.122-129.
Zhang, Y., Lu, J. and Shai, O. (2018) ‘Improve network embeddings with regularization’, Proceedings of the 27th ACM international conference on information and knowledge management, pp. 1643-1646.
Zhao, Y., Han, J., Chen, Y., Sun, H., Chen, J., Ke, A., Han, Y., Zhang, P., Zhang, Y., Zhou, J. and Wang, C. (2018) ‘Improving generalization based on l1-norm regularization for EEG-based motor imagery classification’, Frontiers in Neuroscience, 12, p.272.