C4.5 algorithm is a decision tree with unlimited number of paths within the node. This algorithm can work only with discrete dependent attribute, that is why it can solve only classification tasks. C4.5 algorithm is considered to be one of the most famous and widely used algorithms of generating decision trees. It is necessary to follow the next demands for working with C4.5 algorism:
Each record from set of data should be associated with one of the offered classes, it means that one of the attributes of the class should be considered as a class mark. It may be concluded that all the samples should belong to the same class, otherwise the mistakes are inevitable.
Each class should be discrete. Each sample should belong to one of the classes.
The number of classes should be much fewer from the number of samples in the considered scope of data.
One should understand that C4.5 algorithm works slowly with very large scale set of data.
Using the concept of information entropy, C4.5 builds the decision trees based on the set of data, like ID3 algorithm. Filestem.ext is the form for the files which are read and written within C4.5 algorithm (filestem is a file name, and ext is a file extension which is aimed at defining the file type). Working with the program, one is expected to have at least two files, the first one is with the file name and class definition and the second one is with the date which gathers the set of objects described by the value of the class attributes. Considering the structure of a decision tree based on C4.5 algorithm, it may be either a leaf, which is predicted to identify a class or a decision node with a number of branches and sub trees, which show the possible outcome of the trial (Quinlan 5).
There are two ways how this algorithm can generate decision trees, batch mode and iterative model. Batch mode (often called default mode) generates a single decision tree. This tree covers all the data available for the decision. Another kind of this algorithm, iterative mode, is based on the random basis. The set of data is selected randomly. Then, a decision tree is generated with adding some specific objects which have been misclassified.
The actions are repeated and the decision tree is continued until it is classified in a correct way or it is found out that there is no any progress. Keeping in mind that iterative model is based on the subset selected randomly, many trials may be used for generating decision trees based on the same data. Keeping in mind that there can be many different decision trees due to the multiple trials, the presence of the filestem.unpruned is necessary. This file is created with the purpose to collect the decision trees in the process. If the similar data is used for generating decision trees, the latest variant of the tree is used. The machine saves the best generated decision tree in the file filestem.tree.
Works Cited
Quinlan, John Ross, C4.5: programs for machine learning. Burlington: Morgan Kaufmann, 1993.
Select two application areas for data mining NOT discussed in the textbook and briefly discuss how data mining is being used to solve a problem (or to explore an opportunity)?
Data mining involves rearranging large volumes of data to create comprehensible information that can be used to solve problems. There are several ways in which data mining can be applied in the real world (Han et al. 76). It can be used to solve problems and explore opportunities.
Data Mining and the Detection of Disturbances in the Ecosystem
The use of data mining to detect disturbances in the ecosystem can help to avert problems that are destructive to the environment and to society. Such calamities include floods and droughts (Kumar and Bhardwaj 258). Remote sensing and earth science techniques are used to understand the radical changes in the environment. Data is collected and archived. It is later mined and used to detect disturbances.
Data Mining in Sports
Data mining can be used to predict sporting activities. A case in point is the Advanced Scout System developed by IBM (Leung and Kyle 715). The application is used by coaches to improve the performance of players. In most cases, fans predict games by watching. They may also use archived data, which is mined and statistically used to make predictions based on the history of the game.
What is Association Rule Mining? And explain how Market-basket analysis helps retail business to maximize the profit from business transactions?
Association Rule Mining
It is the retrieval of data based on the relationship between a given set of objects. It takes into consideration the togetherness of these objects and how they appear in a database. It involves the identification of connections and correlations between objects (Ramageri 304).
Market-Based Basket Analysis and Retail Business
Market basket analysis and association rule mining can be used to maximize profits and improve transactions in the retail business. It is used to study the behavior of customers and their shopping trends. Marketers use the information to design catalogs and undertake customer behavior analysis (Han et al. 99). Consequently, the information can be used in marketing and advertisement to maximize profits and improve business transactions.
Discuss k-Nearest Neighbor (KNN) learning algorithm. What is the significance of the value of k in k-NN?
K-Nearest Neighbor (KNN) Learning Algorithm
The algorithm is a method that is used to classify data obtained from sources with similar sets of parameters. It uses a set of data based on the known classifications of the existing database. It makes use of separate classes to predict a new pattern and classify the new data. The neighbors in this case are the separate sets of data with common characteristics (Bhatia and Vandana 304). For instance, a bank may get a customer who wants a loan, but the entity lacks time to calculate the credit rating of the applicant. The bank can use previous credit ratings of people with similar characteristics, such as earnings and collaterals.
The Significance of the Value of k in k-NN
The k represents the number of classes used in the comparison. Lower values of this component are more accurate compared to higher values. On the other hand, increasing the random data point raises the percentage error of approximation (Bhatia and Vandana 304). As such, k can be used to obtain the most accurate approximation in data classification and regression.
Discuss the two estimation methods of classification-type data mining models while considering ANN as a classifier
Supervised Learning
It is one of the estimation methods of classification data mining models in artificial neural networks (ANN). In this case, a set of example pairs is provided. The objective is to identify or estimate a function. The function has to lie within the permitted cluster of functions (Nikam 15). In addition, it has to reflect the given examples.
Unsupervised Learning
In this estimation method, the ANN works with a given set of data. The data is usually denoted as x. The cost function to be minimized is also provided. The latter can be a random function of x. It can also be the output of the network. The output is usually denoted as f. The cost function relies on what the network is trying to model (Nikam 16). It is also affected by the assumptions made.
Works Cited
Bhatia, Nitin, and Ashev Vandana. Survey of Nearest Neighbor Techniques. International Journal of Computer Science and Information Security, vol. 8, no. 2, 2010, pp. 302-305.
Han, Jiawei, et al. Data Mining: Concepts and Techniques. 3rd ed., Morgan Kaufmann Publishers, 2011.
Kumar, Dharminder, and Deepak Bhardwaj. Rise of Data Mining: Current and Future Application Areas. International Journal of Computer Science Issues, vol. 8, no. 5, 2011, pp. 256-260.
Leung, Carson, and Joseph Kyle. Sports Data Mining: Predicting Results for the College Football Games. Procedia Computer Science, vol. 35, 2014, pp. 710-719.
Nikam, Sagar. A Comparative Study of Classification Techniques in Data Mining Algorithms. Oriental Journal of Computer Science & Technology, vol. 8, no. 1, 2015, pp. 13-19.
Ramageri, Bharati. Data Mining Techniques and Applications. Indian Journal of Computer Science and Engineering, vol. 1, no. 4, 2011, pp. 301-305.
People talking about data mining usually means analyzing vast amounts of information and various data. This information helps organizations solve multiple problems and tasks, predict trends, reduce possible risks and find new opportunities. Data mining involves searching for patterns, relationships, anomalies, and deformations to solve a particular problem. In data mining, helpful information is created or found, which can play an essential role in the search.
Data mining is an exciting and diverse process that includes many components, some of which are even confused with the mining itself. For example, statistics is an essential element of data mining (Lu, 2021). Data mining and machine learning fall into data science, but they have different principles of operation, despite some similarities. Data mining involves several vital steps or stages (Lu, 2021). One can distinguish the search for the necessary information, the preparation of data, the evaluation of information, and the provision of a solution.
The process of data mining provides people with many means of solving problems in the digital age. Information mining has many advantages, among which the following can be distinguished (Lu, 2021). This process helps companies to collect reliable information from a massive amount of data. Data mining is more efficient and cost-effective than other data processing applications. It helps businesses make profitable production and operational adjustments and uses new and outdated systems. Data mining allows companies to make informed decisions and identify credit risks and fraud. In addition to the above, this analysis method allows data processing specialists to easily and quickly analyze massive volumes, initiate automated forecasts of behavior and trends, and detect hidden patterns. Thus, it is possible to conclude that data mining is a convenient and effective way of processing information, which has many advantages.
Reference
Lu, Z. (2021). Research on the application of computer data mining technology in the era of big data. In Journal of Physics: Conference Series (Vol. 1744, No. 4, p. 042118). IOP Publishing.
Data Mining Technology (DTM) is a well-developed technology that assists organizations to make the best informed decisions rather than assumptions and guesswork by mining or extracting useful information from large volumes of data set collected in the past (Wook, Yusof, & Nazri, 2014). The DMT consists of different approaches and techniques that have been thoroughly deployed in business and industry processes. It has been noted that data mining techniques applicable in these organizations can also be applied in the education sector, specifically higher education. The paper proposes the implementation and use of data mining in the Canadian University Dubai.
Data mining refers to the extraction of hidden predictive information from large databases, and it is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses (Thearling, 2012, p. 1). Given the robust nature of data mining technology, it can be applied in understanding unique elements of data stored in educational databases (Luan, 2004). The aim of mining data in the education environment is to enhance the quality of education for the mass through proactive and knowledge-based decision-making approaches. By assessing such benefits, Institutions of Higher Learning should focus on Data Mining Technology approaches as new strategies for comprehending and improving processes of learning and teaching. In fact, higher education requires technology to enhance competition and improve educational outcomes.
While some institutions of higher learning have made significant investments in Data Mining Technology, they have paid much attention either on the invention of complex algorithms or technical elements (Luan, 2004). However, these institutions have not indulged much about user perception of data mining technologies. In addition, not many studies have focused on understanding utilization of data mining technology that could inhibit adoption and appreciation (Jan & Contreras, 2011). Just like other technologies applicable in different environments, user support is extremely critical for such technologies to succeed; otherwise, DMT may experience low rates of adoption and acceptance irrespective of notable outcomes and benefits (Jan & Contreras, 2011). It is therefore imperative to observe user behaviors before proposing and implementing any technologies. This approach would reduce low usage rates or utter abandonment.
Technology Acceptance Model (TAM) and various models have been developed to understand specific user behavior toward technology adoption. Hence, it is advisable for institutions of higher learning to integrate such tools in their strategies before adopting DMT.
Who use it?
As previously noted, several organizations, businesses and industries use data mining technology to enhance decision-making. That is, data mining technology cuts across all sectors and organization, including small and mid-sized organizations. For instance, one pharmaceutical firm has been analyzing the past sales activities to enhance customer targeting and identify the most robust marketing strategies. Financial institutions have jumped into data mining to leverage abundant data from customer transactions for credit scoring and identify customers who are most likely to apply for new credit facilities. Customer experiences and needs can be used for market segmentation. Finally, retailers may mine data from loyalty cards to understand buying patterns, improve sales processes and customer experience. Generally, any organization can apply Data Mining Technology, including universities.
How it works?
Data Mining Technology in education is possible. Educational Data Mining (EDM) is the area of scientific inquiry centered on the development of methods for making discoveries within the unique kinds of data that come from educational settings (de Baker, 2010, p. 2). It also reflects the use of data mining techniques to a given set of data in an educational setting.
Data analysts or scientists have used different techniques and a wide range of methods, such as clustering, prediction and relationship mining, decision tree, k-means, Bayesian networks and neural network among others (Erdoan & Timor, 2005). These techniques and methods are applicable in educational setting and data (Romero & Ventura, 2010). Specifically, neural networks, Bayesian networks and decision trees have been noted as more appropriate for the education sector (Romero & Ventura, 2010).
In education, data mining focuses on analysis of large volumes of data from various sources for improved comprehension of learning, processes and outcomes. According to figure 1, the process of Educational Data Mining involves transformation of large volumes of data into valuable knowledge for users.
Figure 2 shows the phases, and the iterative nature of a data mining project (Oracle, 2015, p. 1). From this process, one can observe that the flow of the process never stops even if a specific solution is found and implemented. Instead, new questions emerge, which can be subsequently used to create models that are more robust.
Data Mining Technology is useful for gaining access to data from both structured and unstructured sources from the Web and traditional data storage tools.
Data can be mined to determine or predict possible behaviors and trends of learners in course session. These behaviors and trends may include performance and curriculum. For instance, effective deployment of Data Mining Technology can predict students failure or success in a given course. Classifiers or variables are used to determine relationships between variables in predicting outcomes. At the same time, the university can also determine dropout rates and develop appropriate intervention systems to control the number of learners inclined to dropout from their courses.
At the same time, the recent use of Data Mining Technology has increased within the educational Web-based setting. For instance, data obtained from e-learning, learning management applications, tutoring systems and adaptive systems have been crucial in helping educators to better comprehend learners online learning behaviors and trends.
The predictive approaches rely on students activities and other related pieces of information, including time spent online, assignment submission, content studied and study results. In this case, educators can use results from predictive analytics to identify at-risk students. Consequently, they can design the most effective intervention methods to enhance outcomes for such students. In this regard, educators and institutions can rely on Data Mining Technology to enhance learning environments for learners and improve their operational activities.
TCO of such technology
Many Data Mining technology tools are available in the market today. Data mining tools such as WEKA by the University of Waikato, New Zealand are absolutely free. On the other hand, some vendors, such as Oracle, SAP, IBM SPSS and SAS offer these data mining tools at varied costs, which could be significant. Hence, it is imperative to understand whether the CUD can deploy open source tool or use most touted expensive data mining software from specific vendors.
Costs associated with hiring or training data scientists for the University could be enormous. It is imperative to note that there are currently no enough data scientists to meet the rising demands.
Literature Review about Selected Technology
Many scholars have recognized myriads of challenges facing higher learning institutions (Delavari, Phon-Amnuaisuk, & Beikzadeh, 2008). Specifically, decision-making processes have become more difficult because of many interrelated factors within the education system. In this regard, data mining by using more efficient technology and expertise has been identified as a technique that can assist universities in decision-making processes. Information needed to facilitate decision-making lies in databases possessed by universities. Only tools and techniques of are required to gain new knowledge from data (Thearling, 2012).
Researchers continue to develop new data mining models for use in higher education institutions (Al-Twijri & Noaman, 2015). These models are designed to facilitate decision-making processes and control elements of student admission and graduation. In this regard, major characteristics of learners that could lead to higher retention and graduation can be mined from available data and predicted earlier enough within the first semester (Raju & Schumacker, 2015).
For a long time, studies have continuously demonstrated that learners are unique, have diverse knowledge levels, learn at different paces, face different socioeconomic challenges and topic familiarity differs. On this note, an intelligent curriculum should account for unique needs of every learner rather than using a standard curriculum for all. Learning contents should be flexible, adaptive and regularly reviewed. This calls for analytics so that significant insights can be discovered and applied to shape the content for different learners and improve learning experiences (Wagner & Ice, 2012).
At the lower levels, higher education should use data mining for student acquisitions; course selection; improving performance; student work groups; retention; teacher effectiveness; and attrition (Schmarzo, 2014). These diverse applications show that higher education outcomes can be enhanced through Data Mining Technology. Data Mining can act as the basis for making informed decisions, changing curriculum design and delivery, learning content evaluation, student learning activities, resource allocations and outcome monitoring among others.
Universities need to go deeper with analytics beyond student acquisition and attrition rates. While it is a good starting point, deeper insights can be obtained by monitoring, for instance, students at greater risk of dropping a course or out of college. By identifying these issues earlier enough, higher education can develop programs to reduce dropout rates effectively.
Rapid expansion of universities, online education and technologies are putting much pressure on these institutions to increase performance and graduation rates. Fortunately, universities can benefit from data analytics to enhance education outcomes. Universities can realize and exploit the identified opportunities from studies and adopt data mining to demonstrate how they can solve higher education multiple challenges. The process requires collaboration with technology vendors and data experts to show how these new technologies can support learning in universities. Universities should adopt best practices and models for data mining to provide them with opportunities to transform learning experiences and outcomes for students, teachers and other stakeholders.
Applying the Technology in the Canadian University Dubai
The Data Mining Technology is proposed for Canadian University Dubai (CUD). As globalization increases, global universities have focused on expanding their reach. The University was established in 2006 in Dubai. It provides students with Canadian education but with the respect of the culture and values of the United Arab Emirates (Canadian University Dubai, 2015, p. 1).
The goal of the University is help every student to move forward and ensure well-rounded lifelong learner (Canadian University Dubai, 2015, p. 1). Consequently, it has focused on academic achievement and extracurricular engagement. While all these goals sound good, the Universitys goals are most likely to take longer than necessary because it lacks robust decision support systems. The University has not adopted Data Mining Technology to gain useful insights from its existing database.
To demonstrate how the University can benefit from Data Mining Technology, various case studies will be used to show how it can overcome specific challenges, as well as benefits of the outcomes. For instance, the University can use Data Mining to understand students that take most credit hours, classes that are most likely to be popular, students that are most likely to come for additional classes, or even predict pledges that alumni will make.
A certain institution of higher learning wanted to create consequential learning outcome typologies, but it faced the challenge of limited understanding of student. To overcome this issue, unsupervised data mining was applied (Luan, 2004). A typical university would have an enrolment of about 15,000 with students grouped as transfer based, vocational or basic skill upgraders (Luan, 2004, p. 4). These means of student identification contain basic information of what learners declared during enrolment. In addition, they do not reflect specific differences noted between each learner type. In this case, the university used data mining techniques to develop the exact typologies for its 15,000 students. The researchers applied two techniques of clustering algorithms, TwoStep and K-means and used the algorithms on the three general classifications noted above and obtained mixed results (Luan, 2004, p. 4). There were no distinctions between boundaries of clusters even after data cleaning and repeated measures. No significant improvements were observed in the result. Thus, the researchers concluded that the initial declaration at enrolment did not reflect actual behaviors of learners. They then applied a replacement technique by concentrating on educational outcomes alongside lengths of study (Luan, 2004, p. 4).
The educational outcomes were difficult to define. Specific learning period was required to ensure that a learner has attained a given milestone. Dropout was also assessed as an outcome of learning while researchers had to deal with stopouts learners who dropped out but later returned to continue with studies (Luan, 2004, p. 5). Hence, the data scientist must be able to account for all these diverse variables in order to answer specific typology question and research objective. After taking care of the outliers by eliminating or including them in other cluster, TwoStep algorithm generated the following clusters: Transfers, vocational students, basic skills students, students with mixed outcomes, and dropouts (Luan, 2004, p. 5). The researchers then used k-means to validate the generated clusters. The length-of-study was also factored into the variables for every cluster, and new perspectives emerged. Some transfer cluster learners completed quickly; some vocational learners took longer; and other students appeared to simply take one or two courses at a time (Luan, 2004, p. 5).
The results were informative for the university. From data mining, the university was able to understand demographic and other related information about student typologies. It was also established that some older students took more time to complete their studies while younger ones with more socioeconomic advantages settled for high credit courses and completed studies faster (Luan, 2004, p. 5). The university was able to group students as transfer speeders, college historians, fence sitters and skill upgraders among others. The typologies ensured that the university could understand students beyond the normal homogenous grouping. The data mining project discovered hidden vital information that the university could use to meet diverse needs of learners.
Another demonstration to show how the University can apply Data Mining Technology is in the area of academic planning and interventions (Luan, 2004). In this case study, the college faced the most difficult challenge of accurately predicting academic outcomes in order to develop appropriate interventions for learners. Colleges apply data mining techniques to determine learners that are at risk of low performance. Consequently, they can develop appropriate interventions to prevent failure even before learners realize the risks. Transferring through the four-year of academic at the university is the main objective of many students. However, academic challenges lead to extended periods of transferring while other students completely fail to transfer. Traditionally, it has been difficult to understand these issues and transferring among students. However, data miners can mine and match data available from various sources to understand behaviors and characteristics of students that transfer or fail to transfer. Thus, data scientists and decision-makers can relate these data and academic behaviors and outcomes of students to determine transfer outcomes.
The solution to the transfer issue was found after application of data mining techniques. Various typologies and domain knowledge were used to develop appropriate data mining model (Luan, 2004). In this case, it was determined that transfer education domain knowledge the most reliable means of handling transfer was to identify transfer-oriented students within the earliest opportunities. Training students who are potential candidate to transfer is more relevant than concentrating on students who have gathered adequate points to transfer. By relying on transfer outcome data, data miners developed a dataset with different students variables under the major transfer clusters of laggards and speeders (Luan, 2004, p. 5).
The researchers then created a test dataset and validation dataset from the original dataset through proprietary randomization technique (Luan, 2004). Transfer was regarded as outcome variable while other variables included units earned, courses taken, demographics, and financial assistance were classified as predictors (Luan, 2004, p. 5). Thus, they were analyzed without a focus on stepwise testing for significance (Luan, 2004). The data mining process tolerated interactions between variables and non-linear relations (Luan, 2004, p. 5). The researchers used supervised data mining for the study. Thus, neural network and rule induction algorithms were performed at the same time for ease of comparing and contrasting the accuracy of the prediction (Luan, 2004, p. 5).
The result allowed the college to identify students with better transfer opportunities. The extensive machine learning through neural network algorithm increased the accuracy of the prediction (Luan, 2004, p. 5). Thus, the researchers could easily identify patterns found in the data.
These are just few cases, which illustrate how the Canadian University Dubai can apply Data Mining Technology to improve learning and teaching outcomes. Robust algorithms and tools can be applied with highly qualified data scientists to help the University to attain its goals.
Discussion and Analysis
One major issue that institutions of higher learning encounter now is predicting possible outcomes of their students and alumni. It is necessary for universities to determine enrolment and students that would require help to transfer. Other issues associated with traditional management of learning continue to motivate universities to seek for better alternatives. Consequently, some universities have noted the potential of applying Data Mining Technology to overcome such challenges. Data mining gives different organizations abilities to exploit their current data resources and data mining tools, techniques and expertise to uncover and comprehend hidden patterns in large databases. The identified patterns are developed into data mining models to provide useful information that universities can use to predict student behaviors. From the insight obtained, universities can allocate teaching and learning resources more accurately and effectively.
Data Mining Technology would, for instance, provide an opportunity and information for the university to predict drop out and develop appropriate interventions to avert drop out or possible provide more resources for a given course based on the prediction.
Past studies in the subject of data mining and its application in higher learning reveal a powerful tool that can transform the education sector. At-risk students can be identified and get the necessary support they require, for example. In addition, there are specific functional areas in higher education that analytics and prediction can be applied to support outcomes. Analytics, for instance, can be applied in critical areas such as finance and budgeting, enrollment, instructional and student progress management among others (Mattingly, Rice, & Berge, 2012).
Although few institutions of higher learning have embraced Data Mining Technology, they can leverage student performance, usage, behaviors, faculty performance and social insights such as tendencies, propensities and trends to maximize learner engagement processes, reduce rates of attrition, enhance lifetime value and promote learning advocacy (Schmarzo, 2014).
Summary
The paper proposes the implementation and use of data mining in the Canadian University Dubai. Data Mining refers to the extraction of unknown predictive information from large databases (Thearling, 2012, p. 1). The technology is considered powerful with robust predictive analytics that can assist organizations to concentrate on valuable data stored in their databases. Data Mining Technology helps organizations to predict possible future trends. From the results, organizations can then act proactively based on available knowledge and, therefore, decisions can be more reliable. It is imperative to recognize that data mining is robust and goes beyond data analysis. It includes extraction of large volumes of data from various sources, which are analyzed to support decisions. Many data mining tools, including open, freely available ones can tackle some challenging issues that were once considered complex or tedious. These tools can scour multiple databases for any information that is predictive and can provide potential future behaviors.
Results from organizations, including few institutions of higher learning that have embraced data mining show that data mining is a robust analytical tool that can assist organizations to overcome some challenges. For instance, institutions of higher learning can use data mining techniques to understand student behaviors, improve resource and staff allocation and enhance relationships with alumni. The hidden patterns can provide critical information based on predictive models to manage issues of enrolment, dropouts, graduation and even alumni.
Different organizations have deployed data mining tools and techniques to analyze large volumes of data and get insights that can aid decision-making. Canadian University Dubai can also adopt Data Mining Technology for analytical purposes. It however requires expertise in data science and effective machine learning tools that can handle large volumes of structured and unstructured data from different sources.
References
Al-Twijri, M. I., & Noaman, A. Y. (2015). A New Data Mining Model Adopted for Higher Institutions. Procedia Computer Science, 65, 836844. Web.
de Baker, R. J. (2010). Data Mining for Education. Oxford, UK: Elsevier.
Delavari, N., Phon-Amnuaisuk, S., & Beikzadeh, M. R. (2008). Data Mining Application in Higher Learning Institutions. Informatics in Education, 7(1), 3154.
Erdoan, ^. Z., & Timor, M. (2005). A Data Mining Application in a Student Database. Journal of Aeronautics and Space Technologies, 2(2), 53-57.
Jan, A. U., & Contreras, V. (2011). Technology acceptance model for the use of information technology in universities. Computers in Human Behavior, 27(2), 845851. Web.
Luan, J. (2004). Data Mining Applications in Higher Education. Chicago: SPSS Inc.
Mattingly, K. D., Rice, M. C., & Berge, Z. L. (2012). Learning analytics as a tool for closing the assessment loop in higher education. Knowledge Management & E- Learning: An International Journal, 4(3), 236-247.
Raju, D., & Schumacker, R. (2015). Exploring Student Characteristics of Retention that Lead to Graduation in Higher Education Using Data Mining Models. Journal of College Student Retention: Research, Theory & Practice, 16(4), 563-591. Web.
Romero, C., & Ventura, S. (2010). Educational Data Mining: A Review of the State of the Art. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 40(6), 601 618. Web.
Schmarzo, B. (2014). What Universities Can Learn from Big Data Higher Education Analytics. Web.
Thearling, K. (2012). An Introduction to Data Mining. Web.
Wagner, E., & Ice, P. (2012). Data Changes Everything: Delivering on the Promise of Learning Analytics in Higher Education. EDUCAUSE Review, 47(4).
Wook, M., Yusof, Z. M., & Nazri, M. Z. (2014). Data Mining Technology Adoption in Institutions of Higher Learning: A Conceptual Framework Incorporating Technology Readiness Index Model and Technology Acceptance Model 3. Journal of Applied Sciences, 14, 2129-2138. Web.
Data mining is aimed at providing fully detailed reports allowing to analysis the necessary statistical data and giving access to website traffic. Data mining is an important element in political and economic spheres of state activities as it is used in various policies and practices by law enforcement agencies and Department of Justice. Its role is considered to be decisive in the national fight with terrorism and criminals tracking.
Main Text
It is a well known fact that the FBI refers to data mining for the purpose of tracking the potential terrorists as well as individuals breaking the national law. The US Department of Justice states that the development of data mining programs will be especially stressed, Each of these initiatives is extremely valuable for investigators, allowing them to analyze and process lawfully acquired information more effectively in order to detect potential criminal activity and focus resources appropriately. (Vijayan, 2007) The principle issues covered by such programs are aimed at: the decrease of terrorism level in the USA focusing on the individuals being identified as the ones of interest for FBI; identification of theft intelligence through the customers complaints; the examinations of real estate transactions. It should be noted that data mining is also concentrated on Internet pharmacy fraud and that one of automobile insurance. (Vijayan, 2007)
Law Enforcement agencies are focused on the usage of data mining for the purpose of national security interests promotion. The Federal Bureau of Investigation feels sharp demand in obtaining the data referring not only to the criminals but to the people the criminals had contact with. Taking into account the tragedy of September, 11, 2001, the FBI started to stick to data mining technique officials in order to get more links to potential terrorists and criminals.
The FBI used to petition the Federal Communications Commission to get access to Internet connections. It was explained by the fact that terrorists can use voice through the technologies of internet protocol in order to evade detection. The aim is to introduce changes into the networks for the terrorist patterns to be discerned and data to be tapped. Nevertheless it caused some problems connected with the development of the FBI project as the absence of discrete circuits cannot help in the location identification. The usage of data mining has its disadvantages because it can provide false facts or mixed crimes. Internet data, whether it is transmitted via a digital subscriber line (DSL), cable modem or dial-up modem, mixes and mingles with packets of data from thousands of other users, said the representative of Aaxis Technologies CEO. (Koprowski, 2004)
Conclusion
So, data mining can be analyzed from positive and negative side though its advantages far outweigh disadvantages. It is necessary to note that the usage of data mining helps FBI to have access to the necessary information for terrorism and crime tracking. Besides, it is useful for the decrease of economical and social thefts and crimes spread in modern world. Certainly internet basis is too sophisticated and one cannot but meet mistakes and data mixing which cause troubles and make the search more difficult. (Schneider, 2007) Despite this fact the promotion of data mining in current departments of justice and governmental bodies as well as security agencies will allow to reduce the number of crimes and get more chances to fight modern terrorism spread worldwide.
References
Koprowski, Gene. 2004. FBI Plans to Track Suspects with Data-Mining Techniques. Tech News World. Web.
C4.5 algorithm is a decision tree with unlimited number of paths within the node. This algorithm can work only with discrete dependent attribute, that is why it can solve only classification tasks. C4.5 algorithm is considered to be one of the most famous and widely used algorithms of generating decision trees. It is necessary to follow the next demands for working with C4.5 algorism:
Each record from set of data should be associated with one of the offered classes, it means that one of the attributes of the class should be considered as a class mark. It may be concluded that all the samples should belong to the same class, otherwise the mistakes are inevitable.
Each class should be discrete. Each sample should belong to one of the classes.
The number of classes should be much fewer from the number of samples in the considered scope of data.
One should understand that C4.5 algorithm works slowly with very large scale set of data.
Using the concept of information entropy, C4.5 builds the decision trees based on the set of data, like ID3 algorithm. Filestem.ext is the form for the files which are read and written within C4.5 algorithm (filestem is a file name, and ext is a file extension which is aimed at defining the file type). Working with the program, one is expected to have at least two files, the first one is with the file name and class definition and the second one is with the date which gathers the set of objects described by the value of the class attributes. Considering the structure of a decision tree based on C4.5 algorithm, it may be either a leaf, which is predicted to identify a class or a decision node with a number of branches and sub trees, which show the possible outcome of the trial (Quinlan 5).
There are two ways how this algorithm can generate decision trees, batch mode and iterative model. Batch mode (often called default mode) generates a single decision tree. This tree covers all the data available for the decision. Another kind of this algorithm, iterative mode, is based on the random basis. The set of data is selected randomly. Then, a decision tree is generated with adding some specific objects which have been misclassified.
The actions are repeated and the decision tree is continued until it is classified in a correct way or it is found out that there is no any progress. Keeping in mind that iterative model is based on the subset selected randomly, many trials may be used for generating decision trees based on the same data. Keeping in mind that there can be many different decision trees due to the multiple trials, the presence of the filestem.unpruned is necessary. This file is created with the purpose to collect the decision trees in the process. If the similar data is used for generating decision trees, the latest variant of the tree is used. The machine saves the best generated decision tree in the file filestem.tree.
Works Cited
Quinlan, John Ross, C4.5: programs for machine learning. Burlington: Morgan Kaufmann, 1993.
People talking about data mining usually means analyzing vast amounts of information and various data. This information helps organizations solve multiple problems and tasks, predict trends, reduce possible risks and find new opportunities. Data mining involves searching for patterns, relationships, anomalies, and deformations to solve a particular problem. In data mining, helpful information is created or found, which can play an essential role in the search.
Data mining is an exciting and diverse process that includes many components, some of which are even confused with the mining itself. For example, statistics is an essential element of data mining (Lu, 2021). Data mining and machine learning fall into data science, but they have different principles of operation, despite some similarities. Data mining involves several vital steps or stages (Lu, 2021). One can distinguish the search for the necessary information, the preparation of data, the evaluation of information, and the provision of a solution.
The process of data mining provides people with many means of solving problems in the digital age. Information mining has many advantages, among which the following can be distinguished (Lu, 2021). This process helps companies to collect reliable information from a massive amount of data. Data mining is more efficient and cost-effective than other data processing applications. It helps businesses make profitable production and operational adjustments and uses new and outdated systems. Data mining allows companies to make informed decisions and identify credit risks and fraud. In addition to the above, this analysis method allows data processing specialists to easily and quickly analyze massive volumes, initiate automated forecasts of behavior and trends, and detect hidden patterns. Thus, it is possible to conclude that data mining is a convenient and effective way of processing information, which has many advantages.
Reference
Lu, Z. (2021). Research on the application of computer data mining technology in the era of big data. In Journal of Physics: Conference Series (Vol. 1744, No. 4, p. 042118). IOP Publishing.
Ethnography refers to the study of specific cultures. The study of cultures is of great importance under normal circumstances to enhance the understanding of the same. It is against this background that the study of cultures occupies a central role in society. Anthropology involves the study of human behavior. This involves the history and cultures of people. However, anthropology involves a general approach to the whole aspect of human culture, behavior, and experiences. Ethnography on the other hand focuses on the specific aspect of culture. Normally ethnography involves the selection and study of a specific culture.
This is considered vital since it brings more accuracy and authenticity to the whole field of anthropology. In this case, the knowledge that could not have been achieved through anthropology can be achieved by ethnography. Therefore ethnography is considered as a better way of understanding human behavior specifically culture. Henceforth, ethnography complements anthropology in many ways. Ethnography works in many ways, the collection of information and process of study involves several methods and parameters. One common method used by ethnography is data mining. Data mining aids ethnography since it is through it that the necessary information is obtained, analyzed, and evaluated. This paper aims to take a keen look at the concept of ethnography. To succeed in this endeavor the paper will also analyze data mining and its essence. The paper will refer to several articles and sources in the discussion of the whole concept.
Culture and Psychology
Ethnography plays a key role in the study of psychological behaviors. These behaviors are shaped by culture and human experience (Atkinson & Hammersley 2007). It is against this background that through data mining, ethnography collects information and evaluates bringing out the essence of the same. Psychological behaviors are those features that emanate from the status of the mind of students at any given time. The psychological status of an individual at any given time determines the effect of whatever activity is prevailing. Psychological behaviors might take the shape of personality and other symptoms of mental disorder. Under normal circumstances, psychological behaviors involve several conditions. Most of these conditions represent a malfunction of the various mental faculties. Examples of psychological behaviors include anxiety, attitude, and motivation (Havemeyer 2007).
Various studies conducted have indicated that anxiety is a major cause of poor performance among students from certain cultural backgrounds. It is a proven fact that anxiety contributes to a high degree of low performance by the students. However, this does not mean that only anxiety causes poor performance since virtually all psychological behaviors tend to harm the performance of the students. Fears of all kinds and aspects of mental and psychological disorders all have a profound effect on the aspect of performance of the students. Yet as far as the learning process is concerned, performance is the most important of all the aspects (Fetterman 2009). The performance of the students is an indicator of the success or failure of the whole program. When a students performance is affected it leads to a kind of situation where the program is of no significance.
Children who have psychological disorders lack the necessary concentration and focus that is necessary for learning (Keong 2006). Under normal circumstances, the process of learning requires a lot of focus and attention at the same time. There is therefore a very clear relationship between concentration and the learning process. Without adequate concentration and attention, the learning process is rendered ineffective. As a result, the role played by factors of psychological nature such as anxiety is great and cannot be underestimated. Such factors dont get limited to the learning process alone (Larose et al 2007). Their impact goes beyond the learning environment. Under normal circumstances, the students get problems in almost all the other areas of life. For instance, the social lives of such students are also affected by such factors of psychological nature.
Conclusion
Ethnography plays an important role in the field of anthropology. Ethnography complements the process of anthropology. However, ethnography is more effective, successful, and specific than anthropology. This is because ethnography focuses on a specific subject in its study. Under normal circumstances, ethnography involves the study of cultures. This is done in a way in which a specific culture is selected for the study. In this way, the process is more objective and successful on many counts. Ethnography involves several methodologies and aspects. For the whole process to be successful several parameters are needed. This is how the process of ethnography becomes successful on many counts. As a result, ethnography involves the process of data mining. Data mining refers to the process in which the data is sought for study. Ethnography cannot operate without data mining. Data mining is the success secret of ethnography. Through data mining, ethnography established the necessary information needed for the analysis and understanding of human culture. This works in a manner where the data involving the selected culture is sought and analyzed thoroughly to be used in making conclusions. The paper has discussed fully the concept of ethnography. Since there are several related aspects, the paper has focused on several parameters. This was done to navigate through all the necessary factors of ethnography.
References
Atkinson, P. & Hammersley, M. (2007). Ethnography: principles in practice. Washington: Taylor & Francis.
Fetterman, D. (2009). Ethnography: step-by-step. New York: SAGE.
Havemeyer, L. (2007). Ethnography. Washington: LLC.
Keong, W. (2006). Advances in knowledge discovery and data mining: 10th Pacific-Asia conference. New York: PAKDD.
Larose et al. (2006). Data mining the Web: uncovering patterns in Web content, structure, and usage. Washington: Wiley-Interscience.
Data mining is primarily used today by companies with a strong consumer focus – retail, financial, communication and marketing organizations. It enables these companies to determine relationships among ‘internal’ factors, such as price, product positioning or staff skills, and ‘external’ factors, such as economic indicators, competition and customer demographics. And, it enables them to determine the impact on sales, customer satisfaction, and corporate profits. Finally, it enables them to ‘drill down’ into summary information to view detail transactional data. With data mining, a retailer could use point-of-sale records of customer purchases to send targeted promotions based on an individual’s purchase history. By mining demographic data from comment or warranty cards, the retailer could develop products and promotions to appeal to specific customer segments. For example, Blockbuster Entertainment mines its video rental history database to recommend rentals to individual customers. American Express can suggest products to its cardholders based on analysis of their monthly expenditures. Data mining is the procedure of posing questions and taking out patterns, often in the past mysterious from huge capacities of data applying pattern matching or other way of thinking techniques. Data mining has several applications in protection together with for national protection as well as for cyber protection. The pressure to national protection includes aggressive buildings, demolishing dangerous infrastructures such as power grids and telecommunication structures. Data mining techniques are being examined to realize who the doubtful people are and who is competent of functioning revolutionary activities. Cyber security is concerned with defending the computer and network systems against fraud due to Trojan cattle, worms and viruses. Data mining is also being useful to give solutions for invasion finding and auditing. While data mining has several applications in protection, there are also serious privacy fears. Because of data mining, even inexperienced users can connect data and make responsive associations. Therefore, we must to implement the privacy of persons while working on practical data mining. In this paper we will talk about the developments and instructions on privacy and data mining. In particular, we will give a general idea of data mining, the different types of threats and then talk about the penalty to privacy.
Data Mining for Safety Applications
Data mining is fitting a key technology for identifying doubtful activities. In this section, data mining will be discussed with respect to use in both ways for non-real-time and for real-time applications. In order to complete data mining for counter terrorism applications, one wants to gather data from several sources. For example, the subsequent information on revolutionary attacks is wanted at the very least: who, what, where, when, and how; personal and business data of the possible terrorists: place of birth, religion, education, ethnic origin, work history, finances, criminal record, relatives, friends and associates, and travel history; unstructured data: newspaper articles, video clips, dialogues, e-mails, and phone calls. The data has to be included, warehoused and mined. One wants to develop sketches of terrorists, and activities/threats. The data has to be mined to take out patterns of possible terrorists and forecast future activities and goals. Fundamentally one wants to find the ‘needle in the haystack’ or more suitably doubtful needles among probably millions of needles. Data integrity is essential and also the methods have to scale. For several applications such as urgent situation response, one needs to complete real-time data mining. Data will be incoming from sensors and other strategy in the form of nonstop data streams together with breaking news, videocassette releases, and satellite images. Some serious data may also exist in caches. One wants to quickly sift through the data and remove redundant data for shortly use and analysis (non-real-time data mining). Data mining techniques require to meet timing restriction and may have to stick the quality of service (QoS) tradeoffs among suitability, accuracy and precision. The consequences have to be accessible and visualized in real-time. Additionally, alerts and triggers will also have to be employed. Efficiently applying data mining for safety applications and to develop suitable tools, we need to first find out what our present capabilities are. For instance, do the profitable tools balance? Do they effort only on particular data and limited cases? Do they carry what they assure? We require a balanced objective study with display. At the same time, we also require to work on the large picture. For instance, what do we desire the data mining tools to carry out? What are our end consequences for the predictable future? What are the standards for achievement? How do we assess the data mining algorithms? What test beds do we construct? We require both a near-term as well as longer-term resolutions. For the future, we require to influence present efforts and fill the gaps in an objective aimed way and complete technology transfer. For the longer-term, we require a research and development diagrams. In summary, data mining is very helpful to resolve security troubles. Tools could be utilized to inspect audit data and flag irregular behavior. There are many latest works on applying data mining for cyber safety applications, Tools are being examined to find out irregular patterns for national security together with those based on categorization and link analysis. Law enforcement is also using these kinds of tools for fraud exposure and crime solving.
Privacy Suggestions
We require finding out what is meant by privacy before we look at the privacy suggestions of data mining and recommend efficient solutions. In fact, different society-ties have different ideas of privacy. In the case of the medical society, privacy is about a patient finding out what details the doctor should discharge about him/her. Normally employers, marketers and insurance corporations may try to find information about persons. It is up to the individuals to find out the details to be given about him. In the monetary society, a bank customer finds out what financial details the bank should give about him/her. Additionally, retail corporations should not be providing the sales details about the persons unless the individuals have approved the release. In the case of the government society, privacy may get a whole new significance. For example, the students who attend my classes at AFCEA have pointed out to me that FBI would gather data about US citizens. However, FBI finds out what data about a US citizen it can provide to say the CIA. That is, the FBI has to make sure the privacy of US citizens. Additionally, permitting access to individual travel and spending data as well as his/her web surfing activities should also be provided upon receiving permission from the individuals. Now that we have explained what we signify by privacy, we will now checkup the privacy suggestion of data mining. Data mining provides us facts’ that are not clear to human analysts of the data. For instance, can general tendency across individuals be calculated without enlightening details about individuals? On the other hand, can we take out highly private relations from public data? In the former case we require to protect the person data values while enlightening the associations or aggregation while in the last case we need to defend the associations and correlations between the data.
Growth in Privacy
Different types of privacy problems have been considered by researchers. We will point out the various problems and the solutions projected:
Privacy contraventions that consequence due to data mining. In this case the way out is Privacy protecting data mining. That is, we perform data mining and give out the results without enlightening the data values used to perform data mining.
Privacy contraventions that result due to the inference problem. Note that inference is the procedure of realizing sensitive data details from the lawful answers received to user inquiries. The way out to this problem is privacy constraint processing.
Privacy contravention due to un-encrypted data. The way out to this problem is to make use of encryption at different levels.
Privacy contravention due to poor system design. Here the way out is to build up methodology for designing privacy-enhanced systems.
Below we will observe the ways out projected for both privacy constraint/policy processing and for privacy preserving data mining. Privacy limitation or policy processing research was carried out and is footed on some of her prior research on security restriction processing. Instance of privacy restrictions include the following:
Simple constraint: an aspect of a document is private.
Content footed constraint: if document holds information about X, then it is private.
Association-based constraint: two or more documents used together are private; individually each document is public.
Free constraint: after X is freed Y becomes private.
The way out projected is to augment a database system with a privacy checker for constraint processing. During the inquiry process, the constraints are checked up and only the public information is freed unless certainly the user is approved to obtain the private information. Our approach also contains processing constraints during the database update and design operations.
Some early work on managing the privacy problem that consequence from data mining was performed by Clifton at the MITRE Corporation. The suggestion here is to avoid useful outcomes from mining. One could initiate ‘cover stories’ to provide ‘false’ outcomes. Another approach is to only build a sample of data existing so that a challenger is not capable to come up with helpful rules and analytical functions. However, these approaches did not impression as it beaten the idea of data mining. The objective is to perform effective data mining but at the same time guard individual data values and sensitive relations. Agrawal was the first to invent the word privacy preserving data mining. His early work was to initiate random values into the data or to bother the data so that the real data could be confined. The challenge is to initiate random values or agitate the values without touching the data mining results. Another new approach is the secure multi-party computation (SMC) by Kantarcioglu and Clifton. Here, each party knows its individual contribution but not the others’ contributions. Additionally, the final data mining outcomes are also well-known to all. Various encryption techniques utilized to make sure that the entity values are protected. SMC was demonstrating several promises and can be used also for privacy preserving scattered data mining. It is probably safe under some suppositions and the learned models are correct. It is assumed that procedures are followed which is a semi truthful model. Malicious model is also investigated in some current work by Kantarcioglu and Kardes. Many SMC footed privacy preserving data mining algorithms contribute to familiar sub-protocols (e.g. dot product, summary, etc.). SMC does have any disadvantage as it’s not competent enough for very large datasets (e.g. petabyte sized datasets). Semi-honest model may not be reasonable and the malicious model is yet slower. There are some novel guidelines where novel models are being discovered that can swap better between efficiency and security. Game theoretic and motivation issues are also being discovered. Finally merging anonymization with cryptographic techniques is also a route. Before performing an evaluation of the data mining algorithms, one wants to find out the objectives. In some cases, the objective is to twist data while still preserving some assets for data mining. Another objective is to attain a high data mining accuracy with greatest privacy protection.
Our current work imagines that privacy is a personal preference, so should be individually adjustable. That is, we want to make privacy protecting data mining approaches to replicate authenticity. We examined perturbation-based approaches with real-world data sets and provided applicability learning to the existing approaches. We found that the rebuilding of the original sharing may not work well with real-world data sets. We attempted to amend perturbation techniques and adjust the data mining tools. We also developed a new privacy preserving decision tree algorithms. Another growth is the platform for privacy preferences (P3P) by the World Wide Web association (W3C). P3P is an up-and-coming standard that facilitates web sites to convey their privacy practices in a typical format. The format of the strategies can be robotically recovered and appreciated by user agents. When a user comes in a web site, the privacy policies of the web site are communicated to the user; if the privacy policies are dissimilar from user favorites, the user is notified. User can then make a decision how to continue. Several major corporations are working onP3P standards.
Directions for Privacy
Thuraisingham verified in 1990 that the inference problem in common was unsolvable, therefore the suggestion was to discover the solvability features of the problem. We were able to explain comparable results for the privacy problem. Therefore, we need to inspect the involvement classes as well as the storage and time complication. We also need to discover the base of privacy preserving data mining algorithms and connected privacy ways out. There are various such algorithms. How do they evaluate with each other? We need a test bed with practical constraints to test the algorithms. Is it meaningful to observe privacy preserving data mining for each data mining algorithm and for all application? It is also time to enlarge real world circumstances where these algorithms can be used. Is it possible to build up realistic commercial products or should each association get used to products to suit their needs? Investigative privacy may create intelligence for healthcare and monetary applications. Does privacy work for defense and Intelligence purposes? Is it even important to have privacy for inspection and geospatial applications? Once the image of my home is on Google Earth, then how much isolation can I have? I may wish for my position to be private, but does it make sense if a camera can detain a picture of me? If there are sensors all over the position, is it important to have privacy preserving surveillance? This proposes that we require application detailed privacy. Next what is the connection between confidentiality, privacy and faith? If I as a user of Association A send data about me to Association B, then imagine I read the privacy policies imposed by Association B. If I agree to the privacy policies of Association B, then I will drive data about me to Association B. If I do not concur with the policies of association B, then I can bargain with association B. Even if the website affirms that it will not distribute private information with others, do I faith the website? Note: while secrecy is enforced by the association, privacy is strong-minded by the user. Therefore, for confidentiality, the association will conclude whether a user can have the data. If so, then the association can additionally decide whether the user can be trusted. Another way is how can we make sure the confidentiality of the data mining procedures and outcome? What sort of access control policies do we implement? How can we faith the data mining procedures and results as well as authenticate and validate the results? How can we join together confidentiality, privacy and trust with high opinion to data mining? We need to check up the research challenges and form a research schema. One question that Rakesh Agrawal inquired at the 2003 SIGKDD panel on privacy: “Is privacy and data mining friends or rivals?”. We think that they are neither associates nor rivals. We need progresses in both data mining and privacy. We require planning flexible systems. For some applications one may have to hub entirely on ‘pure’ data mining while for some others there may be a need for ‘privacy-preserving’ data mining. We need flexible data mining techniques that can settle in to the changing environments. We consider that technologists, legal specialists, social scientists, policy makers and privacy advocates must work together.
Conclusion
In this paper we have examined data mining applications in security and their implications for privacy. We have examined the idea of privacy and then talked about the developments particularly those on privacy preserving data mining. We then presented an agenda for research on privacy and data mining. Here are our conclusions. There is no collective definition for privacy, each organization must clear-cut what it indicates by privacy and develop suitable privacy policies. Technology only is not adequate for privacy. We require technologists, policy expert, legal experts and social scientists to effort on privacy. Some well acknowledged people have believed ‘forget about privacy’. Therefore, should we follow research on privacy? We trust that there are attractive research problems; therefore, we need to carry on with this research. Additionally, some privacy is better than nil. One more school of consideration is tried to avoid privacy destructions and if destructions take place, then put on trial. We need to put into effect suitable policies and checkup the legal aspects. We need to undertake privacy from all directions.
References
Agrawal, R., Srikant, R.: Privacy-Preserving Data Mining. In: SIGMOD Conference, pp.439–450 (2000).
Agrawal, R.: Data Mining and Privacy: Friends or Foes. In: SIGKDD Panel (2003).
Kantarcioglu, M., Clifton, C.: Privately Computing a Distributed k-nn Classifier. In: Bou-licaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS, vol. 3202, 279–290. Springer, Heidelberg (2004).
Kantarcioglu, M., Kardes, O.: Privacy-Preserving Data Mining Applications in the Mali-cious Model. In: ICDM Workshops, pp. 717–722 (2007).
Liu, L., Kantarcioglu, M., Thuraisingham, B.M.: The Applicability of the Perturbation-Based Privacy Preserving Data Mining for Real-World Data. Data Knowl. Eng. 65(1), 5–21 (2008).
Liu, L., Kantarcioglu, M., Thuraisingham, B.M.: A Novel Privacy Preserving Decision Tree. In: Proceedings Hawaii International Conf. on Systems Sciences (2009).
Thuraisingham, B.: One the Complexity of the Inference Problem. In: IEEE Computer Security Foundations Workshop (1990) (also available as MITRE Report, MTP-291).
Thuraisingham, B.M.: Privacy Constraint Processing in a Privacy-Enhanced Database Management System. Data Knowl. Eng. 55(2), 159–188 (2005).
Clifton, C.: Using Sample Size to Limit Exposure to Data Mining. Journal of Computer Security 8(4) (2000).
In this era where there is a lot of information to be handled at ago and actually with little available time, it is necessarily useful and wise to analyze data from different viewpoints and summarize it in such a way that it will take less time to interpret by the final user of that particular information.
This, in effect, reduces costs associated with processing data and information for the purpose intended and at the same time increasing the revenue base for the business through internal saving tactics. Data mining is the process by which data from different perspectives is analyzed and summarized into useful information. This task ensures that data achieve relevance in performing the intended duty. Data mining is also referred to as knowledge discovery.
During this period of high operation capacity of businesses, the role played by data mining cannot be assumed. Using statistical and mathematical techniques to dig through data warehouses helps in the recognition of significant facts, relationships, trend patterns, exceptions, and anomalies that might not be noticed by the concerned in the business environment. With this kind of approach, predicting customers’ behavior becomes very easy.
Areas of application for this software are also not few. First of all, it aids in market segmentation and direct marketing where the businessman knows the best way to get the highest response rate from the clients. This can be done by establishing the customer preferences. Data mining can also help a business entity to establish the level of customer runaways, that is, the number of potential customers who run away and join the competitor.
Fraud detection is made easy by this software, in addition to the fact that it creates room for interactive marketing. It is very easier to know the products common y purchased together by carrying out a market basket analysis. This will help the business in developing the products that can be purchased together and therefore gain from complementary effects of each other. Using the same software, trend analysis can be carried out, which will help in knowing the customer preferences and, therefore, the best service provision style best suited for the particular customers in question.
Discussion
Data mining is not just a hanging phenomenon, but it is part of the many series of analytical techniques, and it involves extracting, transforming, and loading transaction data into the depot data system. Secondly, there is storing and managing the data in a database system, after which provision is made for the data to be made available to analysts and technology experts. These experts analyze the data using application software to make sure that the data presented performs its intended duty satisfactorily.
Lastly, the data is availed in a useful design such as a graph (viewbi: group). This advancement has led to the rise of new developments like internet sales, web marketing, among other several recent developments like web seminars and web advertising. These developments have really led to more work being handled by a particular business entity at the same time ensuring the efficiency of the whole system.
Other types of analytical developments include business intelligence, data modeling, data visualization, metrics, and web analytics. All these are part of an information management system meant to improve the performance of the business and give, in the long run, a powerful result. This ensures a rapid return on the investment channeled into the business operations. Taking a look at the demo on the ACME bank’s profitability, it would be very easy to note that the whole operation of the bank is actually summarized in one place, making it very easy to understand the operations as a whole with just a glance at the demo (Bank profitability. SWF). This enhances cost minimization since all the costs that would be involved in the whole issue if it were to be checked stage by stage and department by department.
The quarterly internet sales on the viewbi (ViewBI: Group) is a good summary giving the general presentation of the scorecard detailing the finance of the group in terms of finance and revenue by category. Development and use of these financial visual models, offers a vibrant amalgamation of “supposing” scenarios and drill-down features which are helpful in future projection, financial performance, growth, risk, and market share. Every specific value point shows the viewer the underlying individual report which ca be used to modify sales growth rate and all other germane accounts measured.
In conclusion, the springing up of the analytical tools for analyzing data has improved the rate at which the operations are carried out and made it possible for data to be scrutinized from different perspectives, categorized and summarized in a concise manner.
Reference
ViewBI- Business Intelligence Reporting Dashboard. ADWEB Agency. 2007. Web.
Crystal Xcelsius Templates. Business Objects SA. 2007. Web.
Palace B. (1996) Data Mining: What is Data Mining? Web.