Large Volume Data Handling: An Efficient Data Mining Solution

Do you need this or any other assignment done for you from scratch?
We have qualified writers to help you.
We assure you a quality paper that is 100% free from plagiarism and AI.
You can choose either format of your choice ( Apa, Mla, Havard, Chicago, or any other)

NB: We do not resell your papers. Upon ordering, we do an original paper exclusively for you.

NB: All your data is kept safe from the public.

Click Here To Order Now!

Introduction

The project proposal is specially designed to highlight the problem of large volume data handling and provides an efficient data mining solution. This project proposal is specifically designed keeping an eye on communication service delivering problems and provides its solution in a most approximate way. The proposal starts with basic concepts of data mining, related terms used in data mining, company background and business problem, in later sections this proposal highlights existing problem with the system and later on proposed solution and discussion. The proposal ends with conclusion and references.

Data Mining

Data mining is a commonly used term in computer field. Data mining is the process of sorting huge amount of data and finding out the relevant data. Usually ERP systems are used for sorting data in large organizations. Data mining is commonly known as knowledge discovery. This is the process of analyzing data from numerous perspectives and summarizing it into useful information for further processing. There are numerous companies following data mining techniques in order to make their system effective and time saving. Data mining is widely used for the maintenance of data which helps a lot to an organization in order to organize its resources and capital in a proper way. In other words, data mining is the process of finding relationship between dozens of fields. Database system gets affected if data mining techniques are not properly applied in a certain domain. Data mining is widely sued by large firms with strong consumer focus. Data mining techniques empowers organizations to identify the relationship between entities and to create a strong relationship among internal factors such as cost, positioning, staff skills and also it gives flexible path to create relationship among external factors like economic indicators, competition and customer demographics. Data mining also helps in determining the impact on sales caused by internal factors changes.

Terms

  • Data: Data is a raw material usually found in bulk quantity. There are three types of data operational or transactional data, non operational data and Meta data.
  • Information: The pattern, relationship and association among all this data could provide information.
  • Knowledge: Information can be converted into knowledge if related to previous facts and figures and previous statistics of an organization (Han & Kamber, 2000).
  • Data warehouses: Dramatic place for storage of data, data warehouse gives flexible opportunity to integrate new data with a previous one. Data warehousing is commonly defined as a process of proper data management and retrieval. Data warehousing gives a concept of storing all data centrally in large organizations.

Knowledge discovery in databases (KDD): KDD is a commonly used term in databases it’s a non-trivial extraction of previously unknown data and information from large databases.

Company Background

PCCW is the largest communication network in Hong Kong. PCCW Limited (PCCW) is one of the best communication companies in HKT (PCCW, 2008). HKT Group Holdings Limited, Hong Kong’s premier telecommunications are the most renowned provider and a world-class candidate in transferring information and communication technologies. PCCW also holds the great interest of foreign investors. The PCCW posses a remarkable position in market and it employs a total of 16,200 employees. Its headquarter is located in Hong Kong and is renowned in maintaining a presence in Europe, the Middle East, Africa, the Americas, mainland China and many other regions of Asia.

HKT has gained so much popularity in telecommunication business HKT Group Holdings Limited (HKT) was founded in 2008 with the aim of providing telecommunication services, media and IT solutions. They are renowned as the Hong Kong’s first quadruple-play experience provider, PCCW/HKT announces a wide range of media content and services in following four domains – fixed-line, broadband Internet, TV and mobile. They have gained a great success worldwide and posses following credits: Hong Kong’s leading telecoms player, genuine quadruple-play experience, Expert in ICT solutions, Expanding into international markets. They offer following services:

Voice Services, Data Services, Internet Services, Mobile Service, Equipment Solutions, ICT Solutions, Contact Center Services, Telecom and IPTV Solutions, Interconnect Services, TSCM Services. PCCW is also famous in outsourcing flagship.

Business process

Business process starts from setting up a wireless telecommunication network using different routers and switches. In order to provide fault free network number of employees and tools are used. The basic problem arises is the management of bulk amount of data efficiently. As, they provide telecommunication and IT services, the main problem arises is of data redundancy and data consistency. Both these problems are the main hurdles in providing valuable services. There is a huge amount of customer’s data also there to be deal with.

Existing Problem

Problem question: How to deal with large amount of customer data and services info in order to provide speedy communication system?

PCCW is a large organization and its primary responsibility is to provide best communication network to all its customers. There is a high risk of loosing customers if network gets fail due to huge network traffic. They offer services in telecommunication and also offer IPTV solutions. IPTV solutions give complete opportunity to integrate satellite systems. The main problem occur in providing speedy connections is data redundancy, time used in searching a particular record and noise distortion over large networks. It is really important that the service provided to all customers should be cost effective, speedy and based on fault free network.

Goals and objectives

The main objective of using data mining techniques is to reduce data redundancy over a large network. It also helps company to better utilize its resources and gives an opportunity to allocate resources in an effective manner. Data mining techniques provides strong facts and data which help in decision making. They also provide a path for better growth and allocation of resources.

Proposed solution

There are numerous data mining techniques are available that suits above scenario. There are many techniques provides excellent data handling over large networks if applied properly. Distortion and clustering techniques is proposed in order to solve above stated problem. Distortion techniques are specifically designed keeping an eye on the changing needs of explore data, this technique also helps in data exploration process by emphasizing on details and preserves an overview of the complete data. The main objective of distortion techniques is to explore high level of detail with the combination of lower level of data detail. For multidimensional data sets a dynamic projection method is widely used to change the overall projections. In order to solve this problem distortion technique will help a lot. PCCW team need to implement a structure in which data handling must be strong i.e. as they use ERP solution for data handling so according to the proposed solution they need to obtain the relationship between fields and then define a structure to link high level of data with lower detail of data based on details or attributes of data. When there is a link between both details, so when any particular data is called the search result would be according to the requirements and lots of time will be saved. Browsing is very difficult over large networks where a bulk quantity of data is available. With the help of interactive filtering and division of large data into smaller groups along with the relationship between fields this problem could be solve up to high extent.

Clustering is a process widely used for portioning data sets in meaningful classes for further effective processing. Clustering is commonly known as unspecified classification of data without the combination of predefined classes. Clustering is a techniques used for division of large volume of data into small identical groups. There is a numerous perspective to classify clustering techniques in data mining domain. Clustering plays a pivotal role in data mining applications. Clustering has become a significant problem in past few years’ databases, graphics, pattern recognition, neural networks and computer graphics. Clustering technique can solve the identified problem as PCCW poses a wide network so if the data will be divided in small groups, according to their Meta data and will be stored in a central respiratory system would be beneficial and save time. ERP solutions provide well defined data structure but still numerous companies are using other software along with ERP as integration problem is associated with an ERP solution. PCCW needs a well defined integrated structure of data for effective service. If clustering technique will be applied so the data would be stored in different groups, whenever a particular data will be searched the crawler or pointer will first check its Metadata and then enter in the group. By this way data redundancy problem can be solved as division of data based on Meta data would not allow the same entry with same Meta data. If in PCCW structure there would be no data redundancy so automatically it will save lots of time in finding a particular record. Results and deliverables of this approach may vary due to increasing amount of customers day by day. The proposed solution is significant in handling of large volume of data over large databases.

Sample Process Model

 in above figure Perl script is applied in order to define the paths for data. PCCW is a network where data travels from different directions so it’s really necessary that data follows the correct path so the network traffic could be handled properly and data storage can be made easy.
Figure 1. in above figure Perl script is applied in order to define the paths for data. PCCW is a network where data travels from different directions so it’s really necessary that data follows the correct path so the network traffic could be handled properly and data storage can be made easy.

Deliverables

Details Before After
Time required 1-2 minutes 40-50 seconds
Project Load Uneven & distracted Organized & Balanced
Data Placement Uneven & unorganized Well utilized
Searching Time 1-2 minutes 40-50 seconds

Discussion

There are lots of advantages of using these both techniques in PCCW environment as PCCW is a very large network and posses bulk quantity of data over large network. It’s harder to manage the complexity of large data with the rate of increasing customers. There is an issue of mishandling of data also involves in such cases. In order to solve this problem it’s really necessary to detect the exact problem and then proposed technique is applied in order to get perfect results. Another alternate approach is neural network can be applied in such environment. Every algorithm and proposed solution poses some advantages and disadvantages. The selection of solution depends on environment, requirements and level of fitness in order to solve the problem.

Conclusion

PCCW is leading firm in Hong Kong offers telecommunication services. It has a huge list of customers and the rate of upcoming customers is also very high. PCCW is a wide network and its being ruling its position from last many years. PCCW faces problems in handling large volume of data. Proposed data mining techniques help a lot in establishing a fault tolerant network and also helps in proper allocation of resources and staff. The proposal gives proper justification and solution to the identified problem. Management can make decision on the basis of fair and free data obtained with the aid of proposed model.

References

  1. Han, J. & Kamber, and M. (2000), Data Mining: Concepts and Techniques. Morgan Kaufmann
  2. Prefix Clustering, (2006), C-BGP and prefix clustering.
  3. PCCW, (2008), , Web.
Do you need this or any other assignment done for you from scratch?
We have qualified writers to help you.
We assure you a quality paper that is 100% free from plagiarism and AI.
You can choose either format of your choice ( Apa, Mla, Havard, Chicago, or any other)

NB: We do not resell your papers. Upon ordering, we do an original paper exclusively for you.

NB: All your data is kept safe from the public.

Click Here To Order Now!