πŸ’‘⏳ Mining deep into Data Mining - PART I ⏳πŸ’‘

 "Necessity is the mother of invention"

The need for knowledge is the root of data collection, discovery, and analysis. To be precise, we could say that the current technological world is drowning in data but starving for knowledge. Thus, data mining comes in handy

What is Data Mining?

It is the extraction of interesting, non-trivial, previously unknown, potentially useful, patterns or knowledge from the huge amount of data.

Want to know the alternative names of Data Mining?

πŸ‘‰ Knowledge Discovery and Databases (KDD)
πŸ‘‰ Data or Pattern analysis
πŸ‘‰ Data archeology
πŸ‘‰ Data dredging
πŸ‘‰ Information harvesting
πŸ‘‰ Business Intelligence

Data mining is indeed a confluence of multiple disciplines mainly

πŸ‘‰ Statistics
πŸ‘‰ Algorithms
πŸ‘‰ Data visualization
πŸ‘‰ Machine learning
πŸ‘‰ Pattern recognition
πŸ‘‰ Database Technology

Why not follow traditional data analysis?

πŸ‘‰ Traditional analysis of data will not be able to handle tera-bytes of data
πŸ‘‰ High dimensional data add complexity to the analysis.
πŸ‘‰ Complex data such as social network data, temporal and spatial data, multimedia such as audio, video are hard to be processed in a traditional manner

On what kinds of data can Data Mining be done?

πŸ‘‰ Relational database and data warehouse
πŸ‘‰ Advanced applications such as Heterogenous databases, social network data, graphs, multimedia, WWW, sensor data, data streams, multi-linked data, spatial, temporal or spatio-temporal data

Functionalities of Data Mining:

πŸ‘‰ Classification and prediction

                 ➡ Construct models that can distinguish classes - classify countries based on climate, classify cars based on performance.
                 ➡ Predict any numerical or missing value based on data inputs - Predict the house price after 5 years, cost estimate after a certain period of time

πŸ‘‰ Cluster analysis
                
                  ➡ Data is grouped together to form a group or cluster.
                  ➡ Class label (ie) cluster name is unknown.
                  ➡ Maximize intra-class similarity and minimize inter-class similarity.

πŸ‘‰ Outlier analysis
                
                  ➡ what are outliers?  : Data points that do not match with the general behaviour of the common data. They generally stand out from other data points.
                  ➡ Can be noise or exception.

πŸ‘‰ Trend and evolution analysis

                   ➡ Regression-based analysis
                   ➡ Pattern mining
                   ➡ Periodicity analysis
                   ➡ Similarity-based analysis

πŸ‘‰ Statistical analysis, association, etc

Basic steps involved in KDD process / Data Mining:

πŸ‘‰ Selecting suitable target data from the pool of common data
πŸ‘‰ Pre-process data
πŸ‘‰ Transform data into a flexible format
πŸ‘‰ Mining the data
πŸ‘‰ Interpret or evaluate the data and analyse patterns
πŸ‘‰ Finally, obtain Knowledge 
                

Issues in Data Mining

πŸ‘‰ Protection of data and privacy issues 
            
                ➡ Sensitive credit card information is been fed into the prediction model for fraud detection.
                ➡ Domain-specific data mining is highly ineffective.

πŸ‘‰ Handling noise and incomplete data

πŸ‘‰ Background knowledge of the data is required

πŸ‘‰ Knowledge fusion - Integration of the discovered knowledge with the existing one may be difficult.

πŸ‘‰ Diversity of data types such as web, multimedia requires different mining techniques such as parallel, distributed and incremental methods.



                         πŸ’‘ Let's keep mining deeper  !!! πŸ’‘




Comments

Articles by Hemapriya

Comprehending the state-of-art Digit Recognizer dataset using machine learning

Performance Analysis of Weather Data using Machine Learning

Market Basket Analysis using Association Rule-Mining in R language