๐Ÿ’ก⏳ Mining deep into Data Mining - PART I ⏳๐Ÿ’ก

 "Necessity is the mother of invention"

The need for knowledge is the root of data collection, discovery, and analysis. To be precise, we could say that the current technological world is drowning in data but starving for knowledge. Thus, data mining comes in handy

What is Data Mining?

It is the extraction of interesting, non-trivial, previously unknown, potentially useful, patterns or knowledge from the huge amount of data.

Want to know the alternative names of Data Mining?

๐Ÿ‘‰ Knowledge Discovery and Databases (KDD)
๐Ÿ‘‰ Data or Pattern analysis
๐Ÿ‘‰ Data archeology
๐Ÿ‘‰ Data dredging
๐Ÿ‘‰ Information harvesting
๐Ÿ‘‰ Business Intelligence

Data mining is indeed a confluence of multiple disciplines mainly

๐Ÿ‘‰ Statistics
๐Ÿ‘‰ Algorithms
๐Ÿ‘‰ Data visualization
๐Ÿ‘‰ Machine learning
๐Ÿ‘‰ Pattern recognition
๐Ÿ‘‰ Database Technology

Why not follow traditional data analysis?

๐Ÿ‘‰ Traditional analysis of data will not be able to handle tera-bytes of data
๐Ÿ‘‰ High dimensional data add complexity to the analysis.
๐Ÿ‘‰ Complex data such as social network data, temporal and spatial data, multimedia such as audio, video are hard to be processed in a traditional manner

On what kinds of data can Data Mining be done?

๐Ÿ‘‰ Relational database and data warehouse
๐Ÿ‘‰ Advanced applications such as Heterogenous databases, social network data, graphs, multimedia, WWW, sensor data, data streams, multi-linked data, spatial, temporal or spatio-temporal data

Functionalities of Data Mining:

๐Ÿ‘‰ Classification and prediction

                 ➡ Construct models that can distinguish classes - classify countries based on climate, classify cars based on performance.
                 ➡ Predict any numerical or missing value based on data inputs - Predict the house price after 5 years, cost estimate after a certain period of time

๐Ÿ‘‰ Cluster analysis
                
                  ➡ Data is grouped together to form a group or cluster.
                  ➡ Class label (ie) cluster name is unknown.
                  ➡ Maximize intra-class similarity and minimize inter-class similarity.

๐Ÿ‘‰ Outlier analysis
                
                  ➡ what are outliers?  : Data points that do not match with the general behaviour of the common data. They generally stand out from other data points.
                  ➡ Can be noise or exception.

๐Ÿ‘‰ Trend and evolution analysis

                   ➡ Regression-based analysis
                   ➡ Pattern mining
                   ➡ Periodicity analysis
                   ➡ Similarity-based analysis

๐Ÿ‘‰ Statistical analysis, association, etc

Basic steps involved in KDD process / Data Mining:

๐Ÿ‘‰ Selecting suitable target data from the pool of common data
๐Ÿ‘‰ Pre-process data
๐Ÿ‘‰ Transform data into a flexible format
๐Ÿ‘‰ Mining the data
๐Ÿ‘‰ Interpret or evaluate the data and analyse patterns
๐Ÿ‘‰ Finally, obtain Knowledge 
                

Issues in Data Mining

๐Ÿ‘‰ Protection of data and privacy issues 
            
                ➡ Sensitive credit card information is been fed into the prediction model for fraud detection.
                ➡ Domain-specific data mining is highly ineffective.

๐Ÿ‘‰ Handling noise and incomplete data

๐Ÿ‘‰ Background knowledge of the data is required

๐Ÿ‘‰ Knowledge fusion - Integration of the discovered knowledge with the existing one may be difficult.

๐Ÿ‘‰ Diversity of data types such as web, multimedia requires different mining techniques such as parallel, distributed and incremental methods.



                         ๐Ÿ’ก Let's keep mining deeper  !!! ๐Ÿ’ก




Comments

Articles by Hemapriya

Comprehending the state-of-art Digit Recognizer dataset using machine learning

Performance Analysis of Weather Data using Machine Learning

⏳ Mining deep into Data Mining - Statistics - PART I ⏳