π‘⏳ Mining deep into Data Mining - PART I ⏳π‘
"Necessity is the mother of invention"
The need for knowledge is the root of data collection, discovery, and analysis. To be precise, we could say that the current technological world is drowning in data but starving for knowledge. Thus, data mining comes in handy
What is Data Mining?
It is the extraction of interesting, non-trivial, previously unknown, potentially useful, patterns or knowledge from the huge amount of data.
Want to know the alternative names of Data Mining?
π Knowledge Discovery and Databases (KDD)
π Data or Pattern analysis
π Data archeology
π Data dredging
π Information harvesting
π Business Intelligence
Data mining is indeed a confluence of multiple disciplines mainly
π Statistics
π Algorithms
π Data visualization
π Machine learning
π Pattern recognition
π Database Technology
Why not follow traditional data analysis?
π Traditional analysis of data will not be able to handle tera-bytes of data
π High dimensional data add complexity to the analysis.
π Complex data such as social network data, temporal and spatial data, multimedia such as audio, video are hard to be processed in a traditional manner
On what kinds of data can Data Mining be done?
π Relational database and data warehouse
π Advanced applications such as Heterogenous databases, social network data, graphs, multimedia, WWW, sensor data, data streams, multi-linked data, spatial, temporal or spatio-temporal data
Functionalities of Data Mining:
π Classification and prediction
➡ Construct models that can distinguish classes - classify countries based on climate, classify cars based on performance.
➡ Predict any numerical or missing value based on data inputs - Predict the house price after 5 years, cost estimate after a certain period of time
π Cluster analysis
➡ Data is grouped together to form a group or cluster.
➡ Class label (ie) cluster name is unknown.
➡ Maximize intra-class similarity and minimize inter-class similarity.
π Outlier analysis
➡ what are outliers? : Data points that do not match with the general behaviour of the common data. They generally stand out from other data points.
➡ Can be noise or exception.
π Trend and evolution analysis
➡ Regression-based analysis
➡ Pattern mining
➡ Periodicity analysis
➡ Similarity-based analysis
π Statistical analysis, association, etc
Basic steps involved in KDD process / Data Mining:
π Selecting suitable target data from the pool of common data
π Pre-process data
π Transform data into a flexible format
π Mining the data
π Interpret or evaluate the data and analyse patterns
π Finally, obtain Knowledge
Issues in Data Mining
π Protection of data and privacy issues
➡ Sensitive credit card information is been fed into the prediction model for fraud detection.
➡ Domain-specific data mining is highly ineffective.
π Handling noise and incomplete data
π Background knowledge of the data is required
π Knowledge fusion - Integration of the discovered knowledge with the existing one may be difficult.
π Diversity of data types such as web, multimedia requires different mining techniques such as parallel, distributed and incremental methods.
Comments
Post a Comment