Data mining passed through a
number of historical stages. The very first stage was the foundation of the
independent field of data mining with the series of the Knowledge Discovery
from Databases (KDD) workshops started in 1989 that later became the premier
data mining conference. This stage refers to the first “V” of the data mining
era we live in nowadays. This “V” refers to the “volume”. Although the volumes of the data over two
decades ago were small when compared to the current volumes of data, they were
then big enough to challenge the traditional statistics and machine learning
techniques and state-of-the-art hardware. The second “V” was linked to the
second stage of the data mining historical development; the “velocity”. This
stage referred to data stream mining, which is defined by the process of
performing approximate data mining on high speed input data records [3]. We can
trace the early developments in this area to the late 1990s [1], with maturity
being reached in about a decade of continuous research with thousands of papers
published. The last “V” stands for the “variety”. Variety is a more recent
development in the data mining area that has been the outcome of the maturity
in the field of storage and retrieval of semi-structured and unstructured data.
This in turn has been an important development dealing with the increasing
reliance on social media websites as important source of information.
The combination of the 3 Vs has
been referred to as Big Data analytics. This combination is the third
wave of developments in data mining [2]. Successful deployment of Big Data
analytics will change the scale at which data were analysed, and digital
humanities will be provided with tools that will mark its rise and success.
References
[1] Alon, N., Matias, Y., & Szegedy, M. (1996, July). The space
complexity of approximating the frequency moments. In Proceedings of the
twenty-eighth annual ACM symposium on Theory of computing (pp. 20-29). ACM.
[2] Cuzzocrea, A., & Gaber, M. M. (2013). Data science and distributed
intelligence: recent developments and future insights. In Intelligent
Distributed Computing VI (pp. 139-147). Springer Berlin Heidelberg.
[3] Gaber, M. M., Zaslavsky, A., & Krishnaswamy, S. (2005). Mining
data streams: a review. ACM Sigmod Record, 34(2), 18-26.
No comments:
Post a Comment