Tuesday, 30 July 2013

The Era of the 3V Data Mining

Data mining passed through a number of historical stages. The very first stage was the foundation of the independent field of data mining with the series of the Knowledge Discovery from Databases (KDD) workshops started in 1989 that later became the premier data mining conference. This stage refers to the first “V” of the data mining era we live in nowadays. This “V” refers to the “volume”.  Although the volumes of the data over two decades ago were small when compared to the current volumes of data, they were then big enough to challenge the traditional statistics and machine learning techniques and state-of-the-art hardware. The second “V” was linked to the second stage of the data mining historical development; the “velocity”. This stage referred to data stream mining, which is defined by the process of performing approximate data mining on high speed input data records [3]. We can trace the early developments in this area to the late 1990s [1], with maturity being reached in about a decade of continuous research with thousands of papers published. The last “V” stands for the “variety”. Variety is a more recent development in the data mining area that has been the outcome of the maturity in the field of storage and retrieval of semi-structured and unstructured data. This in turn has been an important development dealing with the increasing reliance on social media websites as important source of information.

The combination of the 3 Vs has been referred to as Big Data analytics. This combination is the third wave of developments in data mining [2]. Successful deployment of Big Data analytics will change the scale at which data were analysed, and digital humanities will be provided with tools that will mark its rise and success.

References
[1] Alon, N., Matias, Y., & Szegedy, M. (1996, July). The space complexity of approximating the frequency moments. In Proceedings of the twenty-eighth annual ACM symposium on Theory of computing (pp. 20-29). ACM.
[2] Cuzzocrea, A., & Gaber, M. M. (2013). Data science and distributed intelligence: recent developments and future insights. In Intelligent Distributed Computing VI (pp. 139-147). Springer Berlin Heidelberg.

[3] Gaber, M. M., Zaslavsky, A., & Krishnaswamy, S. (2005). Mining data streams: a review. ACM Sigmod Record, 34(2), 18-26.

Tuesday, 31 August 2010

Pocket Data Mining

The term "Pocket Data Mining" has been coined in our first paper that describes a general architecture that we hope to be fully implemented in the next couple of years. The paper will be presented in the 22nd IEEE  International Conference on Tools with Artificial Intelligence (ICTAI 2010), in Arras, France between the 27th and the 29th of October, 2010.

Pocket data mining refers to the use of the agent technology to enable mobile data stream mining in an ad hoc collaborative computing environment.

From the user preceptive, we would be able to use each other's smart phones to perform online data analysis that can help us in many important applications.

I will be posting more details in due course. For now the paper has the following details:

Frederic Stahl, Mohamed Medhat Gaber, Max Bramer, and Philip S. Yu, Pocket Data Mining: Towards Collaborative Data Mining in Mobile Computing Environments, to appear in the Proceedings of the IEEE 22nd International Conference on Tools with Artificial Intelligence (ICTAI 2010), Arras, France, 27-29 October, 2010.

Sunday, 8 August 2010

My first blog

This is my first blog. I will be posting here my thoughts and activities whenever I have time to do so.