[12-16]Research on Spatial database and on data quality
Date:2010-12-13
Title: Research on Spatial database and on data quality
Speaker: Dr. Ke DENG (University of Queensland)
Time: Dec. 16th, 2010. 9:30-11:30am
Location: Lecture room, Lab for Computer Science, Level 3 Building #5, Institute of Software, CAS
Bio:
Dr. Ke Deng is a Research Fellow and an assistant professor in DKE division, ITEE School of the University of Queensland. Before joining DKE, he was working with CSIRO ICT centre. His research interest includes Data Quality, Spatial Database. He was conferred his PhD degree in 2007 in University of Queensland. His Master's degree is in Information and Communication Technology and was awarded in 2001 in Griffith University. From 2001-2003, he has worked in an Internet Service Provider as a software engineer for two years. He has published a lot of very high quality papers in top journals and conferences such as VLDB, VLDB Journal, ICDE, and IEEE TKDE. He has also served in a lot of international journals and conferences.
Abstract:
Dr. Ke Deng will talk about his recent research on the following two directions:
1. Best Point Detour Query in Road Networks
A point detour is a temporary deviation from a user preferred path P (not necessarily a shortest network path) for visiting a data point such as a supermarket or McDonald's. The goodness of a point detour can be measured by the additional traveling introduced, called point detour cost or simply detour cost. Given a preferred path to be traveling on, Best Point Detour (BPD) query aims to identify the point detour with the minimum detour cost. This problem can be frequently found in our daily life but is less studied. In this work, the efficient processing of BPD query is investigated with support of devised optimization techniques. Furthermore, we investigate continuous-BPD query with target at the scenario where the path to be traveling on continuously changes when a user is moving to the destination along the preferred path. The challenge of continuous-BPD query lies in finding a set of update locations which split P into partitions. In the same partition, the user has the same BPD.We process continuous-BPD query by running BPD queries in a deliberately planned strategy. The efficiency study reveals that the number of BPD queries executed is optimal. The efficiency of BPD query and continuous-BPD query processing has been verified by extensive experiments.
2. Active Duplicate Detection (DASFAA 2009 Best Paper Runner Up)
The aim of duplicate detection is to group records in a relation which refer to the same entity in the real world such as a person or business. Most existing works require user specified parameters such as similarity threshold in order to conduct duplicate detection. These methods are called user-first in this paper. However, in many scenarios, pre-specification from the user is very hard and often unreliable, thus limiting applicability of user-first methods. In this paper, we propose a user-last method, called Active Duplicate Detection (ADD), where an initial solution is returned without forcing user to specify such parameters and then user is involved to refine the initialsolution. Different from user-first methods where user makes decision before any processing, ADD allows user to make decision based on an initial solution. The identified initial solution in ADD enjoys comparatively high quality and is easy to be refined in a systematic way (at almost zero cost).
Besides, he will share with the audience his experience about how to conduct top research and how to write high quality papers.