An Entity Resolution Approach to Isolate Instances of Human Trafficking Online
Workshop Paper, EMNLP '17 3rd Workshop on Noisy User-generated Text, pp. 77 - 84, September, 2017
Abstract
Human trafficking is a challenging law enforcement problem, and traces of victims of such activity manifest as ‘escort advertisements’ on various online forums. Given the large, heterogeneous and noisy structure of this data, building models to predict instances of trafficking is a convoluted task. In this paper we propose an entity resolution pipeline using a notion of proxy labels, in order to extract clusters from this data with prior history of human trafficking activity. We apply this pipeline to 5M records from backpage.com and report on the performance of this approach, challenges in terms of scalability, and some significant domain specific characteristics of our resolved entities.
BibTeX
@workshop{Nagpal-2017-121824,author = {Chirag Nagpal and Kyle Miller and Benedikt Boecking and Artur Dubrawski},
title = {An Entity Resolution Approach to Isolate Instances of Human Trafficking Online},
booktitle = {Proceedings of EMNLP '17 3rd Workshop on Noisy User-generated Text},
year = {2017},
month = {September},
pages = {77 - 84},
}
Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.