作者: Ihab IlyasXu Chu
作者单位: University of Waterloo;;Georgia Institute of Technology
英文丛书称: ACM Books
出版社: ACMNY,   2019
ISBN: 978-1-4503-7152-0
来源数据库: Association for Computing Machinery
原始语种摘要: Data quality is one of the most important problems in data management, since dirty data often leads to inaccurate data analytics results and incorrect business decisions. Poor data across businesses and the U.S. government are reported to cost trillions of dollars a year. Multiple surveys show that dirty data is the most common barrier faced by data scientists. Not surprisingly, developing effective and efficient data cleaning solutions is challenging and is rife with deep theoretical and engineering problems.(#br)This book is about data cleaning, which is used to refer to all kinds of tasks and activities to detect and repair errors in the data. Rather than focus on a particular data cleaning task, we give an overview of the endto- end data cleaning process, describing various error...
全文获取路径: ACM  (合作)

  • cleaning 精选
  • covering 覆盖物
  • machine 机器
  • inaccurate 不准确的
  • surprisingly 惊人地
  • popularity 名望
  • recognize 承认
  • efficient 有用的
  • dirty 肮脏的
  • overview 概观