Cost-Sensitive Machine Learning (Chapman & Hall/CRC Machine Learning & Pattern Recognition)

Cost-Sensitive Machine Learning (Chapman & Hall/CRC Machine Learning & Pattern Recognition)

Balaji Krishnapuram

Language: English

Pages: 331

ISBN: 1439839255

Format: PDF / Kindle (mobi) / ePub


In machine learning applications, practitioners must take into account the cost associated with the algorithm. These costs include:

  • Cost of acquiring training data
  • Cost of data annotation/labeling and cleaning
  • Computational cost for model fitting, validation, and testing
  • Cost of collecting features/attributes for test data
  • Cost of user feedback collection
  • Cost of incorrect prediction/classification

Cost-Sensitive Machine Learning is one of the first books to provide an overview of the current research efforts and problems in this area. It discusses real-world applications that incorporate the cost of learning into the modeling process.

The first part of the book presents the theoretical underpinnings of cost-sensitive machine learning. It describes well-established machine learning approaches for reducing data acquisition costs during training as well as approaches for reducing costs when systems must make predictions for new samples. The second part covers real-world applications that effectively trade off different types of costs. These applications not only use traditional machine learning approaches, but they also incorporate cutting-edge research that advances beyond the constraining assumptions by analyzing the application needs from first principles.

Spurring further research on several open problems, this volume highlights the often implicit assumptions in machine learning techniques that were not fully understood in the past. The book also illustrates the commercial importance of cost-sensitive machine learning through its coverage of the rapid application developments made by leading companies and academic research labs.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

approximate the φ0/1 utility measure from the same section. Under this interpretation, the φV E function from the QBC framework in Section 1.2.2 is very similar to φH . Namely, it makes the assumptions that x is representative of U and that the expected future vote entropy of the query point is zero. The crucial difference is that QBC methods replace the point estimate θ with a distribution over possible hypotheses θ ∈ C, approximating the version space with a committee. This Bayesian flavor

→ DVD 77.3% 78.5% 81.4% DVD → Book 74.1% 77.6% 77.1% Kitchen → Electronic 82.8% 85.1% 85.0% Electronic → Kitchen 85.0% 85.1% 86.8% Bound 82.6% 81.4% 84.6% 87.1% and target domains to find a mapping between the features from these domains, by which a common feature space is constructed. Extending this idea, Pan et al. [44] develop a spectral feature alignment (SFA) algorithm to align domain-specific words from different domains into unified clusters, with the help of domain-independent words as

researched (and still somewhat controversial) topic of semisupervised learning, clearly highlighting the open challenges. Bin Cao, Yu Zhang, and Qiang Yang summarize the current approaches to transfer learning and multi-task learning in Chapter 3. In Chapter 4, Vikas C. Raykar describes recent advances in the design of cascaded classifiers that attempt to reduce the cost of acquiring features during prediction time by making a decision after acquiring the most important information where

estimate the generalization performance of the classifier with respect to each class, A(c); class c can then be sampled according to: ptA (c) ∝ max 0, At−1 (c) − At−2 (c) . t−1 (c ) − At−2 (c )} c max {0, A Selective Data Acquisition for Machine Learning 139 Alternatively, we could consider general volatility in class members’ predicted labels, beyond improvement in the model’s ability to predict the class. Again, by using cross-validated predictions at successive epochs, it is possible to

the prospective acquisition’s rank numbers across all learning tasks. Both of these methods [69] are general, in that they can be used to learn different types of models. In fact, in principle, these policies can also apply when a different active learning policy is used to rank prospective acquisitions for different types of models (e.g., aggregation and classification models). However, their effectiveness has not been evaluated for these settings related to multiple active learning techniques.

Download sample

Download