Back to Blogs
Advancing Adversarial Attacks in Tabular Machine Learning: A Deep Dive into CAA
January 6, 2025
Paper Review

Advancing Adversarial Attacks in Tabular Machine Learning: A Deep Dive into CAA

Machine LearningAdversarial AttacksTabular DataSecurityNeurIPSResearch

Authors: Mohammad Sadat Hossain, Nafis Tahmid, Shattik Islam Rhythm
Date: January 6, 2025


Introduction

In the rapidly evolving landscape of machine learning security, adversarial attacks have predominantly focused on computer vision and natural language processing. However, a significant portion of real-world machine learning applications actually process tabular data, especially in critical domains like finance, healthcare, and cybersecurity. The paper “Constrained Adaptive Attack: Effective Adversarial Attack Against Deep Neural Networks for Tabular Data” addresses this crucial gap by introducing novel approaches to generate adversarial examples for tabular data while respecting real-world constraints.


The Challenge of Tabular Adversarial Attacks

Unlike images or text, tabular data comes with inherent constraints that make traditional adversarial attack methods ineffective. For example:

  • In a financial dataset, features like “total debt” and “monthly payment” must maintain specific mathematical relationships.
  • Categorical features like “education level” cannot be arbitrarily modified to continuous values.

Key Limitations in Existing Approaches:

  • Most attacks ignore feature relationships.
  • Current methods fail to handle mixed data types or categorical features.
  • Available attacks like CPGD show low success rates.
  • Search-based methods like MOEVA are computationally expensive.

Technical Innovation: CAPGD

Adaptive Step Size

CAPGD adaptively adjusts the step size based on optimization progress:

  • Step size is halved when:
    • The loss hasn’t increased in 75% of steps since the last checkpoint.
    • The maximum loss hasn’t improved since the last checkpoint.

Mathematically:

ηk+1={ηkif progress is goodηk2otherwise\eta^{k+1} = \begin{cases} \eta^{k} & \text{if progress is good} \\ \frac{\eta^{k}}{2} & \text{otherwise} \end{cases}

Momentum Integration

Incorporates momentum for improved stability:

zk+1=PS(xk+ηkLxk)z^{k+1} = P_S\left( x^{k} + \eta^{k} \nabla L'x^{k} \right) xk+1=RΩ(PS(xk+α(zk+1xk)+(1α)(xkxk1)))x^{k+1} = R_{\Omega}\left( P_S\left( x^{k} + \alpha \left( z^{k+1} - x^{k} \right) + (1-\alpha) \left( x^{k} - x^{k-1} \right) \right) \right)

where α=0.75\alpha = 0.75 balances the current gradient and previous updates.

Repair Operator

A novel repair operator RΩR_{\Omega} ensures constraint satisfaction by projecting examples back into the valid data space.


Formulation of Constraints

CAPGD and CAA respect domain-specific constraints, preserving adversarial example validity via structured grammar.

Types of Constraints:

  1. Immutability: Certain features cannot be modified (e.g., “loan ID”).
  2. Boundaries: Features must remain within specific ranges (e.g., 5000Loan Amount1000005000 \leq \text{Loan Amount} \leq 100000).
  3. Type: Features retain their data type (e.g., categorical values remain categorical).
  4. Feature Relationships: Logical/mathematical relationships must be preserved (e.g., Total DebtMonthly Payments\text{Total Debt} \geq \text{Monthly Payments}).

Constraint Grammar

ω:=ω1ω2ω1ω2ψ1ψ2f{ψ1,,ψk},\omega := \omega_1 \land \omega_2 \mid \omega_1 \lor \omega_2 \mid \psi_1 \geq \psi_2 \mid f \in \{\psi_1, \ldots, \psi_k\},

where:

  • ω\omega: Constraint
  • ψ\psi: Numeric expression
  • ff: Feature

The Power of Ensemble: CAA

CAA combines CAPGD with MOEVA, leveraging their strengths:

  1. CAPGD is fast but less effective.
  2. MOEVA is slower but highly effective.

Achievements:

  • Up to 96.1% decrease in model accuracy.
  • 5x faster than pure MOEVA.
  • Best performance in 19 out of 20 experimental settings.

Experimental Validation

Datasets:

  • URL (phishing detection)
  • LCLD (credit scoring)
  • CTU (botnet detection)
  • WiDS (medical)

Architectures:

  • TabTransformer
  • RLN
  • VIME
  • STG
  • TabNet

Key Findings:

  • CAPGD outperforms all other gradient-based attacks.
  • CAA balances effectiveness with reduced computational cost.
  • Adversarial training shows varying effectiveness across architectures.

Future Directions

  1. Development of defenses against constrained adversarial attacks.
  2. Advanced approaches for handling complex feature relationships.
  3. Design of robust architectures for tabular data.
  4. Optimization of search-based components in CAA.

Conclusion

This work advances adversarial machine learning for tabular data by introducing CAPGD and CAA. These contributions not only enhance attack effectiveness but also establish new benchmarks for evaluating robustness in tabular ML models. The paper underscores the critical need to address vulnerabilities in sensitive domains like finance and healthcare.