Sachdeva M., Narayanan C., Wiedenkeller M., Sedlakova J., Bernard J.

LLM-generated tabular data is creating new opportunities for data-driven applications in academia, business, and society. To leverage benefits like missing value imputation, labeling, and enrichment with context-aware attributes, LLM-generated data needs a critical validation process. The number of pioneering approaches is increasing fast, opening a promising validation space that, so far, remains unstructured. We present a design space for the critical validation of LLM-generated tabular data with two dimensions: First, the Analysis Granularity dimension-from within-attribute (single-item and multi-item) to across-attribute perspectives (1 × 1, 1 × m, and n × n). Second, the Data Source dimension-differentiating between LLM-generated values, ground truth values, explanations, and their combinations. We discuss analysis tasks for each dimension cross-cut, map 19 existing validation approaches, and discuss the characteristics of two approaches in detail, demonstrating descriptive power.

Design Space for the Critical Validation of LLM-Generated Tabular Data

Sachdeva M., Narayanan C., Wiedenkeller M., Sedlakova J., Bernard J.

DOI

Type

Publication Date