2026-03-17-xldiff.md (58492B)
1 # xldiff Implementation Plan 2 3 > **For agentic workers:** REQUIRED: Use superpowers:subagent-driven-development (if subagents available) or superpowers:executing-plans to implement this plan. Steps use checkbox (`- [ ]`) syntax for tracking. 4 5 **Goal:** Port go-xldiff to Rust as the `xldiff` binary in this repo, with type-aware comparison and tolerance support. 6 7 **Architecture:** New `src/diff.rs` module for core diff logic (data structures, positional/key-based algorithms, tolerance). New `src/bin/xldiff.rs` binary for CLI parsing, file:sheet argument handling, and four output formatters (text, markdown, json, csv). Reuses existing `reader`, `metadata`, `filter` (for `resolve_column`), and `formatter` modules. 8 9 **Tech Stack:** Rust, polars (DataFrames), calamine (reading), clap (CLI), serde_json (JSON output), std::io::IsTerminal (TTY detection) 10 11 **Spec:** `docs/superpowers/specs/2026-03-17-xldiff-design.md` 12 13 --- 14 15 ### Task 1: Cargo.toml & lib.rs setup 16 17 **Files:** 18 - Modify: `Cargo.toml` 19 - Modify: `src/lib.rs` 20 21 - [ ] **Step 1: Add xldiff binary entry and dependencies to Cargo.toml** 22 23 Add after the xlfilter `[[bin]]` block: 24 25 ```toml 26 [[bin]] 27 name = "xldiff" 28 path = "src/bin/xldiff.rs" 29 ``` 30 31 Add to `[dependencies]`: 32 33 ```toml 34 serde_json = { version = "1", features = ["preserve_order"] } 35 ``` 36 37 - [ ] **Step 2: Add diff module to lib.rs** 38 39 ```rust 40 pub mod cell; 41 pub mod diff; 42 pub mod filter; 43 pub mod formatter; 44 pub mod metadata; 45 pub mod reader; 46 pub mod writer; 47 ``` 48 49 - [ ] **Step 3: Create empty diff.rs so the project compiles** 50 51 Create `src/diff.rs`: 52 53 ```rust 54 // Diff engine for comparing two Excel sheets. 55 ``` 56 57 - [ ] **Step 4: Verify project compiles** 58 59 Run: `cargo check` 60 Expected: success (no errors) 61 62 - [ ] **Step 5: Commit** 63 64 ```bash 65 git add Cargo.toml src/lib.rs src/diff.rs 66 git commit -m "chore: add xldiff binary entry, serde deps, empty diff module" 67 ``` 68 69 --- 70 71 ### Task 2: Make formatter helpers public 72 73 **Files:** 74 - Modify: `src/formatter.rs` 75 76 The markdown output formatter in xldiff needs `render_table`, `compute_col_widths`, `render_table_header`, `render_table_rows`, and `format_any_value` from `formatter.rs`. These are currently private. 77 78 - [ ] **Step 1: Make the five helper functions public** 79 80 In `src/formatter.rs`, change: 81 82 - `fn compute_col_widths(` → `pub fn compute_col_widths(` 83 - `fn render_table_header(` → `pub fn render_table_header(` 84 - `fn render_table_rows(` → `pub fn render_table_rows(` 85 - `fn render_table(` → `pub fn render_table(` 86 - `fn format_any_value(` → `pub fn format_any_value(` 87 88 - [ ] **Step 2: Verify project compiles** 89 90 Run: `cargo check` 91 Expected: success 92 93 - [ ] **Step 3: Commit** 94 95 ```bash 96 git add src/formatter.rs 97 git commit -m "refactor: make formatter table helpers public for xldiff reuse" 98 ``` 99 100 --- 101 102 ### Task 3: Make `col_letter_to_index` and `resolve_column` available from filter module 103 104 **Files:** 105 - Modify: `src/filter.rs` 106 107 `col_letter_to_index` is currently private in `filter.rs`. The xldiff binary needs `resolve_column` (already public) and `resolve_columns` (already public), which internally use `col_letter_to_index`. Since `resolve_column` is already public, no changes are needed here — xldiff will call `xlcat::filter::resolve_column()` directly. 108 109 **This task is a no-op** — confirming that the existing public API is sufficient. Skip to Task 4. 110 111 --- 112 113 ### Task 4: diff.rs — data structures 114 115 **Files:** 116 - Modify: `src/diff.rs` 117 118 - [ ] **Step 1: Write the data structures** 119 120 ```rust 121 use polars::prelude::*; 122 123 /// Source file and sheet metadata for display. 124 #[derive(Debug, Clone)] 125 pub struct SheetSource { 126 pub file_name: String, 127 pub sheet_name: String, 128 } 129 130 /// A single row from an added or removed set. 131 #[derive(Debug, Clone)] 132 pub struct DiffRow { 133 pub values: Vec<String>, 134 } 135 136 /// A change in a single cell. 137 #[derive(Debug, Clone)] 138 pub struct CellChange { 139 pub column: String, 140 pub old_value: String, 141 pub new_value: String, 142 } 143 144 /// A row present in both files with cell-level differences. 145 #[derive(Debug, Clone)] 146 pub struct ModifiedRow { 147 pub key: Vec<String>, 148 pub changes: Vec<CellChange>, 149 } 150 151 /// Result of comparing two sheets. 152 #[derive(Debug, Clone)] 153 pub struct DiffResult { 154 pub headers: Vec<String>, 155 pub key_columns: Vec<String>, 156 pub added: Vec<DiffRow>, 157 pub removed: Vec<DiffRow>, 158 pub modified: Vec<ModifiedRow>, 159 pub source_a: SheetSource, 160 pub source_b: SheetSource, 161 } 162 163 impl DiffResult { 164 pub fn has_differences(&self) -> bool { 165 !self.added.is_empty() || !self.removed.is_empty() || !self.modified.is_empty() 166 } 167 } 168 169 /// Options controlling how the diff is performed. 170 #[derive(Debug, Clone, Default)] 171 pub struct DiffOptions { 172 pub key_columns: Vec<String>, 173 pub tolerance: Option<f64>, 174 } 175 ``` 176 177 - [ ] **Step 2: Verify project compiles** 178 179 Run: `cargo check` 180 Expected: success 181 182 - [ ] **Step 3: Commit** 183 184 ```bash 185 git add src/diff.rs 186 git commit -m "feat(xldiff): add diff data structures — DiffResult, DiffRow, ModifiedRow, CellChange" 187 ``` 188 189 --- 190 191 ### Task 5: diff.rs — positional mode 192 193 **Files:** 194 - Modify: `src/diff.rs` 195 - Create: `tests/test_xldiff.rs` (unit tests only at first) 196 197 - [ ] **Step 1: Write the failing test for positional diff** 198 199 Add at the bottom of `src/diff.rs`: 200 201 ```rust 202 #[cfg(test)] 203 mod tests { 204 use super::*; 205 206 fn make_source(name: &str) -> SheetSource { 207 SheetSource { 208 file_name: format!("{name}.xlsx"), 209 sheet_name: "Sheet1".to_string(), 210 } 211 } 212 213 #[test] 214 fn test_positional_no_diff() { 215 let s1 = Series::new("name".into(), &["Alice", "Bob"]); 216 let s2 = Series::new("score".into(), &[90i64, 80]); 217 let df_a = DataFrame::new(vec![s1.clone().into_column(), s2.clone().into_column()]).unwrap(); 218 let df_b = DataFrame::new(vec![s1.into_column(), s2.into_column()]).unwrap(); 219 220 let opts = DiffOptions::default(); 221 let result = diff_positional(&df_a, &df_b, &opts, make_source("a"), make_source("b")).unwrap(); 222 assert!(!result.has_differences()); 223 assert!(result.added.is_empty()); 224 assert!(result.removed.is_empty()); 225 } 226 227 #[test] 228 fn test_positional_added_removed() { 229 let df_a = DataFrame::new(vec![ 230 Series::new("name".into(), &["Alice", "Bob"]).into_column(), 231 ]).unwrap(); 232 let df_b = DataFrame::new(vec![ 233 Series::new("name".into(), &["Alice", "Charlie"]).into_column(), 234 ]).unwrap(); 235 236 let opts = DiffOptions::default(); 237 let result = diff_positional(&df_a, &df_b, &opts, make_source("a"), make_source("b")).unwrap(); 238 assert_eq!(result.removed.len(), 1); 239 assert_eq!(result.removed[0].values, vec!["Bob"]); 240 assert_eq!(result.added.len(), 1); 241 assert_eq!(result.added[0].values, vec!["Charlie"]); 242 assert!(result.modified.is_empty()); 243 } 244 245 #[test] 246 fn test_positional_duplicate_rows() { 247 let df_a = DataFrame::new(vec![ 248 Series::new("x".into(), &["A", "A", "A"]).into_column(), 249 ]).unwrap(); 250 let df_b = DataFrame::new(vec![ 251 Series::new("x".into(), &["A", "A"]).into_column(), 252 ]).unwrap(); 253 254 let opts = DiffOptions::default(); 255 let result = diff_positional(&df_a, &df_b, &opts, make_source("a"), make_source("b")).unwrap(); 256 assert_eq!(result.removed.len(), 1); 257 assert!(result.added.is_empty()); 258 } 259 } 260 ``` 261 262 - [ ] **Step 2: Run tests to verify they fail** 263 264 Run: `cargo test --lib diff::tests` 265 Expected: FAIL — `diff_positional` not found 266 267 - [ ] **Step 3: Implement positional diff** 268 269 Add to `src/diff.rs` before the tests module: 270 271 ```rust 272 use crate::formatter; 273 use std::collections::HashMap; 274 275 /// Format a cell value for hashing/display, using a null sentinel to distinguish null from empty. 276 fn cell_to_string(col: &Column, idx: usize) -> String { 277 match col.get(idx) { 278 Ok(AnyValue::Null) | Err(_) => String::new(), 279 Ok(v) => formatter::format_any_value(&v), 280 } 281 } 282 283 fn cell_to_key_part(col: &Column, idx: usize) -> String { 284 match col.get(idx) { 285 Ok(AnyValue::Null) | Err(_) => "\x01NULL\x01".to_string(), 286 Ok(v) => formatter::format_any_value(&v), 287 } 288 } 289 290 /// Convert a DataFrame row to a string key by joining cell values with null bytes. 291 fn row_to_key(df: &DataFrame, row_idx: usize) -> String { 292 let cols = df.get_columns(); 293 let parts: Vec<String> = cols 294 .iter() 295 .map(|c| cell_to_key_part(c, row_idx)) 296 .collect(); 297 parts.join("\0") 298 } 299 300 /// Convert a DataFrame row to a Vec<String> of display values. 301 fn row_to_strings(df: &DataFrame, row_idx: usize) -> Vec<String> { 302 df.get_columns() 303 .iter() 304 .map(|c| cell_to_string(c, row_idx)) 305 .collect() 306 } 307 308 /// Positional diff: treats entire row as identity using multiset comparison. 309 pub fn diff_positional( 310 df_a: &DataFrame, 311 df_b: &DataFrame, 312 _opts: &DiffOptions, 313 source_a: SheetSource, 314 source_b: SheetSource, 315 ) -> anyhow::Result<DiffResult> { 316 let headers_a: Vec<String> = df_a.get_column_names().iter().map(|s| s.to_string()).collect(); 317 let headers_b: Vec<String> = df_b.get_column_names().iter().map(|s| s.to_string()).collect(); 318 319 // Warn about mismatched column counts in positional mode 320 if headers_a.len() != headers_b.len() { 321 eprintln!( 322 "warning: column count differs ({} vs {}), padding shorter side with empty values", 323 headers_a.len(), headers_b.len() 324 ); 325 } 326 327 // Use headers from the file with more columns 328 let headers = if headers_b.len() > headers_a.len() { headers_b } else { headers_a }; 329 330 // Build frequency maps 331 let mut counts_a: HashMap<String, Vec<usize>> = HashMap::new(); 332 for i in 0..df_a.height() { 333 counts_a.entry(row_to_key(df_a, i)).or_default().push(i); 334 } 335 let mut counts_b: HashMap<String, Vec<usize>> = HashMap::new(); 336 for i in 0..df_b.height() { 337 counts_b.entry(row_to_key(df_b, i)).or_default().push(i); 338 } 339 340 let mut removed = Vec::new(); 341 let mut consumed_b: HashMap<String, usize> = HashMap::new(); 342 343 // Walk A: try to match with B 344 for i in 0..df_a.height() { 345 let key = row_to_key(df_a, i); 346 let b_count = counts_b.get(&key).map(|v| v.len()).unwrap_or(0); 347 let already_consumed = consumed_b.get(&key).copied().unwrap_or(0); 348 if already_consumed < b_count { 349 *consumed_b.entry(key).or_insert(0) += 1; 350 } else { 351 let mut vals = row_to_strings(df_a, i); 352 vals.resize(headers.len(), String::new()); // pad if needed 353 removed.push(DiffRow { values: vals }); 354 } 355 } 356 357 // Walk B: anything not consumed is added 358 let mut consumed_a: HashMap<String, usize> = HashMap::new(); 359 let mut added = Vec::new(); 360 for i in 0..df_b.height() { 361 let key = row_to_key(df_b, i); 362 let a_count = counts_a.get(&key).map(|v| v.len()).unwrap_or(0); 363 let already_consumed = consumed_a.get(&key).copied().unwrap_or(0); 364 if already_consumed < a_count { 365 *consumed_a.entry(key).or_insert(0) += 1; 366 } else { 367 let mut vals = row_to_strings(df_b, i); 368 vals.resize(headers.len(), String::new()); // pad if needed 369 added.push(DiffRow { values: vals }); 370 } 371 } 372 373 Ok(DiffResult { 374 headers, 375 key_columns: vec![], 376 added, 377 removed, 378 modified: vec![], 379 source_a, 380 source_b, 381 }) 382 } 383 ``` 384 385 - [ ] **Step 4: Run tests to verify they pass** 386 387 Run: `cargo test --lib diff::tests` 388 Expected: all 3 tests PASS 389 390 - [ ] **Step 5: Commit** 391 392 ```bash 393 git add src/diff.rs 394 git commit -m "feat(xldiff): implement positional diff with multiset comparison" 395 ``` 396 397 --- 398 399 ### Task 6: diff.rs — key-based mode 400 401 **Files:** 402 - Modify: `src/diff.rs` 403 404 - [ ] **Step 1: Write failing tests for key-based diff** 405 406 Add to the `tests` module in `src/diff.rs`: 407 408 ```rust 409 #[test] 410 fn test_keyed_no_diff() { 411 let df_a = DataFrame::new(vec![ 412 Series::new("id".into(), &["1", "2"]).into_column(), 413 Series::new("val".into(), &[10i64, 20]).into_column(), 414 ]).unwrap(); 415 let df_b = df_a.clone(); 416 417 let opts = DiffOptions { key_columns: vec!["id".to_string()], ..Default::default() }; 418 let result = diff_keyed(&df_a, &df_b, &opts, make_source("a"), make_source("b")).unwrap(); 419 assert!(!result.has_differences()); 420 } 421 422 #[test] 423 fn test_keyed_added_removed() { 424 let df_a = DataFrame::new(vec![ 425 Series::new("id".into(), &["1", "2"]).into_column(), 426 Series::new("val".into(), &[10i64, 20]).into_column(), 427 ]).unwrap(); 428 let df_b = DataFrame::new(vec![ 429 Series::new("id".into(), &["2", "3"]).into_column(), 430 Series::new("val".into(), &[20i64, 30]).into_column(), 431 ]).unwrap(); 432 433 let opts = DiffOptions { key_columns: vec!["id".to_string()], ..Default::default() }; 434 let result = diff_keyed(&df_a, &df_b, &opts, make_source("a"), make_source("b")).unwrap(); 435 assert_eq!(result.removed.len(), 1); 436 assert_eq!(result.added.len(), 1); 437 assert_eq!(result.removed[0].values, vec!["1", "10"]); 438 assert_eq!(result.added[0].values, vec!["3", "30"]); 439 } 440 441 #[test] 442 fn test_keyed_modified() { 443 let df_a = DataFrame::new(vec![ 444 Series::new("id".into(), &["1", "2"]).into_column(), 445 Series::new("score".into(), &[90i64, 80]).into_column(), 446 ]).unwrap(); 447 let df_b = DataFrame::new(vec![ 448 Series::new("id".into(), &["1", "2"]).into_column(), 449 Series::new("score".into(), &[95i64, 80]).into_column(), 450 ]).unwrap(); 451 452 let opts = DiffOptions { key_columns: vec!["id".to_string()], ..Default::default() }; 453 let result = diff_keyed(&df_a, &df_b, &opts, make_source("a"), make_source("b")).unwrap(); 454 assert!(result.added.is_empty()); 455 assert!(result.removed.is_empty()); 456 assert_eq!(result.modified.len(), 1); 457 assert_eq!(result.modified[0].key, vec!["1"]); 458 assert_eq!(result.modified[0].changes.len(), 1); 459 assert_eq!(result.modified[0].changes[0].column, "score"); 460 assert_eq!(result.modified[0].changes[0].old_value, "90"); 461 assert_eq!(result.modified[0].changes[0].new_value, "95"); 462 } 463 464 #[test] 465 fn test_keyed_composite_key() { 466 let df_a = DataFrame::new(vec![ 467 Series::new("date".into(), &["2024-01", "2024-01"]).into_column(), 468 Series::new("ticker".into(), &["AAPL", "GOOG"]).into_column(), 469 Series::new("price".into(), &[150i64, 100]).into_column(), 470 ]).unwrap(); 471 let df_b = DataFrame::new(vec![ 472 Series::new("date".into(), &["2024-01", "2024-01"]).into_column(), 473 Series::new("ticker".into(), &["AAPL", "GOOG"]).into_column(), 474 Series::new("price".into(), &[155i64, 100]).into_column(), 475 ]).unwrap(); 476 477 let opts = DiffOptions { 478 key_columns: vec!["date".to_string(), "ticker".to_string()], 479 ..Default::default() 480 }; 481 let result = diff_keyed(&df_a, &df_b, &opts, make_source("a"), make_source("b")).unwrap(); 482 assert_eq!(result.modified.len(), 1); 483 assert_eq!(result.modified[0].key, vec!["2024-01", "AAPL"]); 484 } 485 ``` 486 487 - [ ] **Step 2: Run tests to verify they fail** 488 489 Run: `cargo test --lib diff::tests` 490 Expected: FAIL — `diff_keyed` not found 491 492 - [ ] **Step 3: Implement key-based diff** 493 494 Add to `src/diff.rs`: 495 496 ```rust 497 /// Row data stored during key-based comparison. 498 struct KeyedRow { 499 values: Vec<String>, 500 key_values: Vec<String>, 501 } 502 503 /// Key-based diff: match rows by key columns, compare remaining cells. 504 pub fn diff_keyed( 505 df_a: &DataFrame, 506 df_b: &DataFrame, 507 opts: &DiffOptions, 508 source_a: SheetSource, 509 source_b: SheetSource, 510 ) -> anyhow::Result<DiffResult> { 511 let headers_a: Vec<String> = df_a.get_column_names().iter().map(|s| s.to_string()).collect(); 512 let headers_b: Vec<String> = df_b.get_column_names().iter().map(|s| s.to_string()).collect(); 513 514 // Resolve key column indices for both frames 515 let key_indices_a: Vec<usize> = opts.key_columns.iter() 516 .map(|k| headers_a.iter().position(|h| h == k) 517 .ok_or_else(|| anyhow::anyhow!("key column '{}' not found in first file", k))) 518 .collect::<anyhow::Result<Vec<_>>>()?; 519 520 let key_indices_b: Vec<usize> = opts.key_columns.iter() 521 .map(|k| headers_b.iter().position(|h| h == k) 522 .ok_or_else(|| anyhow::anyhow!("key column '{}' not found in second file", k))) 523 .collect::<anyhow::Result<Vec<_>>>()?; 524 525 // Find common non-key columns for comparison 526 let non_key_a: Vec<String> = headers_a.iter() 527 .filter(|h| !opts.key_columns.contains(h)) 528 .cloned().collect(); 529 let non_key_b: Vec<String> = headers_b.iter() 530 .filter(|h| !opts.key_columns.contains(h)) 531 .cloned().collect(); 532 let common_non_key: Vec<String> = non_key_a.iter() 533 .filter(|h| non_key_b.contains(h)) 534 .cloned().collect(); 535 536 // Warn about columns only in one file 537 for col in &non_key_a { 538 if !non_key_b.contains(col) { 539 eprintln!("warning: column '{}' only in first file, skipping", col); 540 } 541 } 542 for col in &non_key_b { 543 if !non_key_a.contains(col) { 544 eprintln!("warning: column '{}' only in second file, skipping", col); 545 } 546 } 547 548 // Headers = key columns + union of non-key columns (common first, then only-in-A, only-in-B) 549 let mut headers = opts.key_columns.clone(); 550 headers.extend(non_key_a.iter().chain( 551 non_key_b.iter().filter(|h| !non_key_a.contains(h)) 552 ).cloned()); 553 554 // Build row maps 555 let map_a = build_key_map(df_a, &key_indices_a); 556 let map_b = build_key_map(df_b, &key_indices_b); 557 558 // Detect duplicate keys 559 check_duplicate_keys(df_a, &key_indices_a, &source_a); 560 check_duplicate_keys(df_b, &key_indices_b, &source_b); 561 562 let mut added = Vec::new(); 563 let mut removed = Vec::new(); 564 let mut modified = Vec::new(); 565 566 // Removed: keys in A but not B 567 for (key_str, row_a) in &map_a { 568 if !map_b.contains_key(key_str) { 569 removed.push(DiffRow { values: row_a.values.clone() }); 570 } 571 } 572 573 // Added: keys in B but not A; Modified: keys in both 574 for (key_str, row_b) in &map_b { 575 match map_a.get(key_str) { 576 None => { 577 added.push(DiffRow { values: row_b.values.clone() }); 578 } 579 Some(row_a) => { 580 let changes = compare_rows( 581 df_a, df_b, 582 &headers_a, &headers_b, 583 row_a, row_b, 584 &common_non_key, 585 opts, 586 ); 587 if !changes.is_empty() { 588 modified.push(ModifiedRow { 589 key: row_a.key_values.clone(), 590 changes, 591 }); 592 } 593 } 594 } 595 } 596 597 Ok(DiffResult { 598 headers, 599 key_columns: opts.key_columns.clone(), 600 added, 601 removed, 602 modified, 603 source_a, 604 source_b, 605 }) 606 } 607 608 fn build_key_map(df: &DataFrame, key_indices: &[usize]) -> HashMap<String, KeyedRow> { 609 let cols = df.get_columns(); 610 let mut map = HashMap::new(); 611 for i in 0..df.height() { 612 let key_values: Vec<String> = key_indices.iter() 613 .map(|&idx| formatter::format_any_value(&cols[idx].get(i).unwrap_or(AnyValue::Null))) 614 .collect(); 615 let key_str = key_values.join("\0"); 616 let values: Vec<String> = cols.iter() 617 .map(|c| formatter::format_any_value(&c.get(i).unwrap_or(AnyValue::Null))) 618 .collect(); 619 map.insert(key_str, KeyedRow { values, key_values }); 620 } 621 map 622 } 623 624 fn check_duplicate_keys(df: &DataFrame, key_indices: &[usize], source: &SheetSource) { 625 let cols = df.get_columns(); 626 let mut seen: HashMap<String, usize> = HashMap::new(); 627 for i in 0..df.height() { 628 let key: Vec<String> = key_indices.iter() 629 .map(|&idx| formatter::format_any_value(&cols[idx].get(i).unwrap_or(AnyValue::Null))) 630 .collect(); 631 let key_str = key.join(", "); 632 let count = seen.entry(key_str.clone()).or_insert(0); 633 *count += 1; 634 if *count == 2 { 635 eprintln!("warning: duplicate key \"{}\" in {}", key_str, source.file_name); 636 } 637 } 638 } 639 640 /// Compare non-key columns of two matched rows. 641 fn compare_rows( 642 df_a: &DataFrame, 643 df_b: &DataFrame, 644 headers_a: &[String], 645 headers_b: &[String], 646 row_a: &KeyedRow, 647 row_b: &KeyedRow, 648 common_columns: &[String], 649 opts: &DiffOptions, 650 ) -> Vec<CellChange> { 651 let mut changes = Vec::new(); 652 for col_name in common_columns { 653 let idx_a = headers_a.iter().position(|h| h == col_name).unwrap(); 654 let idx_b = headers_b.iter().position(|h| h == col_name).unwrap(); 655 let val_a = &row_a.values[idx_a]; 656 let val_b = &row_b.values[idx_b]; 657 658 let is_equal = if let Some(tol) = opts.tolerance { 659 values_equal_with_tolerance(val_a, val_b, tol, df_a, df_b, col_name) 660 } else { 661 val_a == val_b 662 }; 663 664 if !is_equal { 665 changes.push(CellChange { 666 column: col_name.clone(), 667 old_value: val_a.clone(), 668 new_value: val_b.clone(), 669 }); 670 } 671 } 672 changes 673 } 674 675 fn is_float_dtype(dt: &DataType) -> bool { 676 matches!(dt, DataType::Float32 | DataType::Float64) 677 } 678 679 fn is_int_dtype(dt: &DataType) -> bool { 680 matches!(dt, DataType::Int8 | DataType::Int16 | DataType::Int32 | DataType::Int64 681 | DataType::UInt8 | DataType::UInt16 | DataType::UInt32 | DataType::UInt64) 682 } 683 684 /// Check if two values are equal, applying tolerance for float columns. 685 /// Tolerance applies to Float64 columns and Int64-vs-Float64 cross-type, 686 /// but NOT to pure integer columns (spec requirement). 687 fn values_equal_with_tolerance( 688 val_a: &str, 689 val_b: &str, 690 tolerance: f64, 691 df_a: &DataFrame, 692 df_b: &DataFrame, 693 col_name: &str, 694 ) -> bool { 695 if let (Ok(a), Ok(b)) = (val_a.parse::<f64>(), val_b.parse::<f64>()) { 696 // Handle NaN 697 if a.is_nan() && b.is_nan() { 698 return true; 699 } 700 if a.is_nan() || b.is_nan() { 701 return false; 702 } 703 let type_a = df_a.column(col_name).ok().map(|c| c.dtype().clone()); 704 let type_b = df_b.column(col_name).ok().map(|c| c.dtype().clone()); 705 706 let a_is_float = type_a.as_ref().map_or(false, is_float_dtype); 707 let b_is_float = type_b.as_ref().map_or(false, is_float_dtype); 708 let a_is_int = type_a.as_ref().map_or(false, is_int_dtype); 709 let b_is_int = type_b.as_ref().map_or(false, is_int_dtype); 710 711 // Pure integer columns: exact comparison, no tolerance 712 if a_is_int && b_is_int { 713 return val_a == val_b; 714 } 715 // At least one float, or cross-type int/float: apply tolerance 716 if a_is_float || b_is_float { 717 return (a - b).abs() <= tolerance; 718 } 719 } 720 // Fallback: exact string comparison 721 val_a == val_b 722 } 723 ``` 724 725 - [ ] **Step 4: Run tests to verify they pass** 726 727 Run: `cargo test --lib diff::tests` 728 Expected: all 7 tests PASS 729 730 - [ ] **Step 5: Commit** 731 732 ```bash 733 git add src/diff.rs 734 git commit -m "feat(xldiff): implement key-based diff with composite keys and duplicate detection" 735 ``` 736 737 --- 738 739 ### Task 7: diff.rs — tolerance tests 740 741 **Files:** 742 - Modify: `src/diff.rs` 743 744 - [ ] **Step 1: Write tolerance tests** 745 746 Add to the `tests` module: 747 748 ```rust 749 #[test] 750 fn test_keyed_tolerance_within() { 751 let df_a = DataFrame::new(vec![ 752 Series::new("id".into(), &["1"]).into_column(), 753 Series::new("price".into(), &[100.001f64]).into_column(), 754 ]).unwrap(); 755 let df_b = DataFrame::new(vec![ 756 Series::new("id".into(), &["1"]).into_column(), 757 Series::new("price".into(), &[100.002f64]).into_column(), 758 ]).unwrap(); 759 760 let opts = DiffOptions { 761 key_columns: vec!["id".to_string()], 762 tolerance: Some(0.01), 763 }; 764 let result = diff_keyed(&df_a, &df_b, &opts, make_source("a"), make_source("b")).unwrap(); 765 assert!(!result.has_differences(), "values within tolerance should be equal"); 766 } 767 768 #[test] 769 fn test_keyed_tolerance_exceeded() { 770 let df_a = DataFrame::new(vec![ 771 Series::new("id".into(), &["1"]).into_column(), 772 Series::new("price".into(), &[100.0f64]).into_column(), 773 ]).unwrap(); 774 let df_b = DataFrame::new(vec![ 775 Series::new("id".into(), &["1"]).into_column(), 776 Series::new("price".into(), &[100.05f64]).into_column(), 777 ]).unwrap(); 778 779 let opts = DiffOptions { 780 key_columns: vec!["id".to_string()], 781 tolerance: Some(0.01), 782 }; 783 let result = diff_keyed(&df_a, &df_b, &opts, make_source("a"), make_source("b")).unwrap(); 784 assert_eq!(result.modified.len(), 1); 785 } 786 787 #[test] 788 fn test_keyed_nan_handling() { 789 let df_a = DataFrame::new(vec![ 790 Series::new("id".into(), &["1"]).into_column(), 791 Series::new("val".into(), &[f64::NAN]).into_column(), 792 ]).unwrap(); 793 let df_b = DataFrame::new(vec![ 794 Series::new("id".into(), &["1"]).into_column(), 795 Series::new("val".into(), &[f64::NAN]).into_column(), 796 ]).unwrap(); 797 798 let opts = DiffOptions { 799 key_columns: vec!["id".to_string()], 800 tolerance: Some(0.01), 801 }; 802 let result = diff_keyed(&df_a, &df_b, &opts, make_source("a"), make_source("b")).unwrap(); 803 assert!(!result.has_differences(), "NaN == NaN should be equal"); 804 } 805 ``` 806 807 - [ ] **Step 2: Run tests** 808 809 Run: `cargo test --lib diff::tests` 810 Expected: all 10 tests PASS 811 812 - [ ] **Step 3: Commit** 813 814 ```bash 815 git add src/diff.rs 816 git commit -m "test(xldiff): add tolerance and NaN handling tests" 817 ``` 818 819 --- 820 821 ### Task 8: diff.rs — public entry point `diff_sheets` 822 823 **Files:** 824 - Modify: `src/diff.rs` 825 826 - [ ] **Step 1: Write test for the unified entry point** 827 828 Add to tests: 829 830 ```rust 831 #[test] 832 fn test_diff_sheets_positional() { 833 let df_a = DataFrame::new(vec![ 834 Series::new("x".into(), &["a", "b"]).into_column(), 835 ]).unwrap(); 836 let df_b = DataFrame::new(vec![ 837 Series::new("x".into(), &["a", "c"]).into_column(), 838 ]).unwrap(); 839 840 let opts = DiffOptions::default(); // no key → positional 841 let result = diff_sheets(&df_a, &df_b, &opts, make_source("a"), make_source("b")).unwrap(); 842 assert_eq!(result.removed.len(), 1); 843 assert_eq!(result.added.len(), 1); 844 } 845 846 #[test] 847 fn test_diff_sheets_keyed() { 848 let df_a = DataFrame::new(vec![ 849 Series::new("id".into(), &["1"]).into_column(), 850 Series::new("v".into(), &[10i64]).into_column(), 851 ]).unwrap(); 852 let df_b = DataFrame::new(vec![ 853 Series::new("id".into(), &["1"]).into_column(), 854 Series::new("v".into(), &[20i64]).into_column(), 855 ]).unwrap(); 856 857 let opts = DiffOptions { key_columns: vec!["id".to_string()], ..Default::default() }; 858 let result = diff_sheets(&df_a, &df_b, &opts, make_source("a"), make_source("b")).unwrap(); 859 assert_eq!(result.modified.len(), 1); 860 } 861 ``` 862 863 - [ ] **Step 2: Run tests to verify they fail** 864 865 Run: `cargo test --lib diff::tests::test_diff_sheets` 866 Expected: FAIL 867 868 - [ ] **Step 3: Implement `diff_sheets`** 869 870 Add to `src/diff.rs`: 871 872 ```rust 873 /// Compare two DataFrames. Dispatches to positional or key-based mode. 874 pub fn diff_sheets( 875 df_a: &DataFrame, 876 df_b: &DataFrame, 877 opts: &DiffOptions, 878 source_a: SheetSource, 879 source_b: SheetSource, 880 ) -> anyhow::Result<DiffResult> { 881 if opts.key_columns.is_empty() { 882 diff_positional(df_a, df_b, opts, source_a, source_b) 883 } else { 884 diff_keyed(df_a, df_b, opts, source_a, source_b) 885 } 886 } 887 ``` 888 889 - [ ] **Step 4: Run tests** 890 891 Run: `cargo test --lib diff::tests` 892 Expected: all 12 tests PASS 893 894 - [ ] **Step 5: Commit** 895 896 ```bash 897 git add src/diff.rs 898 git commit -m "feat(xldiff): add diff_sheets entry point dispatching positional/keyed modes" 899 ``` 900 901 --- 902 903 ### Task 9: xldiff.rs — CLI skeleton + file:sheet parsing 904 905 **Files:** 906 - Create: `src/bin/xldiff.rs` 907 908 - [ ] **Step 1: Write the CLI binary with clap args and file:sheet parsing** 909 910 Create `src/bin/xldiff.rs`: 911 912 ```rust 913 use std::io::IsTerminal; 914 use std::path::PathBuf; 915 use std::process; 916 917 use anyhow::Result; 918 use clap::Parser; 919 920 use xlcat::diff::{self, DiffOptions, DiffResult, SheetSource}; 921 use xlcat::filter; 922 use xlcat::metadata; 923 use xlcat::reader; 924 925 #[derive(Parser)] 926 #[command( 927 name = "xldiff", 928 about = "Compare two Excel sheets and report differences", 929 version 930 )] 931 struct Args { 932 /// First file, optionally with :sheet suffix (e.g., old.xlsx:Sales) 933 file_a: String, 934 935 /// Second file, optionally with :sheet suffix 936 file_b: String, 937 938 /// Comma-separated key columns for row matching (name or column letter). 939 /// Omit for positional (whole-row) comparison. 940 #[arg(long)] 941 key: Option<String>, 942 943 /// Comma-separated columns to compare (name or column letter). 944 /// Default: all columns. Key columns are always included. 945 #[arg(long)] 946 cols: Option<String>, 947 948 /// Rows to skip before header. Single value for both files, or N,M for independent skips. 949 #[arg(long, default_value = "0")] 950 skip: String, 951 952 /// Treat first row as data; generate synthetic column_0, column_1, … headers. 953 /// NOTE: declared for CLI compatibility but not yet implemented — follow-up task. 954 #[arg(long)] 955 no_header: bool, 956 957 /// Numeric tolerance for float comparisons (e.g., 0.01) 958 #[arg(long)] 959 tolerance: Option<f64>, 960 961 /// Output format: text (default), markdown, json, csv 962 #[arg(long, default_value = "text")] 963 format: String, 964 965 /// Force-disable ANSI colors 966 #[arg(long)] 967 no_color: bool, 968 } 969 970 /// Parse a file argument like "path.xlsx:Sheet1" into (path, Option<sheet>). 971 /// Uses rfind(':') to handle Windows drive letters (C:\...). 972 fn parse_file_arg(arg: &str) -> (PathBuf, Option<String>) { 973 if let Some(colon_pos) = arg.rfind(':') { 974 // Don't split if the colon is part of a Windows drive letter (e.g., C:\) 975 // A drive letter colon is at position 1 and followed by \ or / 976 if colon_pos == 1 && arg.len() > 2 && (arg.as_bytes()[2] == b'\\' || arg.as_bytes()[2] == b'/') { 977 return (PathBuf::from(arg), None); 978 } 979 // Don't split if nothing after the colon 980 if colon_pos == arg.len() - 1 { 981 return (PathBuf::from(&arg[..colon_pos]), None); 982 } 983 let path = PathBuf::from(&arg[..colon_pos]); 984 let sheet = arg[colon_pos + 1..].to_string(); 985 (path, Some(sheet)) 986 } else { 987 (PathBuf::from(arg), None) 988 } 989 } 990 991 /// Parse the --skip flag: "3" or "3,5" 992 fn parse_skip(s: &str) -> Result<(usize, usize)> { 993 if let Some((a, b)) = s.split_once(',') { 994 let skip_a = a.trim().parse::<usize>().map_err(|_| anyhow::anyhow!("invalid skip value: {}", a))?; 995 let skip_b = b.trim().parse::<usize>().map_err(|_| anyhow::anyhow!("invalid skip value: {}", b))?; 996 Ok((skip_a, skip_b)) 997 } else { 998 let skip = s.trim().parse::<usize>().map_err(|_| anyhow::anyhow!("invalid skip value: {}", s))?; 999 Ok((skip, skip)) 1000 } 1001 } 1002 1003 /// Resolve sheet name from user input or default to first sheet. 1004 fn resolve_sheet(info: &metadata::FileInfo, sheet_arg: Option<&str>) -> Result<String> { 1005 match sheet_arg { 1006 None => info.sheets.first() 1007 .map(|s| s.name.clone()) 1008 .ok_or_else(|| anyhow::anyhow!("workbook has no sheets")), 1009 Some(s) => { 1010 // Try exact name match 1011 if let Some(sheet) = info.sheets.iter().find(|si| si.name == s) { 1012 return Ok(sheet.name.clone()); 1013 } 1014 // Try as 0-based index 1015 if let Ok(idx) = s.parse::<usize>() { 1016 if let Some(sheet) = info.sheets.get(idx) { 1017 return Ok(sheet.name.clone()); 1018 } 1019 } 1020 let names: Vec<_> = info.sheets.iter().map(|s| s.name.as_str()).collect(); 1021 anyhow::bail!("sheet '{}' not found. Available: {}", s, names.join(", ")) 1022 } 1023 } 1024 } 1025 1026 fn run(args: Args) -> Result<()> { 1027 let (path_a, sheet_sel_a) = parse_file_arg(&args.file_a); 1028 let (path_b, sheet_sel_b) = parse_file_arg(&args.file_b); 1029 1030 // Validate files exist 1031 if !path_a.exists() { 1032 anyhow::bail!("file not found: {}", path_a.display()); 1033 } 1034 if !path_b.exists() { 1035 anyhow::bail!("file not found: {}", path_b.display()); 1036 } 1037 1038 // Validate format 1039 let format = match args.format.as_str() { 1040 "text" | "markdown" | "json" | "csv" => args.format.as_str(), 1041 other => anyhow::bail!("unknown format '{}'. Use: text, markdown, json, csv", other), 1042 }; 1043 1044 // Parse skip 1045 let (skip_a, skip_b) = parse_skip(&args.skip)?; 1046 1047 // Resolve sheets 1048 let info_a = metadata::read_file_info(&path_a)?; 1049 let info_b = metadata::read_file_info(&path_b)?; 1050 let sheet_a = resolve_sheet(&info_a, sheet_sel_a.as_deref())?; 1051 let sheet_b = resolve_sheet(&info_b, sheet_sel_b.as_deref())?; 1052 1053 // Read DataFrames 1054 let df_a = if skip_a > 0 { 1055 reader::read_sheet_with_skip(&path_a, &sheet_a, skip_a)? 1056 } else { 1057 reader::read_sheet(&path_a, &sheet_a)? 1058 }; 1059 let df_b = if skip_b > 0 { 1060 reader::read_sheet_with_skip(&path_b, &sheet_b, skip_b)? 1061 } else { 1062 reader::read_sheet(&path_b, &sheet_b)? 1063 }; 1064 1065 // Resolve key columns (validate they exist in both files) 1066 let key_columns: Vec<String> = if let Some(ref key_str) = args.key { 1067 let specs: Vec<String> = key_str.split(',').map(|s| s.trim().to_string()).collect(); 1068 let cols_a: Vec<String> = df_a.get_column_names().iter().map(|s| s.to_string()).collect(); 1069 let cols_b: Vec<String> = df_b.get_column_names().iter().map(|s| s.to_string()).collect(); 1070 let resolved = filter::resolve_columns(&specs, &cols_a) 1071 .map_err(|e| anyhow::anyhow!("first file: {}", e))?; 1072 // Validate key columns also exist in second file 1073 for k in &resolved { 1074 if !cols_b.iter().any(|c| c == k) { 1075 anyhow::bail!("key column '{}' not found in second file", k); 1076 } 1077 } 1078 resolved 1079 } else { 1080 vec![] 1081 }; 1082 1083 // Column filtering 1084 let (df_a, df_b) = if let Some(ref cols_str) = args.cols { 1085 let specs: Vec<String> = cols_str.split(',').map(|s| s.trim().to_string()).collect(); 1086 let cols_a: Vec<String> = df_a.get_column_names().iter().map(|s| s.to_string()).collect(); 1087 let mut selected = filter::resolve_columns(&specs, &cols_a) 1088 .map_err(|e| anyhow::anyhow!(e))?; 1089 // Ensure key columns are included 1090 for k in &key_columns { 1091 if !selected.contains(k) { 1092 selected.insert(0, k.clone()); 1093 } 1094 } 1095 let sel_refs: Vec<&str> = selected.iter().map(|s| s.as_str()).collect(); 1096 let df_a = df_a.select(&sel_refs)?; 1097 let df_b = df_b.select(&sel_refs)?; 1098 (df_a, df_b) 1099 } else { 1100 (df_a, df_b) 1101 }; 1102 1103 let source_a = SheetSource { 1104 file_name: path_a.file_name().unwrap_or_default().to_string_lossy().to_string(), 1105 sheet_name: sheet_a, 1106 }; 1107 let source_b = SheetSource { 1108 file_name: path_b.file_name().unwrap_or_default().to_string_lossy().to_string(), 1109 sheet_name: sheet_b, 1110 }; 1111 1112 let opts = DiffOptions { 1113 key_columns, 1114 tolerance: args.tolerance, 1115 }; 1116 1117 let result = diff::diff_sheets(&df_a, &df_b, &opts, source_a, source_b)?; 1118 1119 // Determine color 1120 let use_color = !args.no_color && std::io::stdout().is_terminal(); 1121 1122 // Format and print 1123 let output = match format { 1124 "text" => format_text(&result, use_color), 1125 "markdown" => format_markdown(&result), 1126 "json" => format_json(&result), 1127 "csv" => format_csv(&result), 1128 _ => unreachable!(), 1129 }; 1130 print!("{output}"); 1131 1132 // Exit code 1133 if result.has_differences() { 1134 process::exit(1); 1135 } 1136 Ok(()) 1137 } 1138 1139 fn main() { 1140 let args = Args::parse(); 1141 if let Err(err) = run(args) { 1142 eprintln!("xldiff: {err}"); 1143 process::exit(2); 1144 } 1145 } 1146 1147 // Output formatters — implemented in subsequent tasks 1148 fn format_text(_result: &DiffResult, _color: bool) -> String { todo!() } 1149 fn format_markdown(_result: &DiffResult) -> String { todo!() } 1150 fn format_json(_result: &DiffResult) -> String { todo!() } 1151 fn format_csv(_result: &DiffResult) -> String { todo!() } 1152 ``` 1153 1154 - [ ] **Step 2: Verify it compiles** 1155 1156 Run: `cargo check --bin xldiff` 1157 Expected: success (todo!() stubs compile but will panic at runtime) 1158 1159 - [ ] **Step 3: Commit** 1160 1161 ```bash 1162 git add src/bin/xldiff.rs 1163 git commit -m "feat(xldiff): add CLI skeleton with file:sheet parsing, skip, key/cols resolution" 1164 ``` 1165 1166 --- 1167 1168 ### Task 10: xldiff.rs — text output formatter 1169 1170 **Files:** 1171 - Modify: `src/bin/xldiff.rs` 1172 1173 - [ ] **Step 1: Implement `format_text`** 1174 1175 Replace the `format_text` stub: 1176 1177 ```rust 1178 fn format_text(result: &DiffResult, color: bool) -> String { 1179 if !result.has_differences() { 1180 return "No differences found.\n".to_string(); 1181 } 1182 1183 let mut out = String::new(); 1184 1185 // Header 1186 let (red, green, yellow, reset) = if color { 1187 ("\x1b[31m", "\x1b[32m", "\x1b[33m", "\x1b[0m") 1188 } else { 1189 ("", "", "", "") 1190 }; 1191 1192 out.push_str(&format!( 1193 "--- {} ({})\n+++ {} ({})\n\n", 1194 result.source_a.sheet_name, result.source_a.file_name, 1195 result.source_b.sheet_name, result.source_b.file_name, 1196 )); 1197 1198 // Summary 1199 out.push_str(&format!( 1200 "Added: {} | Removed: {} | Modified: {}\n\n", 1201 result.added.len(), 1202 result.removed.len(), 1203 result.modified.len(), 1204 )); 1205 1206 // Removed rows 1207 for row in &result.removed { 1208 out.push_str(red); 1209 out.push_str("- "); 1210 out.push_str(&format_row_inline(&result.headers, &row.values)); 1211 out.push_str(reset); 1212 out.push('\n'); 1213 } 1214 1215 // Added rows 1216 for row in &result.added { 1217 out.push_str(green); 1218 out.push_str("+ "); 1219 out.push_str(&format_row_inline(&result.headers, &row.values)); 1220 out.push_str(reset); 1221 out.push('\n'); 1222 } 1223 1224 if (!result.removed.is_empty() || !result.added.is_empty()) && !result.modified.is_empty() { 1225 out.push('\n'); 1226 } 1227 1228 // Modified rows 1229 for m in &result.modified { 1230 out.push_str(yellow); 1231 out.push_str("~ "); 1232 // Show key 1233 let key_display: Vec<String> = result.key_columns.iter() 1234 .zip(m.key.iter()) 1235 .map(|(col, val)| format!("{}: \"{}\"", col, val)) 1236 .collect(); 1237 out.push_str(&key_display.join(" ")); 1238 out.push_str(reset); 1239 out.push('\n'); 1240 for change in &m.changes { 1241 out.push_str(yellow); 1242 out.push_str(&format!(" {}: \"{}\" → \"{}\"", change.column, change.old_value, change.new_value)); 1243 out.push_str(reset); 1244 out.push('\n'); 1245 } 1246 } 1247 1248 out 1249 } 1250 1251 /// Format a row as inline key-value pairs: Name: "Alice" Score: "90" 1252 fn format_row_inline(headers: &[String], values: &[String]) -> String { 1253 headers.iter() 1254 .zip(values.iter()) 1255 .map(|(h, v)| format!("{}: \"{}\"", h, v)) 1256 .collect::<Vec<_>>() 1257 .join(" ") 1258 } 1259 ``` 1260 1261 - [ ] **Step 2: Verify it compiles** 1262 1263 Run: `cargo check --bin xldiff` 1264 Expected: success 1265 1266 - [ ] **Step 3: Commit** 1267 1268 ```bash 1269 git add src/bin/xldiff.rs 1270 git commit -m "feat(xldiff): implement colored text output formatter" 1271 ``` 1272 1273 --- 1274 1275 ### Task 11: xldiff.rs — markdown output formatter 1276 1277 **Files:** 1278 - Modify: `src/bin/xldiff.rs` 1279 1280 - [ ] **Step 1: Implement `format_markdown`** 1281 1282 Replace the `format_markdown` stub: 1283 1284 ```rust 1285 fn format_markdown(result: &DiffResult) -> String { 1286 use xlcat::formatter; 1287 1288 if !result.has_differences() { 1289 return "No differences found.\n".to_string(); 1290 } 1291 1292 let mut out = String::new(); 1293 1294 // Added 1295 if !result.added.is_empty() { 1296 out.push_str(&format!("## Added ({})\n", result.added.len())); 1297 let rows: Vec<Vec<String>> = result.added.iter().map(|r| r.values.clone()).collect(); 1298 out.push_str(&formatter::render_table(&result.headers, &rows)); 1299 out.push('\n'); 1300 } 1301 1302 // Removed 1303 if !result.removed.is_empty() { 1304 out.push_str(&format!("## Removed ({})\n", result.removed.len())); 1305 let rows: Vec<Vec<String>> = result.removed.iter().map(|r| r.values.clone()).collect(); 1306 out.push_str(&formatter::render_table(&result.headers, &rows)); 1307 out.push('\n'); 1308 } 1309 1310 // Modified 1311 if !result.modified.is_empty() { 1312 out.push_str(&format!("## Modified ({})\n", result.modified.len())); 1313 let key_label = if result.key_columns.len() == 1 { 1314 format!("Key ({})", result.key_columns[0]) 1315 } else { 1316 format!("Key ({})", result.key_columns.join(", ")) 1317 }; 1318 let mod_headers = vec![key_label, "Column".to_string(), "Old".to_string(), "New".to_string()]; 1319 let mut mod_rows = Vec::new(); 1320 for m in &result.modified { 1321 let key_str = m.key.join(", "); 1322 for (i, change) in m.changes.iter().enumerate() { 1323 let key_cell = if i == 0 { key_str.clone() } else { String::new() }; 1324 mod_rows.push(vec![ 1325 key_cell, 1326 change.column.clone(), 1327 change.old_value.clone(), 1328 change.new_value.clone(), 1329 ]); 1330 } 1331 } 1332 out.push_str(&formatter::render_table(&mod_headers, &mod_rows)); 1333 out.push('\n'); 1334 } 1335 1336 out 1337 } 1338 ``` 1339 1340 - [ ] **Step 2: Verify it compiles** 1341 1342 Run: `cargo check --bin xldiff` 1343 Expected: success 1344 1345 - [ ] **Step 3: Commit** 1346 1347 ```bash 1348 git add src/bin/xldiff.rs 1349 git commit -m "feat(xldiff): implement markdown output formatter" 1350 ``` 1351 1352 --- 1353 1354 ### Task 12: xldiff.rs — JSON output formatter 1355 1356 **Files:** 1357 - Modify: `src/bin/xldiff.rs` 1358 1359 - [ ] **Step 1: Implement `format_json`** 1360 1361 Replace the `format_json` stub: 1362 1363 ```rust 1364 fn format_json(result: &DiffResult) -> String { 1365 use serde_json::{json, Map, Value}; 1366 1367 let row_to_obj = |row: &diff::DiffRow| -> Value { 1368 let mut map = Map::new(); 1369 for (h, v) in result.headers.iter().zip(row.values.iter()) { 1370 map.insert(h.clone(), Value::String(v.clone())); 1371 } 1372 Value::Object(map) 1373 }; 1374 1375 let added: Vec<Value> = result.added.iter().map(row_to_obj).collect(); 1376 let removed: Vec<Value> = result.removed.iter().map(row_to_obj).collect(); 1377 1378 let modified: Vec<Value> = result.modified.iter().map(|m| { 1379 let mut key_map = Map::new(); 1380 for (col, val) in result.key_columns.iter().zip(m.key.iter()) { 1381 key_map.insert(col.clone(), Value::String(val.clone())); 1382 } 1383 let changes: Vec<Value> = m.changes.iter().map(|c| { 1384 json!({ 1385 "column": c.column, 1386 "old": c.old_value, 1387 "new": c.new_value, 1388 }) 1389 }).collect(); 1390 json!({ 1391 "key": Value::Object(key_map), 1392 "changes": changes, 1393 }) 1394 }).collect(); 1395 1396 let obj = json!({ 1397 "added": added, 1398 "removed": removed, 1399 "modified": modified, 1400 }); 1401 1402 serde_json::to_string_pretty(&obj).unwrap() + "\n" 1403 } 1404 ``` 1405 1406 - [ ] **Step 2: Verify it compiles** 1407 1408 Run: `cargo check --bin xldiff` 1409 Expected: success 1410 1411 - [ ] **Step 3: Commit** 1412 1413 ```bash 1414 git add src/bin/xldiff.rs 1415 git commit -m "feat(xldiff): implement JSON output formatter" 1416 ``` 1417 1418 --- 1419 1420 ### Task 13: xldiff.rs — CSV output formatter 1421 1422 **Files:** 1423 - Modify: `src/bin/xldiff.rs` 1424 1425 - [ ] **Step 1: Implement `format_csv`** 1426 1427 Replace the `format_csv` stub: 1428 1429 ```rust 1430 fn format_csv(result: &DiffResult) -> String { 1431 let mut out = String::new(); 1432 1433 // Header: _status, col1, col2, ..., _old_col1, _old_col2, ... 1434 let mut header_parts = vec!["_status".to_string()]; 1435 header_parts.extend(result.headers.clone()); 1436 for h in &result.headers { 1437 header_parts.push(format!("_old_{}", h)); 1438 } 1439 out.push_str(&csv_row(&header_parts)); 1440 1441 // Added rows 1442 for row in &result.added { 1443 let mut parts = vec!["added".to_string()]; 1444 parts.extend(row.values.clone()); 1445 // Empty _old_ columns 1446 for _ in &result.headers { 1447 parts.push(String::new()); 1448 } 1449 out.push_str(&csv_row(&parts)); 1450 } 1451 1452 // Removed rows 1453 for row in &result.removed { 1454 let mut parts = vec!["removed".to_string()]; 1455 parts.extend(row.values.clone()); 1456 // Empty _old_ columns 1457 for _ in &result.headers { 1458 parts.push(String::new()); 1459 } 1460 out.push_str(&csv_row(&parts)); 1461 } 1462 1463 // Modified rows 1464 for m in &result.modified { 1465 let mut main_vals = vec![String::new(); result.headers.len()]; 1466 let mut old_vals = vec![String::new(); result.headers.len()]; 1467 1468 // Fill key columns in main values 1469 for (key_col, key_val) in result.key_columns.iter().zip(m.key.iter()) { 1470 if let Some(idx) = result.headers.iter().position(|h| h == key_col) { 1471 main_vals[idx] = key_val.clone(); 1472 } 1473 } 1474 1475 // Fill changed columns 1476 for change in &m.changes { 1477 if let Some(idx) = result.headers.iter().position(|h| h == &change.column) { 1478 main_vals[idx] = change.new_value.clone(); 1479 old_vals[idx] = change.old_value.clone(); 1480 } 1481 } 1482 1483 let mut parts = vec!["modified".to_string()]; 1484 parts.extend(main_vals); 1485 parts.extend(old_vals); 1486 out.push_str(&csv_row(&parts)); 1487 } 1488 1489 out 1490 } 1491 1492 /// Format a single CSV row with RFC 4180 quoting. 1493 fn csv_row(values: &[String]) -> String { 1494 let escaped: Vec<String> = values.iter().map(|v| { 1495 if v.contains(',') || v.contains('"') || v.contains('\n') { 1496 format!("\"{}\"", v.replace('"', "\"\"")) 1497 } else { 1498 v.clone() 1499 } 1500 }).collect(); 1501 escaped.join(",") + "\n" 1502 } 1503 ``` 1504 1505 - [ ] **Step 2: Verify it compiles** 1506 1507 Run: `cargo check --bin xldiff` 1508 Expected: success 1509 1510 - [ ] **Step 3: Commit** 1511 1512 ```bash 1513 git add src/bin/xldiff.rs 1514 git commit -m "feat(xldiff): implement CSV output formatter" 1515 ``` 1516 1517 --- 1518 1519 ### Task 14: Test fixtures 1520 1521 **Files:** 1522 - Modify: `tests/common/mod.rs` 1523 1524 - [ ] **Step 1: Add diff test fixtures** 1525 1526 Append to `tests/common/mod.rs`: 1527 1528 ```rust 1529 /// Create a pair of files for positional diff testing. 1530 /// File A: Alice/90, Bob/80, Charlie/70 1531 /// File B: Alice/90, Charlie/70, Dana/85 (Bob removed, Dana added) 1532 pub fn create_diff_pair(path_a: &Path, path_b: &Path) { 1533 let mut wb = Workbook::new(); 1534 let ws = wb.add_worksheet().set_name("Data").unwrap(); 1535 ws.write_string(0, 0, "Name").unwrap(); 1536 ws.write_string(0, 1, "Score").unwrap(); 1537 ws.write_string(1, 0, "Alice").unwrap(); 1538 ws.write_number(1, 1, 90.0).unwrap(); 1539 ws.write_string(2, 0, "Bob").unwrap(); 1540 ws.write_number(2, 1, 80.0).unwrap(); 1541 ws.write_string(3, 0, "Charlie").unwrap(); 1542 ws.write_number(3, 1, 70.0).unwrap(); 1543 wb.save(path_a).unwrap(); 1544 1545 let mut wb = Workbook::new(); 1546 let ws = wb.add_worksheet().set_name("Data").unwrap(); 1547 ws.write_string(0, 0, "Name").unwrap(); 1548 ws.write_string(0, 1, "Score").unwrap(); 1549 ws.write_string(1, 0, "Alice").unwrap(); 1550 ws.write_number(1, 1, 90.0).unwrap(); 1551 ws.write_string(2, 0, "Charlie").unwrap(); 1552 ws.write_number(2, 1, 70.0).unwrap(); 1553 ws.write_string(3, 0, "Dana").unwrap(); 1554 ws.write_number(3, 1, 85.0).unwrap(); 1555 wb.save(path_b).unwrap(); 1556 } 1557 1558 /// Create a pair for key-based diff testing. 1559 /// File A: ID=1/Score=90, ID=2/Score=80, ID=3/Score=70 1560 /// File B: ID=1/Score=95, ID=2/Score=80, ID=4/Score=85 1561 /// Expected: ID=1 modified (90→95), ID=3 removed, ID=4 added 1562 pub fn create_diff_pair_with_keys(path_a: &Path, path_b: &Path) { 1563 let mut wb = Workbook::new(); 1564 let ws = wb.add_worksheet().set_name("Data").unwrap(); 1565 ws.write_string(0, 0, "ID").unwrap(); 1566 ws.write_string(0, 1, "Name").unwrap(); 1567 ws.write_string(0, 2, "Score").unwrap(); 1568 ws.write_string(1, 0, "1").unwrap(); 1569 ws.write_string(1, 1, "Alice").unwrap(); 1570 ws.write_number(1, 2, 90.0).unwrap(); 1571 ws.write_string(2, 0, "2").unwrap(); 1572 ws.write_string(2, 1, "Bob").unwrap(); 1573 ws.write_number(2, 2, 80.0).unwrap(); 1574 ws.write_string(3, 0, "3").unwrap(); 1575 ws.write_string(3, 1, "Charlie").unwrap(); 1576 ws.write_number(3, 2, 70.0).unwrap(); 1577 wb.save(path_a).unwrap(); 1578 1579 let mut wb = Workbook::new(); 1580 let ws = wb.add_worksheet().set_name("Data").unwrap(); 1581 ws.write_string(0, 0, "ID").unwrap(); 1582 ws.write_string(0, 1, "Name").unwrap(); 1583 ws.write_string(0, 2, "Score").unwrap(); 1584 ws.write_string(1, 0, "1").unwrap(); 1585 ws.write_string(1, 1, "Alice").unwrap(); 1586 ws.write_number(1, 2, 95.0).unwrap(); 1587 ws.write_string(2, 0, "2").unwrap(); 1588 ws.write_string(2, 1, "Bob").unwrap(); 1589 ws.write_number(2, 2, 80.0).unwrap(); 1590 ws.write_string(3, 0, "4").unwrap(); 1591 ws.write_string(3, 1, "Dana").unwrap(); 1592 ws.write_number(3, 2, 85.0).unwrap(); 1593 wb.save(path_b).unwrap(); 1594 } 1595 1596 /// Create a pair for tolerance testing. 1597 /// File A: ID=1/Price=100.001, ID=2/Price=200.5 1598 /// File B: ID=1/Price=100.002, ID=2/Price=200.6 1599 /// With tolerance 0.01: only ID=2 should be modified (diff=0.1 > 0.01) 1600 pub fn create_diff_pair_with_floats(path_a: &Path, path_b: &Path) { 1601 let mut wb = Workbook::new(); 1602 let ws = wb.add_worksheet().set_name("Data").unwrap(); 1603 ws.write_string(0, 0, "ID").unwrap(); 1604 ws.write_string(0, 1, "Price").unwrap(); 1605 ws.write_string(1, 0, "1").unwrap(); 1606 ws.write_number(1, 1, 100.001).unwrap(); 1607 ws.write_string(2, 0, "2").unwrap(); 1608 ws.write_number(2, 1, 200.5).unwrap(); 1609 wb.save(path_a).unwrap(); 1610 1611 let mut wb = Workbook::new(); 1612 let ws = wb.add_worksheet().set_name("Data").unwrap(); 1613 ws.write_string(0, 0, "ID").unwrap(); 1614 ws.write_string(0, 1, "Price").unwrap(); 1615 ws.write_string(1, 0, "1").unwrap(); 1616 ws.write_number(1, 1, 100.002).unwrap(); 1617 ws.write_string(2, 0, "2").unwrap(); 1618 ws.write_number(2, 1, 200.6).unwrap(); 1619 wb.save(path_b).unwrap(); 1620 } 1621 ``` 1622 1623 - [ ] **Step 2: Verify compilation** 1624 1625 Run: `cargo test --no-run` 1626 Expected: compiles successfully 1627 1628 - [ ] **Step 3: Commit** 1629 1630 ```bash 1631 git add tests/common/mod.rs 1632 git commit -m "test(xldiff): add diff test fixtures — positional, keyed, and float tolerance pairs" 1633 ``` 1634 1635 --- 1636 1637 ### Task 15: Integration tests 1638 1639 **Files:** 1640 - Create: `tests/test_xldiff.rs` 1641 1642 - [ ] **Step 1: Write integration tests** 1643 1644 Create `tests/test_xldiff.rs`: 1645 1646 ```rust 1647 mod common; 1648 1649 use assert_cmd::Command; 1650 use predicates::prelude::*; 1651 use tempfile::TempDir; 1652 1653 #[test] 1654 fn test_no_diff() { 1655 let dir = TempDir::new().unwrap(); 1656 let path_a = dir.path().join("a.xlsx"); 1657 let path_b = dir.path().join("b.xlsx"); 1658 // Use same data for both 1659 common::create_simple(&path_a); 1660 common::create_simple(&path_b); 1661 1662 Command::cargo_bin("xldiff").unwrap() 1663 .arg(path_a.to_str().unwrap()) 1664 .arg(path_b.to_str().unwrap()) 1665 .arg("--no-color") 1666 .assert() 1667 .success() // exit 0 1668 .stdout(predicate::str::contains("No differences found.")); 1669 } 1670 1671 #[test] 1672 fn test_positional_diff() { 1673 let dir = TempDir::new().unwrap(); 1674 let path_a = dir.path().join("a.xlsx"); 1675 let path_b = dir.path().join("b.xlsx"); 1676 common::create_diff_pair(&path_a, &path_b); 1677 1678 Command::cargo_bin("xldiff").unwrap() 1679 .arg(path_a.to_str().unwrap()) 1680 .arg(path_b.to_str().unwrap()) 1681 .arg("--no-color") 1682 .assert() 1683 .code(1) // differences found 1684 .stdout(predicate::str::contains("Removed: 1")) 1685 .stdout(predicate::str::contains("Added: 1")) 1686 .stdout(predicate::str::contains("Bob")); 1687 } 1688 1689 #[test] 1690 fn test_keyed_diff() { 1691 let dir = TempDir::new().unwrap(); 1692 let path_a = dir.path().join("a.xlsx"); 1693 let path_b = dir.path().join("b.xlsx"); 1694 common::create_diff_pair_with_keys(&path_a, &path_b); 1695 1696 Command::cargo_bin("xldiff").unwrap() 1697 .arg(path_a.to_str().unwrap()) 1698 .arg(path_b.to_str().unwrap()) 1699 .arg("--key").arg("ID") 1700 .arg("--no-color") 1701 .assert() 1702 .code(1) 1703 .stdout(predicate::str::contains("Modified: 1")) 1704 .stdout(predicate::str::contains("Removed: 1")) 1705 .stdout(predicate::str::contains("Added: 1")); 1706 } 1707 1708 #[test] 1709 fn test_tolerance() { 1710 let dir = TempDir::new().unwrap(); 1711 let path_a = dir.path().join("a.xlsx"); 1712 let path_b = dir.path().join("b.xlsx"); 1713 common::create_diff_pair_with_floats(&path_a, &path_b); 1714 1715 // With tolerance=0.01, ID=1 diff (0.001) within tolerance, ID=2 diff (0.1) exceeds 1716 Command::cargo_bin("xldiff").unwrap() 1717 .arg(path_a.to_str().unwrap()) 1718 .arg(path_b.to_str().unwrap()) 1719 .arg("--key").arg("ID") 1720 .arg("--tolerance").arg("0.01") 1721 .arg("--no-color") 1722 .assert() 1723 .code(1) 1724 .stdout(predicate::str::contains("Modified: 1")); 1725 } 1726 1727 #[test] 1728 fn test_json_format() { 1729 let dir = TempDir::new().unwrap(); 1730 let path_a = dir.path().join("a.xlsx"); 1731 let path_b = dir.path().join("b.xlsx"); 1732 common::create_diff_pair_with_keys(&path_a, &path_b); 1733 1734 Command::cargo_bin("xldiff").unwrap() 1735 .arg(path_a.to_str().unwrap()) 1736 .arg(path_b.to_str().unwrap()) 1737 .arg("--key").arg("ID") 1738 .arg("--format").arg("json") 1739 .assert() 1740 .code(1) 1741 .stdout(predicate::str::contains("\"added\"")) 1742 .stdout(predicate::str::contains("\"removed\"")) 1743 .stdout(predicate::str::contains("\"modified\"")); 1744 } 1745 1746 #[test] 1747 fn test_markdown_format() { 1748 let dir = TempDir::new().unwrap(); 1749 let path_a = dir.path().join("a.xlsx"); 1750 let path_b = dir.path().join("b.xlsx"); 1751 common::create_diff_pair_with_keys(&path_a, &path_b); 1752 1753 Command::cargo_bin("xldiff").unwrap() 1754 .arg(path_a.to_str().unwrap()) 1755 .arg(path_b.to_str().unwrap()) 1756 .arg("--key").arg("ID") 1757 .arg("--format").arg("markdown") 1758 .assert() 1759 .code(1) 1760 .stdout(predicate::str::contains("## Added")) 1761 .stdout(predicate::str::contains("## Removed")) 1762 .stdout(predicate::str::contains("## Modified")); 1763 } 1764 1765 #[test] 1766 fn test_csv_format() { 1767 let dir = TempDir::new().unwrap(); 1768 let path_a = dir.path().join("a.xlsx"); 1769 let path_b = dir.path().join("b.xlsx"); 1770 common::create_diff_pair_with_keys(&path_a, &path_b); 1771 1772 Command::cargo_bin("xldiff").unwrap() 1773 .arg(path_a.to_str().unwrap()) 1774 .arg(path_b.to_str().unwrap()) 1775 .arg("--key").arg("ID") 1776 .arg("--format").arg("csv") 1777 .assert() 1778 .code(1) 1779 .stdout(predicate::str::contains("_status")) 1780 .stdout(predicate::str::contains("added")) 1781 .stdout(predicate::str::contains("removed")) 1782 .stdout(predicate::str::contains("modified")); 1783 } 1784 1785 #[test] 1786 fn test_file_not_found() { 1787 Command::cargo_bin("xldiff").unwrap() 1788 .arg("nonexistent.xlsx") 1789 .arg("also_nonexistent.xlsx") 1790 .assert() 1791 .code(2) 1792 .stderr(predicate::str::contains("file not found")); 1793 } 1794 1795 #[test] 1796 fn test_sheet_selector() { 1797 let dir = TempDir::new().unwrap(); 1798 let path = dir.path().join("multi.xlsx"); 1799 common::create_multi_sheet(&path); 1800 let path_str = path.to_str().unwrap(); 1801 1802 // Compare Revenue vs Expenses sheets within same file 1803 Command::cargo_bin("xldiff").unwrap() 1804 .arg(format!("{}:Revenue", path_str)) 1805 .arg(format!("{}:Expenses", path_str)) 1806 .arg("--no-color") 1807 .assert() 1808 .code(1); // different data → exit 1 1809 } 1810 1811 #[test] 1812 fn test_cols_filter() { 1813 let dir = TempDir::new().unwrap(); 1814 let path_a = dir.path().join("a.xlsx"); 1815 let path_b = dir.path().join("b.xlsx"); 1816 common::create_diff_pair_with_keys(&path_a, &path_b); 1817 1818 // Only compare Name column (ignore Score), so ID=1 shows no modification 1819 Command::cargo_bin("xldiff").unwrap() 1820 .arg(path_a.to_str().unwrap()) 1821 .arg(path_b.to_str().unwrap()) 1822 .arg("--key").arg("ID") 1823 .arg("--cols").arg("ID,Name") 1824 .arg("--no-color") 1825 .assert() 1826 .code(1) 1827 // ID=1 has same Name in both, so no modification for that row 1828 .stdout(predicate::str::contains("Modified: 0").or(predicate::str::contains("Modified").not())); 1829 } 1830 ``` 1831 1832 - [ ] **Step 2: Run integration tests** 1833 1834 Run: `cargo test --test test_xldiff` 1835 Expected: all tests PASS 1836 1837 - [ ] **Step 3: Commit** 1838 1839 ```bash 1840 git add tests/test_xldiff.rs 1841 git commit -m "test(xldiff): add integration tests — all modes, formats, tolerance, errors" 1842 ``` 1843 1844 --- 1845 1846 ### Task 16: Update CLAUDE.md and README 1847 1848 **Files:** 1849 - Modify: `CLAUDE.md` 1850 1851 - [ ] **Step 1: Update CLAUDE.md** 1852 1853 Update the "Current tools" section to include xldiff, remove it from "Related tools" and "Planned": 1854 1855 In CLAUDE.md, update: 1856 - Add `- **xldiff** — compare two sheets, report added/removed/modified rows` to "Current tools" 1857 - Remove or update the "Related tools" xldiff entry 1858 - Remove xlfilter from "Planned" if still there (it's already built) 1859 - Update the Architecture line to mention the diff module 1860 1861 - [ ] **Step 2: Commit** 1862 1863 ```bash 1864 git add CLAUDE.md 1865 git commit -m "docs: update CLAUDE.md — add xldiff to current tools, remove from planned" 1866 ``` 1867 1868 --- 1869 1870 ### Task 17: Final verification 1871 1872 - [ ] **Step 1: Run all tests** 1873 1874 Run: `cargo test` 1875 Expected: all tests pass (existing + new) 1876 1877 - [ ] **Step 2: Run clippy** 1878 1879 Run: `cargo clippy --all-targets` 1880 Expected: no warnings 1881 1882 - [ ] **Step 3: Build release binary** 1883 1884 Run: `cargo build --release --bin xldiff` 1885 Expected: binary created at `target/release/xldiff` 1886 1887 - [ ] **Step 4: Manual smoke test** 1888 1889 Run: `cargo run --bin xldiff -- demo/budget.xlsx demo/sales.xlsx --no-color` 1890 Expected: shows differences between the two demo files 1891 1892 - [ ] **Step 5: Commit any fixes** 1893 1894 If clippy or tests revealed issues, fix and commit.