FinanceRoutines.jl

Financial data routines for Julia
Log | Files | Refs | README | LICENSE

2026-03-22-v0.5.0-hardening-and-extensions.md (28612B)


      1 # FinanceRoutines.jl v0.5.0 — Hardening & Extensions
      2 
      3 > **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
      4 
      5 **Goal:** Fix all identified quality/robustness issues, restructure ImportYields.jl, add CI path filtering, create NEWS.md, and implement extensions (FF5, portfolio returns, event studies, diagnostics) — releasing as v0.5.0.
      6 
      7 **Architecture:** Fixes first (tasks 1–8), then extensions (tasks 9–11), then integration + release (tasks 12–13). Each task is independently testable and committable. The ImportYields.jl split (task 5) is the highest-risk refactor — it moves code into two new files without changing any public API.
      8 
      9 **Tech Stack:** Julia 1.10+, LibPQ, DataFrames, CSV, Downloads, ZipFile, Roots, LinearAlgebra, FlexiJoins, BazerData, GitHub Actions
     10 
     11 ---
     12 
     13 ## File Map
     14 
     15 ### Files to create
     16 - `src/GSW.jl` — GSW parameter struct, yield/price/forward/return calculations, DataFrame wrappers
     17 - `src/BondPricing.jl` — `bond_yield`, `bond_yield_excel`, day-count helpers
     18 - `src/ImportFamaFrench5.jl` — FF5 + momentum import functions
     19 - `src/PortfolioUtils.jl` — portfolio return calculations
     20 - `src/Diagnostics.jl` — data quality diagnostics
     21 - `test/UnitTests/FF5.jl` — tests for FF5/momentum imports
     22 - `test/UnitTests/PortfolioUtils.jl` — tests for portfolio returns
     23 - `test/UnitTests/Diagnostics.jl` — tests for diagnostics
     24 - `NEWS.md` — changelog
     25 
     26 ### Files to modify
     27 - `src/FinanceRoutines.jl` — update includes/exports
     28 - `src/Utilities.jl` — add retry logic, remove broken logging macro
     29 - `src/ImportFamaFrench.jl` — make parsing more robust, refactor for FF5 reuse
     30 - `src/ImportYields.jl` — DELETE (replaced by GSW.jl + BondPricing.jl)
     31 - `src/ImportCRSP.jl` — expand missing-value flags in `_safe_parse_float` (actually in ImportYields.jl, moves to GSW.jl)
     32 - `.github/workflows/CI.yml` — add path filters, consider macOS
     33 - `Project.toml` — bump to 0.5.0
     34 - `test/runtests.jl` — add new test suites
     35 
     36 ### Files to delete
     37 - `src/ImportYields.jl` — replaced by `src/GSW.jl` + `src/BondPricing.jl`
     38 
     39 ---
     40 
     41 ## Task 1: Remove broken logging macro & clean up Utilities.jl
     42 
     43 **Files:**
     44 - Modify: `src/Utilities.jl:70-102`
     45 - Modify: `src/FinanceRoutines.jl:19` (remove Logging import if unused)
     46 - Modify: `src/ImportCRSP.jl:303,381,400` (replace `@log_msg` calls)
     47 
     48 - [ ] **Step 1: Check all usages of `@log_msg` and `log_with_level`**
     49 
     50 Run: `grep -rn "log_msg\|log_with_level" src/`
     51 
     52 - [ ] **Step 2: Replace `@log_msg` calls in ImportCRSP.jl with `@debug`**
     53 
     54 In `src/ImportCRSP.jl`, replace the three `@log_msg` calls (lines 303, 381, 400) with `@debug`:
     55 ```julia
     56 # Before:
     57 @log_msg "# -- GETTING MONTHLY STOCK FILE (CIZ) ... msf_v2"
     58 # After:
     59 @debug "Getting monthly stock file (CIZ) ... msf_v2"
     60 ```
     61 
     62 - [ ] **Step 3: Remove `log_with_level` and `@log_msg` from Utilities.jl**
     63 
     64 Delete lines 69–102 (the `log_with_level` function and `@log_msg` macro).
     65 
     66 - [ ] **Step 4: Clean up Logging import and stale export in FinanceRoutines.jl**
     67 
     68 Remove the entire `import Logging: ...` line from `FinanceRoutines.jl` (line 19). `@debug` and `@warn` are available from `Base.CoreLogging` without explicit import. Also remove `Logging` from the `[deps]` section of `Project.toml`. Also remove the stale `export greet_FinanceRoutines` (line 45) — this function is not defined anywhere.
     69 
     70 - [ ] **Step 5: Run tests to verify nothing breaks**
     71 
     72 Run: `julia --project=. -e 'using Pkg; Pkg.test()'` (with `[skip ci]` since this is internal cleanup)
     73 
     74 - [ ] **Step 6: Commit**
     75 
     76 ```bash
     77 git add src/Utilities.jl src/ImportCRSP.jl src/FinanceRoutines.jl
     78 git commit -m "Remove broken @log_msg macro, replace with @debug [skip ci]"
     79 ```
     80 
     81 ---
     82 
     83 ## Task 2: Add WRDS connection retry logic
     84 
     85 **Files:**
     86 - Modify: `src/Utilities.jl:19-29` (the `open_wrds_pg(user, password)` method)
     87 
     88 - [ ] **Step 1: Write test for retry behavior**
     89 
     90 This is hard to unit test (requires WRDS), so we verify by code review. The retry logic should:
     91 - Attempt up to 3 connections
     92 - Exponential backoff: 1s, 2s, 4s
     93 - Log warnings on retry
     94 - Rethrow on final failure
     95 
     96 - [ ] **Step 2: Add retry wrapper to `open_wrds_pg`**
     97 
     98 Replace the `open_wrds_pg(user, password)` function:
     99 
    100 ```julia
    101 function open_wrds_pg(user::AbstractString, password::AbstractString;
    102                       max_retries::Int=3, base_delay::Float64=1.0)
    103     conn_str = """
    104         host = wrds-pgdata.wharton.upenn.edu
    105         port = 9737
    106         user='$user'
    107         password='$password'
    108         sslmode = 'require' dbname = wrds
    109     """
    110     for attempt in 1:max_retries
    111         try
    112             return Connection(conn_str)
    113         catch e
    114             if attempt == max_retries
    115                 rethrow(e)
    116             end
    117             delay = base_delay * 2^(attempt - 1)
    118             @warn "WRDS connection attempt $attempt/$max_retries failed, retrying in $(delay)s" exception=e
    119             sleep(delay)
    120         end
    121     end
    122 end
    123 ```
    124 
    125 - [ ] **Step 3: Verify the package loads and existing tests still pass**
    126 
    127 Run: `julia --project=. -e 'using FinanceRoutines'`
    128 
    129 - [ ] **Step 4: Commit**
    130 
    131 ```bash
    132 git add src/Utilities.jl
    133 git commit -m "Add retry logic with exponential backoff for WRDS connections"
    134 ```
    135 
    136 ---
    137 
    138 ## Task 3: Expand missing-value flags in `_safe_parse_float`
    139 
    140 **Files:**
    141 - Modify: `src/ImportYields.jl:281-309` (will move to GSW.jl in task 5, but fix first)
    142 
    143 - [ ] **Step 1: Write test for expanded flags**
    144 
    145 Add to `test/UnitTests/Yields.jl` inside a new testset:
    146 
    147 ```julia
    148 @testset "Missing value flag handling" begin
    149     @test ismissing(FinanceRoutines._safe_parse_float(-999.99))
    150     @test ismissing(FinanceRoutines._safe_parse_float(-999.0))
    151     @test ismissing(FinanceRoutines._safe_parse_float(-9999.0))
    152     @test ismissing(FinanceRoutines._safe_parse_float(-99.99))
    153     @test !ismissing(FinanceRoutines._safe_parse_float(-5.0))  # legitimate negative
    154     @test FinanceRoutines._safe_parse_float(3.14) ≈ 3.14
    155     @test ismissing(FinanceRoutines._safe_parse_float(""))
    156     @test ismissing(FinanceRoutines._safe_parse_float(missing))
    157 end
    158 ```
    159 
    160 - [ ] **Step 2: Run test to verify it fails**
    161 
    162 Run: `julia --project=. -e 'using FinanceRoutines, Test; include("test/UnitTests/Yields.jl")'`
    163 Expected: FAIL on `-999.0` and `-9999.0`
    164 
    165 - [ ] **Step 3: Update `_safe_parse_float`**
    166 
    167 ```julia
    168 function _safe_parse_float(value)
    169     if ismissing(value) || value == ""
    170         return missing
    171     end
    172 
    173     if value isa AbstractString
    174         parsed = tryparse(Float64, strip(value))
    175         if isnothing(parsed)
    176             return missing
    177         end
    178         value = parsed
    179     end
    180 
    181     try
    182         numeric_value = Float64(value)
    183         # Common missing data flags in economic/financial datasets
    184         if numeric_value in (-999.99, -999.0, -9999.0, -99.99)
    185             return missing
    186         end
    187         return numeric_value
    188     catch
    189         return missing
    190     end
    191 end
    192 ```
    193 
    194 - [ ] **Step 4: Run test to verify it passes**
    195 
    196 Run: `julia --project=. -e 'using FinanceRoutines, Test; include("test/UnitTests/Yields.jl")'`
    197 Expected: PASS
    198 
    199 - [ ] **Step 5: Commit**
    200 
    201 ```bash
    202 git add src/ImportYields.jl test/UnitTests/Yields.jl
    203 git commit -m "Expand missing-value flags to cover -999, -9999, -99.99"
    204 ```
    205 
    206 ---
    207 
    208 ## Task 4: Make Ken French parsing more robust
    209 
    210 **Files:**
    211 - Modify: `src/ImportFamaFrench.jl:118-159` (`_parse_ff_annual`)
    212 - Modify: `src/ImportFamaFrench.jl:164-205` (`_parse_ff_monthly`)
    213 
    214 - [ ] **Step 1: Refactor `_parse_ff_annual` to use data-pattern detection**
    215 
    216 Instead of `occursin(r"Annual Factors", line)`, detect the annual section by:
    217 1. Skip past the monthly data (lines starting with 6-digit YYYYMM)
    218 2. Find the next block of lines starting with 4-digit YYYY
    219 
    220 ```julia
    221 function _parse_ff_annual(zip_file; types=nothing)
    222     file_lines = split(String(read(zip_file)), '\n')
    223 
    224     # Find annual data: lines starting with a 4-digit year that are NOT 6-digit monthly dates
    225     # Annual section comes after monthly section
    226     found_monthly = false
    227     past_monthly = false
    228     lines = String[]
    229 
    230     for line in file_lines
    231         stripped = strip(line)
    232 
    233         # Track when we're past the monthly data section
    234         if !found_monthly && occursin(r"^\s*\d{6}", stripped)
    235             found_monthly = true
    236             continue
    237         end
    238 
    239         if found_monthly && !past_monthly
    240             # Still in monthly section until we hit a non-data line
    241             if occursin(r"^\s*\d{6}", stripped)
    242                 continue
    243             elseif !occursin(r"^\s*$", stripped) && !occursin(r"^\s*\d", stripped)
    244                 past_monthly = true
    245                 continue
    246             else
    247                 continue
    248             end
    249         end
    250 
    251         if past_monthly
    252             # Look for annual data lines (4-digit year)
    253             if occursin(r"^\s*\d{4}\s*,", stripped)
    254                 push!(lines, replace(stripped, r"[\r]" => ""))
    255             elseif !isempty(lines) && occursin(r"^\s*$", stripped)
    256                 break  # End of annual section
    257             end
    258         end
    259     end
    260 
    261     if isempty(lines)
    262         error("Annual Factors section not found in file")
    263     end
    264 
    265     lines_buffer = IOBuffer(join(lines, "\n"))
    266     return CSV.File(lines_buffer, header=false, delim=",", ntasks=1, types=types) |> DataFrame |>
    267            df -> rename!(df, [:datey, :mktrf, :smb, :hml, :rf])
    268 end
    269 ```
    270 
    271 - [ ] **Step 2: Run Ken French tests**
    272 
    273 Run: `julia --project=. -e 'using FinanceRoutines, Test; include("test/UnitTests/KenFrench.jl")'`
    274 Expected: PASS
    275 
    276 - [ ] **Step 3: Commit**
    277 
    278 ```bash
    279 git add src/ImportFamaFrench.jl
    280 git commit -m "Make FF3 parsing use data patterns instead of hardcoded headers"
    281 ```
    282 
    283 ---
    284 
    285 ## Task 5: Split ImportYields.jl into GSW.jl + BondPricing.jl
    286 
    287 This is the largest refactor. No API changes — just file reorganization.
    288 
    289 **Files:**
    290 - Create: `src/GSW.jl` — everything from ImportYields.jl lines 1–1368 (GSWParameters struct, all gsw_* functions, DataFrame wrappers, helpers)
    291 - Create: `src/BondPricing.jl` — everything from ImportYields.jl lines 1371–1694 (bond_yield, bond_yield_excel, day-count functions)
    292 - Delete: `src/ImportYields.jl`
    293 - Modify: `src/FinanceRoutines.jl` — update includes
    294 
    295 - [ ] **Step 1: Create `src/GSW.jl`**
    296 
    297 Copy lines 1–1368 from ImportYields.jl into `src/GSW.jl`. This includes:
    298 - `GSWParameters` struct and constructors
    299 - `is_three_factor_model`, `_extract_params`
    300 - `import_gsw_parameters`, `_clean_gsw_data`, `_safe_parse_float`, `_validate_gsw_data`
    301 - `gsw_yield`, `gsw_price`, `gsw_forward_rate`
    302 - `gsw_yield_curve`, `gsw_price_curve`
    303 - `gsw_return`, `gsw_excess_return`
    304 - `add_yields!`, `add_prices!`, `add_returns!`, `add_excess_returns!`
    305 - `gsw_curve_snapshot`
    306 - `_validate_gsw_dataframe`, `_maturity_to_column_name`
    307 
    308 - [ ] **Step 2: Create `src/BondPricing.jl`**
    309 
    310 Copy lines 1371–1694 from ImportYields.jl into `src/BondPricing.jl`. This includes:
    311 - `bond_yield_excel`
    312 - `bond_yield`
    313 - `_day_count_days`
    314 - `_date_difference`
    315 
    316 - [ ] **Step 3: Update `src/FinanceRoutines.jl`**
    317 
    318 Replace `include("ImportYields.jl")` with:
    319 ```julia
    320 include("GSW.jl")
    321 include("BondPricing.jl")
    322 ```
    323 
    324 - [ ] **Step 4: Delete `src/ImportYields.jl`**
    325 
    326 - [ ] **Step 5: Run full test suite to verify nothing broke**
    327 
    328 Run: `julia --project=. -e 'using FinanceRoutines, Test; include("test/UnitTests/Yields.jl")'`
    329 Expected: All 70+ assertions PASS
    330 
    331 - [ ] **Step 6: Commit**
    332 
    333 ```bash
    334 git add src/GSW.jl src/BondPricing.jl src/FinanceRoutines.jl
    335 git rm src/ImportYields.jl
    336 git commit -m "Split ImportYields.jl into GSW.jl and BondPricing.jl (no API changes)"
    337 ```
    338 
    339 ---
    340 
    341 ## Task 6: Add CI path filters and macOS runner
    342 
    343 **Files:**
    344 - Modify: `.github/workflows/CI.yml`
    345 
    346 - [ ] **Step 1: Add path filters to CI.yml**
    347 
    348 ```yaml
    349 on:
    350   push:
    351     branches:
    352       - main
    353     tags:
    354       - "*"
    355     paths:
    356       - 'src/**'
    357       - 'test/**'
    358       - 'Project.toml'
    359       - '.github/workflows/CI.yml'
    360   pull_request:
    361     paths:
    362       - 'src/**'
    363       - 'test/**'
    364       - 'Project.toml'
    365       - '.github/workflows/CI.yml'
    366 ```
    367 
    368 - [ ] **Step 2: Add macOS to the matrix**
    369 
    370 ```yaml
    371 matrix:
    372   version:
    373     - "1.11"
    374     - nightly
    375   os:
    376     - ubuntu-latest
    377     - macos-latest
    378   arch:
    379     - x64
    380 ```
    381 
    382 - [ ] **Step 3: Commit**
    383 
    384 ```bash
    385 git add .github/workflows/CI.yml
    386 git commit -m "Add CI path filters and macOS runner [skip ci]"
    387 ```
    388 
    389 ---
    390 
    391 ## Task 7: Clarify env parsing in test/runtests.jl
    392 
    393 **Files:**
    394 - Modify: `test/runtests.jl:33`
    395 
    396 Line 33 uses `!startswith(line, "#") || continue` which correctly skips comment lines (the `||` evaluates `continue` when the left side is `false`, i.e. when the line IS a comment). This is logically correct but reads awkwardly. Rewrite to the more idiomatic `&&` form for clarity.
    397 
    398 - [ ] **Step 1: Rewrite for readability**
    399 
    400 ```julia
    401 # Before (correct but hard to read):
    402 !startswith(line, "#") || continue
    403 # After (same logic, clearer):
    404 startswith(line, "#") && continue
    405 ```
    406 
    407 - [ ] **Step 2: Commit**
    408 
    409 ```bash
    410 git add test/runtests.jl
    411 git commit -m "Clarify env parsing idiom in test runner [skip ci]"
    412 ```
    413 
    414 ---
    415 
    416 ## Task 8: Create NEWS.md
    417 
    418 **Files:**
    419 - Create: `NEWS.md`
    420 
    421 - [ ] **Step 1: Create NEWS.md with v0.5.0 changelog**
    422 
    423 ```markdown
    424 # FinanceRoutines.jl Changelog
    425 
    426 ## v0.5.0
    427 
    428 ### Breaking changes
    429 - `ImportYields.jl` split into `GSW.jl` (yield curve model) and `BondPricing.jl` (bond math). No public API changes, but code that `include`d `ImportYields.jl` directly will need updating.
    430 - Missing-value flags expanded: `-999.0`, `-9999.0`, `-99.99` now treated as missing in GSW data (previously only `-999.99`). **Migration note:** if your downstream code used these numeric values (e.g., `-999.0` as an actual number), they will now silently become `missing`. Check any filtering or aggregation that might be affected.
    431 
    432 ### New features
    433 - `import_FF5`: Import Fama-French 5-factor model data (market, size, value, profitability, investment)
    434 - `import_FF_momentum`: Import Fama-French momentum factor
    435 - `calculate_portfolio_returns`: Value-weighted and equal-weighted portfolio return calculations
    436 - `diagnose`: Data quality diagnostics for financial DataFrames
    437 - WRDS connections now retry up to 3 times with exponential backoff
    438 
    439 ### Internal improvements
    440 - Removed broken `@log_msg` macro, replaced with `@debug`
    441 - Removed stale `export greet_FinanceRoutines` (function was never defined)
    442 - Removed `Logging` from dependencies (macros available from Base)
    443 - Ken French file parsing generalized with shared helpers for FF3/FF5 reuse
    444 - CI now filters by path (skips runs for docs-only changes)
    445 - CI matrix includes macOS
    446 ```
    447 
    448 - [ ] **Step 2: Commit**
    449 
    450 ```bash
    451 git add NEWS.md
    452 git commit -m "Add NEWS.md for v0.5.0 [skip ci]"
    453 ```
    454 
    455 ---
    456 
    457 ## Task 9: Add Fama-French 5-factor and Momentum imports
    458 
    459 **Files:**
    460 - Create: `src/ImportFamaFrench5.jl`
    461 - Modify: `src/FinanceRoutines.jl` (add include + exports)
    462 - Create: `test/UnitTests/FF5.jl`
    463 - Modify: `test/runtests.jl` (add "FF5" to testsuite)
    464 
    465 The FF5 and momentum files follow the same zip+CSV format as FF3 on Ken French's site.
    466 
    467 - [ ] **Step 1: Write failing tests**
    468 
    469 ```julia
    470 # test/UnitTests/FF5.jl
    471 @testset "Importing Fama-French 5 factors and Momentum" begin
    472     import Dates
    473 
    474     # FF5 monthly
    475     df_FF5_monthly = import_FF5(frequency=:monthly)
    476     @test names(df_FF5_monthly) == ["datem", "mktrf", "smb", "hml", "rmw", "cma", "rf"]
    477     @test nrow(df_FF5_monthly) >= (Dates.year(Dates.today()) - 1963 - 1) * 12
    478 
    479     # FF5 annual
    480     df_FF5_annual = import_FF5(frequency=:annual)
    481     @test names(df_FF5_annual) == ["datey", "mktrf", "smb", "hml", "rmw", "cma", "rf"]
    482     @test nrow(df_FF5_annual) >= Dates.year(Dates.today()) - 1963 - 2
    483 
    484     # FF5 daily
    485     df_FF5_daily = import_FF5(frequency=:daily)
    486     @test names(df_FF5_daily) == ["date", "mktrf", "smb", "hml", "rmw", "cma", "rf"]
    487     @test nrow(df_FF5_daily) >= 15_000
    488 
    489     # Momentum monthly
    490     df_mom_monthly = import_FF_momentum(frequency=:monthly)
    491     @test "mom" in names(df_mom_monthly)
    492     @test nrow(df_mom_monthly) > 1000
    493 end
    494 ```
    495 
    496 - [ ] **Step 2: Run tests to verify they fail**
    497 
    498 Expected: `import_FF5` and `import_FF_momentum` not defined
    499 
    500 - [ ] **Step 3: Generalize `_parse_ff_annual` and `_parse_ff_monthly` to accept `col_names`**
    501 
    502 Before writing FF5, first update the existing parsers in `src/ImportFamaFrench.jl` to accept a `col_names` keyword argument. Default to the FF3 column names so `import_FF3` continues to work unchanged.
    503 
    504 ```julia
    505 # In _parse_ff_annual:
    506 function _parse_ff_annual(zip_file; types=nothing, col_names=[:datey, :mktrf, :smb, :hml, :rf])
    507     # ... existing logic ...
    508     return CSV.File(...) |> DataFrame |> df -> rename!(df, col_names)
    509 end
    510 
    511 # In _parse_ff_monthly:
    512 function _parse_ff_monthly(zip_file; types=nothing, col_names=[:datem, :mktrf, :smb, :hml, :rf])
    513     # ... existing logic ...
    514     return CSV.File(...) |> DataFrame |> df -> rename!(df, col_names)
    515 end
    516 ```
    517 
    518 Also extract a shared `_download_and_parse_ff_zip` helper to DRY up the download+zip+parse logic shared by FF3 and FF5.
    519 
    520 - [ ] **Step 4: Run existing KenFrench tests to verify no regression**
    521 
    522 Run: `julia --project=. -e 'using FinanceRoutines, Test; include("test/UnitTests/KenFrench.jl")'`
    523 Expected: PASS (existing FF3 behavior unchanged)
    524 
    525 - [ ] **Step 5: Implement `import_FF5` in `src/ImportFamaFrench5.jl`**
    526 
    527 The FF5 file URL: `https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Research_Data_5_Factors_2x3_CSV.zip`
    528 Daily URL: `https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Research_Data_5_Factors_2x3_daily_CSV.zip`
    529 
    530 Uses the generalized `_download_and_parse_ff_zip` helper with 7-column names:
    531 
    532 ```julia
    533 function import_FF5(; frequency::Symbol=:monthly)
    534     ff_col_classes = [String7, Float64, Float64, Float64, Float64, Float64, Float64]
    535     url_mth_yr = "https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Research_Data_5_Factors_2x3_CSV.zip"
    536     url_daily  = "https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Research_Data_5_Factors_2x3_daily_CSV.zip"
    537     col_names_mth = [:datem, :mktrf, :smb, :hml, :rmw, :cma, :rf]
    538     col_names_yr  = [:datey, :mktrf, :smb, :hml, :rmw, :cma, :rf]
    539     col_names_day = [:date, :mktrf, :smb, :hml, :rmw, :cma, :rf]
    540     # ... uses shared helper
    541 end
    542 ```
    543 
    544 - [ ] **Step 4: Implement `import_FF_momentum`**
    545 
    546 Momentum URL: `https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Momentum_Factor_CSV.zip`
    547 Daily URL: `https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Momentum_Factor_daily_CSV.zip`
    548 
    549 Single factor file, columns: date, mom
    550 
    551 - [ ] **Step 5: Add exports to FinanceRoutines.jl**
    552 
    553 ```julia
    554 include("ImportFamaFrench5.jl")
    555 export import_FF5, import_FF_momentum
    556 ```
    557 
    558 - [ ] **Step 6: Run tests**
    559 
    560 Run: `julia --project=. -e 'using FinanceRoutines, Test; include("test/UnitTests/FF5.jl")'`
    561 Expected: PASS
    562 
    563 - [ ] **Step 7: Commit**
    564 
    565 ```bash
    566 git add src/ImportFamaFrench5.jl src/FinanceRoutines.jl test/UnitTests/FF5.jl test/runtests.jl
    567 git commit -m "Add import_FF5 and import_FF_momentum for 5-factor model and momentum"
    568 ```
    569 
    570 ---
    571 
    572 ## Task 10: Add portfolio return calculations
    573 
    574 **Files:**
    575 - Create: `src/PortfolioUtils.jl`
    576 - Modify: `src/FinanceRoutines.jl` (add include + exports)
    577 - Create: `test/UnitTests/PortfolioUtils.jl`
    578 - Modify: `test/runtests.jl`
    579 
    580 - [ ] **Step 1: Write failing tests**
    581 
    582 ```julia
    583 # test/UnitTests/PortfolioUtils.jl
    584 @testset "Portfolio Return Calculations" begin
    585     import Dates: Date, Month
    586     import DataFrames: DataFrame, groupby, combine, nrow, transform!
    587 
    588     # Create test data: 3 stocks, 12 months
    589     dates = repeat(Date(2020,1,1):Month(1):Date(2020,12,1), inner=3)
    590     df = DataFrame(
    591         datem = dates,
    592         permno = repeat([1, 2, 3], 12),
    593         ret = rand(36) .* 0.1 .- 0.05,
    594         mktcap = [100.0, 200.0, 300.0, # weights sum to 600
    595                   repeat([100.0, 200.0, 300.0], 11)...]
    596     )
    597 
    598     # Equal-weighted returns
    599     df_ew = calculate_portfolio_returns(df, :ret, :datem; weighting=:equal)
    600     @test nrow(df_ew) == 12
    601     @test "port_ret" in names(df_ew)
    602 
    603     # Value-weighted returns
    604     df_vw = calculate_portfolio_returns(df, :ret, :datem;
    605                                          weighting=:value, weight_col=:mktcap)
    606     @test nrow(df_vw) == 12
    607     @test "port_ret" in names(df_vw)
    608 
    609     # Grouped portfolios (e.g., by size quintile)
    610     df.group = repeat([1, 1, 2], 12)
    611     df_grouped = calculate_portfolio_returns(df, :ret, :datem;
    612                                               weighting=:value, weight_col=:mktcap,
    613                                               groupby=:group)
    614     @test nrow(df_grouped) == 24  # 12 months x 2 groups
    615 end
    616 ```
    617 
    618 - [ ] **Step 2: Implement `calculate_portfolio_returns`**
    619 
    620 ```julia
    621 """
    622     calculate_portfolio_returns(df, ret_col, date_col;
    623         weighting=:value, weight_col=nothing, groupby=nothing)
    624 
    625 Calculate portfolio returns from individual stock returns.
    626 
    627 # Arguments
    628 - `df::DataFrame`: Panel data with stock returns
    629 - `ret_col::Symbol`: Column name for returns
    630 - `date_col::Symbol`: Column name for dates
    631 - `weighting::Symbol`: `:equal` or `:value`
    632 - `weight_col::Union{Nothing,Symbol}`: Column for weights (required if weighting=:value)
    633 - `groupby::Union{Nothing,Symbol,Vector{Symbol}}`: Optional grouping columns
    634 
    635 # Returns
    636 - `DataFrame`: Portfolio returns by date (and group if specified)
    637 """
    638 function calculate_portfolio_returns(df::AbstractDataFrame, ret_col::Symbol, date_col::Symbol;
    639     weighting::Symbol=:value, weight_col::Union{Nothing,Symbol}=nothing,
    640     groupby::Union{Nothing,Symbol,Vector{Symbol}}=nothing)
    641 
    642     if weighting == :value && isnothing(weight_col)
    643         throw(ArgumentError("weight_col required for value-weighted portfolios"))
    644     end
    645 
    646     group_cols = isnothing(groupby) ? [date_col] : vcat([date_col], groupby isa Symbol ? [groupby] : groupby)
    647 
    648     grouped = DataFrames.groupby(df, group_cols)
    649 
    650     if weighting == :equal
    651         return combine(grouped, ret_col => (r -> mean(skipmissing(r))) => :port_ret)
    652     else
    653         return combine(grouped,
    654             [ret_col, weight_col] => ((r, w) -> begin
    655                 valid = .!ismissing.(r) .& .!ismissing.(w)
    656                 any(valid) || return missing
    657                 rv, wv = r[valid], w[valid]
    658                 sum(rv .* wv) / sum(wv)
    659             end) => :port_ret)
    660     end
    661 end
    662 ```
    663 
    664 - [ ] **Step 3: Add dependencies and imports**
    665 
    666 First, move `Statistics` from `[extras]` to `[deps]` in `Project.toml` (it's currently test-only but `calculate_portfolio_returns` uses `mean` at runtime). Add the UUID line to `[deps]`:
    667 ```toml
    668 Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
    669 ```
    670 Keep it in `[extras]` too (valid and harmless in Julia 1.10+).
    671 
    672 Then update `src/FinanceRoutines.jl`:
    673 - Add `import Statistics: mean` to the imports
    674 - Add `combine` to the `import DataFrames:` line (used by `calculate_portfolio_returns`)
    675 - Add include and export:
    676 ```julia
    677 include("PortfolioUtils.jl")
    678 export calculate_portfolio_returns
    679 ```
    680 
    681 - [ ] **Step 4: Run tests**
    682 
    683 Expected: PASS
    684 
    685 - [ ] **Step 5: Commit**
    686 
    687 ```bash
    688 git add src/PortfolioUtils.jl src/FinanceRoutines.jl test/UnitTests/PortfolioUtils.jl test/runtests.jl
    689 git commit -m "Add calculate_portfolio_returns for equal/value-weighted portfolios"
    690 ```
    691 
    692 ---
    693 
    694 ## Task 11: Add data quality diagnostics
    695 
    696 **Files:**
    697 - Create: `src/Diagnostics.jl`
    698 - Modify: `src/FinanceRoutines.jl` (add include + exports)
    699 - Create: `test/UnitTests/Diagnostics.jl`
    700 - Modify: `test/runtests.jl`
    701 
    702 - [ ] **Step 1: Write failing tests**
    703 
    704 ```julia
    705 # test/UnitTests/Diagnostics.jl
    706 @testset "Data Quality Diagnostics" begin
    707     import DataFrames: DataFrame, allowmissing!
    708 
    709     # Create test data with known issues
    710     df = DataFrame(
    711         permno = [1, 1, 1, 2, 2, 2],
    712         date = [Date(2020,1,1), Date(2020,2,1), Date(2020,2,1),  # duplicate for permno 1
    713                 Date(2020,1,1), Date(2020,3,1), Date(2020,4,1)],  # gap for permno 2
    714         ret = [0.05, missing, 0.03, -1.5, 0.02, 150.0],  # suspicious: -1.5, 150.0
    715         prc = [10.0, 20.0, 20.0, -5.0, 30.0, 40.0]  # negative price
    716     )
    717     allowmissing!(df, :ret)
    718 
    719     report = diagnose(df)
    720 
    721     @test haskey(report, :missing_rates)
    722     @test haskey(report, :suspicious_values)
    723     @test haskey(report, :duplicate_keys)
    724     @test report[:missing_rates][:ret] > 0
    725     @test length(report[:suspicious_values]) > 0
    726 end
    727 ```
    728 
    729 - [ ] **Step 2: Implement `diagnose`**
    730 
    731 ```julia
    732 """
    733     diagnose(df; id_col=:permno, date_col=:date, ret_col=:ret, price_col=:prc)
    734 
    735 Run data quality diagnostics on a financial DataFrame.
    736 
    737 Returns a Dict with:
    738 - `:missing_rates` — fraction missing per column
    739 - `:suspicious_values` — rows with returns > 100% or < -100%, negative prices
    740 - `:duplicate_keys` — duplicate (id, date) pairs
    741 - `:nrow`, `:ncol` — dimensions
    742 """
    743 function diagnose(df::AbstractDataFrame;
    744     id_col::Symbol=:permno, date_col::Symbol=:date,
    745     ret_col::Union{Nothing,Symbol}=:ret,
    746     price_col::Union{Nothing,Symbol}=:prc)
    747 
    748     report = Dict{Symbol, Any}()
    749     report[:nrow] = nrow(df)
    750     report[:ncol] = ncol(df)
    751 
    752     # Missing rates
    753     missing_rates = Dict{Symbol, Float64}()
    754     for col in names(df)
    755         col_sym = Symbol(col)
    756         missing_rates[col_sym] = count(ismissing, df[!, col]) / nrow(df)
    757     end
    758     report[:missing_rates] = missing_rates
    759 
    760     # Duplicate keys
    761     if id_col in propertynames(df) && date_col in propertynames(df)
    762         dup_count = nrow(df) - nrow(unique(df, [id_col, date_col]))
    763         report[:duplicate_keys] = dup_count
    764     end
    765 
    766     # Suspicious values
    767     suspicious = String[]
    768     if !isnothing(ret_col) && ret_col in propertynames(df)
    769         n_extreme = count(r -> !ismissing(r) && (r > 1.0 || r < -1.0), df[!, ret_col])
    770         n_extreme > 0 && push!(suspicious, "$n_extreme returns outside [-100%, +100%]")
    771     end
    772     if !isnothing(price_col) && price_col in propertynames(df)
    773         n_neg = count(r -> !ismissing(r) && r < 0, df[!, price_col])
    774         n_neg > 0 && push!(suspicious, "$n_neg negative prices (CRSP convention for bid/ask midpoint)")
    775     end
    776     report[:suspicious_values] = suspicious
    777 
    778     return report
    779 end
    780 ```
    781 
    782 - [ ] **Step 3: Add to FinanceRoutines.jl**
    783 
    784 Update the `import DataFrames:` line to also include `ncol` and `unique` (DataFrames-specific `unique` for duplicate key detection by column). Then add:
    785 
    786 ```julia
    787 include("Diagnostics.jl")
    788 export diagnose
    789 ```
    790 
    791 - [ ] **Step 4: Run tests**
    792 
    793 Expected: PASS
    794 
    795 - [ ] **Step 5: Commit**
    796 
    797 ```bash
    798 git add src/Diagnostics.jl src/FinanceRoutines.jl test/UnitTests/Diagnostics.jl test/runtests.jl
    799 git commit -m "Add diagnose() for data quality diagnostics on financial DataFrames"
    800 ```
    801 
    802 ---
    803 
    804 ## Task 12: Version bump and final integration
    805 
    806 **Files:**
    807 - Modify: `Project.toml` — version to "0.5.0", add Statistics to [deps] if needed for PortfolioUtils
    808 - Modify: `NEWS.md` — finalize
    809 - Modify: `test/runtests.jl` — ensure all new test suites are listed
    810 
    811 - [ ] **Step 1: Update Project.toml version**
    812 
    813 Change `version = "0.4.5"` to `version = "0.5.0"`
    814 
    815 - [ ] **Step 2: Verify all dependencies are correct in Project.toml**
    816 
    817 Statistics should already be in `[deps]` (added in Task 10). Logging should already be removed (Task 1). Verify no stale entries.
    818 
    819 - [ ] **Step 3: Update test/runtests.jl testsuite list**
    820 
    821 ```julia
    822 const testsuite = [
    823     "KenFrench",
    824     "FF5",
    825     "WRDS",
    826     "betas",
    827     "Yields",
    828     "PortfolioUtils",
    829     "Diagnostics",
    830 ]
    831 ```
    832 
    833 - [ ] **Step 4: Run full test suite**
    834 
    835 Run: `julia --project=. -e 'using Pkg; Pkg.test()'`
    836 Expected: ALL PASS
    837 
    838 - [ ] **Step 5: Commit**
    839 
    840 ```bash
    841 git add Project.toml test/runtests.jl NEWS.md
    842 git commit -m "Bump version to v0.5.0, finalize test suite and changelog"
    843 ```
    844 
    845 ---
    846 
    847 ## Task 13: Tag release and update registry
    848 
    849 Follow the release workflow in CLAUDE.md:
    850 
    851 - [ ] **Step 1: Tag**
    852 
    853 ```bash
    854 git tag v0.5.0
    855 git push origin v0.5.0
    856 ```
    857 
    858 - [ ] **Step 2: Get tree SHA**
    859 
    860 ```bash
    861 git rev-parse v0.5.0^{tree}
    862 ```
    863 
    864 - [ ] **Step 3: Update LouLouLibs/loulouJL registry**
    865 
    866 Update `F/FinanceRoutines/Versions.toml`, `Deps.toml`, `Compat.toml` via `gh api`.
    867 
    868 ---
    869 
    870 ## Extensions deferred for user decision
    871 
    872 These were listed as extensions A–E. Tasks 9–11 cover B (FF5), E (diagnostics), and A (portfolio returns). The remaining two are:
    873 
    874 - **C: Event study utilities** — `event_study(events_df, returns_df; ...)` computing CARs/BHARs. Can be added as Task 15 if desired.
    875 - **D: Treasury yield interpolation** — `treasury_zero_rate(date, maturity)` incorporating T-bill rates. Requires a new data source. Can be added as Task 16 if desired.
    876 
    877 Both are independent of the above tasks and can be planned separately.