commit 95151f27d7ab321862e8e22627a9a5b9783655c3
parent 12f368cb2156503e11fecd914949e72000e6c452
Author: Erik Loualiche <[email protected]>
Date: Sun, 22 Mar 2026 10:07:49 -0500
Add v0.5.0 implementation plan: hardening + extensions
13-task plan covering:
- Fixes: dead logging, WRDS retry, missing-value flags, FF parsing robustness,
ImportYields.jl split, CI path filters, env parsing cleanup
- Extensions: FF5+momentum, portfolio returns, data diagnostics
- Release: version bump, NEWS.md, registry update
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Diffstat:
1 file changed, 877 insertions(+), 0 deletions(-)
diff --git a/docs/superpowers/plans/2026-03-22-v0.5.0-hardening-and-extensions.md b/docs/superpowers/plans/2026-03-22-v0.5.0-hardening-and-extensions.md
@@ -0,0 +1,877 @@
+# FinanceRoutines.jl v0.5.0 — Hardening & Extensions
+
+> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+
+**Goal:** Fix all identified quality/robustness issues, restructure ImportYields.jl, add CI path filtering, create NEWS.md, and implement extensions (FF5, portfolio returns, event studies, diagnostics) — releasing as v0.5.0.
+
+**Architecture:** Fixes first (tasks 1–8), then extensions (tasks 9–11), then integration + release (tasks 12–13). Each task is independently testable and committable. The ImportYields.jl split (task 5) is the highest-risk refactor — it moves code into two new files without changing any public API.
+
+**Tech Stack:** Julia 1.10+, LibPQ, DataFrames, CSV, Downloads, ZipFile, Roots, LinearAlgebra, FlexiJoins, BazerData, GitHub Actions
+
+---
+
+## File Map
+
+### Files to create
+- `src/GSW.jl` — GSW parameter struct, yield/price/forward/return calculations, DataFrame wrappers
+- `src/BondPricing.jl` — `bond_yield`, `bond_yield_excel`, day-count helpers
+- `src/ImportFamaFrench5.jl` — FF5 + momentum import functions
+- `src/PortfolioUtils.jl` — portfolio return calculations
+- `src/Diagnostics.jl` — data quality diagnostics
+- `test/UnitTests/FF5.jl` — tests for FF5/momentum imports
+- `test/UnitTests/PortfolioUtils.jl` — tests for portfolio returns
+- `test/UnitTests/Diagnostics.jl` — tests for diagnostics
+- `NEWS.md` — changelog
+
+### Files to modify
+- `src/FinanceRoutines.jl` — update includes/exports
+- `src/Utilities.jl` — add retry logic, remove broken logging macro
+- `src/ImportFamaFrench.jl` — make parsing more robust, refactor for FF5 reuse
+- `src/ImportYields.jl` — DELETE (replaced by GSW.jl + BondPricing.jl)
+- `src/ImportCRSP.jl` — expand missing-value flags in `_safe_parse_float` (actually in ImportYields.jl, moves to GSW.jl)
+- `.github/workflows/CI.yml` — add path filters, consider macOS
+- `Project.toml` — bump to 0.5.0
+- `test/runtests.jl` — add new test suites
+
+### Files to delete
+- `src/ImportYields.jl` — replaced by `src/GSW.jl` + `src/BondPricing.jl`
+
+---
+
+## Task 1: Remove broken logging macro & clean up Utilities.jl
+
+**Files:**
+- Modify: `src/Utilities.jl:70-102`
+- Modify: `src/FinanceRoutines.jl:19` (remove Logging import if unused)
+- Modify: `src/ImportCRSP.jl:303,381,400` (replace `@log_msg` calls)
+
+- [ ] **Step 1: Check all usages of `@log_msg` and `log_with_level`**
+
+Run: `grep -rn "log_msg\|log_with_level" src/`
+
+- [ ] **Step 2: Replace `@log_msg` calls in ImportCRSP.jl with `@debug`**
+
+In `src/ImportCRSP.jl`, replace the three `@log_msg` calls (lines 303, 381, 400) with `@debug`:
+```julia
+# Before:
+@log_msg "# -- GETTING MONTHLY STOCK FILE (CIZ) ... msf_v2"
+# After:
+@debug "Getting monthly stock file (CIZ) ... msf_v2"
+```
+
+- [ ] **Step 3: Remove `log_with_level` and `@log_msg` from Utilities.jl**
+
+Delete lines 69–102 (the `log_with_level` function and `@log_msg` macro).
+
+- [ ] **Step 4: Clean up Logging import and stale export in FinanceRoutines.jl**
+
+Remove the entire `import Logging: ...` line from `FinanceRoutines.jl` (line 19). `@debug` and `@warn` are available from `Base.CoreLogging` without explicit import. Also remove `Logging` from the `[deps]` section of `Project.toml`. Also remove the stale `export greet_FinanceRoutines` (line 45) — this function is not defined anywhere.
+
+- [ ] **Step 5: Run tests to verify nothing breaks**
+
+Run: `julia --project=. -e 'using Pkg; Pkg.test()'` (with `[skip ci]` since this is internal cleanup)
+
+- [ ] **Step 6: Commit**
+
+```bash
+git add src/Utilities.jl src/ImportCRSP.jl src/FinanceRoutines.jl
+git commit -m "Remove broken @log_msg macro, replace with @debug [skip ci]"
+```
+
+---
+
+## Task 2: Add WRDS connection retry logic
+
+**Files:**
+- Modify: `src/Utilities.jl:19-29` (the `open_wrds_pg(user, password)` method)
+
+- [ ] **Step 1: Write test for retry behavior**
+
+This is hard to unit test (requires WRDS), so we verify by code review. The retry logic should:
+- Attempt up to 3 connections
+- Exponential backoff: 1s, 2s, 4s
+- Log warnings on retry
+- Rethrow on final failure
+
+- [ ] **Step 2: Add retry wrapper to `open_wrds_pg`**
+
+Replace the `open_wrds_pg(user, password)` function:
+
+```julia
+function open_wrds_pg(user::AbstractString, password::AbstractString;
+ max_retries::Int=3, base_delay::Float64=1.0)
+ conn_str = """
+ host = wrds-pgdata.wharton.upenn.edu
+ port = 9737
+ user='$user'
+ password='$password'
+ sslmode = 'require' dbname = wrds
+ """
+ for attempt in 1:max_retries
+ try
+ return Connection(conn_str)
+ catch e
+ if attempt == max_retries
+ rethrow(e)
+ end
+ delay = base_delay * 2^(attempt - 1)
+ @warn "WRDS connection attempt $attempt/$max_retries failed, retrying in $(delay)s" exception=e
+ sleep(delay)
+ end
+ end
+end
+```
+
+- [ ] **Step 3: Verify the package loads and existing tests still pass**
+
+Run: `julia --project=. -e 'using FinanceRoutines'`
+
+- [ ] **Step 4: Commit**
+
+```bash
+git add src/Utilities.jl
+git commit -m "Add retry logic with exponential backoff for WRDS connections"
+```
+
+---
+
+## Task 3: Expand missing-value flags in `_safe_parse_float`
+
+**Files:**
+- Modify: `src/ImportYields.jl:281-309` (will move to GSW.jl in task 5, but fix first)
+
+- [ ] **Step 1: Write test for expanded flags**
+
+Add to `test/UnitTests/Yields.jl` inside a new testset:
+
+```julia
+@testset "Missing value flag handling" begin
+ @test ismissing(FinanceRoutines._safe_parse_float(-999.99))
+ @test ismissing(FinanceRoutines._safe_parse_float(-999.0))
+ @test ismissing(FinanceRoutines._safe_parse_float(-9999.0))
+ @test ismissing(FinanceRoutines._safe_parse_float(-99.99))
+ @test !ismissing(FinanceRoutines._safe_parse_float(-5.0)) # legitimate negative
+ @test FinanceRoutines._safe_parse_float(3.14) ≈ 3.14
+ @test ismissing(FinanceRoutines._safe_parse_float(""))
+ @test ismissing(FinanceRoutines._safe_parse_float(missing))
+end
+```
+
+- [ ] **Step 2: Run test to verify it fails**
+
+Run: `julia --project=. -e 'using FinanceRoutines, Test; include("test/UnitTests/Yields.jl")'`
+Expected: FAIL on `-999.0` and `-9999.0`
+
+- [ ] **Step 3: Update `_safe_parse_float`**
+
+```julia
+function _safe_parse_float(value)
+ if ismissing(value) || value == ""
+ return missing
+ end
+
+ if value isa AbstractString
+ parsed = tryparse(Float64, strip(value))
+ if isnothing(parsed)
+ return missing
+ end
+ value = parsed
+ end
+
+ try
+ numeric_value = Float64(value)
+ # Common missing data flags in economic/financial datasets
+ if numeric_value in (-999.99, -999.0, -9999.0, -99.99)
+ return missing
+ end
+ return numeric_value
+ catch
+ return missing
+ end
+end
+```
+
+- [ ] **Step 4: Run test to verify it passes**
+
+Run: `julia --project=. -e 'using FinanceRoutines, Test; include("test/UnitTests/Yields.jl")'`
+Expected: PASS
+
+- [ ] **Step 5: Commit**
+
+```bash
+git add src/ImportYields.jl test/UnitTests/Yields.jl
+git commit -m "Expand missing-value flags to cover -999, -9999, -99.99"
+```
+
+---
+
+## Task 4: Make Ken French parsing more robust
+
+**Files:**
+- Modify: `src/ImportFamaFrench.jl:118-159` (`_parse_ff_annual`)
+- Modify: `src/ImportFamaFrench.jl:164-205` (`_parse_ff_monthly`)
+
+- [ ] **Step 1: Refactor `_parse_ff_annual` to use data-pattern detection**
+
+Instead of `occursin(r"Annual Factors", line)`, detect the annual section by:
+1. Skip past the monthly data (lines starting with 6-digit YYYYMM)
+2. Find the next block of lines starting with 4-digit YYYY
+
+```julia
+function _parse_ff_annual(zip_file; types=nothing)
+ file_lines = split(String(read(zip_file)), '\n')
+
+ # Find annual data: lines starting with a 4-digit year that are NOT 6-digit monthly dates
+ # Annual section comes after monthly section
+ found_monthly = false
+ past_monthly = false
+ lines = String[]
+
+ for line in file_lines
+ stripped = strip(line)
+
+ # Track when we're past the monthly data section
+ if !found_monthly && occursin(r"^\s*\d{6}", stripped)
+ found_monthly = true
+ continue
+ end
+
+ if found_monthly && !past_monthly
+ # Still in monthly section until we hit a non-data line
+ if occursin(r"^\s*\d{6}", stripped)
+ continue
+ elseif !occursin(r"^\s*$", stripped) && !occursin(r"^\s*\d", stripped)
+ past_monthly = true
+ continue
+ else
+ continue
+ end
+ end
+
+ if past_monthly
+ # Look for annual data lines (4-digit year)
+ if occursin(r"^\s*\d{4}\s*,", stripped)
+ push!(lines, replace(stripped, r"[\r]" => ""))
+ elseif !isempty(lines) && occursin(r"^\s*$", stripped)
+ break # End of annual section
+ end
+ end
+ end
+
+ if isempty(lines)
+ error("Annual Factors section not found in file")
+ end
+
+ lines_buffer = IOBuffer(join(lines, "\n"))
+ return CSV.File(lines_buffer, header=false, delim=",", ntasks=1, types=types) |> DataFrame |>
+ df -> rename!(df, [:datey, :mktrf, :smb, :hml, :rf])
+end
+```
+
+- [ ] **Step 2: Run Ken French tests**
+
+Run: `julia --project=. -e 'using FinanceRoutines, Test; include("test/UnitTests/KenFrench.jl")'`
+Expected: PASS
+
+- [ ] **Step 3: Commit**
+
+```bash
+git add src/ImportFamaFrench.jl
+git commit -m "Make FF3 parsing use data patterns instead of hardcoded headers"
+```
+
+---
+
+## Task 5: Split ImportYields.jl into GSW.jl + BondPricing.jl
+
+This is the largest refactor. No API changes — just file reorganization.
+
+**Files:**
+- Create: `src/GSW.jl` — everything from ImportYields.jl lines 1–1368 (GSWParameters struct, all gsw_* functions, DataFrame wrappers, helpers)
+- Create: `src/BondPricing.jl` — everything from ImportYields.jl lines 1371–1694 (bond_yield, bond_yield_excel, day-count functions)
+- Delete: `src/ImportYields.jl`
+- Modify: `src/FinanceRoutines.jl` — update includes
+
+- [ ] **Step 1: Create `src/GSW.jl`**
+
+Copy lines 1–1368 from ImportYields.jl into `src/GSW.jl`. This includes:
+- `GSWParameters` struct and constructors
+- `is_three_factor_model`, `_extract_params`
+- `import_gsw_parameters`, `_clean_gsw_data`, `_safe_parse_float`, `_validate_gsw_data`
+- `gsw_yield`, `gsw_price`, `gsw_forward_rate`
+- `gsw_yield_curve`, `gsw_price_curve`
+- `gsw_return`, `gsw_excess_return`
+- `add_yields!`, `add_prices!`, `add_returns!`, `add_excess_returns!`
+- `gsw_curve_snapshot`
+- `_validate_gsw_dataframe`, `_maturity_to_column_name`
+
+- [ ] **Step 2: Create `src/BondPricing.jl`**
+
+Copy lines 1371–1694 from ImportYields.jl into `src/BondPricing.jl`. This includes:
+- `bond_yield_excel`
+- `bond_yield`
+- `_day_count_days`
+- `_date_difference`
+
+- [ ] **Step 3: Update `src/FinanceRoutines.jl`**
+
+Replace `include("ImportYields.jl")` with:
+```julia
+include("GSW.jl")
+include("BondPricing.jl")
+```
+
+- [ ] **Step 4: Delete `src/ImportYields.jl`**
+
+- [ ] **Step 5: Run full test suite to verify nothing broke**
+
+Run: `julia --project=. -e 'using FinanceRoutines, Test; include("test/UnitTests/Yields.jl")'`
+Expected: All 70+ assertions PASS
+
+- [ ] **Step 6: Commit**
+
+```bash
+git add src/GSW.jl src/BondPricing.jl src/FinanceRoutines.jl
+git rm src/ImportYields.jl
+git commit -m "Split ImportYields.jl into GSW.jl and BondPricing.jl (no API changes)"
+```
+
+---
+
+## Task 6: Add CI path filters and macOS runner
+
+**Files:**
+- Modify: `.github/workflows/CI.yml`
+
+- [ ] **Step 1: Add path filters to CI.yml**
+
+```yaml
+on:
+ push:
+ branches:
+ - main
+ tags:
+ - "*"
+ paths:
+ - 'src/**'
+ - 'test/**'
+ - 'Project.toml'
+ - '.github/workflows/CI.yml'
+ pull_request:
+ paths:
+ - 'src/**'
+ - 'test/**'
+ - 'Project.toml'
+ - '.github/workflows/CI.yml'
+```
+
+- [ ] **Step 2: Add macOS to the matrix**
+
+```yaml
+matrix:
+ version:
+ - "1.11"
+ - nightly
+ os:
+ - ubuntu-latest
+ - macos-latest
+ arch:
+ - x64
+```
+
+- [ ] **Step 3: Commit**
+
+```bash
+git add .github/workflows/CI.yml
+git commit -m "Add CI path filters and macOS runner [skip ci]"
+```
+
+---
+
+## Task 7: Clarify env parsing in test/runtests.jl
+
+**Files:**
+- Modify: `test/runtests.jl:33`
+
+Line 33 uses `!startswith(line, "#") || continue` which correctly skips comment lines (the `||` evaluates `continue` when the left side is `false`, i.e. when the line IS a comment). This is logically correct but reads awkwardly. Rewrite to the more idiomatic `&&` form for clarity.
+
+- [ ] **Step 1: Rewrite for readability**
+
+```julia
+# Before (correct but hard to read):
+!startswith(line, "#") || continue
+# After (same logic, clearer):
+startswith(line, "#") && continue
+```
+
+- [ ] **Step 2: Commit**
+
+```bash
+git add test/runtests.jl
+git commit -m "Clarify env parsing idiom in test runner [skip ci]"
+```
+
+---
+
+## Task 8: Create NEWS.md
+
+**Files:**
+- Create: `NEWS.md`
+
+- [ ] **Step 1: Create NEWS.md with v0.5.0 changelog**
+
+```markdown
+# FinanceRoutines.jl Changelog
+
+## v0.5.0
+
+### Breaking changes
+- `ImportYields.jl` split into `GSW.jl` (yield curve model) and `BondPricing.jl` (bond math). No public API changes, but code that `include`d `ImportYields.jl` directly will need updating.
+- Missing-value flags expanded: `-999.0`, `-9999.0`, `-99.99` now treated as missing in GSW data (previously only `-999.99`). **Migration note:** if your downstream code used these numeric values (e.g., `-999.0` as an actual number), they will now silently become `missing`. Check any filtering or aggregation that might be affected.
+
+### New features
+- `import_FF5`: Import Fama-French 5-factor model data (market, size, value, profitability, investment)
+- `import_FF_momentum`: Import Fama-French momentum factor
+- `calculate_portfolio_returns`: Value-weighted and equal-weighted portfolio return calculations
+- `diagnose`: Data quality diagnostics for financial DataFrames
+- WRDS connections now retry up to 3 times with exponential backoff
+
+### Internal improvements
+- Removed broken `@log_msg` macro, replaced with `@debug`
+- Removed stale `export greet_FinanceRoutines` (function was never defined)
+- Removed `Logging` from dependencies (macros available from Base)
+- Ken French file parsing generalized with shared helpers for FF3/FF5 reuse
+- CI now filters by path (skips runs for docs-only changes)
+- CI matrix includes macOS
+```
+
+- [ ] **Step 2: Commit**
+
+```bash
+git add NEWS.md
+git commit -m "Add NEWS.md for v0.5.0 [skip ci]"
+```
+
+---
+
+## Task 9: Add Fama-French 5-factor and Momentum imports
+
+**Files:**
+- Create: `src/ImportFamaFrench5.jl`
+- Modify: `src/FinanceRoutines.jl` (add include + exports)
+- Create: `test/UnitTests/FF5.jl`
+- Modify: `test/runtests.jl` (add "FF5" to testsuite)
+
+The FF5 and momentum files follow the same zip+CSV format as FF3 on Ken French's site.
+
+- [ ] **Step 1: Write failing tests**
+
+```julia
+# test/UnitTests/FF5.jl
+@testset "Importing Fama-French 5 factors and Momentum" begin
+ import Dates
+
+ # FF5 monthly
+ df_FF5_monthly = import_FF5(frequency=:monthly)
+ @test names(df_FF5_monthly) == ["datem", "mktrf", "smb", "hml", "rmw", "cma", "rf"]
+ @test nrow(df_FF5_monthly) >= (Dates.year(Dates.today()) - 1963 - 1) * 12
+
+ # FF5 annual
+ df_FF5_annual = import_FF5(frequency=:annual)
+ @test names(df_FF5_annual) == ["datey", "mktrf", "smb", "hml", "rmw", "cma", "rf"]
+ @test nrow(df_FF5_annual) >= Dates.year(Dates.today()) - 1963 - 2
+
+ # FF5 daily
+ df_FF5_daily = import_FF5(frequency=:daily)
+ @test names(df_FF5_daily) == ["date", "mktrf", "smb", "hml", "rmw", "cma", "rf"]
+ @test nrow(df_FF5_daily) >= 15_000
+
+ # Momentum monthly
+ df_mom_monthly = import_FF_momentum(frequency=:monthly)
+ @test "mom" in names(df_mom_monthly)
+ @test nrow(df_mom_monthly) > 1000
+end
+```
+
+- [ ] **Step 2: Run tests to verify they fail**
+
+Expected: `import_FF5` and `import_FF_momentum` not defined
+
+- [ ] **Step 3: Generalize `_parse_ff_annual` and `_parse_ff_monthly` to accept `col_names`**
+
+Before writing FF5, first update the existing parsers in `src/ImportFamaFrench.jl` to accept a `col_names` keyword argument. Default to the FF3 column names so `import_FF3` continues to work unchanged.
+
+```julia
+# In _parse_ff_annual:
+function _parse_ff_annual(zip_file; types=nothing, col_names=[:datey, :mktrf, :smb, :hml, :rf])
+ # ... existing logic ...
+ return CSV.File(...) |> DataFrame |> df -> rename!(df, col_names)
+end
+
+# In _parse_ff_monthly:
+function _parse_ff_monthly(zip_file; types=nothing, col_names=[:datem, :mktrf, :smb, :hml, :rf])
+ # ... existing logic ...
+ return CSV.File(...) |> DataFrame |> df -> rename!(df, col_names)
+end
+```
+
+Also extract a shared `_download_and_parse_ff_zip` helper to DRY up the download+zip+parse logic shared by FF3 and FF5.
+
+- [ ] **Step 4: Run existing KenFrench tests to verify no regression**
+
+Run: `julia --project=. -e 'using FinanceRoutines, Test; include("test/UnitTests/KenFrench.jl")'`
+Expected: PASS (existing FF3 behavior unchanged)
+
+- [ ] **Step 5: Implement `import_FF5` in `src/ImportFamaFrench5.jl`**
+
+The FF5 file URL: `https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Research_Data_5_Factors_2x3_CSV.zip`
+Daily URL: `https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Research_Data_5_Factors_2x3_daily_CSV.zip`
+
+Uses the generalized `_download_and_parse_ff_zip` helper with 7-column names:
+
+```julia
+function import_FF5(; frequency::Symbol=:monthly)
+ ff_col_classes = [String7, Float64, Float64, Float64, Float64, Float64, Float64]
+ url_mth_yr = "https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Research_Data_5_Factors_2x3_CSV.zip"
+ url_daily = "https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Research_Data_5_Factors_2x3_daily_CSV.zip"
+ col_names_mth = [:datem, :mktrf, :smb, :hml, :rmw, :cma, :rf]
+ col_names_yr = [:datey, :mktrf, :smb, :hml, :rmw, :cma, :rf]
+ col_names_day = [:date, :mktrf, :smb, :hml, :rmw, :cma, :rf]
+ # ... uses shared helper
+end
+```
+
+- [ ] **Step 4: Implement `import_FF_momentum`**
+
+Momentum URL: `https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Momentum_Factor_CSV.zip`
+Daily URL: `https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/F-F_Momentum_Factor_daily_CSV.zip`
+
+Single factor file, columns: date, mom
+
+- [ ] **Step 5: Add exports to FinanceRoutines.jl**
+
+```julia
+include("ImportFamaFrench5.jl")
+export import_FF5, import_FF_momentum
+```
+
+- [ ] **Step 6: Run tests**
+
+Run: `julia --project=. -e 'using FinanceRoutines, Test; include("test/UnitTests/FF5.jl")'`
+Expected: PASS
+
+- [ ] **Step 7: Commit**
+
+```bash
+git add src/ImportFamaFrench5.jl src/FinanceRoutines.jl test/UnitTests/FF5.jl test/runtests.jl
+git commit -m "Add import_FF5 and import_FF_momentum for 5-factor model and momentum"
+```
+
+---
+
+## Task 10: Add portfolio return calculations
+
+**Files:**
+- Create: `src/PortfolioUtils.jl`
+- Modify: `src/FinanceRoutines.jl` (add include + exports)
+- Create: `test/UnitTests/PortfolioUtils.jl`
+- Modify: `test/runtests.jl`
+
+- [ ] **Step 1: Write failing tests**
+
+```julia
+# test/UnitTests/PortfolioUtils.jl
+@testset "Portfolio Return Calculations" begin
+ import Dates: Date, Month
+ import DataFrames: DataFrame, groupby, combine, nrow, transform!
+
+ # Create test data: 3 stocks, 12 months
+ dates = repeat(Date(2020,1,1):Month(1):Date(2020,12,1), inner=3)
+ df = DataFrame(
+ datem = dates,
+ permno = repeat([1, 2, 3], 12),
+ ret = rand(36) .* 0.1 .- 0.05,
+ mktcap = [100.0, 200.0, 300.0, # weights sum to 600
+ repeat([100.0, 200.0, 300.0], 11)...]
+ )
+
+ # Equal-weighted returns
+ df_ew = calculate_portfolio_returns(df, :ret, :datem; weighting=:equal)
+ @test nrow(df_ew) == 12
+ @test "port_ret" in names(df_ew)
+
+ # Value-weighted returns
+ df_vw = calculate_portfolio_returns(df, :ret, :datem;
+ weighting=:value, weight_col=:mktcap)
+ @test nrow(df_vw) == 12
+ @test "port_ret" in names(df_vw)
+
+ # Grouped portfolios (e.g., by size quintile)
+ df.group = repeat([1, 1, 2], 12)
+ df_grouped = calculate_portfolio_returns(df, :ret, :datem;
+ weighting=:value, weight_col=:mktcap,
+ groupby=:group)
+ @test nrow(df_grouped) == 24 # 12 months x 2 groups
+end
+```
+
+- [ ] **Step 2: Implement `calculate_portfolio_returns`**
+
+```julia
+"""
+ calculate_portfolio_returns(df, ret_col, date_col;
+ weighting=:value, weight_col=nothing, groupby=nothing)
+
+Calculate portfolio returns from individual stock returns.
+
+# Arguments
+- `df::DataFrame`: Panel data with stock returns
+- `ret_col::Symbol`: Column name for returns
+- `date_col::Symbol`: Column name for dates
+- `weighting::Symbol`: `:equal` or `:value`
+- `weight_col::Union{Nothing,Symbol}`: Column for weights (required if weighting=:value)
+- `groupby::Union{Nothing,Symbol,Vector{Symbol}}`: Optional grouping columns
+
+# Returns
+- `DataFrame`: Portfolio returns by date (and group if specified)
+"""
+function calculate_portfolio_returns(df::AbstractDataFrame, ret_col::Symbol, date_col::Symbol;
+ weighting::Symbol=:value, weight_col::Union{Nothing,Symbol}=nothing,
+ groupby::Union{Nothing,Symbol,Vector{Symbol}}=nothing)
+
+ if weighting == :value && isnothing(weight_col)
+ throw(ArgumentError("weight_col required for value-weighted portfolios"))
+ end
+
+ group_cols = isnothing(groupby) ? [date_col] : vcat([date_col], groupby isa Symbol ? [groupby] : groupby)
+
+ grouped = DataFrames.groupby(df, group_cols)
+
+ if weighting == :equal
+ return combine(grouped, ret_col => (r -> mean(skipmissing(r))) => :port_ret)
+ else
+ return combine(grouped,
+ [ret_col, weight_col] => ((r, w) -> begin
+ valid = .!ismissing.(r) .& .!ismissing.(w)
+ any(valid) || return missing
+ rv, wv = r[valid], w[valid]
+ sum(rv .* wv) / sum(wv)
+ end) => :port_ret)
+ end
+end
+```
+
+- [ ] **Step 3: Add dependencies and imports**
+
+First, move `Statistics` from `[extras]` to `[deps]` in `Project.toml` (it's currently test-only but `calculate_portfolio_returns` uses `mean` at runtime). Add the UUID line to `[deps]`:
+```toml
+Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
+```
+Keep it in `[extras]` too (valid and harmless in Julia 1.10+).
+
+Then update `src/FinanceRoutines.jl`:
+- Add `import Statistics: mean` to the imports
+- Add `combine` to the `import DataFrames:` line (used by `calculate_portfolio_returns`)
+- Add include and export:
+```julia
+include("PortfolioUtils.jl")
+export calculate_portfolio_returns
+```
+
+- [ ] **Step 4: Run tests**
+
+Expected: PASS
+
+- [ ] **Step 5: Commit**
+
+```bash
+git add src/PortfolioUtils.jl src/FinanceRoutines.jl test/UnitTests/PortfolioUtils.jl test/runtests.jl
+git commit -m "Add calculate_portfolio_returns for equal/value-weighted portfolios"
+```
+
+---
+
+## Task 11: Add data quality diagnostics
+
+**Files:**
+- Create: `src/Diagnostics.jl`
+- Modify: `src/FinanceRoutines.jl` (add include + exports)
+- Create: `test/UnitTests/Diagnostics.jl`
+- Modify: `test/runtests.jl`
+
+- [ ] **Step 1: Write failing tests**
+
+```julia
+# test/UnitTests/Diagnostics.jl
+@testset "Data Quality Diagnostics" begin
+ import DataFrames: DataFrame, allowmissing!
+
+ # Create test data with known issues
+ df = DataFrame(
+ permno = [1, 1, 1, 2, 2, 2],
+ date = [Date(2020,1,1), Date(2020,2,1), Date(2020,2,1), # duplicate for permno 1
+ Date(2020,1,1), Date(2020,3,1), Date(2020,4,1)], # gap for permno 2
+ ret = [0.05, missing, 0.03, -1.5, 0.02, 150.0], # suspicious: -1.5, 150.0
+ prc = [10.0, 20.0, 20.0, -5.0, 30.0, 40.0] # negative price
+ )
+ allowmissing!(df, :ret)
+
+ report = diagnose(df)
+
+ @test haskey(report, :missing_rates)
+ @test haskey(report, :suspicious_values)
+ @test haskey(report, :duplicate_keys)
+ @test report[:missing_rates][:ret] > 0
+ @test length(report[:suspicious_values]) > 0
+end
+```
+
+- [ ] **Step 2: Implement `diagnose`**
+
+```julia
+"""
+ diagnose(df; id_col=:permno, date_col=:date, ret_col=:ret, price_col=:prc)
+
+Run data quality diagnostics on a financial DataFrame.
+
+Returns a Dict with:
+- `:missing_rates` — fraction missing per column
+- `:suspicious_values` — rows with returns > 100% or < -100%, negative prices
+- `:duplicate_keys` — duplicate (id, date) pairs
+- `:nrow`, `:ncol` — dimensions
+"""
+function diagnose(df::AbstractDataFrame;
+ id_col::Symbol=:permno, date_col::Symbol=:date,
+ ret_col::Union{Nothing,Symbol}=:ret,
+ price_col::Union{Nothing,Symbol}=:prc)
+
+ report = Dict{Symbol, Any}()
+ report[:nrow] = nrow(df)
+ report[:ncol] = ncol(df)
+
+ # Missing rates
+ missing_rates = Dict{Symbol, Float64}()
+ for col in names(df)
+ col_sym = Symbol(col)
+ missing_rates[col_sym] = count(ismissing, df[!, col]) / nrow(df)
+ end
+ report[:missing_rates] = missing_rates
+
+ # Duplicate keys
+ if id_col in propertynames(df) && date_col in propertynames(df)
+ dup_count = nrow(df) - nrow(unique(df, [id_col, date_col]))
+ report[:duplicate_keys] = dup_count
+ end
+
+ # Suspicious values
+ suspicious = String[]
+ if !isnothing(ret_col) && ret_col in propertynames(df)
+ n_extreme = count(r -> !ismissing(r) && (r > 1.0 || r < -1.0), df[!, ret_col])
+ n_extreme > 0 && push!(suspicious, "$n_extreme returns outside [-100%, +100%]")
+ end
+ if !isnothing(price_col) && price_col in propertynames(df)
+ n_neg = count(r -> !ismissing(r) && r < 0, df[!, price_col])
+ n_neg > 0 && push!(suspicious, "$n_neg negative prices (CRSP convention for bid/ask midpoint)")
+ end
+ report[:suspicious_values] = suspicious
+
+ return report
+end
+```
+
+- [ ] **Step 3: Add to FinanceRoutines.jl**
+
+Update the `import DataFrames:` line to also include `ncol` and `unique` (DataFrames-specific `unique` for duplicate key detection by column). Then add:
+
+```julia
+include("Diagnostics.jl")
+export diagnose
+```
+
+- [ ] **Step 4: Run tests**
+
+Expected: PASS
+
+- [ ] **Step 5: Commit**
+
+```bash
+git add src/Diagnostics.jl src/FinanceRoutines.jl test/UnitTests/Diagnostics.jl test/runtests.jl
+git commit -m "Add diagnose() for data quality diagnostics on financial DataFrames"
+```
+
+---
+
+## Task 12: Version bump and final integration
+
+**Files:**
+- Modify: `Project.toml` — version to "0.5.0", add Statistics to [deps] if needed for PortfolioUtils
+- Modify: `NEWS.md` — finalize
+- Modify: `test/runtests.jl` — ensure all new test suites are listed
+
+- [ ] **Step 1: Update Project.toml version**
+
+Change `version = "0.4.5"` to `version = "0.5.0"`
+
+- [ ] **Step 2: Verify all dependencies are correct in Project.toml**
+
+Statistics should already be in `[deps]` (added in Task 10). Logging should already be removed (Task 1). Verify no stale entries.
+
+- [ ] **Step 3: Update test/runtests.jl testsuite list**
+
+```julia
+const testsuite = [
+ "KenFrench",
+ "FF5",
+ "WRDS",
+ "betas",
+ "Yields",
+ "PortfolioUtils",
+ "Diagnostics",
+]
+```
+
+- [ ] **Step 4: Run full test suite**
+
+Run: `julia --project=. -e 'using Pkg; Pkg.test()'`
+Expected: ALL PASS
+
+- [ ] **Step 5: Commit**
+
+```bash
+git add Project.toml test/runtests.jl NEWS.md
+git commit -m "Bump version to v0.5.0, finalize test suite and changelog"
+```
+
+---
+
+## Task 13: Tag release and update registry
+
+Follow the release workflow in CLAUDE.md:
+
+- [ ] **Step 1: Tag**
+
+```bash
+git tag v0.5.0
+git push origin v0.5.0
+```
+
+- [ ] **Step 2: Get tree SHA**
+
+```bash
+git rev-parse v0.5.0^{tree}
+```
+
+- [ ] **Step 3: Update LouLouLibs/loulouJL registry**
+
+Update `F/FinanceRoutines/Versions.toml`, `Deps.toml`, `Compat.toml` via `gh api`.
+
+---
+
+## Extensions deferred for user decision
+
+These were listed as extensions A–E. Tasks 9–11 cover B (FF5), E (diagnostics), and A (portfolio returns). The remaining two are:
+
+- **C: Event study utilities** — `event_study(events_df, returns_df; ...)` computing CARs/BHARs. Can be added as Task 15 if desired.
+- **D: Treasury yield interpolation** — `treasury_zero_rate(date, maturity)` incorporating T-bill rates. Requires a new data source. Can be added as Task 16 if desired.
+
+Both are independent of the above tasks and can be planned separately.