SKILL.md (5198B)
1 --- 2 name: wrds-download 3 description: Download data from the WRDS (Wharton Research Data Services) PostgreSQL database to local Parquet or CSV files. Use when the user asks to get data from WRDS, download financial data, or mentions WRDS schemas like crsp, comp, optionm, ibes, etc. 4 allowed-tools: Bash(wrds-dl *), Read, Grep 5 argument-hint: [description of data needed] 6 --- 7 8 # WRDS Data Download 9 10 You help users download data from the Wharton Research Data Services (WRDS) PostgreSQL database using the `wrds-dl` CLI tool. 11 12 ## Prerequisites 13 14 The `wrds-dl` CLI must be installed and on the PATH. Either the Go binary or the Python version works — they have the same commands and flags. The user must have WRDS credentials configured via one of: 15 - Environment variables: `PGUSER` and `PGPASSWORD` 16 - Saved credentials at `~/.config/wrds-dl/credentials` 17 - Standard `~/.pgpass` file 18 19 If `wrds-dl` is not found, tell the user to install it: 20 - **Go binary**: download from https://github.com/LouLouLibs/wrds-download/releases 21 - **Python (via uv)**: `uv tool install wrds-dl --from /path/to/wrds-download/python` 22 23 ## Workflow 24 25 Follow these steps for every download request: 26 27 ### Step 1: Identify the table 28 29 Parse the user's request to determine the WRDS schema and table. Common mappings: 30 31 | Dataset | Schema | Key Tables | 32 |---------|--------|------------| 33 | CRSP daily stock | `crsp` | `dsf` (daily), `msf` (monthly), `dsi` (index) | 34 | CRSP events | `crsp` | `dsedelist`, `stocknames` | 35 | Compustat annual | `comp` | `funda` | 36 | Compustat quarterly | `comp` | `fundq` | 37 | Compustat global | `comp_global_daily` | `g_funda`, `g_fundq` | 38 | IBES | `ibes` | `statsum_epsus`, `actu_epsus` | 39 | OptionMetrics | `optionm` | `opprcd` (prices), `secprd` (security) | 40 | TAQ | `taqmsec` | `ctm_YYYYMMDD` | 41 | CRSP/Compustat merged | `crsp` | `ccmxpf_linktable` | 42 | BoardEx | `boardex` | `na_wrds_company_profile` | 43 | Institutional (13F) | `tfn` | `s34` | 44 | Audit Analytics | `audit` | `auditnonreli` | 45 | Ravenpack | `ravenpack` | `rpa_djnw` | 46 | Bank Regulatory | `bank` | `call_schedule_rc`, `bhck` | 47 48 If unsure which table, ask the user or use `wrds-dl info` to explore. 49 50 ### Step 2: Inspect the table 51 52 Always run `wrds-dl info` first to understand the table structure: 53 54 ```bash 55 wrds-dl info --schema <schema> --table <table> 56 ``` 57 58 Use the output to: 59 - Confirm the table exists and has the expected columns 60 - Note column names for the user's requested variables 61 - Check the estimated row count to warn about large downloads 62 63 For JSON output (useful for parsing): `wrds-dl info --schema <schema> --table <table> --json` 64 65 ### Step 3: Dry run 66 67 For tables with more than 1 million estimated rows, or when a WHERE clause is involved, always do a dry run first: 68 69 ```bash 70 wrds-dl download --schema <schema> --table <table> \ 71 --columns "<cols>" --where "<filter>" --dry-run 72 ``` 73 74 Show the user the row count and sample rows. Ask for confirmation before proceeding if the row count is very large (>10M rows). 75 76 ### Step 4: Download 77 78 Build and run the download command: 79 80 ```bash 81 wrds-dl download \ 82 --schema <schema> \ 83 --table <table> \ 84 --columns "<comma-separated columns>" \ 85 --where "<SQL filter>" \ 86 --out <output_file> \ 87 --format <parquet|csv> 88 ``` 89 90 #### Defaults and conventions 91 - **Format**: Use Parquet unless the user asks for CSV. Parquet is smaller and faster. 92 - **Output path**: Name the file descriptively, e.g., `crsp_dsf_2020.parquet` or `comp_funda_2010_2023.parquet`. 93 - **Columns**: Select only the columns the user needs. Don't use `*` on wide tables — ask what variables they need. 94 - **Limit**: Use `--limit` for testing. Suggest `--limit 1000` if the user is exploring. 95 96 #### Common filters 97 - Date ranges: `--where "date >= '2020-01-01' AND date < '2021-01-01'"` 98 - Specific firms by permno: `--where "permno IN (10107, 93436)"` 99 - Specific firms by gvkey: `--where "gvkey IN ('001690', '012141')"` 100 - Fiscal year: `--where "fyear >= 2010 AND fyear <= 2023"` 101 102 ### Step 5: Verify 103 104 After download completes, confirm the file was created and report its size: 105 106 ```bash 107 ls -lh <output_file> 108 ``` 109 110 ## Error handling 111 112 - **Authentication errors**: Remind the user to set `PGUSER`/`PGPASSWORD` or run `wrds-dl tui` to save credentials. 113 - **Table not found**: Use `wrds-dl info` to check schema/table names. WRDS schemas and table names are lowercase. 114 - **Timeout on large tables**: Suggest adding a `--where` filter or `--limit` to reduce the result set. 115 - **Duo 2FA prompt**: The connection triggers a Duo push. Tell the user to approve it on their phone. 116 117 ## Example interactions 118 119 **User**: "Download CRSP daily stock data for 2020" 120 → `wrds-dl info --schema crsp --table dsf` 121 → `wrds-dl download --schema crsp --table dsf --where "date >= '2020-01-01' AND date < '2021-01-01'" --out crsp_dsf_2020.parquet` 122 123 **User**: "Get Compustat annual fundamentals, just gvkey, datadate, and sales" 124 → `wrds-dl info --schema comp --table funda` 125 → `wrds-dl download --schema comp --table funda --columns "gvkey,datadate,sale" --out comp_funda.parquet` 126 127 **User**: "I need IBES analyst estimates" 128 → `wrds-dl info --schema ibes --table statsum_epsus` 129 → Ask what date range and variables they need, then download.