SKILL.md (5210B)
1 --- 2 name: wrds-download 3 description: Download data from the WRDS (Wharton Research Data Services) PostgreSQL database to local Parquet or CSV files. Use when the user asks to get data from WRDS, download financial data, or mentions WRDS schemas like crsp, comp, optionm, ibes, etc. 4 allowed-tools: Bash(wrds-dl *), Bash(uv run *wrds*), Read, Grep 5 argument-hint: [description of data needed] 6 --- 7 8 # WRDS Data Download (Python) 9 10 You help users download data from the Wharton Research Data Services (WRDS) PostgreSQL database using the `wrds-dl` Python CLI tool (via `uv`). 11 12 ## Prerequisites 13 14 The user must have: 15 - **`uv`** installed (https://docs.astral.sh/uv/) 16 - **WRDS credentials** configured via one of: 17 - Environment variables: `PGUSER` and `PGPASSWORD` 18 - Saved credentials at `~/.config/wrds-dl/credentials` 19 - Standard `~/.pgpass` file 20 21 Run the CLI with `uv run wrds-dl` from the `python/` directory of the wrds-download repo, or `wrds-dl` if installed via `uv tool install`. 22 23 If `wrds-dl` is not found and `uv` is available, install it: 24 ```bash 25 uv tool install wrds-dl --from /path/to/wrds-download/python 26 ``` 27 28 ## Workflow 29 30 Follow these steps for every download request: 31 32 ### Step 1: Identify the table 33 34 Parse the user's request to determine the WRDS schema and table. Common mappings: 35 36 | Dataset | Schema | Key Tables | 37 |---------|--------|------------| 38 | CRSP daily stock | `crsp` | `dsf` (daily), `msf` (monthly), `dsi` (index) | 39 | CRSP events | `crsp` | `dsedelist`, `stocknames` | 40 | Compustat annual | `comp` | `funda` | 41 | Compustat quarterly | `comp` | `fundq` | 42 | Compustat global | `comp_global_daily` | `g_funda`, `g_fundq` | 43 | IBES | `ibes` | `statsum_epsus`, `actu_epsus` | 44 | OptionMetrics | `optionm` | `opprcd` (prices), `secprd` (security) | 45 | TAQ | `taqmsec` | `ctm_YYYYMMDD` | 46 | CRSP/Compustat merged | `crsp` | `ccmxpf_linktable` | 47 | BoardEx | `boardex` | `na_wrds_company_profile` | 48 | Institutional (13F) | `tfn` | `s34` | 49 | Audit Analytics | `audit` | `auditnonreli` | 50 | Ravenpack | `ravenpack` | `rpa_djnw` | 51 | Bank Regulatory | `bank` | `call_schedule_rc`, `bhck` | 52 53 If unsure which table, ask the user or use `wrds-dl info` to explore. 54 55 ### Step 2: Inspect the table 56 57 Always run `wrds-dl info` first to understand the table structure: 58 59 ```bash 60 wrds-dl info --schema <schema> --table <table> 61 ``` 62 63 Use the output to: 64 - Confirm the table exists and has the expected columns 65 - Note column names for the user's requested variables 66 - Check the estimated row count to warn about large downloads 67 68 For JSON output (useful for parsing): `wrds-dl info --schema <schema> --table <table> --json` 69 70 ### Step 3: Dry run 71 72 For tables with more than 1 million estimated rows, or when a WHERE clause is involved, always do a dry run first: 73 74 ```bash 75 wrds-dl download --schema <schema> --table <table> \ 76 --columns "<cols>" --where "<filter>" --dry-run 77 ``` 78 79 Show the user the row count and sample rows. Ask for confirmation before proceeding if the row count is very large (>10M rows). 80 81 ### Step 4: Download 82 83 Build and run the download command: 84 85 ```bash 86 wrds-dl download \ 87 --schema <schema> \ 88 --table <table> \ 89 --columns "<comma-separated columns>" \ 90 --where "<SQL filter>" \ 91 --out <output_file> \ 92 --format <parquet|csv> 93 ``` 94 95 #### Defaults and conventions 96 - **Format**: Use Parquet unless the user asks for CSV. Parquet is smaller and faster. 97 - **Output path**: Name the file descriptively, e.g., `crsp_dsf_2020.parquet` or `comp_funda_2010_2023.parquet`. 98 - **Columns**: Select only the columns the user needs. Don't use `*` on wide tables — ask what variables they need. 99 - **Limit**: Use `--limit` for testing. Suggest `--limit 1000` if the user is exploring. 100 101 #### Common filters 102 - Date ranges: `--where "date >= '2020-01-01' AND date < '2021-01-01'"` 103 - Specific firms by permno: `--where "permno IN (10107, 93436)"` 104 - Specific firms by gvkey: `--where "gvkey IN ('001690', '012141')"` 105 - Fiscal year: `--where "fyear >= 2010 AND fyear <= 2023"` 106 107 ### Step 5: Verify 108 109 After download completes, confirm the file was created and report its size: 110 111 ```bash 112 ls -lh <output_file> 113 ``` 114 115 ## Error handling 116 117 - **Authentication errors**: Remind the user to set `PGUSER`/`PGPASSWORD` or configure `~/.config/wrds-dl/credentials`. 118 - **Table not found**: Use `wrds-dl info` to check schema/table names. WRDS schemas and table names are lowercase. 119 - **Timeout on large tables**: Suggest adding a `--where` filter or `--limit` to reduce the result set. 120 - **Duo 2FA prompt**: The connection triggers a Duo push. Tell the user to approve it on their phone. 121 122 ## Example interactions 123 124 **User**: "Download CRSP daily stock data for 2020" 125 -> `wrds-dl info --schema crsp --table dsf` 126 -> `wrds-dl download --schema crsp --table dsf --where "date >= '2020-01-01' AND date < '2021-01-01'" --out crsp_dsf_2020.parquet` 127 128 **User**: "Get Compustat annual fundamentals, just gvkey, datadate, and sales" 129 -> `wrds-dl info --schema comp --table funda` 130 -> `wrds-dl download --schema comp --table funda --columns "gvkey,datadate,sale" --out comp_funda.parquet` 131 132 **User**: "I need IBES analyst estimates" 133 -> `wrds-dl info --schema ibes --table statsum_epsus` 134 -> Ask what date range and variables they need, then download.