README.md (3942B)
1 # wrds-dl (Python) 2 3 Lightweight CLI for downloading data from the [WRDS](https://wrds-www.wharton.upenn.edu/) PostgreSQL database. No binary to install — runs anywhere Python and [`uv`](https://docs.astral.sh/uv/) are available. 4 5 Same CLI interface as the [Go version](../go/) (`download` and `info` commands with identical flags), but without the TUI. 6 7 ## Installation 8 9 ### With uv (recommended) 10 11 ```sh 12 # Install as a tool (available system-wide) 13 uv tool install wrds-dl --from ./python 14 15 # Or run directly without installing 16 cd python 17 uv run wrds-dl --help 18 ``` 19 20 ### With pip 21 22 ```sh 23 pip install ./python 24 wrds-dl --help 25 ``` 26 27 ## CLI 28 29 ### CRSP monthly stock file 30 31 ```sh 32 # Inspect the table first 33 wrds-dl info --schema crsp --table msf 34 35 # Download prices and returns for 2020 36 wrds-dl download \ 37 --schema crsp \ 38 --table msf \ 39 --columns "permno,date,prc,ret,shrout" \ 40 --where "date >= '2020-01-01' AND date < '2021-01-01'" \ 41 --out crsp_msf_2020.parquet 42 43 # Dry run — check row count before committing 44 wrds-dl download \ 45 --schema crsp \ 46 --table msf \ 47 --where "date = '2020-01-31'" \ 48 --dry-run 49 ``` 50 51 ### CRSP daily stock file 52 53 ```sh 54 wrds-dl download \ 55 --schema crsp \ 56 --table dsf \ 57 --where "date >= '2020-01-01' AND date < '2021-01-01'" \ 58 --out crsp_dsf_2020.parquet 59 ``` 60 61 ### Select specific columns 62 63 ```sh 64 wrds-dl download \ 65 --schema comp \ 66 --table funda \ 67 --columns "gvkey,datadate,sale,at" \ 68 --out funda_subset.parquet 69 ``` 70 71 ### Raw SQL 72 73 ```sh 74 wrds-dl download \ 75 --query "SELECT permno, date, prc FROM crsp.msf WHERE date >= '2015-01-01'" \ 76 --out crsp_msf_2015_onwards.parquet 77 ``` 78 79 ### CSV output 80 81 ```sh 82 wrds-dl download \ 83 --schema crsp \ 84 --table msf \ 85 --columns "permno,date,ret" --limit 1000 \ 86 --out crsp_msf_sample.csv 87 ``` 88 89 Format is inferred from the output file extension (`.parquet` or `.csv`). Override with `--format`. 90 91 ### All download flags 92 93 | Flag | Description | 94 |---|---| 95 | `--schema` | Schema name (e.g. `crsp`, `comp`) | 96 | `--table` | Table name (e.g. `dsf`, `funda`) | 97 | `-c`, `--columns` | Columns to select (comma-separated, default `*`) | 98 | `--where` | SQL `WHERE` clause, without the keyword | 99 | `--query` | Full SQL query — overrides `--schema`, `--table`, `--where`, `--columns` | 100 | `--out` | Output file path (required unless `--dry-run`) | 101 | `--format` | `parquet` or `csv` (inferred from extension if omitted) | 102 | `--limit` | Row limit, useful for testing (default: no limit) | 103 | `--dry-run` | Preview query, row count, and first 5 rows without downloading | 104 105 ### Table info 106 107 Inspect table metadata without downloading data: 108 109 ```sh 110 wrds-dl info --schema crsp --table dsf 111 ``` 112 113 Output: 114 115 ``` 116 crsp.dsf 117 Daily Stock File 118 ~245302893 rows, 47 GB 119 120 NAME TYPE NULLABLE DESCRIPTION 121 cusip character varying(8) YES CUSIP - HISTORICAL 122 permno double precision YES PERMNO 123 ... 124 ``` 125 126 For machine-readable output: 127 128 ```sh 129 wrds-dl info --schema crsp --table dsf --json 130 ``` 131 132 ## How it works 133 134 Connects directly to the WRDS PostgreSQL server using [psycopg](https://www.psycopg.org/) (v3) with bundled `libpq` for cross-platform portability (ARM macOS, x86 Linux, etc.). 135 136 **Downloads** use server-side cursors to stream rows without loading the entire result into memory: 137 - **Parquet**: rows are batched (10,000 per row group) and written with ZSTD compression via [PyArrow](https://arrow.apache.org/docs/python/). 138 - **CSV**: rows are streamed to disk via Python's `csv` module. 139 140 Progress is reported to stderr every 10,000 rows. 141 142 ## Dependencies 143 144 | Package | Purpose | 145 |---|---| 146 | [`psycopg[binary]`](https://www.psycopg.org/) | PostgreSQL driver with bundled libpq | 147 | [`pyarrow`](https://arrow.apache.org/docs/python/) | Parquet writing with ZSTD compression | 148 | [`click`](https://click.palletsprojects.com/) | CLI framework | 149 150 ## Development 151 152 ```sh 153 cd python 154 uv sync 155 uv run pytest 156 uv run ruff check src tests 157 ```