commit a67c81dbd8a2bdf6b4caf94cdbc8689d771629ac
parent 4dfe69b8c4d7ef38c9e42cebade9fae9e06a2bb0
Author: Erik Loualiche <[email protected]>
Date: Fri, 20 Feb 2026 14:24:11 -0600
Merge pull request #3 from eloualiche/docs/update-readme
Update README for pure Go rewrite
Diffstat:
| M | README.md | | | 152 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++---------------------- |
1 file changed, 110 insertions(+), 42 deletions(-)
diff --git a/README.md b/README.md
@@ -1,14 +1,16 @@
# wrds-dl
-A terminal tool for browsing and downloading data from the [WRDS](https://wrds-www.wharton.upenn.edu/) PostgreSQL database. Comes with an interactive TUI for exploration and a CLI for scripted downloads. Output is Parquet (via DuckDB) or CSV.
+A terminal tool for browsing and downloading data from the [WRDS](https://wrds-www.wharton.upenn.edu/) PostgreSQL database. Comes with an interactive TUI for exploration and a CLI for scripted downloads. Output is Parquet or CSV — pure Go, no CGo, cross-platform.
## Features
-- **TUI** — browse schemas and tables, preview rows, trigger downloads without leaving the terminal
+- **TUI** — browse schemas and tables, inspect column metadata, trigger downloads without leaving the terminal
- **CLI** — scriptable `download` command with structured flags or raw SQL
-- **Parquet output** — uses DuckDB's `postgres_scanner` for fast, efficient export with ZSTD compression
-- **CSV output** — plain CSV alternative via the same DuckDB pipeline
-- **Standard auth** — reads from `PG*` environment variables or `~/.pgpass`, no configuration file needed
+- **Parquet output** — streams rows via pgx and writes Parquet with ZSTD compression using parquet-go (pure Go)
+- **CSV output** — streams rows to CSV via encoding/csv
+- **Login flow** — interactive login screen with Duo 2FA support; saved credentials for one-press reconnect
+- **Database switching** — browse and switch between WRDS databases from within the TUI
+- **Standard auth** — reads from `PG*` environment variables, `~/.config/wrds-dl/credentials`, or `~/.pgpass`
## Installation
@@ -19,7 +21,9 @@ Download the latest release from the [Releases page](https://github.com/eloualic
| Platform | Binary |
|---|---|
| macOS (Apple Silicon) | `wrds-dl-darwin-arm64` |
+| macOS (Intel) | `wrds-dl-darwin-amd64` |
| Linux x86-64 | `wrds-dl-linux-amd64` |
+| Windows x86-64 | `wrds-dl-windows-amd64.exe` |
```sh
# macOS example
@@ -30,31 +34,52 @@ chmod +x /usr/local/bin/wrds-dl
### Build from source
-Requires Go 1.21+, CGo, and a C++ compiler (`gcc-c++` on Linux, Xcode CLT on macOS).
+Requires Go 1.25+. No CGo or C compiler needed.
```sh
git clone https://github.com/eloualiche/wrds-download
cd wrds-download
-go build -o wrds-dl .
+CGO_ENABLED=0 go build -ldflags="-s -w" -o wrds-dl .
mv wrds-dl /usr/local/bin/
```
## Authentication
-`wrds-dl` uses the standard PostgreSQL environment variables. Set them before running:
+WRDS uses Duo two-factor authentication. The TUI always starts on a login screen so you control when the connection (and Duo push) fires.
+
+### Option 1: Environment variables
+
+Set the standard PostgreSQL environment variables before running:
```sh
-export PGHOST=wrds-pgdata.wharton.upenn.edu
-export PGPORT=9737
export PGUSER=your_username
export PGPASSWORD=your_password
-export PGDATABASE=your_username # on WRDS, database name = username
```
-Alternatively, store credentials in `~/.pgpass` (no `PGPASSWORD` needed):
+Optional (defaults shown):
+
+```sh
+export PGHOST=wrds-pgdata.wharton.upenn.edu
+export PGPORT=9737
+export PGDATABASE=wrds
+```
+
+### Option 2: Saved credentials
+
+On first login via the TUI, check "Save to ~/.config/wrds-dl/credentials". On subsequent launches, press `enter` on the "Login as ..." button. The credentials file is stored at `~/.config/wrds-dl/credentials` (or `$XDG_CONFIG_HOME/wrds-dl/credentials`) with `0600` permissions:
```
-wrds-pgdata.wharton.upenn.edu:9737:your_username:your_username:your_password
+PGUSER=your_username
+PGPASSWORD=your_password
+PGDATABASE=wrds
+```
+
+### Option 3: ~/.pgpass
+
+Standard PostgreSQL password file:
+
+```
+wrds-pgdata.wharton.upenn.edu:9737:*:your_username:your_password
```
## TUI
@@ -65,32 +90,33 @@ Launch the interactive browser:
wrds-dl tui
```
-```
-┌─ WRDS ──────────────────────────────────────────────────────────┐
-│ Schemas │ Tables (crsp) │ Preview: crsp.dsf │
-│ ───────────── │ ───────────────── │ ────────────────────── │
-│ > crsp │ > dsf │ permno date prc │
-│ comp │ mse │ 10001 2020-01-02 45.23 │
-│ ibes │ ccm_final │ 10001 2020-01-03 47.11 │
-│ optionm │ ... │ ... │
-│ ... │ │ ~2.1M rows │
-│ │ │ │
-│ [tab] switch pane [d] download [/] filter [q] quit │
-└─────────────────────────────────────────────────────────────────┘
-```
+The TUI has three panes: **Schemas**, **Tables**, and **Preview** (column catalog with types, descriptions, and table stats).
### Keybindings
| Key | Action |
|---|---|
| `tab` / `shift+tab` | Cycle focus between panes |
-| `enter` | Drill into schema or table |
+| `right` / `l` | Drill into selected schema or table |
+| `left` / `h` | Go back one pane |
| `d` | Open download dialog for the selected table |
-| `/` | Filter list |
-| `esc` | Cancel / dismiss |
+| `b` | Switch database |
+| `/` | Filter current list (schemas, tables, or columns) |
+| `esc` | Cancel / dismiss overlay |
| `q` / `ctrl+c` | Quit |
-In the download dialog, `tab` moves between fields and `enter` on the last field confirms.
+### Download dialog
+
+Press `d` on a selected table to open the download form:
+
+| Field | Description |
+|---|---|
+| SELECT columns | Comma-separated column names, or `*` for all |
+| WHERE clause | SQL filter without the `WHERE` keyword |
+| Output path | File path; defaults to `./schema_table.parquet` |
+| Format | `parquet` or `csv` |
+
+Navigate with `tab`/`shift+tab`, confirm with `enter` on the last field.
## CLI
@@ -104,6 +130,16 @@ wrds-dl download \
--out crsp_dsf_2020.parquet
```
+### Select specific columns
+
+```sh
+wrds-dl download \
+ --schema comp \
+ --table funda \
+ --columns "gvkey,datadate,sale,at" \
+ --out funda_subset.parquet
+```
+
### Raw SQL
```sh
@@ -121,32 +157,64 @@ wrds-dl download \
--out funda.csv
```
-Format is inferred from the output file extension (`.parquet` → Parquet, `.csv` → CSV). Override with `--format`.
+Format is inferred from the output file extension (`.parquet` or `.csv`). Override with `--format`.
-### Flags
+### All flags
| Flag | Description |
|---|---|
| `--schema` | Schema name (e.g. `crsp`, `comp`) |
| `--table` | Table name (e.g. `dsf`, `funda`) |
-| `--where` | SQL `WHERE` clause, without the keyword (e.g. `date > '2020-01-01'`) |
-| `--query` | Full SQL query — overrides `--schema`, `--table`, `--where` |
+| `-c`, `--columns` | Columns to select (comma-separated, default `*`) |
+| `--where` | SQL `WHERE` clause, without the keyword |
+| `--query` | Full SQL query — overrides `--schema`, `--table`, `--where`, `--columns` |
| `--out` | Output file path (required) |
| `--format` | `parquet` or `csv` (inferred from extension if omitted) |
| `--limit` | Row limit, useful for testing (default: no limit) |
## How it works
-- **Metadata** (schema/table/column listing, row preview) uses a `pgx` connection pool talking directly to the WRDS PostgreSQL server.
-- **Downloads** use [DuckDB](https://duckdb.org/) with the `postgres_scanner` extension. DuckDB attaches to WRDS as a read-only source and streams data directly into Parquet or CSV without loading it all into memory first.
+`wrds-dl` connects directly to the WRDS PostgreSQL server using [pgx](https://github.com/jackc/pgx). All operations — metadata browsing, column inspection, and data download — go through a single pooled connection (limited to 1 to avoid triggering multiple Duo 2FA prompts).
+
+**Downloads** stream rows from Postgres and write them incrementally:
+- **Parquet**: rows are batched (10,000 per row group) and written with ZSTD compression via [parquet-go](https://github.com/parquet-go/parquet-go). String columns use PLAIN encoding for broad compatibility (R, Python, Julia).
+- **CSV**: rows are streamed directly to disk via Go's `encoding/csv`.
+
+PostgreSQL types are mapped to Parquet types: `bool` → BOOLEAN, `int2/int4` → INT32, `int8` → INT64, `float4` → FLOAT, `float8` → DOUBLE, `date` → DATE, `timestamp/timestamptz` → TIMESTAMP (microseconds), `numeric` → STRING, `text/varchar/char` → STRING.
+
+## Project structure
+
+```
+wrds-download/
+├── main.go # entrypoint
+├── cmd/
+│ ├── root.go # cobra root command
+│ ├── tui.go # `wrds-dl tui` — launches interactive browser
+│ └── download.go # `wrds-dl download` — CLI download command
+├── internal/
+│ ├── db/
+│ │ ├── client.go # pgx pool, DSN construction, connection management
+│ │ └── meta.go # schema/table/column queries against pg_catalog
+│ ├── export/
+│ │ └── export.go # Export() — pgx streaming → parquet-go / csv writer
+│ ├── tui/
+│ │ ├── app.go # root Bubble Tea model, Update/View, pane navigation
+│ │ ├── loginform.go # login dialog with saved-credentials support
+│ │ ├── dlform.go # download dialog (columns, where, output, format)
+│ │ └── styles.go # lipgloss styles and colors
+│ └── config/
+│ └── config.go # credentials file read/write (~/.config/wrds-dl/)
+└── .github/workflows/
+ └── release.yml # CI: cross-compile 4 targets, attach to GitHub Release
+```
## Dependencies
| Package | Purpose |
|---|---|
-| `charmbracelet/bubbletea` | TUI framework |
-| `charmbracelet/bubbles` | List, table, text-input, spinner components |
-| `charmbracelet/lipgloss` | Layout and styling |
-| `jackc/pgx/v5` | PostgreSQL driver for metadata and preview |
-| `spf13/cobra` | CLI commands and flags |
-| `marcboeker/go-duckdb` | Parquet/CSV export via `postgres_scanner` |
+| [`jackc/pgx/v5`](https://github.com/jackc/pgx) | PostgreSQL driver — all queries and data streaming |
+| [`parquet-go/parquet-go`](https://github.com/parquet-go/parquet-go) | Parquet file writing with ZSTD compression (pure Go) |
+| [`charmbracelet/bubbletea`](https://github.com/charmbracelet/bubbletea) | TUI framework |
+| [`charmbracelet/bubbles`](https://github.com/charmbracelet/bubbles) | List, text-input, spinner components |
+| [`charmbracelet/lipgloss`](https://github.com/charmbracelet/lipgloss) | Terminal layout and styling |
+| [`spf13/cobra`](https://github.com/spf13/cobra) | CLI commands and flags |