commit 9d976b3ceb6bf074f3ef92eeec19e2409958b620
parent 91a9d2e16e071bfebe3b6c0b181fc3cd333ae370
Author: Erik Loualiche <[email protected]>
Date: Thu, 5 Jun 2025 11:31:40 -0500
Merge pull request #6 from LouLouLibs/feature/jsonlines
doc jsonl
Diffstat:
2 files changed, 113 insertions(+), 2 deletions(-)
diff --git a/docs/make.jl b/docs/make.jl
@@ -7,12 +7,12 @@
# --------------------------------------------------------------------------------------------------
-# --
+# --
using BazerUtils
using Documenter
using DocumenterVitepress
-# --
+# --
makedocs(
format = Documenter.HTML(
size_threshold = 512_000, # KiB — raise above your largest file
@@ -30,6 +30,7 @@ makedocs(
"Home" => "index.md",
"Manual" => [
"man/logger_guide.md",
+ "man/read_jsonl.md",
],
# "Demos" => [
# ],
diff --git a/docs/src/man/read_jsonl.md b/docs/src/man/read_jsonl.md
@@ -0,0 +1,109 @@
+# Working with JSON Lines Files
+
+JSON Lines (JSONL) is a convenient format for storing structured data that may be processed one record at a time. Each line is a valid JSON value, separated by a newline character. This format is ideal for large datasets and streaming applications.
+
+For more details, see [jsonlines.org](https://jsonlines.org/).
+
+---
+
+## What is JSON Lines?
+
+- **UTF-8 Encoding:** Files must be UTF-8 encoded. Do not include a byte order mark (BOM).
+- **One JSON Value Per Line:** Each line is a valid JSON value (object, array, string, number, boolean, or null). Blank lines are ignored.
+- **Line Separator:** Each line ends with `\n` (or `\r\n`). The last line may or may not end with a newline.
+
+**Example:**
+```json
+{"name": "Alice", "score": 42}
+{"name": "Bob", "score": 17}
+[1, 2, 3]
+"hello"
+null
+```
+
+---
+
+## Reading JSON Lines Files
+
+You can use the `read_jsonl` and `stream_jsonl` functions to read JSONL files or streams.
+
+### `read_jsonl`
+
+Reads the entire file or stream into memory and returns a vector of parsed JSON values.
+
+```julia
+data = read_jsonl("data.jsonl")
+# or from an IOBuffer
+buf = IOBuffer("{\"a\": 1}\n{\"a\": 2}\n")
+data = read_jsonl(buf)
+```
+
+- **Arguments:** `source::Union{AbstractString, IO}`
+- **Returns:** `Vector` of parsed JSON values
+- **Note:** Loads all data into memory. For large files, use `stream_jsonl`.
+
+---
+
+### `stream_jsonl`
+
+Creates a lazy iterator (Channel) that yields one parsed JSON value at a time, without loading the entire file into memory.
+
+```julia
+for record in stream_jsonl("data.jsonl")
+ println(record)
+end
+
+# Collect the first 10 records
+first10 = collect(Iterators.take(stream_jsonl("data.jsonl"), 10))
+```
+
+- **Arguments:** `source::Union{AbstractString, IO}`
+- **Returns:** `Channel` (iterator) of parsed JSON values
+- **Note:** Ideal for large files and streaming workflows.
+
+---
+
+## Writing JSON Lines Files
+
+Use `write_jsonl` to write an iterable of JSON-serializable values to a JSONL file.
+
+```julia
+write_jsonl("out.jsonl", [Dict("a"=>1), Dict("b"=>2)])
+write_jsonl("out.jsonl.gz", (Dict("i"=>i) for i in 1:100); compress=true)
+```
+
+- **Arguments:**
+ - `filename::AbstractString`
+ - `data`: iterable of JSON-serializable values
+ - `compress::Bool=false`: write gzip-compressed if true or filename ends with `.gz`
+- **Returns:** The filename
+
+---
+
+
+## Example: Roundtrip with IOBuffer
+
+Note that there is no stable roundtrip between read and write, because of the way `JSON3` processes record into dictionaries.
+
+```julia
+data = [Dict("a"=>1), Dict("b"=>2)]
+buf = IOBuffer()
+for obj in data
+ JSON3.write(buf, obj)
+ write(buf, '\n')
+end
+seekstart(buf)
+read_data = read_jsonl(buf)
+@assert read_data == data
+```
+
+---
+
+## See Also
+
+- [`JSON3.jl`](https://github.com/quinnj/JSON3.jl): Fast, flexible JSON parsing and serialization for Julia.
+- [`CodecZlib.jl`](https://github.com/JuliaIO/CodecZlib.jl): Gzip compression support.
+
+---
+
+For more advanced usage and performance tips, see the main documentation and function docstrings.+
\ No newline at end of file