BazerUtils.jl

Assorted Julia utilities including custom logging
Log | Files | Refs | README | LICENSE

read_jsonl.md (1811B)


      1 # Working with JSON Lines Files
      2 
      3 !!! warning "Deprecated"
      4     The JSONL functions in BazerUtils (`read_jsonl`, `stream_jsonl`, `write_jsonl`) are deprecated.
      5     Use [JSON.jl](https://github.com/JuliaIO/JSON.jl) v1 instead, which has native support:
      6     ```julia
      7     using JSON
      8     data = JSON.parse("data.jsonl"; jsonlines=true)       # read
      9     JSON.json("out.jsonl", data; jsonlines=true)           # write
     10     ```
     11 
     12 ---
     13 
     14 ## From the website: what is JSON Lines?
     15 
     16 > JSON Lines (JSONL) is a convenient format for storing structured data that may be processed one record at a time. Each line is a valid JSON value, separated by a newline character. This format is ideal for large datasets and streaming applications.
     17 
     18 For more details, see [jsonlines.org](https://jsonlines.org/).
     19 
     20 ---
     21 
     22 ## Legacy API (deprecated)
     23 
     24 ### `read_jsonl`
     25 
     26 Reads the entire file or stream into memory and returns a vector of parsed JSON values.
     27 
     28 ```julia
     29 using BazerUtils
     30 data = read_jsonl("data.jsonl")
     31 data = read_jsonl(IOBuffer("{\"a\": 1}\n{\"a\": 2}\n"))
     32 data = read_jsonl(IOBuffer("{\"a\": 1}\n{\"a\": 2}\n"); dict_of_json=true)
     33 ```
     34 
     35 ### `stream_jsonl`
     36 
     37 Creates a lazy iterator (Channel) that yields one parsed JSON value at a time.
     38 
     39 ```julia
     40 for record in stream_jsonl("data.jsonl")
     41     println(record)
     42 end
     43 first10 = collect(Iterators.take(stream_jsonl("data.jsonl"), 10))
     44 ```
     45 
     46 ### `write_jsonl`
     47 
     48 Write an iterable of JSON-serializable values to a JSONL file.
     49 
     50 ```julia
     51 write_jsonl("out.jsonl", [Dict("a"=>1), Dict("b"=>2)])
     52 write_jsonl("out.jsonl.gz", (Dict("i"=>i) for i in 1:100); compress=true)
     53 ```
     54 
     55 ---
     56 
     57 ## See Also
     58 
     59 - [`JSON.jl`](https://github.com/JuliaIO/JSON.jl): The recommended replacement. Use `jsonlines=true` for JSONL support.
     60 - [`CodecZlib.jl`](https://github.com/JuliaIO/CodecZlib.jl): Gzip compression support.