2026-03-24-log-rotation-design.md (7601B)
1 # Log Rotation Design 2 3 **Date:** 2026-03-24 4 **Scope:** Size-based log rotation for FileSink 5 **Version target:** v0.11.1 (non-breaking, additive) 6 7 ## Summary 8 9 Add opt-in size-based log rotation to `FileSink`. When a log file exceeds `max_size` bytes, it is closed, renamed with a timestamp suffix, and a fresh file is opened. Old rotated files can be automatically cleaned up via `max_files`. 10 11 ## Public API 12 13 Two new kwargs on the convenience constructors of `custom_logger` (the `filename` and batch-mode methods). These kwargs are threaded into `FileSink` — the core `custom_logger(sink::LogSink; ...)` method does not need them since they live on the sink. 14 15 | Kwarg | Type | Default | Description | 16 |-------|------|---------|-------------| 17 | `max_size` | `Int` | `0` (disabled) | Max bytes per log file before rotation | 18 | `max_files` | `Int` | `0` (no limit) | Max rotated files to keep per log path. `0` = keep all | 19 20 Rotation is opt-in — `max_size=0` means no rotation (current behavior unchanged). 21 22 **Validation:** `max_size < 0` or `max_files < 0` throws `ArgumentError`. No minimum `max_size` enforced, but values under 1024 emit a warning. 23 24 ```julia 25 custom_logger("/tmp/myapp"; 26 create_log_files=true, 27 max_size=10_000_000, # 10 MB 28 max_files=5) # keep 5 rotated + 1 current 29 ``` 30 31 ## RotatableIO Wrapper 32 33 **Problem:** `FormatLogger` from LoggingExtras.jl captures its IO handle at construction time. After rotation replaces `sink.ios[i]`, the `FormatLogger` would still write to the old closed IO. Additionally, `format_cb` never sees the formatted output string, so there is no natural place to count bytes. 34 35 **Solution:** A thin IO wrapper that solves both problems: 36 37 ```julia 38 mutable struct RotatableIO <: IO 39 sink::FileSink 40 index::Int 41 bytes::Ref{Int} # shared counter (deduplicated like ios/locks) 42 end 43 44 function Base.write(rio::RotatableIO, data::Union{UInt8, AbstractVector{UInt8}}) 45 n = write(rio.sink.ios[rio.index], data) 46 rio.bytes[] += n 47 # Check rotation after write 48 if rio.sink.max_size > 0 && rio.bytes[] > rio.sink.max_size 49 rotate!(rio.sink, rio.index) 50 end 51 return n 52 end 53 54 # Forward other IO methods needed by FormatLogger 55 Base.flush(rio::RotatableIO) = flush(rio.sink.ios[rio.index]) 56 Base.isopen(rio::RotatableIO) = isopen(rio.sink.ios[rio.index]) 57 ``` 58 59 **How it works:** 60 - `FormatLogger` holds a `RotatableIO` instead of the raw file IO 61 - Every `write` delegates to `sink.ios[rio.index]` — always the current handle 62 - Byte counting is exact (counts actual bytes flowing through `write`) 63 - Rotation check happens in `write`, inside the existing lock 64 - No changes needed to `format_cb` or `custom_format` or any `format_log` method 65 66 ## Byte Tracking 67 68 Byte counting is exact — `RotatableIO.write` counts every byte as it flows through. No `stat()` calls, no approximation. 69 70 In single-file mode (IO deduplication), all `RotatableIO` wrappers that share the same file path share the same `Ref{Int}` counter. This mirrors the existing deduplication pattern for `ios` and `locks`: a `Dict(path => Ref{Int}(initial_size))` is built in the constructor, and each slot maps to its shared counter. 71 72 On construction, each counter is initialized to the current file size via `stat(path).size` (one-time cost at startup, handles append mode correctly). 73 74 ## FileSink Changes 75 76 ```julia 77 mutable struct FileSink <: LogSink 78 files::Vector{String} 79 ios::Vector{IO} 80 locks::Vector{ReentrantLock} 81 rios::Vector{RotatableIO} # wrapper IOs passed to FormatLogger 82 max_size::Int 83 max_files::Int 84 end 85 ``` 86 87 The constructor builds `rios` after `ios` and `locks`: 88 89 ```julia 90 # Deduplicated byte counters (one per unique path) 91 path_to_bytes = Dict(p => Ref{Int}(stat(p).size) for p in unique_paths) 92 rios = [RotatableIO(obj, i, path_to_bytes[files[i]]) for i in eachindex(files)] 93 ``` 94 95 `create_demux_logger` passes `sink.rios[io_index]` to `FormatLogger` instead of `sink.ios[io_index]`. 96 97 ## Rotation Mechanics 98 99 ### rotate! function 100 101 ```julia 102 function rotate!(sink::FileSink, trigger_index::Int) 103 path = sink.files[trigger_index] 104 105 # 1. Close the current IO for this path 106 old_io = sink.ios[trigger_index] 107 isopen(old_io) && close(old_io) 108 109 # 2. Rename to timestamp suffix 110 rotated_name = make_rotated_name(path) 111 mv(path, rotated_name) 112 113 # 3. Open fresh file 114 new_io = open(path, "a") 115 116 # 4. Update ALL slots sharing this path (handles IO dedup) 117 for j in eachindex(sink.ios) 118 if sink.files[j] == path 119 sink.ios[j] = new_io 120 end 121 end 122 123 # 5. Reset the shared byte counter 124 sink.rios[trigger_index].bytes[] = 0 125 126 # 6. Cleanup old files (done inline — fast for typical file counts) 127 if sink.max_files > 0 128 cleanup_rotated_files!(path, sink.max_files) 129 end 130 end 131 ``` 132 133 ### Thread safety 134 135 Rotation happens inside the existing `lock(sink.locks[i]) do ... end` block (via the `RotatableIO.write` method, which is called from within the locked `format_cb` closure). Since the lock is already held, rotation is atomic with respect to concurrent writers on the same stream. 136 137 For single-file mode, all streams share one lock, so rotation of the shared file is safe. The `rotate!` function updates ALL slots sharing the same path, preventing stale IO references. 138 139 ### Cleanup 140 141 `cleanup_rotated_files!(path, max_files)`: 142 1. Extract base name and extension from `path` 143 2. Glob for `base.*.ext` in the same directory 144 3. Sort by filename (timestamp suffix sorts chronologically) 145 4. Delete oldest files until count <= `max_files` 146 147 ## Timestamp Suffix Format 148 149 | State | Filename | 150 |-------|----------| 151 | Current | `app.log` | 152 | Rotated | `app.2026-03-24T14-30-00-123.log` | 153 154 Pattern: insert timestamp before the final extension. Format `yyyy-mm-ddTHH-MM-SS-sss` (includes milliseconds to avoid sub-second collisions). Uses hyphens instead of colons for filesystem compatibility. 155 156 For files without an extension (e.g., `myapp`), the suffix is appended: `myapp.2026-03-24T14-30-00-123`. 157 158 For multi-level extensions (e.g., `app_error.log`), the timestamp is inserted before `.log`: `app_error.2026-03-24T14-30-00-123.log`. 159 160 ## Testing Plan 161 162 ### Unit tests 163 - `RotatableIO`: write bytes, verify counter incremented correctly 164 - `rotate!` on a FileSink: verify old file renamed with timestamp, new file opened, counter reset 165 - `rotate!` in single-file mode: verify all slots updated, shared counter reset 166 - `rotate!` with `max_files=2`: verify only 2 rotated files kept after 3 rotations 167 - `rotate!` with `max_files=0`: verify all rotated files kept 168 - `make_rotated_name`: correct timestamp insertion for `.log`, no-extension, multi-extension 169 - Validation: `max_size=-1` throws `ArgumentError`, `max_files=-1` throws `ArgumentError` 170 171 ### Integration tests 172 - Create logger with `max_size=500`, write enough messages to trigger rotation, verify multiple files exist with timestamp suffixes 173 - Verify current log file is always the original filename 174 - Verify content continuity: messages before and after rotation are all present across files 175 - Verify log format is preserved after rotation (e.g., JSON lines still valid) 176 177 ### Edge cases 178 - `max_size=0`: no rotation (current behavior) 179 - Single-file mode with rotation: all levels rotate together since they share one IO 180 - Append to existing file: byte counter initialized to current file size, rotation triggers correctly 181 - Sub-second rotation: millisecond suffix prevents filename collisions 182 183 ## Out of Scope 184 185 - Time-based rotation 186 - Compression of rotated files (could be a future addition) 187 - Remote/network log sinks