Clean repo: remove temp files, AI junk, DBs, build artifacts

This commit is contained in:
ponzischeme89
2026-01-17 22:10:51 +13:00
parent 4c442f1482
commit 40837deab9
3 changed files with 0 additions and 374 deletions
-54
View File
@@ -1,54 +0,0 @@
# Sublogue Installation Guide
## Synology
- Create folders: `./data` and `./media` (or map to Synology shared folders).
- In Container Manager, create a project and paste `docker-compose.yml`.
- Map volumes to your shared folders (e.g., `/volume1/docker/sublogue` -> `/config`, `/volume1/media` -> `/media`).
- Start the stack, then open `http://<NAS-IP>:5000`.
## Unraid
- Create folders: `/mnt/user/appdata/sublogue` and `/mnt/user/appdata/sublogue/media`.
- Add the container using `unraid-sublogue.xml` or import `docker-compose.yml` with a compose manager.
- Set `TZ`, `PUID`, `PGID` to match your Unraid user (often `99/100`).
- Start the container, open `http://<UNRAID-IP>:5000`.
## Komodo
- Add a new stack and paste `docker-compose.yml`.
- Ensure the `npm_network` exists (`docker network create npm_network`).
- Deploy and open `http://<HOST-IP>:5000`.
## Portainer
- Stacks -> Add Stack -> Web editor -> paste `docker-compose.yml`.
- Ensure `npm_network` exists if you are using the proxy compose.
- Deploy and open `http://<HOST-IP>:5000`.
## Bare Metal Docker CLI
- Create folders: `mkdir -p ./data ./media`.
- Run: `docker compose up -d`.
- Open: `http://<HOST-IP>:5000`.
## Folder Structure
- `./data` -> container `/config` (database and settings).
- `./media` -> container `/media` (media library access).
- For NPM: `./npm/data` and `./npm/letsencrypt`.
## Permissions (chmod/chown)
- If you see permission errors, set `PUID`/`PGID` to your host user ID.
- Fix ownership: `sudo chown -R 1000:1000 ./data ./media`.
- Fix permissions: `sudo chmod -R 775 ./data ./media`.
## Updates
- Watchtower (auto): run `containrrr/watchtower:latest` with `WATCHTOWER_CLEANUP=true`.
- Manual update:
- `docker compose pull`
- `docker compose up -d`
## Nginx Proxy Manager (NPM)
- Use `docker-compose.proxy.yml`.
- In NPM, add a proxy host for your domain -> forward to `sublogue:5000`.
- Enable SSL and Lets Encrypt in NPM (auto-renewal is handled by NPM).
- Advanced config (headers):
- `proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;`
- `proxy_set_header X-Forwarded-Proto $scheme;`
- `proxy_set_header X-Forwarded-Host $host;`
- `proxy_set_header X-Forwarded-Port $server_port;`
-8
View File
@@ -1,8 +0,0 @@
# Troubleshooting
- Permissions denied: set `PUID`/`PGID` correctly and run `chown -R` on your host folders.
- Port conflicts: change host port mapping (e.g., `5001:5000`).
- Missing network: create `npm_network` with `docker network create npm_network`.
- Reverse proxy not working: verify NPM is on the same network and forward to `sublogue:5000`.
- Healthcheck failing: confirm the app is listening on port `5000` and `/api/health` returns OK.
- No metadata results: ensure at least one integration is enabled in Settings.
-312
View File
@@ -1,312 +0,0 @@
# Zero Timing Drift Implementation
## Overview
The subtitle processor has been completely rewritten to guarantee **zero timing drift** for existing subtitles when injecting plot metadata.
## Core Guarantee
**Existing subtitle timestamps remain byte-for-byte identical after processing.**
- First dialogue text appears at exactly the same timestamp as before
- No subtitle blocks are shifted, delayed, or merged
- VLC/MPV playback shows no desync
- Running the operation twice doesn't duplicate plot blocks (idempotency)
## Implementation Strategy
### Previous Approach (BROKEN)
```python
# OLD: Shifted ALL subtitles forward by 38 seconds
intro_blocks = build_intro_blocks(movie, plot, header_duration=8, plot_duration=30)
shift_ms = intro_blocks[-1].end_time # 38000 ms
for subtitle in existing_subtitles:
shifted_subtitle = SubtitleBlock(
start_time = subtitle.start_time + shift_ms, # ❌ CAUSES DRIFT!
end_time = subtitle.end_time + shift_ms,
text = subtitle.text
)
```
### New Approach (CORRECT)
```python
# NEW: Inject plot blocks BEFORE first subtitle without shifting
first_subtitle_start = existing_subtitles[0].start_time
intro_blocks = build_intro_blocks(
movie,
plot,
first_subtitle_start_ms=first_subtitle_start, # Adapt to available time
min_safe_gap_ms=1000
)
# Simply prepend intro blocks - NO SHIFTING!
final = intro_blocks + existing_subtitles # ✅ ZERO DRIFT
```
## Adaptive Injection Logic
The system intelligently adapts to available time before the first subtitle:
### Case 1: Plenty of Time (≥ 6 seconds available)
```
Timeline:
├─ Block 1: Header (0ms - 3000ms)
├─ Block 2: Plot (3000ms - [first_subtitle - 1000ms])
├─ [1000ms gap]
└─ Block 3+: Original subtitles (UNCHANGED TIMING)
```
### Case 2: Limited Time (2-6 seconds available)
```
Timeline:
├─ Block 1: Combined header+plot (0ms - [first_subtitle - 1000ms])
├─ [1000ms gap]
└─ Block 2+: Original subtitles (UNCHANGED TIMING)
```
### Case 3: Very Tight Timing (< 2 seconds)
```
Timeline:
├─ Block 1: Zero-duration metadata (0ms - 0ms) [invisible]
├─ Block 2: Zero-duration plot (0ms - 0ms) [invisible]
└─ Block 3+: Original subtitles (UNCHANGED TIMING)
```
Zero-duration blocks preserve metadata for parsing but don't display during playback.
## Edge Cases Handled
### 1. Subtitles Starting at 00:00:00,000
- Uses zero-duration metadata blocks
- No visual display, but metadata preserved in file
### 2. Very Short First Cue Windows
- Automatically detects available time
- Adjusts plot display duration accordingly
### 3. Multiline Subtitle Blocks
- Parser handles `\n` characters correctly
- Text preserved exactly as-is
### 4. Files with BOM or Inconsistent Line Endings
- Strips BOM (`\ufeff`) automatically
- Normalizes `\r\n`, `\n`, `\r` to consistent format
### 5. Existing Non-Dialogue Cues
- Parser intelligently skips empty blocks
- Preserves all dialogue cues
### 6. Malformed SRT Blocks
- Defensive parsing with try/catch
- Invalid timecodes logged but don't crash processing
- Corrupt blocks skipped gracefully
## Idempotency
Running the operation multiple times on the same file is safe:
```python
def strip_existing_plot_blocks(blocks):
"""
Removes SubPlotter-generated blocks before re-processing.
Detection markers:
- "Generated by SubPlotter" text
- Zero-duration blocks (0ms - 0ms)
- Metadata markers: IMDb:, ⭐, ⏱, "runtime"
- Long text blocks in first 2 positions starting before 10s
"""
```
**Result**: File processed twice = same as file processed once
## Code Architecture
### Data Structures
```python
@dataclass(slots=True)
class SubtitleBlock:
index: int
start_time: int # milliseconds
end_time: int # milliseconds
text: str
```
### Key Functions
1. **`parse_srt(content: str)`**: Robust SRT parser with BOM/line ending handling
2. **`build_intro_blocks(..., first_subtitle_start_ms)`**: Adaptive plot block generation
3. **`strip_existing_plot_blocks(blocks)`**: Idempotency helper
4. **`format_srt(blocks)`**: Serialize blocks back to valid SRT format
### Time Handling
- All time internally stored as **milliseconds** (int)
- Uses `datetime.timedelta` principles but optimized for integer math
- Timecode format: `HH:MM:SS,mmm` (SRT standard)
## Testing
Run comprehensive tests:
```bash
python test_timing_preservation.py
```
### Test Cases
1. **Main Timing Preservation Test**
- Original subtitles at 10s, 13s, 16s
- Verifies timestamps unchanged after injection
- Verifies 1-second gap maintained
2. **Edge Case: Early Subtitle (1 second)**
- First subtitle at 1s
- Verifies zero-duration blocks used
- Confirms no visible display interference
3. **Idempotency Test**
- Processes file twice
- Verifies no plot block duplication
- Confirms output stable
### Expected Output
```
============================================================
✅ ALL TESTS PASSED - ZERO TIMING DRIFT CONFIRMED
============================================================
🎉 All tests passed! Zero timing drift guaranteed.
```
## Acceptance Criteria ✅
- [x] After injection, diff of original timestamps shows no change
- [x] First dialogue text at exactly same timestamp as before
- [x] VLC/MPV playback shows no desync
- [x] Handles files where first cue starts at 00:00:00,000
- [x] Handles very short first cue windows
- [x] Preserves multiline subtitle blocks
- [x] Handles BOM and inconsistent line endings
- [x] Preserves existing non-dialogue cues
- [x] Gracefully handles malformed SRT blocks
- [x] Idempotent (running twice doesn't corrupt file)
## What Changed in Codebase
### Modified Files
1. **`core/subtitle_processor.py`**
- Rewrote `build_intro_blocks()` to accept `first_subtitle_start_ms` parameter
- Added adaptive timing logic (3 cases based on available time)
- Removed ALL subtitle shifting code (lines 243-254 deleted)
- Added `strip_existing_plot_blocks()` for idempotency
- Enhanced `parse_srt()` with BOM/line ending handling
- Added comprehensive logging for debugging
### New Files
1. **`test_timing_preservation.py`**
- Comprehensive test suite
- Verifies zero timing drift
- Tests edge cases and idempotency
2. **`ZERO_TIMING_DRIFT.md`** (this file)
- Complete documentation
- Implementation details
- Usage examples
## Usage Example
The API remains unchanged - zero timing drift is automatic:
```python
processor = SubtitleProcessor(omdb_client, tmdb_client)
result = await processor.process_file(
file_path="movie.srt",
duration=40, # Ignored - duration now adaptive
force_reprocess=False
)
# result["status"] = "Processed"
# Original subtitle timing preserved!
```
## Logging Output
```
2026-01-14 03:06:30,885 - INFO - First subtitle starts at 00:00:10,000 (10000 ms) - injecting plot before this time
2026-01-14 03:06:30,885 - INFO - Injecting plot blocks: Header [0ms-3000ms], Plot [3000ms-9000ms], First subtitle: 10000ms
2026-01-14 03:06:30,885 - INFO - Stripped plot blocks: 5 → 3 blocks
```
## Benefits
1. **No Sync Issues**: Subtitles perfectly match video timing
2. **Professional Quality**: Industry-standard SRT handling
3. **Robust**: Handles edge cases and malformed files
4. **Safe**: Idempotent operations prevent corruption
5. **Transparent**: Comprehensive logging for debugging
6. **Fast**: Integer millisecond math, no datetime overhead
7. **Reliable**: Extensive test coverage
## Technical Implementation Details
### Why Integer Milliseconds?
Using `int` milliseconds instead of `datetime.timedelta`:
- **Performance**: Integer arithmetic is faster than datetime objects
- **Precision**: SRT format uses milliseconds (no need for nanoseconds)
- **Simplicity**: Direct conversion to/from SRT timecode format
- **Memory**: Smaller memory footprint for large subtitle files
### Why 1-Second Safety Gap?
The `min_safe_gap_ms=1000` parameter ensures:
- Plot text fully disappears before dialogue starts
- Prevents visual overlap in edge cases
- Accounts for subtitle rendering timing variations
- Industry standard practice for subtitle editing
### Why Zero-Duration Blocks?
When first subtitle starts very early (< 2s):
- Can't display plot without overlapping dialogue
- Zero-duration blocks (0ms-0ms) preserve metadata
- Players skip rendering but parsers see the text
- Maintains file structure for re-processing
## Comparison: Before vs After
### Before (Broken Implementation)
- ❌ All subtitles shifted forward 38 seconds
- ❌ First dialogue at 00:00:10,000 → moved to 00:00:48,000
- ❌ Causes total desync with video
- ❌ Unusable output files
### After (Fixed Implementation)
- ✅ No subtitle timing changes
- ✅ First dialogue at 00:00:10,000 → stays at 00:00:10,000
- ✅ Perfect sync with video
- ✅ Professional-quality output
## Future Enhancements
Possible improvements (not currently needed):
1. **Variable safety gap** based on subtitle density
2. **Multi-language plot blocks** for international content
3. **Custom plot positioning** (before/after/both)
4. **Interactive plot display timing** adjustment
5. **Smart plot splitting** for very long summaries
## Conclusion
The subtitle processor now implements **true zero timing drift** using subtitle-aware parsing and adaptive injection. All existing subtitles maintain their exact original timing while plot metadata is safely prepended.
---
**Status**: ✅ Production Ready
**Test Coverage**: 100% pass rate
**Performance**: < 50ms for typical SRT files
**Reliability**: Handles all edge cases