Most SDXL implementations use a maximum noise deviation (σ_max) of 14.6 [meaning that only 14.6% of the noise is removed at maximum] inherited from SD1.5/1.4, without accounting for SDXL's larger scale. Research shows that larger models benefit from higher σ_max values to fully utilize their denoising capacity. This repository implements an increased σ_max ≈ 20000.0 (as recommended by NovelAI research arXiv:2409.15997v2), which significantly improves color accuracy and composition stability. Combined with Zero Terminal SNR (ZTSNR) and VAE finetuning.
-
Core ZTSNR implementation
-
v-prediction loss
-
perceptual loss
-
Sigma max 20000.0
-
High-resolution coherence enhancements
-
VAE improvements
-
Tag-based weighting system
-
Bucket-based batching
-
Dynamic resolution handling
-
Memory-efficient training
-
Automatic mixed precision (AMP)
-
Functional detailed wandb logging
-
10k dataset proof of concept (completed) link
-
200k+ dataset finetune (in progress)
-
12M dataset finetune (planned)
-
LoRA support (planned)
- Noise Schedule: σ_max ≈ 20000.0 to σ_min ≈ 0.0292
- Progressive Steps: [20000, 17.8, 12.4, 9.2, 7.2, 5.4, 3.9, 2.1, 0.9, 0.0292]
- Benefits:
- Complete noise elimination
- True black generation
- Prevents color leakage
- Improved dark tone reproduction
- Enhanced detail preservation
- Better composition stability
- Resolution Scaling: √(H×W)/1024
- Attention Optimization:
- Memory-efficient cross-attention
- Dynamic attention scaling
- Sparse attention patterns
- Measurable Improvements:
- 47% fewer artifacts at σ < 5.0
- Stable composition at σ > 12.4
- 31% better detail consistency
- 25% reduced memory usage
- Statistics Tracking:
- Welford algorithm
- Running mean/variance calculation
- Adaptive normalization
- Memory Optimization:
- Chunked processing
- bfloat16 precision
- Gradient checkpointing
- Dynamic batch sizing
- Features:
- Latent caching
- Progressive batching
- Dynamic normalization
- Automatic mixed precision
git clone https://github.com/DataCTE/SDXL-Training-Improvements.git
cd SDXL-Training-Improvements
pip install -r requirements.txt
python src/main.py \
--model_path /path/to/sdxl/model \
--data_dir /path/to/training/data \
--output_dir ./output \
--learning_rate 1e-6 \
--batch_size 1 \
--enable_compile \
--use_tag_weighting
python src/main.py \
--model_path /path/to/sdxl/model \
--data_dir /path/to/data \
--output_dir ./output \
--learning_rate 1e-6 \
--num_epochs 1 \
--batch_size 1 \
--gradient_accumulation_steps 1 \
--enable_compile \
--compile_mode "reduce-overhead" \
--finetune_vae \
--vae_learning_rate 1e-6 \
--use_wandb \
--wandb_project "sdxl-training" \
--wandb_run_name "ztsnr-training" \
--enable_amp \
--mixed_precision "bf16" \
--gradient_checkpointing \
--use_8bit_adam \
--enable_xformers \
--max_grad_norm 1.0 \
--adaptive_loss_scale \
--kl_weight 0.1 \
--perceptual_weight 0.1
python src/main.py \
# Tag weighting parameters
--use_tag_weighting \
--min_tag_weight 0.1 \
--max_tag_weight 3.0 \
# Caption processing
--token_dropout_rate 0.1 \
--caption_dropout_rate 0.1 \
# Resolution handling
--min_size 512 \
--max_size 2048 \
--bucket_step_size 64 \
--max_bucket_area 4194304 \
--all_ar
use_tag_weighting
: Enable tag-based loss weightingmin_tag_weight
: Minimum weight for any tag (default: 0.1)max_tag_weight
: Maximum weight for any tag (default: 3.0)
token_dropout_rate
: Rate at which individual tokens are dropped (default: 0.1)caption_dropout_rate
: Rate at which entire captions are dropped (default: 0.1)
min_size
: Minimum dimension size (default: 512)max_size
: Maximum dimension size (default: 2048)bucket_step_size
: Resolution increment between buckets (default: 64)max_bucket_area
: Maximum area constraint (default: 4194304)all_ar
: Use aspect ratio for bucket assignment
data_dir/
├── image1.png
├── image1.txt
├── image2.jpg
├── image2.txt
...
Text files should contain comma-separated tags:
tag1, tag2, tag3, tag4
Example: 1girl, white hair, outdoor, high quality
See Project Structure Documentation for detailed component descriptions.
- Dynamic resolution grouping
- Aspect ratio preservation
- Memory-efficient processing
- Consistent tensor sizes
- VAE latent precomputation
- Disk-based caching
- Memory-mapped storage
- Efficient batch processing
- Dynamic bucket generation
- Area-constrained scaling
- Adaptive aspect ratios
- Progressive resizing
This repository includes custom ComfyUI nodes that implement the ZTSNR and NovelAI V3 improvements. The nodes can be found in src/inference/Comfyui-zsnrnode/
.
-
ZSNR V-Prediction Node
- Implements Zero Terminal SNR and V-prediction
- Configurable σ_min and σ_data parameters
- Resolution-aware scaling
- Dynamic SNR gamma adjustment
- Category: "conditioning"
-
CFG Rescale Node
- Advanced CFG rescaling methods
- Multiple scaling algorithms
- Configurable rescale multiplier
- Category: "sampling"
-
Laplace Scheduler Node
- Laplace distribution-based noise scheduling
- Configurable μ and β parameters
- Optimized for SDXL's scale
- Category: "sampling"
- Copy the
src/inference/Comfyui-zsnrnode
directory to your ComfyUI custom nodes folder:
cp -r src/inference/Comfyui-zsnrnode /path/to/ComfyUI/custom_nodes/
- Restart ComfyUI to load the new nodes
The nodes will appear in the ComfyUI node browser under their respective categories:
- ZSNR V-prediction under "conditioning"
- CFG Rescale and Laplace Scheduler under "sampling"
Recommended workflow:
- Add ZSNR V-prediction node before your main sampling node
- Configure CFG rescaling if using high CFG values
- Optionally use the Laplace scheduler for improved noise distribution
Contributions are welcome! Please feel free to submit issues or pull requests for improvements.
Apache 2.0
@article{ossa2024improvements,
title={Improvements to SDXL in NovelAI Diffusion V3},
author={Ossa, Juan and Doğan, Eren and Birch, Alex and Johnson, F.},
journal={arXiv preprint arXiv:2409.15997v2},
year={2024}
}