FLUX 2 vs FLUX SRPO, New FLUX Training Kohya SS GUI Premium App With Presets & Features

SECourses25 Nov 202519:23
TLDRIn this video, the presenter compares the new FLUX 2 model with the FLUX SRPO, showcasing their respective features and improvements. The FLUX SRPO model is noted for being lightweight and highly realistic, ideal for users with lower VRAM. Meanwhile, FLUX 2, with its 32 billion parameters, offers better quality but requires much more computational power. New features in the Kohya SS GUI, such as FP8 scaling and optimized training configurations, are highlighted. The video also delves into practical aspects like model fine-tuning, image generation quality, and training performance.

Takeaways

  • 🚀 The new FLUX Kohya trainer by SECourses brings major improvements, enabling high-quality FLUX SRPOFLUX 2 vs FLUX SRPO model training even on low-VRAM GPUs (as low as 6 GB).
  • đź§  FLUX SRPO performs similarly to the FLUX Dev model while being significantly smaller and more hardware-friendly.
  • 📦FLUX 2 was released recently, but its massive size (32B parameters, ~64 GB BF16) makes local training difficult until quantized versions become widely available. For more information, visit Flux 2.
  • ⚙️ The updated Kohya trainer now supports Torch Compile, reducing VRAM usage and increasing training speed across all presets.
  • 📉 FP8 Scaled LoRA training is now supported—offering almost identical quality to BF16 while dramatically reducing VRAM consumption and eliminating block swaps for certain configurations.
  • đź”§ DreamBooth currently cannot use FP8 Scaled training, but 32 GB configs fit in VRAM fully thanks to Torch Compile optimizations.
  • ⚡ New 80 GB GPU configuration provides major speed boosts; for example, training on an RTX 6000 Pro can cost as little as ~$3.30 for a full-quality fine-tune.
  • 🖼️ A new image preprocessing tool shows exactly how images are bucketed and transformed during training, helping users fix orientation and dataset issues beforehand.
  • đź’ľ The built-in FP8 Converter halves model size (22.2 GB → 11.1 GB) with no noticeable quality loss, enabling high-quality inference on 12 GB GPUs.
  • ⬇️ An improved Windows model downloader ensures accurate, hash-verifiedFLUX 2 vs SRPO Comparison downloads of FLUX Dev, FLUX SRPO, and other supported models.
  • 🔍 Comparisons show FLUX SRPO, fine-tuned models, and FLUX 2 each excel in different areas—FLUX 2 has strong prompt following but requires heavy hardware.
  • đź‘‘ For realism, Qwen Image Realism currently delivers the best results until FLUX 2 fine-tuning and LoRA workflows mature.

Q & A

  • What is the FLUX SRPO model, and how does it compare to the FLUX 2 model?

    -The FLUX SRPO model is a realistic model with low VRAM requirements, requiring only 6 GB of VRAM for local training. It is smaller compared to the FLUX 2 model, which has 32 billion parameters and is significantly larger, requiring high-end hardware to run effectively.

  • What are the primary differences between FLUX 2 and FLUX SRPO in terms of hardware requirements?

    -FLUX 2 requires powerful hardware with large amounts of VRAM, especially for its 32 billion parameter model. In contrast, the FLUX SRPO model is designed to run with low VRAM (as low as 6 GB) and does not need high-end GPUs, making it more accessible for users with limited hardware.

  • How does the SECourses Premium Kohya Trainer enhance the FLUX training experience?

    -The SECourses Premium Kohya Trainer provides several updates, including improved GUIs, support for Torch Compile, FP8 Scaled training, and more efficient VRAM usage. These updates improve training speed, efficiency, and model quality, making it easier for users to train high-quality models with less powerful hardware.

  • What is the benefit of using FP8 ScFLUX SRPO vs FLUX 2aled training for LoRA and DreamBooth models?

    -FP8 Scaled training helps reduce VRAM usage and improves training speed. For LoRA models, FP8 Scaled models can run significantly faster with lower VRAM requirements, and for DreamBooth models, while only BF16 precision is supported, FP8 can still provide efficiency gains in other configurations.

  • How does the new 'Image Pre-processing' tool improve the FLUX training process?

    -The Image Pre-processing tool helps users prepare their datasets by processing images exactly as they will be used during training. It identifies potential issues like inaccurate orientation or problematic images, which can improve the overall quality and accuracy of the model.

  • What is the significance of the '80 GB GPU configuration' in the SECourses Premium Kohya Trainer?

    -The 80 GB GPU configuration significantly speeds up the training process. It allows for faster batch processing and model training, especially when working with larger images or longer training durations. This configuration is ideal for users with high-end GPUs like the RTX 6000 Pro.

  • What are the benefits of converting FLUX DreamBooth models to FP8?

    -Converting FLUX DreamBooth models to FP8 Scaled precision reduces model size by half, making them easier to fit into GPUs with lower VRAM, such as 12 GB GPUs. Importantly, this conversion does not result in any noticeable loss of quality, making it a valuable tool for optimizing model performance.

  • How does FLUX 2 compare to other models, such as Nano Banana and Seedream 4, in terms of image realism?

    -FLUX 2 offers better realism compared to Nano Banana and Seedream 4. The FLUX 2 Pro model produces highly detailed and realistic images, although certain elements (such as imaginary creatures) may appear unrealistic. Nano Banana, especially in its Pro version, shows inferior realism and lower image quality.

  • What is the advantage of using the FLUX FP8 Converter in the SECourses Premium Kohya GUI?

    -The FLUX FP8 Converter allows users to convert their FLUX DreamBooth models into FP8 Scaled versions, reducing their size without sacrificing quality. This tool makes it easier to fit models into GPUs with lower VRAM while maintaining high-quality outputs.

  • What steps should users follow when training with the SECourses Premium Kohya GUI?

    -Users should ensure their LoRA or DreamBooth training configurations are correctly loaded. They can then utilize the updated presets, which now support Torch Compile and FP8 Scaled precision for faster training. It's also important to follow the training tutorial steps and load the appropriate models, ensuring efficient use of VRAM and RAM for optimal results. For advanced capabilities, consider exploring Flux 2 Pro.

Outlines

  • 00:00

    ⚙️ Intro & FLUX SRPO + Kohya Trainer Updates

    The presenter introduces recent improvements to FLUX training using a locally-run, customized Kohya trainer (SECourses Premium). He shows images trained with the lightweight FLUX SRPO model—designed to be VRAM-efficient and realistic—and explains that the same presets work for both FLUX Dev and FLUX SRPO so users can train locally on modest GPUs (as low as ~6 GB VRAM). He highlights that training tutorials (DreamBooth, RunPod, MassedCompute, and LoRA tutorials) remain valid and available. Key technical changes include adding Torch Compile support across presets, on-the-fly conversion to FP8 Scaled for LoRA (similar to Musubi Tuner) which yields almost no quality loss versus BF16, and VRAM/VRAM-swap improvements. The speaker demonstrates the massive difference in model size between FLUX SRPO and the new FLUX 2 Dev (the latter being extremely large — tens of GBs and billions of parameters — making it hard to run on consumer GPUs). He also shows concrete VRAM/block-swap behavior: non-FP8 setups require many block swaps (slow), while FP8 Scaled can reduce block-swap counts to zero on 24 GB GPUs, producing large speedups.

  • 05:05

    đź’ľ DreamBooth, Configs,FLUX SRPO updates FP8 Converter & Performance Numbers

    This paragraph explains limitations and optimizations for different training modes. DreamBooth currently supports BF16 mixed-precision only (FP8 Scaled isn't supported for DreamBooth fine-tuning), but 32 GB configurations fit in VRAM thanks to Torch Compile. The author presents new high-memory configurations (including an 80 GB profile) and shares example throughput: an RTX 6000 Pro running batch=1 at 1024×1024 achieved ~1.7 seconds/iteration, and the example cost estimate on MassedCompute (with a coupon) is calculated to be only a few dollars for a short high-quality run. He describes a new FLUX FP8 Converter utility that converts trained BF16 DreamBooth models into Scaled FP8 — halving model size (e.g., ~22.2 GB → ~11.1 GB) while preserving visual quality. The paragraph also introduces an Image Pre-processing utility that replicates Kohya’s training preprocessing (bucketing, EXIF orientation, padding) so users can inspect how images will actually be used and identify problematic orientations or padding before training.

  • 10:07

    đź§° Presets, Downloader, Memory Efficiency & FLUX 2 Comparisons

    Here the presenter covers usability and tooling improvements: presets that start at 8 GB, optimized and memory-efficient model loading to reduce both VRAM and system RAM needs, CPU-based text-encoder caching to help low-VRAM GPUs, and an improved model downloader that bundles latest configs, DreamBooth/LoRA tabs, test prompts, and avoids duplicate downloads with hash verification. The downloader lets users choose between FLUX Dev, FLUX Krea Dev (not recommended for realism), and FLUX SRPO Realism (recommended for realistic results). He then compares local FLUX SRPO fine-tuned outputs with images generated by FLUX 2 (Pro/Dev) on the official playground: FLUX 2 is promising but not yet a drop-in replacement—its huge size and resource requirements limit practical local usage. He notes ComfyUI is adding support for quantized FLUX 2 models and promises future tutorials and automated presets/downloaders for easier adoption.

  • 15:08

    🔎 Cross-model Tests, Service Differences & Recommendation

    The final paragraph walks through comparative tests across providers and models. The presenter notes that third-party FLUX 2 deployments (e.g., Fal.ai) can produce lower quality than the official BFL playground and stresses using the highest resolution for best results. He demonstrates that models struggle with non-real or impossible prompt elements (unreal animals, etc.), explains that further speed improvements (e.g., faster LoRAs) are needed to make FLUX 2 practical for local fine-tuning, and shares quick comparisons with other models/services (Nano Banana, Seedream 4). While Seedream 4 and FLUX 2 show strong detail, the author currently recommends using Qwen Image Realism as the “king” for image realism until FLUX 2 fine-tuning/LoRA workflows mature. The video closes with standard channel asks (like & subscribe) and pointers to tutorials for learning more.

Mindmap

Keywords

  • đź’ˇFLUX 2

    FLUX 2 is the newly published, large-generation model discussed in the video; it represents the next major FLUX release and was released just hours before the presenter recorded. In the script the presenter compares FLUX 2 (including FLUX 2 Pro and Dev variants) against smaller local models, noting that FLUX 2 is very large (32 billion parameters) and requires massive memory, so it is currently hard to run on consumer GPUs. The video frames FLUX 2 as promising for prompt-following and high-quality native generations, but not yet a drop-in replacement for the local training workflow described.

  • đź’ˇFLUX SRPO

    FLUX SRPO is presented as a smaller, realism-focused FLUX model that is practical to run locally and train with limited VRAM. The script repeatedly references FLUX SRPO as the base used for local fine-tuning and demonstrations — the presenter trained images locally with the FLUX SRPO model and shows that it achieves high realism while being much lighter than FLUX 2. In context, FLUX SRPO is recommended for realism use-cases and for users with modest hardware (as low as 6 GB VRAM).

  • đź’ˇFLUX 2 vs SRPO comparison FLJSON code correctionUX Dev

    FLUX Dev refers to another FLUX model variant that is described as more general-purpose or stylized (3D, anime, stylization) compared to the realism-focused SRPO. The presenter explains that the SRPO model is essentially the same as the Dev model in behavior but optimized for realism and lower VRAM usage; presets are said to work on both FLUX Dev and FLUX SRPO. Examples in the script show the presenter using FLUX Dev presets for base generations and comparing results across models.

  • đź’ˇSECourses Premium Kohya Trainer

    The SECourses Premium Kohya Trainer is the presenter's forked and enhanced Kohya training GUI and script bundle that adds features, presets and usability improvements. In the video the presenter demonstrates new version 35 features of this trainer (GUI, presets, FP8 conversion, image pre-processing, Torch Compile support), showing how it helps users train locally with improved speeds and lower memory requirements. The trainer is positioned as the central tool for running DreamBooth and LoRA workflows with the FLUX models in the demo.

  • đź’ˇLoRA

    LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning method referenced throughout the script; the presenter describes separate LoRA presets and a LoRA tab in the GUI. The video explains practical LoRA features added to the trainer — such as FP8 Scaled support, optimized VRAM usage, and multipreset configurations — and shows how LoRA presets can be tuned for different VRAM targets (e.g., 24 GB vs 32 GB). In the script, LoRA examples are used to demonstrate speed and memory trade-offs and training-quality comparisons (BF16 vs FP8 Scaled).

  • đź’ˇDreamBooth

    DreamBooth is a fine-tuning approach for personalizing models (often used for subject-driven generation) and is implemented as a separate tab in the presenter's GUI. The script notes that DreamBooth currently supports BF16 mixed-precision training (not FP8 Scaled) and emphasizes the importance of loading the DreamBooth tab correctly to perform fine-tuning. The presenter refers to earlier tutorials (Windows, RunPod, MassedCompute) that cover DreamBooth training and clarifies how the new presets integrate with that workflow.

  • đź’ˇFP8 Scaled

    FP8 Scaled is a quantization/precision format the presenter adds support for in LoRA training and model conversion to reduce VRAM and disk size. In the video the presenter shows that FP8 Scaled LoRA weights produce almost no measurable quality difference compared to BF16 while significantly reducing memory usage (for example, halving model size when converting from BF16 to FP8 Scaled). The script uses FP8 Scaled to demonstrate practical speed and VRAM improvements (such as eliminating block swap counts on 24 GB GPUs).

  • đź’ˇBF16

    BF16 (bfloat16) is a numeric precision format commonly used for training large models; the script uses BF16 as the baseline precision for many training examples. The presenter compares BF16 results against FP8 Scaled equivalents, showing minimal visual quality differences in generated grids while highlighting that BF16 remains the required format for some modes like DreamBooth fine-tuning. BF16 is therefore described as the reliable default precision the training pipeline supports.

  • đź’ˇTorch Compile

    Torch Compile is a PyTorch feature the presenter has enabled across new presets to improve runtime speed and reduce VRAM usage. In the script the presenter explicitly points out that all new presets support Torch Compile and that this contributes to better memory efficiency (for example, reducing the text encoder and model memory footprint). The practical effect described is faster training steps and the ability to fit larger configurations onto the same GPU hardware.

  • đź’ˇVRAM

    VRAM (video RAM) refers to the GPU memory available for model loading and training; it is a repeated concern throughout the video when discussing what users can run locally. The presenter gives concrete VRAM-targeted presets (8 GB, 12 GB, 24 GB, 32 GB, 80 GB), explains how FP8 Scaled and Torch Compile reduce VRAM needs, and shows example timings for RTX-class GPUs to help viewers choose appropriate configurations. VRAM considerations determine whether a model fits in GPU memory (eliminating swap) and directly affect training speed and feasibility.

  • đź’ˇFP8 Converter

    The FLUX FP8 Converter is a utility in the presenter's GUI that converts trained BF16 DreamBooth models into FP8 Scaled versions to reduce storage and VRAM footprint. The video highlights that converting a typical 22.2 GB BF16 model to FP8 Scaled results in an ~11.1 GB model without noticeable quality loss in the presenter’s tests. This converter is presented as a major convenience for users who want trained models to run on 12 GB GPUs or otherwise save disk space while preserving visual fidelity.

  • đź’ˇImage Pre-processing

    Image Pre-processing is a GUI utility that shows exactly how training images will be transformed before being fed to Kohya during training (padding, resizing, orientation fixes, bucketing, etc.). The presenter demonstrates that pre-processing helps users spot problematic images (wrong orientation, aspect-ratio issues) that would otherwise degrade training quality and recommends using this tool to inspect buckets and corrected outputs. In short, image pre-processing is framed as a quality-control step that improves final model output by catching dataset issues early.

  • đź’ˇPresets

    Presets are pre-configured training and generation settings included in the SECourses GUI to simplify training across different hardware targets and tasks (LoRA, DreamBooth, generation, upscale). The script describes updated presets for various VRAM levels (starting at 8 GB), for different precisions (BF16, FP8 Scaled), and for enabling Torch Compile — all intended to make training faster and more reliable with minimal manual tuning. Examples in the video include LoRA presets, DreamBooth presets, generation/upscale presets, and a downloader that installs the matching configs automatically.

  • đź’ˇBFL (Black Forest Labs) playground

    The BFL playground (Black Forest Labs) is the online environment the presenter uses to compare model generations (for example, testing FLUX 2 Pro generations at 2048Ă—2048). In the script the presenter runs identical prompts on the BFL playground for FLUX 2 and compares them to local FLUX SRPO or fine-tuned outputs to show differences in quality and prompt-following. The playground is therefore used as an external benchmark to evaluate native FLUX 2 output versus locally trained models.

Highlights

  • FLUX SRPO model allows for training with low VRAM usage, as low as 6 GB, while still producing high realism.

  • FLUX 2, a new and massive model with 32 billion parameters, offers high-quality generation but requires significant hardware.

  • The SECourses Premium Kohya Trainer has been updated to support Torch Compile, improving training efficiency.

  • New LoRA presets optimize VRAM usage and training speeds, significantly improving performance for GPUs with 24 GB of VRAM.

  • FP8 Scaled weights support in LoRA allows for faster training without compromising quality, as shown in BF16 versus FP8 comparisons.

  • DreamBooth fine-tuning with the SECourses GUI now supports both BF16 mixed precision and optimized VRAM usage for 32 GB GPUs.

  • New 80 GB GPU configurations offer significant speed-ups, reducing batch processing time to just 1.7 seconds per iteration for high-quality training.

  • FLUXFLUX 2 vs SRPO comparison FP8 Converter tool allows conversion of FLUX DreamBooth models into FP8 Scaled models, reducing model size by half without quality loss.

  • Image Pre-processing tool in the SECourses GUI allows users to visualize how training images will be processed, helping to identify image quality issues.

  • The new FLUX model downloader makes it easy to select and download the appropriate models without duplication or errors.

  • FLUX 2 has better prompt-following capabilities but requires more time and resources compared to the FLUX SRPO model.

  • Although FLUX 2's quality is promising, its massive size and hardware demands make it unsuitable for casual or low-end users at this stage.

  • FLUX SRPO fine-tuned models achieve high realism, particularly in human portrait generation, showcasing the model's strength in realistic rendering.

  • Despite FLUX 2's impressive realism, third-party providers like Fal.ai struggle to match the model's performance when compared to BFL's platform.

  • Qwen Image Realism currently stands as the leading choice for high-quality image realism until FLUX 2 fine-tuning capabilities are fully developed.