EDIT #1: replaced some dead links. It's rather perplexing to see fresh uploads die this quickly...maybe it has something to do with the hatred for all things AI-generated on Imgur, who knows.
EDIT #2: did some Spring cleaning on this post while I was cleaning up my HDD and uploading the remaining few goodies worth posting; I have now reordered all pics based on subject character as it helps me resurrect pictures that are surprisingly quick to get deleted from Imgur...for some reason (despite being marked "Mature", hidden and not posted on the community, which is annoying). I'll be reupping those dead links whenever I drop by again; Imgur seems to have it in especially for Motoko Kusanagi and Milly Ashford - just to cite two that I've had to restore
twice already.
EDIT #3: had further discussions with a fellow Stable Diffuser on the subject of multi-LoRA compositions and he advised me to switch to "Latent" mode in Regional Prompter and force SD's precision level to fp32 across the board. The results are fantastic (
much better separation of character traits and higher detail even at low-res), but the cost to speed generation is, on the whole...just
loathsome. Not that it doesn't amount to water under the bridge at this point. *shrugs* Anyway, I added two new pics at the bottom of the TLDR tags right above. Cheers.
EDIT #4:
Oh. Em. Eff. Gee. I just discovered an extension called Multidiffusion Upscaler for Automatic1111, which looks like an instant, no-look-back replacement for Regional Prompter. Why in God's good name did I only learn of it
now? I only have to test it with hires fix to make sure I'm not imagining things.
EDIT #5: well, I've got a new personal record for upscaling ratio: 2.5, up from 2, which allowed me to create a 1280x1920px image, bigger than the 1024x1536px I thought would be the hard limit for me. So...in the end, the MdU extension might be a smidge better than RP thanks to its improved VRAM management and memory savings, but it's still a lot more time-consuming than I'm comfortable with. Also, when it comes to multi-character compositions, it doesn't feel like it integrates the individual components into the greater whole with as much synergy and interaction as RP does; they feel a bit independent of one another as if they'd been inpainted separately, which...sometimes results in perspective issues (e.g. one individual being drawn a LOT bigger than the other, despite having same-sized regions specified in the prompt). It does however,
completely negate the risk of blending different characters into a single one, as RP is, most unfortunately, quite prone to do. I'm still seeing some strange outputs, though. I also added a 1920x1280px pic (a landscape variant of the 1280x1920 one) - the faces were then extracted to separate image files, run through a no-prompt img2img (I made the mistake of NOT enabling After Detailer during initial generation) and Photoshopped back in.
EDIT #6: this will be my final contribution to this post aside from routine maintenance (curse you, Imgur deletions!). I tried the Multidiffusion extension and I found it a most interesting piece of optional software, in some ways better than Regional Prompter but in some ways more...lacking and irritating. The memory management system is impressive (my SD installation on my potato PC became less crash-prone) but Tiled VAE and Tiled Diffusion do make everything slow down to a crawl even after removing
--medvram from SD's launch parameters. The upside? For the first time since I started diffusing, I've been able to generate 1920x1280px and 1280x1920px images - something that I thought was too much to ask from my aging PC until now. MD does furthermore allow
more than two LoRAs per picture! Without using ControlNet and img2img inpainting? Foul witchcraft, I say! But it does. However, one ensuing problem that I've been grappling with for the last two days lies in that if the bounding boxes are too close to one another, the separate outputs tend to
fade into each other instead of
fusing like RP does. There are also perspective issues because MD doesn't allow you to specify the desired depth for each region (ControlNet
can solve this, but it's more resource-taxing, naturally).
On a different topic, I'm sure some will ask, "
your pics...why is it always Milly and Kallen together? Why not C.C. or Kaguya? It's kinda repetitive..." Well, here's the thing with Stable Diffusion: you start by testing small, basic prompts and figure out what works and what doesn't (SD doesn't use natural speech, by the way - it's all about how you
parse the description); you take the prompt that seems to produce the best or most accurate results and underline the keywords (or strings thereof) that seem to make a difference (like "finely detailed face and eyes", "dramatic lighting", "from above", "medium shot" or "cowboy shot", etc.). Then you build upon it; you tweak a few things here and there and try to move out of your comfort zone. You test different viewing angles, poses, clothes, camera/atmospheric styles. It's a recursive exercise of
trial and error, pure and simple...and I did a lot of it while trying to create a triple-LoRA image, which never seemed to quite work for me. Regional Prompter just spat out two characters, one...or fused them altogether into an eldritch, multilimbed and highly elastic creature. Two separate characters seemed to be the limit for THAT extension. For my choice character models...well, the Kallen and Milly LoRAs were of
very good quality and I had some success with C.C., although I never quite managed to get her bored expression just right. Villetta and Cornelia were also very nice (though Villetta often showed disconnected ponytails and SD refused to get her skin tone right), but Lelouch...! Urgh, what a tricky bastard he was - either I got good outputs or very bad ones. Nothing in between.
EDIT #7: Multidiffusion Upscaler (region prompt + tiled VAE options both activated) + ControlNet (either Depth, Normal or Segmented) + Hires fix (4x-UltraSharp) is an absolutely brilliant combination (ControlNet's finally stable, I finally got Insightface to install from within SD's venv session and xformers to properly work). OpenPose is still a bit iffy (I sometimes got characters in the right pose, but facing the
other way), but I've had fantastic results with Depth and Normal. Just look at three pictures (#1: C.C. on her bed #2: Milly in the snow #3: Lelouch + Kallen + Milly) to see the sheer possibilities this combo unlocks. It neatly solves the perspective issues I previously had with MultiDiffusion and Regional Prompter. The only issue with Depth and Normal is that you
need a reference picture (for the character pose), and unless you have one on hand, you'll be forced to use additional software like
VRoid Studio or
PoseMy.art to
create that ControlNet reference picture, therefore increasing your workload. Still, it was well worth the trouble for me. Also, on the topic of generation speed, it is
highly advised to go into Settings > Hypertile and turn on all the checkboxes to shave off a
massive amount of time (could up to
four times faster!) but I have yet to see if it causes massive visual differences in the output. And that's it.
Voilą. I'm done. Folder's all cleared out.
Last but not least:
For the record, my latest multi-LoRA works have also seen me going back to Photoshop for some heavy post-processing/fine-tuning (I found it easier to run cropped pics of the faces, hands and bodies through After Detailer img2img then copy-pasting everything back together in Photoshop, rather than regenerating). A bit ironic, but not surprising.
Anyway, I've therefore uploaded a few more images as proof of my latest breakthrough, and I think it very fitting that these would be my last uploads (the final one being Lelouch sitting in a fauteuil, flanked by Kallen and Milly) for this post - multi-LoRA composition has long been a road fraught with failure and setback for me, and I'm happy, for the time being, to leave the visual AI field with my head held high and able to brag about finally overcoming hurdle that long stood in my way.
But enough of my technical rambling---
enjoy!