Creating images using prompts with Stable Diffusion is great. But, what about if we would be able to create videos?
Videos are practically a set of frames that are played after each other. Using the img2img model of Stable Diffusion, many people started to experiment with turning videos into something more special.
Check this example from the POV video
or check this diffused(is it a word?) Billie Eilish video
My takeaways from this:
Stable diffusion video creation is almost there.
The generated videos are still glitchy because the models are still not 100% consistent even if it uses the same configuration (seed, CFG, etc.), but it is a matter of time.
Applying algorithms frame by frame is quite an ineffective and expensive method for using scarce GPU resources. We should figure out a better way to segment changes and only apply the models on the changed parts.
In the long term, this creates many opportunities for augmented reality. Can you imagine to see that I could walk over Berlin streets during grey winter days but see everything is colourful and enjoyable?