使用Stable-Diffusion生成视频的完整教程

SD指南 2年前 (2023) Rui

1,520 0 0

本文是关于如何使用cuda和Stable-Diffusion生成视频的完整指南，将使用cuda来加速视频生成，并且可以使用Kaggle的TESLA GPU来免费执行我们的模型。

#install the diffuser package
#pip install --upgrade pip
!pip install --upgrade diffusers transformers scipy#load the model from stable-diffusion model card
import torch
from diffusers import StableDiffusionPipeline

from huggingface_hub import notebook_login

模型加载

模型的权重是是在CreateML OpenRail-M许可下发布的。这是一个开放的许可证，不要求对生成的输出有任何权利，并禁止我们故意生产非法或有害的内容。如果你对这个许可有疑问，可以看这里

https://huggingface.co/CompVis/stable-diffusion-v1-4

我们首先要成为huggingface Hub的注册用户，并使用访问令牌才能使代码工作。我们使用是notebook，所以需要使用notebook_login()来进行登录的工作

执行完代码下面的单元格将显示一个登录界面，需要粘贴访问令牌。

if not (Path.home()/.huggingface/token).exists(): notebook_login()

然后就是加载模型

model_id = "CompVis/stable-diffusion-v1-4"
device = "cuda"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipe = pipe.to(device)

显示根据文本生成图像

%%time
#Provide the Keywords
prompts = [
"a couple holding hands with plants growing out of their heads, growth of a couple, rainy day, atmospheric, bokeh matte masterpiece by artgerm by wlop by alphonse muhca ",
"detailed portrait beautiful Neon Operator Girl, cyberpunk futuristic neon, reflective puffy coat, decorated with traditional Japanese ornaments by Ismail inceoglu dragan bibin hans thoma greg rutkowski Alexandros Pyromallis Nekro Rene Maritte Illustrated, Perfect face, fine details, realistic shaded, fine-face, pretty face",
"symmetry!! portrait of minotaur, sci - fi, glowing lights!! intricate, elegant, highly detailed, digital painting, artstation, concept art, smooth, sharp focus, illustration, art by artgerm and greg rutkowski and alphonse mucha, 8 k ",
"Human, Simon Stalenhag in forest clearing style, trends on artstation, artstation HD, artstation, unreal engine, 4k, 8k",
"portrait of a young ruggedly handsome but joyful pirate, male, masculine, upper body, red hair, long hair, d & d, fantasy, roguish smirk, intricate, elegant, highly detailed, digital painting, artstation, concept art, matte, sharp focus, illustration, art by artgerm and greg rutkowski and alphonse mucha ",
"Symmetry!! portrait of a sith lord, warrior in sci-fi armour, tech wear, muscular!! sci-fi, intricate, elegant, highly detailed, digital painting, artstation, concept art, smooth, sharp focus, illustration, art by artgerm and greg rutkowski and alphonse mucha",
"highly detailed portrait of a cat knight wearing heavy armor, stephen bliss, unreal engine, greg rutkowski, loish, rhads, beeple, makoto shinkai and lois van baarle, ilya kuvshinov, rossdraws, tom bagshaw, tom whalen, alphonse mucha, global illumination, god rays, detailed and intricate environment ",
"black and white portrait photo, the most beautiful girl in the world, earth, year 2447, cdx"
]

显示

%%time
#show the results
images = pipe(prompts).images
images#show a single result
images[0]

第一个文本：a couple holding hands with plants growing out of their heads, growth of a couple, rainy day, atmospheric, bokeh matte masterpiece 的图像如下

将生成的图像显示在一起

#show the results in grid
from PIL import Image
def image_grid(imgs, rows, cols):
w,h = imgs[0].size
grid = Image.new(RGB, size=(cols*w, rows*h))
for i, img in enumerate(imgs): grid.paste(img, box=(i%cols*w, i//cols*h))
return gridgrid = image_grid(images, rows=2, cols=4)
grid

#Save the results
grid.save("result_images.png")

如果你的GPU内存有限（可用的GPU RAM小于4GB），请确保以float16精度加载StableDiffusionPipeline，而不是如上所述的默认float32精度。这可以通过告诉扩散器期望权重为float16精度来实现:

%%time
import torch
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipe = pipe.to(device)
pipe.enable_attention_slicing()images2 = pipe(prompts)
images2[0]

grid2 = image_grid(images, rows=2, cols=4)
grid2

如果要更换噪声调度器，也需要将它传递给from_pretrained:

%%time
from diffusers import StableDiffusionPipeline, EulerDiscreteSchedulermodel_id = "CompVis/stable-diffusion-v1-4"
# Use the Euler scheduler here instead
scheduler = EulerDiscreteScheduler.from_pretrained(model_id, subfolder="scheduler")
pipe = StableDiffusionPipeline.from_pretrained(model_id, scheduler=scheduler, torch_dtype=torch.float16)
pipe = pipe.to("cuda")
images3 = pipe(prompts)
images3[0][0]

#save the final output
grid3.save("results_stable_diffusionv1.4.png")

看看这图就是更换不同调度器的结果

#results are saved in tuple
images3[0][0]grid3 = image_grid(images3[0], rows=2, cols=4)
grid3

#save the final output
grid3.save("results_stable_diffusionv1.4.png")

查看全部图片

创建视频。

基本的操作已经完成了，现在我们来使用Kaggle生成视频

首先进入notebook设置:在加速器选择GPU，

然后安装所需的软件包

pip install -U stable_diffusion_videos

from huggingface_hub import notebook_login
notebook_login()
#Making Videos
from stable_diffusion_videos import StableDiffusionWalkPipeline
import torch
#"CompVis/stable-diffusion-v1-4" for 1.4

pipeline = StableDiffusionWalkPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
torch_dtype=torch.float16,
revision="fp16",
).to("cuda")
#Generate the video Prompts 1
video_path = pipeline.walk(
prompts=[environment living room interior, mid century modern, indoor garden with fountain, retro,m vintage, designer furniture made of wood and plastic, concrete table, wood walls, indoor potted tree, large window, outdoor forest landscape, beautiful sunset, cinematic, concept art, sunstainable architecture, octane render, utopia, ethereal, cinematic light, –ar 16:9 –stylize 45000,
environment living room interior, mid century modern, indoor garden with fountain, retro,m vintage, designer furniture made of wood and plastic, concrete table, wood walls, indoor potted tree, large window, outdoor forest landscape, beautiful sunset, cinematic, concept art, sunstainable architecture, octane render, utopia, ethereal, cinematic light, –ar 16:9 –stylize 45000,
environment living room interior, mid century modern, indoor garden with fountain, retro,m vintage, designer furniture made of wood and plastic, concrete table, wood walls, indoor potted tree, large window, outdoor forest landscape, beautiful sunset, cinematic, concept art, sunstainable architecture, octane render, utopia, ethereal, cinematic light, –ar 16:9 –stylize 45000,
environment living room interior, mid century modern, indoor garden with fountain, retro,m vintage, designer furniture made of wood and plastic, concrete table, wood walls, indoor potted tree, large window, outdoor forest landscape, beautiful sunset, cinematic, concept art, sunstainable architecture, octane render, utopia, ethereal, cinematic light, –ar 16:9 –stylize 45000,
environment living room interior, mid century modern, indoor garden with fountain, retro,m vintage, designer furniture made of wood and plastic, concrete table, wood walls, indoor potted tree, large window, outdoor forest landscape, beautiful sunset, cinematic, concept art, sunstainable architecture, octane render, utopia, ethereal, cinematic light, –ar 16:9 –stylize 45000],
seeds=[42,333,444,555],
num_interpolation_steps=50,
#height=1280, # use multiples of 64 if > 512. Multiples of 8 if < 512.
#width=720, # use multiples of 64 if > 512. Multiples of 8 if < 512.
output_dir=dreams, # Where images/videos will be saved
name=imagine, # Subdirectory of output_dir where images/videos will be saved
guidance_scale=8.5, # Higher adheres to prompt more, lower lets model take the wheel
num_inference_steps=50, # Number of diffusion steps per image generated. 50 is good default

)

将图像扩大到4k，这样可以生成视频

from stable_diffusion_videos import RealESRGANModel
model = RealESRGANModel.from_pretrained(nateraw/real-esrgan)
model.upsample_imagefolder(/kaggle/working/dreams/imagine/imagine_000000/, /kaggle/working/dreams/imagine4K_00)

为视频添加音乐

为视频增加音乐可以通过提供音频文件的将音频添加到视频中。

%%capture
! pip install youtube-dl
! youtube-dl -f bestaudio --extract-audio --audio-format mp3 --audio-quality 0 -o "music/thoughts.%(ext)s" https://soundcloud.com/nateraw/thoughtsfrom IPython.display import Audio

Audio(filename=music/thoughts.mp3)

这里我们使用youtube-dl下载音频（需要注意该音频的版权），然后将音频加入到视频中

# Seconds in the song.
audio_offsets = [7, 9]
fps = 8# Convert seconds to frames
num_interpolation_steps = [(b-a) * fps for a, b in zip(audio_offsets, audio_offsets[1:])]

video_path = pipeline.walk(
prompts=[blueberry spaghetti, strawberry spaghetti],
seeds=[42, 1337],
num_interpolation_steps=num_interpolation_steps,
height=512, # use multiples of 64
width=512, # use multiples of 64
audio_filepath=music/thoughts.mp3, # Use your own file
audio_start_sec=audio_offsets[0], # Start second of the provided audio
fps=fps, # important to set yourself based on the num_interpolation_steps you defined
batch_size=4, # increase until you go out of memory.
output_dir=dreams, # Where images will be saved
name=None, # Subdir of output dir. will be timestamp by default
)

最后是生成的视频的展示：

本文代码你可以在这里找到：

https://www.kaggle.com/code/rupakroy/stable-diffusion-videos/notebook