Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

output person identity and image quality is far behind instantid, faceid plus #15

Open
flankechen opened this issue May 7, 2024 · 6 comments

Comments

@flankechen
Copy link

Thanks for your work and code.
But to be honest, in my test, the person identity and image quality is far behind instantid or faceid plus in current released model and weights.

some of my test,

  1. xujiaqi 许佳琪
    uYCqBckOXcGH0ViBe-uNZ0CvYDbbI-hQfROaEjcdFYw
    gO_NOk_kxoO2TfFfQKcmzgNSv69oaeKMlpRnRufGnNI

2.liuyifei 刘亦菲
RNXrH9th8d2iPcwpct1J6YYR-lXhP9Xq4EN5s--qvMw
hGNElMkgupe-GpQIjsu8D6IzBwaneHduHy78-p4PoQs

@jshilong
Copy link
Contributor

jshilong commented May 7, 2024

Thanks for your feedback

The biggest feature of FlashFace is that it can achieve precise control of the face through language (controlling age, gender, accessories, expressions, etc.), but this also leads to higher requirements for language prompts(you can add a beautiful young woman in the prompt), At the same time, you can increase lamda_feature, face_guidance and step_to_launch_face_guidance to obtain better face similarity.

. You can provide your language prompt for convenience I'll test it. At the same time, providing a face box like face_bbox =[0.3, 0.1, 0.6, 0.4] (a face box at center) will also bring better results

At the same time, you can also check #11 to see if it helps

@jshilong
Copy link
Contributor

jshilong commented May 7, 2024

Because there are fewer Asians in my training set, you can add Asian to prompt and change lamda_feat = 1.2
face_guidence = 3
step_to_launch_face_guidence = 800, increase face_bbox =[0.3, 0.2, 0.6, 0.5]

xujiaqi

# ordinary people

face_imgs = [Image.open(f"{package_dir}/jiaqixu/{i+1}.png").convert("RGB") for i in range(4)]
need_detect = True
pos_prompt =  'A beautiful young asian woman, white skin on the street, sunny day, soft light'
# remove beard
neg_prompt = None
# No face position
face_bbox =[0.3, 0.2, 0.6, 0.5] 
# bigger these three parameters leads to more fidelity but less diversity 
lamda_feat = 1.2
face_guidence = 3
step_to_launch_face_guidence = 800

steps = 30
default_text_control_scale = 7.5

default_seed = 0


imgs = generate(pos_prompt=pos_prompt, 
                    neg_prompt=neg_prompt, 
                    steps=steps, 
                    face_bbox=face_bbox,
                    lamda_feat=lamda_feat, 
                    face_guidence=face_guidence, 
                    num_sample=4, 
                    text_control_scale=default_text_control_scale, 
                    seed=default_seed, 
                    step_to_launch_face_guidence=step_to_launch_face_guidence, 
                    reference_faces=face_imgs,
                    need_detect=need_detect
                    )


# show the generated images
img_size = imgs[0].size
num_imgs = len(imgs)
save_img = Image.new('RGB', (img_size[0] * (num_imgs + 1), img_size[1]))
for i, img in enumerate(imgs):
    save_img.paste(img, ((i + 1) * img_size[0], 0))

# paste all four reference face imgs to the first

resize_w = img_size[0] // 2
resize_h = img_size[1] // 2

for id, ref_img in enumerate(face_imgs):
    # resize the ref_img keep the ratio to fit the size of (resize_w, resize_h)
    w_ratio = resize_w / ref_img.size[0]
    h_ratio = resize_h / ref_img.size[1]
    ratio = min(w_ratio, h_ratio)
    ref_img = ref_img.resize(
        (int(ref_img.size[0] * ratio), int(ref_img.size[1] * ratio)))

    if id < 2:
        save_img.paste(ref_img, (id * resize_w, 0))
    else:
        save_img.paste(ref_img, ((id - 2) * resize_w, resize_h))

display(save_img)
截屏2024-05-07 上午11 53 49

@jshilong
Copy link
Contributor

jshilong commented May 7, 2024

liuyifei

# ordinary people

face_imgs = [Image.open(f"{package_dir}/liuyifei/{i+1}.png").convert("RGB") for i in range(3)]
need_detect = True
pos_prompt =  'A beautiful young asian woman in the garden, sunny day'
# remove beard
neg_prompt = None
# No face position
face_bbox =[0.3, 0.2, 0.6, 0.5] 

# bigger these three parameters leads to more fidelity but less diversity 
lamda_feat = 1.3
face_guidence = 2.3
step_to_launch_face_guidence = 600

steps = 25
default_text_control_scale = 7.5

default_seed = 0


imgs = generate(pos_prompt=pos_prompt, 
                    neg_prompt=neg_prompt, 
                    steps=steps, 
                    face_bbox=face_bbox,
                    lamda_feat=lamda_feat, 
                    face_guidence=face_guidence, 
                    num_sample=4, 
                    text_control_scale=default_text_control_scale, 
                    seed=default_seed, 
                    step_to_launch_face_guidence=step_to_launch_face_guidence, 
                    reference_faces=face_imgs,
                    need_detect=need_detect
                    )


# show the generated images
img_size = imgs[0].size
num_imgs = len(imgs)
save_img = Image.new('RGB', (img_size[0] * (num_imgs + 1), img_size[1]))
for i, img in enumerate(imgs):
    save_img.paste(img, ((i + 1) * img_size[0], 0))

# paste all four reference face imgs to the first

resize_w = img_size[0] // 2
resize_h = img_size[1] // 2

for id, ref_img in enumerate(face_imgs):
    # resize the ref_img keep the ratio to fit the size of (resize_w, resize_h)
    w_ratio = resize_w / ref_img.size[0]
    h_ratio = resize_h / ref_img.size[1]
    ratio = min(w_ratio, h_ratio)
    ref_img = ref_img.resize(
        (int(ref_img.size[0] * ratio), int(ref_img.size[1] * ratio)))

    if id < 2:
        save_img.paste(ref_img, (id * resize_w, 0))
    else:
        save_img.paste(ref_img, ((id - 2) * resize_w, resize_h))

display(save_img)
截屏2024-05-07 下午12 16 44

@jshilong
Copy link
Contributor

jshilong commented May 7, 2024

  1. In terms of image quality, FlashFace is heavily constrained by the sd1.5, which significantly lags behind the SDXL version-based methods.

  2. Regarding the representation of Asian individuals, the training dataset for FlashFace contains few to no Asian faces. Consequently, it may be necessary to add a face bounding box and increase the corresponding parameters such as lamda_feature, face_guidance, and step_to_launch_face_guidance for better person identity.

@jshilong
Copy link
Contributor

jshilong commented May 7, 2024

https://github.com/ali-vilab/FlashFace/blob/main/docs/zh_cn.md

I added some hyperparameter experience here, it might help you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
@flankechen @jshilong and others