Annotate photos using the Google Cloud Vision API #14

simonw · 2020-04-28T18:09:03Z

It can detect faces, run OCR, do image labeling (it knows what a lemur is!) and do object localization where it identifies objects and returns bounding polygons for them.

simonw · 2020-04-28T18:09:21Z

Pricing is pretty good: free for first 1,000 calls per month, then $1.50 per thousand after that.

simonw · 2020-04-28T18:12:34Z

Python library docs: https://googleapis.dev/python/vision/latest/index.html

I'm creating a new project for this called simonwillison-photos: https://console.cloud.google.com/projectcreate

https://console.cloud.google.com/home/dashboard?project=simonwillison-photos

Then I enabled the Vision API. The direct link to https://console.cloud.google.com/flows/enableapi?apiid=vision-json.googleapis.com which they provided in the docs didn't work - it gave me a "You don't have sufficient permissions to use the requested API" error - but starting at the "Enable APIs" page and searching for it worked fine.

I created a new service account as an "owner" of that project: https://console.cloud.google.com/apis/credentials/serviceaccountkey (and complained about it on Twitter and through their feedback form)

pip install google-cloud-vision

from google.cloud import vision
client = vision.ImageAnnotatorClient.from_service_account_file("simonwillison-photos-18c570b301fe.json")
# Photo of a lemur
response = client.annotate_image(
    {
        "image": {
            "source": {
                "image_uri": "https://photos.simonwillison.net/i/1b3414ee9ade67ce04ade9042e6d4b433d1e523c9a16af17f490e2c0a619755b.jpeg"
            }
        },
        "features": [
            {"type": vision.enums.Feature.Type.IMAGE_PROPERTIES},
            {"type": vision.enums.Feature.Type.OBJECT_LOCALIZATION},
            {"type": vision.enums.Feature.Type.LABEL_DETECTION},
        ],
    }
)
response

Output is:

label_annotations {
  mid: "/m/09686"
  description: "Vertebrate"
  score: 0.9851104021072388
  topicality: 0.9851104021072388
}
label_annotations {
  mid: "/m/04rky"
  description: "Mammal"
  score: 0.975814163684845
  topicality: 0.975814163684845
}
label_annotations {
  mid: "/m/01280g"
  description: "Wildlife"
  score: 0.8973650336265564
  topicality: 0.8973650336265564
}
label_annotations {
  mid: "/m/02f9pk"
  description: "Lemur"
  score: 0.8270352482795715
  topicality: 0.8270352482795715
}
label_annotations {
  mid: "/m/0fbf1m"
  description: "Terrestrial animal"
  score: 0.7443860769271851
  topicality: 0.7443860769271851
}
label_annotations {
  mid: "/m/06z_nw"
  description: "Tail"
  score: 0.6934166550636292
  topicality: 0.6934166550636292
}
label_annotations {
  mid: "/m/0b5gs"
  description: "Branch"
  score: 0.6203985214233398
  topicality: 0.6203985214233398
}
label_annotations {
  mid: "/m/05s2s"
  description: "Plant"
  score: 0.585474967956543
  topicality: 0.585474967956543
}
label_annotations {
  mid: "/m/089v3"
  description: "Zoo"
  score: 0.5488107800483704
  topicality: 0.5488107800483704
}
label_annotations {
  mid: "/m/02tcwp"
  description: "Trunk"
  score: 0.5200017690658569
  topicality: 0.5200017690658569
}
image_properties_annotation {
  dominant_colors {
    colors {
      color {
        red: 172.0
        green: 146.0
        blue: 116.0
      }
      score: 0.24523821473121643
      pixel_fraction: 0.027533333748579025
    }
    colors {
      color {
        red: 54.0
        green: 50.0
        blue: 42.0
      }
      score: 0.10449723154306412
      pixel_fraction: 0.12893334031105042
    }
    colors {
      color {
        red: 141.0
        green: 121.0
        blue: 97.0
      }
      score: 0.1391485631465912
      pixel_fraction: 0.039133332669734955
    }
    colors {
      color {
        red: 28.0
        green: 25.0
        blue: 20.0
      }
      score: 0.08589499443769455
      pixel_fraction: 0.11506666988134384
    }
    colors {
      color {
        red: 87.0
        green: 82.0
        blue: 74.0
      }
      score: 0.0845794677734375
      pixel_fraction: 0.16113333404064178
    }
    colors {
      color {
        red: 121.0
        green: 117.0
        blue: 108.0
      }
      score: 0.05901569500565529
      pixel_fraction: 0.13379999995231628
    }
    colors {
      color {
        red: 94.0
        green: 83.0
        blue: 66.0
      }
      score: 0.049011144787073135
      pixel_fraction: 0.03946666792035103
    }
    colors {
      color {
        red: 155.0
        green: 117.0
        blue: 90.0
      }
      score: 0.04164913296699524
      pixel_fraction: 0.0023333332501351833
    }
    colors {
      color {
        red: 178.0
        green: 143.0
        blue: 102.0
      }
      score: 0.02993861958384514
      pixel_fraction: 0.0012666666880249977
    }
    colors {
      color {
        red: 61.0
        green: 51.0
        blue: 35.0
      }
      score: 0.027391711249947548
      pixel_fraction: 0.01953333243727684
    }
  }
}
crop_hints_annotation {
  crop_hints {
    bounding_poly {
      vertices {
        x: 2073
      }
      vertices {
        x: 4008
      }
      vertices {
        x: 4008
        y: 3455
      }
      vertices {
        x: 2073
        y: 3455
      }
    }
    confidence: 0.65625
    importance_fraction: 0.746666669845581
  }
}
localized_object_annotations {
  mid: "/m/0jbk"
  name: "Animal"
  score: 0.7008256912231445
  bounding_poly {
    normalized_vertices {
      x: 0.0390297956764698
      y: 0.26235100626945496
    }
    normalized_vertices {
      x: 0.8466796875
      y: 0.26235100626945496
    }
    normalized_vertices {
      x: 0.8466796875
      y: 0.9386426210403442
    }
    normalized_vertices {
      x: 0.0390297956764698
      y: 0.9386426210403442
    }
  }
}

simonw · 2020-04-28T18:13:48Z

For face detection:

    {"type": vision.enums.Feature.Type.Type.FACE_DETECTION}

For OCR:

    {"type": vision.enums.Feature.Type.DOCUMENT_TEXT_DETECTION}

simonw · 2020-04-28T18:14:43Z

Database schema for this will require some thought. Just dumping the output into a JSON column isn't going to be flexible enough - I want to be able to FTS against labels and OCR text, and potentially query against other characteristics too.

simonw · 2020-04-28T18:19:06Z

The default timeout is a bit aggressive and sometimes failed for me if my resizing proxy took too long to fetch and resize the image.

client.annotate_image(..., timeout=3.0) may be worth trying.

simonw added the enhancement New feature or request label Apr 28, 2020

ligurio mentioned this issue Mar 28, 2021

Support to annotate photos on other than macOS OSes #35

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Annotate photos using the Google Cloud Vision API #14

Annotate photos using the Google Cloud Vision API #14

simonw commented Apr 28, 2020

simonw commented Apr 28, 2020

simonw commented Apr 28, 2020 •

edited

Loading

simonw commented Apr 28, 2020

simonw commented Apr 28, 2020

simonw commented Apr 28, 2020

Annotate photos using the Google Cloud Vision API #14

Annotate photos using the Google Cloud Vision API #14

Comments

simonw commented Apr 28, 2020

simonw commented Apr 28, 2020

simonw commented Apr 28, 2020 • edited Loading

simonw commented Apr 28, 2020

simonw commented Apr 28, 2020

simonw commented Apr 28, 2020

simonw commented Apr 28, 2020 •

edited

Loading