-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Annotate photos using the Google Cloud Vision API #14
Comments
Pricing is pretty good: free for first 1,000 calls per month, then $1.50 per thousand after that. |
Python library docs: https://googleapis.dev/python/vision/latest/index.html I'm creating a new project for this called simonwillison-photos: https://console.cloud.google.com/projectcreate https://console.cloud.google.com/home/dashboard?project=simonwillison-photos Then I enabled the Vision API. The direct link to https://console.cloud.google.com/flows/enableapi?apiid=vision-json.googleapis.com which they provided in the docs didn't work - it gave me a "You don't have sufficient permissions to use the requested API" error - but starting at the "Enable APIs" page and searching for it worked fine. I created a new service account as an "owner" of that project: https://console.cloud.google.com/apis/credentials/serviceaccountkey (and complained about it on Twitter and through their feedback form)
from google.cloud import vision
client = vision.ImageAnnotatorClient.from_service_account_file("simonwillison-photos-18c570b301fe.json")
# Photo of a lemur
response = client.annotate_image(
{
"image": {
"source": {
"image_uri": "https://photos.simonwillison.net/i/1b3414ee9ade67ce04ade9042e6d4b433d1e523c9a16af17f490e2c0a619755b.jpeg"
}
},
"features": [
{"type": vision.enums.Feature.Type.IMAGE_PROPERTIES},
{"type": vision.enums.Feature.Type.OBJECT_LOCALIZATION},
{"type": vision.enums.Feature.Type.LABEL_DETECTION},
],
}
)
response Output is:
|
For face detection:
For OCR:
|
Database schema for this will require some thought. Just dumping the output into a JSON column isn't going to be flexible enough - I want to be able to FTS against labels and OCR text, and potentially query against other characteristics too. |
The default timeout is a bit aggressive and sometimes failed for me if my resizing proxy took too long to fetch and resize the image.
|
It can detect faces, run OCR, do image labeling (it knows what a lemur is!) and do object localization where it identifies objects and returns bounding polygons for them.
The text was updated successfully, but these errors were encountered: