Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update misc.txt #23

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

dobkeratops
Copy link

@dobkeratops dobkeratops commented Oct 2, 2017

another simpler suggestion for some 'broader labels', to allow growing your dataset: the existing preset labels are too restrictive IMO. being able to select between these will still give labels for training , and gather scenes that can be further segmented (with individual objects labels)
see issue #18 for more rationale , #18

this list isn't as big; you could start growing the dataset with this, then retroactively add 'has-a' / 'is-a' relations to give more detailed labels ('scenes' and 'objects' alike could both gain components.. e.g. 'urban scene' has components 'person', 'car' etc .. 'car' has components 'wheels', 'headlamps' etc)

scene based recognition is quite important .. perceiving things in context .. machine vision is to be used for navigation for delivery bots.. self-driving cars.. domestic robots, agricultural robots, security.. etc ... it's all context based situations

Perhaps you could introduce a simple hierarchy of labels as an intermediate step if the whole 'traits' idea is too much initially.. alternatively maybe some clickable tags (multiple tags applying to the whole image?) would suffice (e.g. just 5 tags like 'animals' 'plants' 'buildings' 'machines' 'people' instead of fiddler concepts wouldn't be so much ui; instead of 'urban' you'd just click 'buildings, machines, people', etc)

another simpler suggestion for some 'broader labels', to allow growing your dataset: the preset labels are too restrictive IMO. being able to select between these will still give labels for training , and gather scenes that can be further segmented (with individual objects labels)
@bbernhard
Copy link
Collaborator

bbernhard commented Oct 3, 2017

I really like the idea of putting the labels into some context (scene). However, at the moment I am not sure if we should mix up (broader) scenes with (fine granular) labels. Currently all the labels in the misc.txt file are exposed via the REST API (and currently used by the imagemonkey-client).

imagemonkey-client

I think it might get distracting for users if we mix up scenes with labels, because users maybe won't know what they should choose if both the label and the scene matches. What do you think?

Does it make sense to introduce some sort of hierarchy concept already at this point? If so, how should we model that hierarchy? Some JSON like syntax?

edit: We should probably also think about if and how we want to represent the hierarchy in the database, as a badly choosen database schema could maybe become a bottleneck some day.

@dobkeratops
Copy link
Author

dobkeratops commented Oct 4, 2017

"I think it might get distracting for users if we mix up scenes with labels, because users maybe won't know what they should choose if both the label and the scene matches. What do you think?"

Yes I can see that (and I posted another issue about yet another typ - textures #25; so now I actually see 3 distinct type of labels : scenes, objects, textures. You don't want to clutter the same list with all those). (and later you might want verbs, adjectives..)

I'm not sure what the most user-friendly way to present that is,

you could have 2 drop boxes ? :-

Scene:<?>  Object:<?>

Scene:<urban>  Object:  <many..>      ('many' is a default 'object' if you specify a scene)
                                                         (or you could just leave it as <?>)

Scene:<?>  Object: <cat>

or alternatively , "scene vs object" could be a presented as the first choice, and further UI unfolds

or object:Many then gives you the choice of 'scene=..'

r.e. presenting the labels in the API, perhaps the stored labels could be as specific as possible ("urban scene", etc to spell it out that they're not just objects. Alternatively, the 'base type' system could sort that out

@dobkeratops
Copy link
Author

dobkeratops commented Oct 4, 2017

Does it make sense to introduce some sort of hierarchy concept already at this point? If so, how should we model that hierarchy? Some JSON like syntax?

unsure.. I'd agree JSON is definitely the most obvious choice. (my earlier suggestions were based on programming-language syntax for classes, but those would need a custom parser).

of course how to actually structure the JSON ?
1-to-many 'is-a' relations will complicate the specification, but will prevent ambiguity in classification later ( is a "bicycle" a vehicle , or a piece of sports equipment, or just a machine?. etc )

does this look too complicated?

{
  "cat" : { has:["head","paws",..],  is_a:["animal"]  }
  "car" : {has:["wheel","headlight","bumper"],   is_a:["vehicle"] }
  "tractor : {is_a:["vehicle","agricultural equipment"]}
  "hatchback" : { is_a:["car"] }
  "vehicle" : { is_a:["machine"] }
}

(could you allow "is_a" to be either an array or a single item?)

you might want to specify synonyms too ("hatchback" = "hatchback car". "saloon car" = "sedan" }

it might be simpler to specify a simple hierarchy first (tree organisation), then allow a separate way of building additional links, where you couldn't fit something in one category (unsure exactly how):-

{
 "machine":{
   "vehicle":{
      "land vehicle":{
          "car":{}, "van":{}, "tractor":{},"truck":{}
       }
       "aircraft":{ 
            "aeroplane":{},"helicopter":{},"autogyro":{},"airship":{}
       }
       "aquatic vehicle":{
            "ship":{},"hovercraft":{},"submarine":{},..
       }
   }
   "lathe":{},..  
 },
 "animal":{...},..
 "plant":{..},..
}

however, if you get these wrong you've imposed sub-optimal structure. You're left wondering 'how deep should it be', whereas with "is_a", you can easily just say "porsche" is_a "car" without first having to decide if it's a "grand tourer" or "sports car"

@dobkeratops
Copy link
Author

dobkeratops commented Oct 4, 2017

another way to go might be to forget 'scenes' and figure out 'objects' that do represent them, "object=city", "object=town", "object=street", "object=kitchen", "object=living room" , "object=forest", "object=farm"

however there is a distinction in how they're used ?

scenes - aren't used for area labelling, they always apply to the entire image
objects - suitable for both the entire image or labelling.
textures - you really want to highlight patches within objects/scenes, rather than around objects
        ... but you might still have a photo zoomed in on a patch of grass..

alternatively you could forget that distinction, and allow for the possibility of , say, aerial photography and you use the same labels "forest", "town" as for ground level photos within them (interesting video sequences from drones or aircraft taking off, or weather balloons being released to the edge of space..)

It would certainly look silly if you gave the label "kitchen" for a photo of a sink with kettle beside it, then asked "annotate all the kitchens in this photo".

then again you could have a photo of a corridor with an open door, then you could highlight the interior of the door as "kitchen"; some kitchens have windows though to the dining area too

@dobkeratops
Copy link
Author

dobkeratops commented Oct 4, 2017

what if the annotation tool didn't actually bother asking you to annotate 'cat' in a photo tagged as 'cat', but rather moved onto components.. cat - 'annotate all the eyes,ears,mouths, noses , paws,tails..'. 'car' - 'annotate all the wheels,headlights,taillights, wheels, license plates, windscreens..'.

Maybe you could assume if the single image label is one object, you needn't bother with fine-grain labelling of the object again. Then you could indeed have 'photo=forest' -> 'highlight all the leaves, tree trunks..' , 'photo=street' -> 'highlight all the cars, pedestrians, signposts', 'photo=cat-> 'highlight all the eyes, noses, mouths' .. all treated the same

component annotations would give your vision net more power, e.g. distinguishing the orientation of objects ('this is a car travelling toward you')

@bbernhard
Copy link
Collaborator

I really like the JSON structure - I think something like that makes the whole thing pretty flexible and expressible. But in order to see if the JSON structure is flexible enough to work for for every use case we intend to implement, it's probably a good idea to think about concrete things we want to implement.

I created a "Project Planning" ticket (#24) with all the thoughts and ideas that are currently going through my mind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants