-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update misc.txt #23
base: master
Are you sure you want to change the base?
Update misc.txt #23
Conversation
another simpler suggestion for some 'broader labels', to allow growing your dataset: the preset labels are too restrictive IMO. being able to select between these will still give labels for training , and gather scenes that can be further segmented (with individual objects labels)
I really like the idea of putting the labels into some context (scene). However, at the moment I am not sure if we should mix up (broader) scenes with (fine granular) labels. Currently all the labels in the I think it might get distracting for users if we mix up scenes with labels, because users maybe won't know what they should choose if both the label and the scene matches. What do you think? Does it make sense to introduce some sort of hierarchy concept already at this point? If so, how should we model that hierarchy? Some JSON like syntax? edit: We should probably also think about if and how we want to represent the hierarchy in the database, as a badly choosen database schema could maybe become a bottleneck some day. |
Yes I can see that (and I posted another issue about yet another typ - textures #25; so now I actually see 3 distinct type of labels : scenes, objects, textures. You don't want to clutter the same list with all those). (and later you might want verbs, adjectives..) I'm not sure what the most user-friendly way to present that is, you could have 2 drop boxes ? :-
or alternatively , "scene vs object" could be a presented as the first choice, and further UI unfolds or r.e. presenting the labels in the API, perhaps the stored labels could be as specific as possible ("urban scene", etc to spell it out that they're not just objects. Alternatively, the 'base type' system could sort that out |
unsure.. I'd agree JSON is definitely the most obvious choice. (my earlier suggestions were based on programming-language syntax for classes, but those would need a custom parser). of course how to actually structure the JSON ? does this look too complicated?
(could you allow "is_a" to be either an array or a single item?) you might want to specify synonyms too ("hatchback" = "hatchback car". "saloon car" = "sedan" } it might be simpler to specify a simple hierarchy first (tree organisation), then allow a separate way of building additional links, where you couldn't fit something in one category (unsure exactly how):-
however, if you get these wrong you've imposed sub-optimal structure. You're left wondering 'how deep should it be', whereas with "is_a", you can easily just say "porsche" is_a "car" without first having to decide if it's a "grand tourer" or "sports car" |
another way to go might be to forget 'scenes' and figure out 'objects' that do represent them, "object=city", "object=town", "object=street", "object=kitchen", "object=living room" , "object=forest", "object=farm" however there is a distinction in how they're used ?
alternatively you could forget that distinction, and allow for the possibility of , say, aerial photography and you use the same labels "forest", "town" as for ground level photos within them (interesting video sequences from drones or aircraft taking off, or weather balloons being released to the edge of space..) It would certainly look silly if you gave the label "kitchen" for a photo of a sink with kettle beside it, then asked "annotate all the kitchens in this photo". then again you could have a photo of a corridor with an open door, then you could highlight the interior of the door as "kitchen"; some kitchens have windows though to the dining area too |
what if the annotation tool didn't actually bother asking you to annotate 'cat' in a photo tagged as 'cat', but rather moved onto components.. cat - 'annotate all the eyes,ears,mouths, noses , paws,tails..'. 'car' - 'annotate all the wheels,headlights,taillights, wheels, license plates, windscreens..'. Maybe you could assume if the single image label is one object, you needn't bother with fine-grain labelling of the object again. Then you could indeed have 'photo=forest' -> 'highlight all the leaves, tree trunks..' , 'photo=street' -> 'highlight all the cars, pedestrians, signposts', 'photo=cat-> 'highlight all the eyes, noses, mouths' .. all treated the same component annotations would give your vision net more power, e.g. distinguishing the orientation of objects ('this is a car travelling toward you') |
I really like the JSON structure - I think something like that makes the whole thing pretty flexible and expressible. But in order to see if the JSON structure is flexible enough to work for for every use case we intend to implement, it's probably a good idea to think about concrete things we want to implement. I created a "Project Planning" ticket (#24) with all the thoughts and ideas that are currently going through my mind. |
another simpler suggestion for some 'broader labels', to allow growing your dataset: the existing preset labels are too restrictive IMO. being able to select between these will still give labels for training , and gather scenes that can be further segmented (with individual objects labels)
see issue #18 for more rationale , #18
this list isn't as big; you could start growing the dataset with this, then retroactively add 'has-a' / 'is-a' relations to give more detailed labels ('scenes' and 'objects' alike could both gain components.. e.g. 'urban scene' has components 'person', 'car' etc .. 'car' has components 'wheels', 'headlamps' etc)
scene based recognition is quite important .. perceiving things in context .. machine vision is to be used for navigation for delivery bots.. self-driving cars.. domestic robots, agricultural robots, security.. etc ... it's all context based situations
Perhaps you could introduce a simple hierarchy of labels as an intermediate step if the whole 'traits' idea is too much initially.. alternatively maybe some clickable tags (multiple tags applying to the whole image?) would suffice (e.g. just 5 tags like 'animals' 'plants' 'buildings' 'machines' 'people' instead of fiddler concepts wouldn't be so much ui; instead of 'urban' you'd just click 'buildings, machines, people', etc)