Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix paramsBoostedFields and paramsFilterFields #25

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

zebraf1
Copy link

@zebraf1 zebraf1 commented Apr 17, 2017

Description:
According to documentation: http://actionml.com/docs/ur_config
fields: optional, default = none. An array of ... . Positive biases are boosts any negative number will filter out any results that do not contain the values in the field name.

Problem:
When adding fields to algorithm parameters with a negative bias (-1) they will not be filtered, but instead added as boosted fields with a negative value. Providing a positive number will cause this to be a filter.

If you look at the annotations, someone has tried to fix this but instead of swapping the ">" and "<", he moved "=" instead.

Solution:
I have corrected this issue so that paramsFields are consistent with queryFields. (see queryBoostedFields below)

According to documentation: http://actionml.com/docs/ur_config
fields: optional, default = none. An array of ... . Positive biases are boosts any negative number will filter out any results that do not contain the values in the field name.

When adding fields to algorithm parameters with a negative bias (-1) they will not be filtered, but instead added as boosted fields with a negative value. Providing a positive number will cause this to be a filter.

If you look at the annotations, someone has tried to fix this but instead of swapping the ">" and "<", he moved "=" instead.
I have corrected this issue
@pferrel
Copy link
Owner

pferrel commented Apr 17, 2017

Do you have a test to prove correct behavior? See examples/integration-test for how to do this. It should be easy, to create and add more tests. Also I believe the test is already included.

The "=" is because 0 is reserved for a new meaning in the UR v0.6.0, namely as an exclusion filter.

Bias in v 0.6.0:

  • 0: exclude items with properties specified (must_not in ES parlance)
  • <0: include only items with properties specified (must in ES parlance)
  • >0: use the bias as a multiplier on all scores

The next release will be timed to coincide with PIO 0.11.0 (being voted on) and Mahout v0.113.0 (being assembled for release). Watch for a new branch labeled v0.6.0-SNAPSHOT.

@zebraf1
Copy link
Author

zebraf1 commented Apr 17, 2017

Good point, I couldn't find any tests where algorithm params include the fields attribute. I will try to create a new test to cover it.

@pferrel
Copy link
Owner

pferrel commented Apr 17, 2017

it is very difficult to test params in engine.json in an automated way. So this is a case of fields in engine.json?

That helps to know since they have had minimal use or testing, except the time stamp filters. Putting a field in engine.json means it will be applied to every query, which is not very useful. Why include data for the items if they are never needed as recs? I was thinking of deprecating that but if you need it, let's make sure it works as advertized.

I'd be happy for a very simple test run on it's own not integrated with examples/integration-test. You may see another one regarding rankings there too, with it's own data.

@zebraf1
Copy link
Author

zebraf1 commented Apr 17, 2017

I don't use the fields parameter currently. I'm using the template and was reviewing your code for a better understanding of the algorithm and its opportunities. This just caught my eye.

One use case I can imagine for the fields parameter is setting default boosts and filters for items. Since query parameters override this, it would make sense to describe common filters here too.

However this could cause issues on the application side as the results are silently boosted and filtered which may not always be a good practice.

Fox example in my case (movie recommendations) I would populate a field isAdult = yes/no (or perhaps K-rating) to each item. By default I want to exclude all adult content from recommendations, unless I explicitly say so in the query (adult recommendations feed). Currenly I have to add isAdult = [no] to each query (which is not a problem).

Another case is when I want to set default boosts to item attributes (ie title boost 1.5, description 1, cast 1.2, etc). What do you think where these should be described? (application for each query, algo params fields, somewhere else).

Ofcourse this would be a separate test with special data.
Test cases:

  • Test with a simple query - must not return adult content
  • Test with query isAdult = [yes] - must return only adult content
  • Test with query isAdult = [yes,no] - must return all content (perhaps a better way to include all?)

I think there might be a broken case if you add a fields with negative bias (filter) and in the query add it as a boost. In this case the query boost would not override fields filter as they're added in separate sections of the ES query (should, must_not).

I see 2 options:
a) I'm not sure if people use this feature if it's broken. Remove it without deprecation notice.
b) Add tests, solve the filter/boost issue, merge in working solution if there is a useful use case

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants