Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Neural Click Models #8

Open
wants to merge 15 commits into
base: main
Choose a base branch
from
Open

Neural Click Models #8

wants to merge 15 commits into from

Conversation

arabel1a
Copy link

No description provided.

pyproject.toml Show resolved Hide resolved
print("Warning: the historical data is empty")
hist_data = spark.createDataFrame([], schema=SIM_LOG_SCHEMA)
# filter users whom we don't need
hist_data = hist_data.join(new_recs, on="user_idx", how="inner").select(
Copy link
Collaborator

@monkey0head monkey0head Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is going on here? why do you join all new_recs columns? you need to take at least new_recs.select("user_idx").distinct() and do not select(hist_data["*"])) after

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls fix

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, this is really a huge bug. I think, it is an artifact of one of the intermediate versions, where I tried to work with tables whose user_idx is unique and each row represent the whole itertaion. I'll fix it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems the fix was wrong, see the new suggestion #8 (comment)

print("Warning: the simulator log is empty")
simlog = spark.createDataFrame([], schema=SIM_LOG_SCHEMA)
# filter users whom we don't need
simlog = simlog.join(new_recs, on="user_idx", how="inner").select(simlog["*"])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls fix

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

)
)

# not very optimal way, it makes one worker to
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to discuss. you batch id should not influence the partitioning. one partition != one batch and the users are grouped to batches within one partition. do not now how to implement it for now.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BatchID won't influence the partition, because each batch must consist of the whole interaction history of a specific group of users. I

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, i see, just remove the comment

self.backbone_response_model = None

def _fit(self, train_data):
"""
Copy link
Collaborator

@monkey0head monkey0head Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls describe the dataframe format here and for transform. what should be included to properly convert dataframe to the RecommendationData. pls add corresponding docstrings

Copy link
Author

@arabel1a arabel1a Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is exactly the same as the simulator logs format. Please give me advice, where I can obtain it's description.

@monkey0head
Copy link
Collaborator

Thank you for your contribution! Please, have a look at the comments and add a time measurements to the notebook to show the speed of the main stages of simulation pipeline.

"""
Predict responses for given dataframe with recommendations.

:param dataframe: new recommendations.
Copy link
Contributor

@Veronika-Ivanova Veronika-Ivanova Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the param name is not correct, should be new_recs

print("Warning: the historical data is empty")
hist_data = spark.createDataFrame([], schema=SIM_LOG_SCHEMA)
# filter users whom we don't need
hist_data = hist_data.join(new_recs, on="user_idx", how="semi")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you really want to leave the history of only distinct users from new_recs in hist_data.

Suggested change
hist_data = hist_data.join(new_recs, on="user_idx", how="semi")
hist_data = hist_data.join(sf.broadcast(new_recs.select("user_idx").distinct()), on="user_idx", how="inner")

print("Warning: the simulator log is empty")
simlog = spark.createDataFrame([], schema=SIM_LOG_SCHEMA)
# filter users whom we don't need
simlog = simlog.join(new_recs, on="user_idx", how="semi")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as for hist data...

Suggested change
simlog = simlog.join(new_recs, on="user_idx", how="semi")
simlog = simlog.join(sf.broadcast(new_recs.select("user_idx").distinct()), on="user_idx", how="inner")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants