Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clustering kwargs exposed #14

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

andrewmackie
Copy link
Contributor

I have exposed the kwargs for all of the sklearn-based clustering algorithms so that they can be called from cluster_SC(), cluster_AHC(), Diarizer.diarize() and the command line.

All kwargs available in the sklearn algorithms should be available. I noted that you have some default values for kwargs and have retained those.

I haven't done comprehensive testing. I won't be offended if you want to change the way it is implemented.

FYI, the reason I did this was that 'arpack' eigen solver in sklearn.cluster.SpectralClustering falls over when attempting to cluster a large number (>2k) of embeddings. Using the 'lobpcg' eigen solver appears to address this problem, but the eigen_solver kwarg could not be set from Diarizer.diarize() - now it can.

@andrewmackie
Copy link
Contributor Author

I've realised that when calling the kwargs from the command line, all of the kwarg values will be received as strings - some will need to be converted.

The most thorough method of doing this would probably be to:

  1. create a dictionary which contains the types of the known clustering kwargs and convert the values into those types, and
  2. guess the type of any unknown kwargs, e.g. foo=True -> {'foo': True}

Please let me know if you would like me to do this (I'm very happy for you to do it as well).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant