Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closes #163 #193

Merged
merged 108 commits into from
Nov 8, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
108 commits
Select commit Hold shift + click to select a range
773b3b1
initial commit with some ideas on how to do multilabel.
deaktator Aug 30, 2017
66817f7
Additional hacking on multilabel model.
deaktator Aug 30, 2017
c2516f5
Added multilabel type aliases. Updated MultilabelModel but it's broken.
deaktator Aug 31, 2017
73d378b
Added additional commentst.
deaktator Aug 31, 2017
48a6157
Made SparseLabelDepFeatures type alias a little nicer but abused nota…
deaktator Aug 31, 2017
0ebc108
New plan. No label-dependent features for now.
deaktator Sep 1, 2017
8ba3e6e
moved MultilabelModel to multilabel package.
deaktator Sep 1, 2017
5931ef0
updated comments
deaktator Sep 1, 2017
5e1f1dc
Removed the requirements that SparseMultiLabelPredictor is Closeable.
deaktator Sep 1, 2017
0aea10c
Added comment to predictor.
deaktator Sep 1, 2017
85229f9
Updated skeleton of MultilabelModel and small change to RegressionFea…
deaktator Sep 1, 2017
7812019
Added B <: U in helper methods.
deaktator Sep 1, 2017
f78e4bd
Added a few comments to model and added test skeleton.
deaktator Sep 1, 2017
f21fa4f
removed parameters to Auditor. Just use defaults because the values …
deaktator Sep 1, 2017
a9a0f8b
Added import to companion object Label type and Auditor.
deaktator Sep 1, 2017
2d6de6f
Added additional test.
deaktator Sep 1, 2017
6defbe0
SparseMultiLabelPredictor was made package private for testing.
deaktator Sep 1, 2017
7661df2
updated privacy of type aliases in multilabel package object
deaktator Sep 1, 2017
9fd15ad
Serialization test
amirziai Sep 1, 2017
ce66b7c
First test passing
amirziai Sep 1, 2017
75dee22
Hopefully code complete for the case class.
deaktator Sep 1, 2017
599e336
Merge branch '163-multilabel' into 163-multilabel-test
amirziai Sep 1, 2017
2e0f4eb
Added comment.
deaktator Sep 1, 2017
3dc2921
Merge branch '163-multilabel' into 163-multilabel-test
amirziai Sep 1, 2017
3b2652b
lessened privileges.
deaktator Sep 5, 2017
dd7dc71
Merge remote-tracking branch 'upstream/163-multilabel' into 163-multi…
amirziai Sep 5, 2017
999348b
Adding more multi-label tests
amirziai Sep 5, 2017
a38b5ea
Success report test case
amirziai Sep 6, 2017
a5c5259
Addressing JMorra's PR comments.
deaktator Sep 6, 2017
2657672
More tests
amirziai Sep 6, 2017
163609e
Merge remote-tracking branch 'upstream/163-multilabel' into 163-multi…
amirziai Sep 6, 2017
71e21aa
java Serializable needs to be here
amirziai Sep 6, 2017
ed08228
Adding MultilabelModel parsing stuff, plugins, VW version, etc.
deaktator Sep 7, 2017
e450ce1
Empty label problems test
amirziai Sep 7, 2017
77d57e0
More explicit val name
amirziai Sep 7, 2017
6dc5636
MultilabelModel parsing is compiling.
deaktator Sep 8, 2017
c6152ac
Added changed from Iterable[(String, Double)] to Sparse.
deaktator Sep 8, 2017
ff1725a
new line at EOF.
deaktator Sep 8, 2017
79422ab
VW compiling but still a few holes to fill in. Added Namespaces trait.
deaktator Sep 9, 2017
7546262
VW compiling. Added test shell. Fill in shell.
deaktator Sep 11, 2017
6d5671d
Number of new changes
amirziai Sep 12, 2017
6f45145
Test passing. It appears we don't need the dummy classes in test mode.
deaktator Sep 12, 2017
8d1caaa
Merge remote-tracking branch 'upstream/163-multilabel' into 163-multi…
amirziai Sep 12, 2017
f71aba5
Updated VwSparseMultilabelPredictor. It now seems to be fully workin…
deaktator Sep 12, 2017
e8159fc
Added some test comments.
deaktator Sep 12, 2017
6f98ce2
Added comments to VwSparseMultilabelPredictor.
deaktator Sep 12, 2017
47c0658
One more passing test, moving to performance testing now
amirziai Sep 12, 2017
c54d915
updated split
deaktator Sep 13, 2017
1356881
Merged from master.
deaktator Sep 14, 2017
4c00c6f
Merging updates
amirziai Sep 18, 2017
a0a2c0d
Merge remote-tracking branch 'upstream/163-multilabel' into 163-multi…
amirziai Sep 18, 2017
86890c1
Figured out the gist of a few more tests, terrible code though
amirziai Sep 18, 2017
daffd68
First pass over all tests
amirziai Sep 19, 2017
8280ef7
Simplified some of the tests
amirziai Sep 21, 2017
80fc41c
Refactoring
amirziai Sep 22, 2017
a18a827
Refactoring common patterns into the companion object
amirziai Sep 22, 2017
406b1f9
committing VwMultilabelRowCreator and updating other stuff to use it.
deaktator Sep 22, 2017
dc363e6
All tests pass, code structured a bit better
amirziai Sep 23, 2017
b0c1b77
Merge remote-tracking branch 'upstream/163-multilabel' into 163-multi…
amirziai Sep 23, 2017
0d35d4c
Wasn't compiling after merge
amirziai Sep 23, 2017
201a823
labels not in training set should be reported
amirziai Sep 23, 2017
a42c9d7
Renamed missingLabels->labelsNotInTrainingSet to conform to new signa…
amirziai Sep 23, 2017
c8902ec
Added some unit tests. Still plenty more to do.
deaktator Sep 23, 2017
a1ce0dd
Adding PR template
amirziai Sep 24, 2017
d1a40f3
Addressing comments
amirziai Sep 25, 2017
ddf40fd
Merge pull request #182 from amirziai/163-multilabel-test
deaktator Sep 25, 2017
e3f9c59
Getting everything to compile. Still some work to be done.,
deaktator Sep 26, 2017
9dd35ee
VW multi-label model parsing working correctly. Tests prove it!
deaktator Sep 27, 2017
82746e7
exposed VW parameters to VwSparseMultilabelPredictor
deaktator Sep 27, 2017
5ab8e7c
removed TODO.
deaktator Sep 27, 2017
2023e74
updated tests to add coverage.
deaktator Sep 27, 2017
86c5c50
Added additional tests.
deaktator Sep 27, 2017
b3fdff4
End to end testing working. Need to clean it up.
deaktator Sep 29, 2017
7087d04
a little cleanup.
deaktator Sep 29, 2017
2a0b5c0
Removed implicit fn com.eharmony.aloha.factory.ScalaJsonFormats.lift(…
deaktator Sep 29, 2017
e11aef1
simplifying tests.
deaktator Sep 29, 2017
c2a048e
vw param function skeleton.
deaktator Oct 3, 2017
0ce7d68
merge from development branch. Build passing.
deaktator Oct 4, 2017
3cb9d2e
non working VwMultilabelModel.updatedVwParams. Skeleton laid out.
deaktator Oct 5, 2017
fb7a8f2
quadratics and cubics seem to be working.
deaktator Oct 5, 2017
f86b610
removed println
deaktator Oct 5, 2017
4c25972
made ignore_linear more concise
deaktator Oct 5, 2017
79f988b
lots of stuff working. More tests to write for VwMultilabelParamAugm…
deaktator Oct 7, 2017
9f61597
tested higher order interactions.
deaktator Oct 7, 2017
ca84c36
removed extra whitespace in string output.
deaktator Oct 7, 2017
db8967f
working but will change regex padding to use zero-width positive look…
deaktator Oct 9, 2017
6f89630
added different padding.
deaktator Oct 9, 2017
95e0c92
Updated documentation and tests. Looks good.
deaktator Oct 9, 2017
a7b168a
Updated VW label NS algo. Added test for when a NS can't be found.
deaktator Oct 9, 2017
9c4362b
hacky solution to flags with options referencing files. Use tmp file…
deaktator Oct 11, 2017
6df3f83
Looks good.
deaktator Oct 30, 2017
cd50006
Precompute positive and negative dummy class strings.
deaktator Nov 1, 2017
ffd10ee
Adding numUniqueLabels parameter to updatedVwParams to add VW's --rin…
deaktator Nov 1, 2017
9592b09
stateful row creator and reservoir sampling.
deaktator Nov 3, 2017
8e478f3
provided concrete implementations of iterator and vector apply method.
deaktator Nov 4, 2017
b7594fa
more purity in test.
deaktator Nov 4, 2017
819095a
separated pure and impure code.
deaktator Nov 4, 2017
b190821
no more. good enough.
deaktator Nov 4, 2017
5e1ebd0
It's never good enough, even on a football Saturday.
deaktator Nov 4, 2017
0889971
Seq -> List
deaktator Nov 5, 2017
d60f226
VwDownsampledMultilabelRowCreator and supporting infrastructure and t…
deaktator Nov 7, 2017
fe8dca2
Made Rand use Int indices, made k < 2^15 in neg label sampling. Upda…
deaktator Nov 7, 2017
baf7466
Addressing PR comments. Removed VW params from VW multi-label model …
deaktator Nov 8, 2017
22d9547
Downsampling can now operate over 2^31 - 1 (2 billion) labels.
deaktator Nov 8, 2017
c463c11
changed Iterator.isEmpty to hasNext. Updated docs.
deaktator Nov 8, 2017
79fddb6
forgot logical not in if statement.
deaktator Nov 8, 2017
1c0102c
removed toShort from Rand.
deaktator Nov 8, 2017
6e21f8c
addressing PR comments. Changed name of multilabel model to 'SparseM…
deaktator Nov 8, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ class ModelTypesTest {
"ModelDecisionTree",
"Regression",
"Segmentation",
"SparseMultilabel",
"VwJNI"
)

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
package com.eharmony.aloha.dataset

import com.eharmony.aloha.util.StatefulMapOps

import scala.collection.{SeqLike, immutable => sci}
import scala.collection.generic.{CanBuildFrom => CBF}

/**
* A row creator that requires state. This state should be modeled functionally, meaning
* implementations should be referentially transparent.
*
* Created by ryan.deak on 11/2/17.
*/
trait StatefulRowCreator[-A, +B, S] extends Serializable {

/**
* Some initial state that can be used on the very first call to `apply(A, S)`.
* @return some state.
*/
val initialState: S

/**
* Given an `a` and some `state`, produce output, including a new state.
*
* When using this function, the user is responsible for keeping track of,
* and providing the state.
*
* The implementation of this function should be referentially transparent.
*
* @param a input
* @param state the state
* @return a tuple where the first element is a Tuple2 whose first element is
* missing and error information and second element is an optional result.
* The second element of the outer Tuple2 is the new state.
*/
def apply(a: A, state: S): ((MissingAndErroneousFeatureInfo, Option[B]), S)

/**
* Apply the `apply(A, S)` method to the elements of the iterator. In the first
* application of `apply(A, S)`, `state` will be used as the state. In subsequent
* applications, the state will come from the state generated in the output of the
* previous application of `apply(A, S)`.
*
* For more information, see [[com.eharmony.aloha.util.StatefulMapOps]]
*
* @param as Note the first element of `as` ''will be forced'' in this method in order
* to construct the output.
* @param state the initial state to use at the start of the iterator.
* @return an iterator containing the `a` mapped to a
* `(MissingAndErroneousFeatureInfo, Option[B])` along with the resulting
* state that is created in the process.
*/
def statefulMap(as: Iterator[A], state: S): Iterator[((MissingAndErroneousFeatureInfo, Option[B]), S)] =
StatefulMapOps.statefulMap(as, state)(apply)

/**
* Apply the `apply(A, S)` method to the elements of the sequence. In the first
* application of `apply(A, S)`, `state` will be used as the state. In subsequent
* applications, the state will come from the state generated in the output of the
* previous application of `apply(A, S)`.
*
* '''NOTE''': This method isn't really parallelizable via chunking. The way to
* parallelize this method is to provide a separate starting state for each unit
* of parallelism.
*
* For more information, see [[com.eharmony.aloha.util.StatefulMapOps]]
*
* @param as input to map.
* @param state the initial state to use at the start of mapping.
* @param cbf object responsible for building the output collection.
* @return
*/
def statefulMap[In <: sci.Seq[A], Out](as: SeqLike[A, In], state: S)(implicit
cbf: CBF[In, ((MissingAndErroneousFeatureInfo, Option[B]), S), Out]
): Out = StatefulMapOps.statefulMap(as, state)(apply)
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
package com.eharmony.aloha.dataset

import com.eharmony.aloha.semantics.compiled.CompiledSemantics
import spray.json.JsValue

import scala.util.Try

/**
* Created by deaktator on 11/6/17.
*
* @tparam A
* @tparam B
* @tparam S
* @tparam Impl
*/
trait StatefulRowCreatorProducer[A, +B, S, +Impl <: StatefulRowCreator[A, B, S]] {

/**
* Type of parsed JSON object.
*/
type JsonType

/**
* Name of this producer.
* @return
*/
def name: String

/**
* Attempt to parse the JSON AST to an intermediate representation that is used
* @param json
* @return
*/
def parse(json: JsValue): Try[JsonType]

/**
* Attempt to produce a Spec.
* @param semantics semantics used to make sense of the features in the JsonSpec
* @param jsonSpec a JSON specification to transform into a StatefulRowCreator.
* @return
*/
def getRowCreator(semantics: CompiledSemantics[A], jsonSpec: JsonType): Try[Impl]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
package com.eharmony.aloha.dataset.vw.multilabel

import com.eharmony.aloha.AlohaException
import com.eharmony.aloha.dataset.DvProducer
import com.eharmony.aloha.dataset.vw.multilabel.VwMultilabelRowCreator.{determineLabelNamespaces, LabelNamespaces}
import com.eharmony.aloha.reflect.RefInfo
import com.eharmony.aloha.semantics.compiled.CompiledSemantics
import com.eharmony.aloha.semantics.func.GenAggFunc

import scala.collection.breakOut
import scala.util.{Failure, Success, Try}
import scala.collection.{immutable => sci}

/**
* Created by ryan.deak on 11/6/17.
* @param ev$1
* @tparam A
* @tparam K
*/
private[multilabel] abstract class PositiveLabelsFunction[A, K: RefInfo] { self: DvProducer =>

private[multilabel] def positiveLabelsFn(
semantics: CompiledSemantics[A],
positiveLabels: String
): Try[GenAggFunc[A, sci.IndexedSeq[K]]] =
getDv[A, sci.IndexedSeq[K]](
semantics, "positiveLabels", Option(positiveLabels), Option(Vector.empty[K]))

private[multilabel] def labelNamespaces(nss: List[(String, List[Int])]): Try[LabelNamespaces] = {
val nsNames: Set[String] = nss.map(_._1)(breakOut)
determineLabelNamespaces(nsNames) match {
case Some(ns) => Success(ns)

// If there are so many VW namespaces that all available Unicode characters are taken,
// then a memory error will probably already have occurred.
case None => Failure(new AlohaException(
"Could not find any Unicode characters to as VW namespaces. Namespaces provided: " +
nsNames.mkString(", ")
))
}
}
}
Loading