object HashNHotWeightedEncoder extends SettingsBuilder with Serializable
Transform a collection of weighted categorical features to columns of weight sums, with at most N values. Similar to NHotWeightedEncoder but uses MurmursHash3 to hash features into buckets to reduce CPU and memory overhead.
Weights of the same labels in a row are summed instead of 1.0 as is the case with the normal NHotEncoder.
If hashBucketSize is inferred with HLL, the estimate is scaled by sizeScalingFactor to reduce the number of collisions.
Rough table of relationship of scaling factor to % collisions, measured from a corpus of 466544 English words:
sizeScalingFactor % Collisions ----------------- ------------ 2 17.9934% 4 10.5686% 8 5.7236% 16 3.0019% 32 1.5313% 64 0.7864% 128 0.3920% 256 0.1998% 512 0.0975% 1024 0.0478% 2048 0.0236% 4096 0.0071%
- Alphabetic
- By Inheritance
- HashNHotWeightedEncoder
- Serializable
- Serializable
- SettingsBuilder
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
apply(name: String, hashBucketSize: Int = 0, sizeScalingFactor: Double = 8.0): Transformer[Seq[WeightedLabel], HLL, Int]
Create a new HashNHotWeightedEncoder instance.
Create a new HashNHotWeightedEncoder instance.
- hashBucketSize
number of buckets, or 0 to infer from data with HyperLogLog
- sizeScalingFactor
when hashBucketSize is 0, scale HLL estimate by this amount
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
def
fromSettings(setting: Settings): Transformer[Seq[WeightedLabel], HLL, Int]
Create a new HashOneHotEncoder from a settings object
Create a new HashOneHotEncoder from a settings object
- setting
Settings object
- Definition Classes
- HashNHotWeightedEncoder → SettingsBuilder
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()