-
Notifications
You must be signed in to change notification settings - Fork 0
Train a DSSM (or a convolutional DSSM) model
DSSM (or Deep Semantic Similarity Model) is a DNN model trained on pairs of source-target texts, for learning a short-text embedding space where relevant source and target text pairs are closer. The text input to the model is represented by their pre-computed trigram hash (see, Huang et al.). For C-DSSM, the trigram hash is computed per word and then concatenated in the order in which the words occur in the text. The input to both models are of fixed size. If we consider 50K trigrams, then the DSSM input corresponding to the source and the target text would be a vector of length 50K each. For C-DSSM, the vector would be of length 50K x n, where the first n-1 word vectors are concatenated, and the nth vector contains a sum of the vectors corresponding to all the remaining words in the text. If there are less than n words in the text, then the rest of the vector is padded with zeros. To draw an analogy with image, you can think of the text input for C-DSSM as an image with dimensions 10x1 and 50K channels stored in a [C x H x W]
format.
This example demonstrates how to train a DSSM / C-DSSM model using CNTKTextFormatReader. The data should contain 2 features (source and target text) and 1 label (which is always set to the value 1 in the training data as it contains only positive samples – during training the negative target examples are generated by random sampling). Here’s the reader configuration,
reader = {
verbosity = 0
randomize = true
deserializers = ({
type = "CNTKTextFormatDeserializer"
module = "CNTKTextFormatReader"
file = "data.txt"
input = {
Q = { dim = 500000; format = "sparse" }
D = { dim = 500000; format = "sparse" }
L = { dim = 1; format = "dense" }
}
})
}
A sample of the input data,
|L 1 |Q 482:1 761:1 1832:1 2117:1 12370:1 17131:1 17854:1 24976:1 27676:1 28055:1 28177:1 29507:1|D 482:1 761:1 1832:1 2117:1 12370:1 17131:1 17854:1 24976:1 27676:1 28055:1 28177:1 29507:1
|L 1 |Q 149:1 153:1 595:1 671:1 675:1 1110:1 1517:1 2077:1 2114:1 5533:1 5662:1 6886:1 6901:1 7294:1 12846:1 13033:1 16614:1 19425:1 22015:1 24839:1 24994:1 26196:1 26358:1 27565:1|D 149:1 153:1 595:1 671:1 675:1 1110:1 1517:1 2077:1 2114:1 5533:1 5662:1 6886:1 6901:1 7294:1 12846:1 13033:1 16614:1 19425:1 22015:1 24839:1 24994:1 26196:1 26358:1 27565:1
|L 1 |Q 187:1 2294:1 2800:1 6920:1|D 187:1 2294:1 2800:1 6920:1
And finally the network definition,
BrainScriptNetworkBuilder = {
# Constants scalars
isConvolutional = true
numWords = (if isConvolutional then 10 else 1)
numTrigramsPerWord = 50000
numHiddenNodes = 300
wordWindowSize = 3
numWindows = numWords - wordWindowSize + 1
numNeg = 50
# Constant tensors
CONST_GAMMA = Constant(10)
CONST_SHIFT = Constant(1)
CONST_NEG = Constant(numNeg)
CONST_PAD_NEG = Constant(0, rows=numNeg, cols=1)
CONST_PAD_POS = Constant(1, rows=1, cols=1)
CONST_PAD = Splice(CONST_PAD_POS : CONST_PAD_NEG, axis=1)
# Inputs
Q = Input(500000)
D = Input(500000)
L = Input(1)
qr = if isConvolutional
then TransposeDimensions(ReshapeDimension(Q, 1, numTrigramsPerWord:1:numWords), 1, 3)
else Slice(0, numTrigramsPerWord, Q, axis=1)
dr = if isConvolutional
then TransposeDimensions(ReshapeDimension(D, 1, numTrigramsPerWord:1:numWords), 1, 3)
else Slice(0, numTrigramsPerWord, D, axis=1)
qdssm = Sequential (
DenseLayer {numHiddenNodes, activation=Tanh} :
DenseLayer {numHiddenNodes, activation=Tanh} :
DenseLayer {numHiddenNodes, activation=Tanh})
qcdssm = Sequential (
ConvolutionalLayer {numHiddenNodes, (wordWindowSize:1), pad=false, activation=Tanh} :
MaxPoolingLayer {(numWindows:1), stride=(1:1)} :
DenseLayer {numHiddenNodes, activation=Tanh} :
DenseLayer {numHiddenNodes, activation=Tanh})
ddssm = Sequential (
DenseLayer {numHiddenNodes, activation=Tanh} :
DenseLayer {numHiddenNodes, activation=Tanh} :
DenseLayer {numHiddenNodes, activation=Tanh})
dcdssm = Sequential (
ConvolutionalLayer {numHiddenNodes, (wordWindowSize:1), pad=false, activation=Tanh} :
MaxPoolingLayer {(numWindows:1), stride=(1:1)} :
DenseLayer {numHiddenNodes, activation=Tanh} :
DenseLayer {numHiddenNodes, activation=Tanh})
qembed = if isConvolutional
then qcdssm
else qdssm
dembed = if isConvolutional
then dcdssm
else ddssm
qf = qembed(qr)
df = dembed(dr)
lf = Times(CONST_PAD, L)
c = CosDistanceWithNegativeSamples(qf, df, CONST_SHIFT, CONST_NEG)
s = Slice(0, 1, c, axis=1, tag="output")
ce = CrossEntropyWithSoftmax(lf, Scale(CONST_GAMMA, c), tag="criterion")
}
Note:
- While C-DSSM has been shown to consistently perform better than DSSM, it also trains slower (sometime up to 5-10x slower). So in some cases you may get better performance from DSSM in the same training time by training over more data (or for more epochs).
- The original DSSM / C-DSSM were trained on query and document title pairs. But you can learn other relationships between short texts by training on other kinds of data such as session query pairs or query prefix-suffix pairs.
Getting Started
Additional Documentation
How to use CNTK
Using CNTK Models in Your Code
- Overview
- Nuget Package for Evaluation
- C++ Evaluation Interface
- C# Evaluation Interface
- Evaluating Hidden Layers
- C# Image Transforms for Evaluation
- C# Multi-model Evaluation
- Evaluate in Azure
Advanced topics
Licenses
Source Code & Development