Skip to content

Commit

Permalink
merge with master
Browse files Browse the repository at this point in the history
  • Loading branch information
emjotde committed Nov 9, 2023
2 parents 53b0b0d + 757f7c1 commit 60e3be2
Show file tree
Hide file tree
Showing 15 changed files with 183 additions and 69 deletions.
8 changes: 6 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,17 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.

### Added
- Added `--no-spm-encode` option, allowing the model to use vocabulary IDs directly to train/decode.
- Added MSE and MAE costs to COMET-QE training.
- Added augmentation of shuffled examples to COMET-QE training via `--comet-augment-bad`.
- Minor changes and fixes related to metric training.
- Added `--quiet-validation` option that disables printing Hyp/Ref samples during validation
- Added --custom-fallbacks option that allows to specify a list of option sets that get traversed for subsequent fallbacks upon divergence
- Added --overwrite-checkpoint option that (when set to false) can be used to dump checkpoints with iteration numbers.
- Added --overwrite-checkpoint option that (when set to false) can be used to dump checkpoints with iteration numbers.
- Implementations of COMET-20 (reference-based) and BLEURT-20 for inference with conversion scripts.
- `./marian evaluate` sub command for evaluation with COMET-QE-20, COMET-20 and BLEURT-20
- A bunch of scripts for metrics use and early MBR experiments
- LSH vocab filtering for GPU. Speed is not competitive with non-LSH. Checking in for completeness and possible future use of LSH on GPU for non-filtering stuff
- Added --throw-on-divergence and --fp16-fallback-to-fp32 options to detect (fp16 and fp32) and recover (only fp16)
- Added --throw-on-divergence and --fp16-fallback-to-fp32 options to detect (fp16 and fp32) and recover (only fp16)
diverged runs. If not recoverable, exception gets rethrown and goes unhandled to force fatal error and shutdown.
- Re-implementation of COMET-QE for inference and training; conversion scripts from Unbabel-Comet to Marian.
- Validator that generates embeddings and can be used during COMET training with an external script.
Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
v1.12.12
v1.12.14
15 changes: 9 additions & 6 deletions azure-pipelines.yml
Original file line number Diff line number Diff line change
Expand Up @@ -293,8 +293,9 @@ stages:

# https://software.intel.com/content/www/us/en/develop/articles/installing-intel-free-libs-and-python-apt-repo.html
- bash: |
wget -qO- "https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS-2019.PUB" | sudo apt-key add -
sudo sh -c "echo deb https://apt.repos.intel.com/mkl all main > /etc/apt/sources.list.d/intel-mkl.list"
sudo mkdir -p /usr/share/keyrings
wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB | gpg --dearmor | sudo tee /usr/share/keyrings/intel.gpg > /dev/null
echo "deb [signed-by=/usr/share/keyrings/intel.gpg] https://apt.repos.intel.com/mkl all main" | sudo tee /etc/apt/sources.list.d/intel-mkl.list
sudo apt-get update -o Dir::Etc::sourcelist="/etc/apt/sources.list.d/intel-mkl.list"
sudo apt-get install -y --no-install-recommends intel-mkl-64bit-2020.0-088
displayName: Install MKL
Expand Down Expand Up @@ -414,8 +415,9 @@ stages:

# https://software.intel.com/content/www/us/en/develop/articles/installing-intel-free-libs-and-python-apt-repo.html
- bash: |
wget -qO- "https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS-2019.PUB" | sudo apt-key add -
sudo sh -c "echo deb https://apt.repos.intel.com/mkl all main > /etc/apt/sources.list.d/intel-mkl.list"
sudo mkdir -p /usr/share/keyrings
wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB | gpg --dearmor | sudo tee /usr/share/keyrings/intel.gpg > /dev/null
echo "deb [signed-by=/usr/share/keyrings/intel.gpg] https://apt.repos.intel.com/mkl all main" | sudo tee /etc/apt/sources.list.d/intel-mkl.list
sudo apt-get update -o Dir::Etc::sourcelist="/etc/apt/sources.list.d/intel-mkl.list"
sudo apt-get install -y --no-install-recommends intel-mkl-64bit-2020.0-088
displayName: Install MKL
Expand Down Expand Up @@ -606,8 +608,9 @@ stages:
# https://software.intel.com/content/www/us/en/develop/articles/installing-intel-free-libs-and-python-apt-repo.html
- bash: |
wget -qO- "https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS-2019.PUB" | sudo apt-key add -
sudo sh -c "echo deb https://apt.repos.intel.com/mkl all main > /etc/apt/sources.list.d/intel-mkl.list"
sudo mkdir -p /usr/share/keyrings
wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB | gpg --dearmor | sudo tee /usr/share/keyrings/intel.gpg > /dev/null
echo "deb [signed-by=/usr/share/keyrings/intel.gpg] https://apt.repos.intel.com/mkl all main" | sudo tee /etc/apt/sources.list.d/intel-mkl.list
sudo apt-get update -o Dir::Etc::sourcelist="/etc/apt/sources.list.d/intel-mkl.list"
sudo apt-get install -y --no-install-recommends intel-mkl-64bit-2020.0-088
displayName: Install MKL
Expand Down
8 changes: 6 additions & 2 deletions scripts/comet/comet2marian.py
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,8 @@ def load_comet_model(model_path):
config["type"] = "comet"
elif model_type == "ReferencelessRegression":
config["type"] = "comet-qe"
elif model_type == "XLMRobertaModel":
config["type"] = "comet-qe"
else:
raise Exception(f'Unknown type of model {model_type}')

Expand All @@ -104,8 +106,10 @@ def load_comet_model(model_path):
config["bert-train-type-embeddings"] = False
config["bert-type-vocab-size"] = 0
config["comet-prepend-zero"] = True
config["comet-final-sigmoid"] = args.add_sigmoid
config["comet-pooler-ffn"] = [2048, 1024]
if not args.roberta:
config["comet-final-sigmoid"] = args.add_sigmoid
config["comet-pooler-ffn"] = [2048, 1024]

# @TODO: figure out if it's worth adding `cometModel.name_or_path` to the end of this version string.
config["version"] = "comet2marian2.py conversion"
config["enc-depth"] = 0
Expand Down
8 changes: 8 additions & 0 deletions src/common/config_parser.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -334,11 +334,14 @@ void ConfigParser::addOptionsModel(cli::CLIWrapper& cli) {

// Options specific for the "comet-qe" model type
cli.add<bool>("--comet-final-sigmoid", "Add final sigmoid to COMET model");
cli.add<bool>("--comet-stop-grad", "Do not propagate gradients through COMET model");

cli.add<bool>("--comet-mix", "Mix encoder layers to produce embedding");
cli.add<bool>("--comet-mix-norm", "Normalize layers prior to mixing");
cli.add<float>("--comet-dropout", "Dropout for pooler layers", 0.1f);
cli.add<float>("--comet-mixup", "Alpha parameter for Beta distribution for mixup", 0.0f);
cli.add<bool>("--comet-mixup-reg", "Use original and mixed-up samples in training");
cli.add<float>("--comet-augment-bad", "Fraction of bad examples added via shuffling for class/label 0.f", 0.0f);
cli.add<std::vector<int>>("--comet-pooler-ffn", "Hidden sizes for comet pooler", {2048, 1024});
cli.add<bool>("--comet-prepend-zero", "Add a start symbol to batch entries");

Expand Down Expand Up @@ -705,6 +708,11 @@ void ConfigParser::addOptionsValidation(cli::CLIWrapper& cli) {
"Keep best model for each validation metric");
cli.add<std::string>("--valid-log",
"Log validation scores to file given by arg");

// general options for validation
cli.add<bool>("--quiet-validation",
"Suppress logging hyp./ref. samples during validation");

cli.switchGroup(previous_group);
// clang-format on
}
Expand Down
2 changes: 1 addition & 1 deletion src/data/corpus.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,7 @@ SentenceTuple Corpus::next() {
// check if all streams are valid, that is, non-empty and no longer than maximum allowed length
if(std::all_of(tup.begin(), tup.end(), [=](const Words& words) {
return words.size() > 0 && words.size() <= maxLength_;
})) {
})) {
return tup;
} else {
return SentenceTupleImpl(); // return an empty tuple if above test does not pass
Expand Down
26 changes: 19 additions & 7 deletions src/data/corpus_base.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -434,22 +434,34 @@ void CorpusBase::addWordsToSentenceTuple(const std::string& line,
auto prependedWord = Word::fromWordIndex(0);
words.insert(words.begin(), prependedWord);
}

if(maxLengthCrop_ && words.size() > maxLength_) {
words.resize(maxLength_);

// if fields are joined and the current sentence is not the first one, we need to make sure that
// the current sentence is not longer than the maximum length minus the length of the previous sentence
// (minus 1 for the separator <eos> token)
size_t localMaxLength = maxLength_;
if(joinFields_ && !tup.empty())
localMaxLength = std::max(1, (int)maxLength_ - (int)tup.back().size());

// if the current sentence is longer than the maximum length, we need to crop it
if(maxLengthCrop_ && words.size() > localMaxLength) {
words.resize(localMaxLength);
if(addEOS_[batchIndex])
words.back() = vocabs_[batchIndex]->getEosId();
}

// if true, the words are reversed
if(rightLeft_)
std::reverse(words.begin(), words.end() - 1);

// if true, the numeric indices get joined with the previous sentence, <eos> acts as a separator here
// @TODO: make this cleaner.
if(joinFields_)
tup.appendToBack(words);
else
if(joinFields_) {
size_t currLength = tup.empty() ? 0 : tup.back().size();
// if the current sentence would exceed the maximum length we don't add any more fields
if(currLength + words.size() < maxLength_)
tup.appendToBack(words);
} else {
tup.pushBack(words);
}
}

void CorpusBase::addAlignmentToSentenceTuple(const std::string& line,
Expand Down
3 changes: 3 additions & 0 deletions src/data/corpus_base.h
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,9 @@ class SentenceTupleImpl {
auto begin() const -> decltype(tuple_.begin()) { return tuple_.begin(); }
auto end() const -> decltype(tuple_.end()) { return tuple_.end(); }

auto begin() -> decltype(tuple_.begin()) { return tuple_.begin(); }
auto end() -> decltype(tuple_.end()) { return tuple_.end(); }

auto rbegin() const -> decltype(tuple_.rbegin()) { return tuple_.rbegin(); }
auto rend() const -> decltype(tuple_.rend()) { return tuple_.rend(); }

Expand Down
22 changes: 2 additions & 20 deletions src/graph/expression_operators.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -362,27 +362,9 @@ Expr flatten_2d(Expr a) {
return Expression<ReshapeNodeOp>(a, shape);
}

// Stop gradients from flowing back through this node
Expr stopGradient(Expr a) {
#if 0
// This is a different implementation which is more reliable than the original,
// but it introduces a full copy which hogs memory. Keeping it around for now
// to decide later which one to use.

auto fwd = [](Expr output, const std::vector<Expr> inputs) {
CopyCast(output->val(), inputs[0]->val());
};

auto bwd = [](Expr output, const std::vector<Expr> inputs) {
/*Dummy*/
};

return lambda({a}, a->shape(), a->value_type(), fwd, bwd, (size_t)&fwd);
#else
// implemented as a dummy reshape that is not trainable
auto res = Expression<ReshapeNodeOp>(a, a->shape());
res->setTrainable(false);
return res;
#endif
return Expression<StopGradientNodeOp>(a);
}

Expr choose(std::vector<Expr> nodes, size_t index) {
Expand Down
23 changes: 23 additions & 0 deletions src/graph/node_operators_unary.h
Original file line number Diff line number Diff line change
Expand Up @@ -1042,6 +1042,29 @@ class ClipGradientNodeOp : public UnaryNodeOp {
}
};

// This class is used to stop gradients from flowing through a node.
// This is slighyly costly, as memory is allocated and a copy is made on the forward step.
class StopGradientNodeOp : public UnaryNodeOp {
public:
StopGradientNodeOp(Expr a)
: UnaryNodeOp(a) {}

~StopGradientNodeOp() {}

// On forward the values are just copied from the input node.
NodeOps forwardOps() override {
return { NodeOp(CopyCast(val_, child(0)->val())) };
}

// On backward nothing happens, i.e. the gradient does not flow through.
NodeOps backwardOps() override {
return { NodeOp( /*dummy*/; ) };
}

const std::string type() override { return "stopGradient"; }
const std::string color() override { return "grey"; }
};

// narrow an axis to [begin, end)
// The resulting object must be consecutive in memory.
class SliceViewNodeOp : public UnaryNodeOp {
Expand Down
Loading

0 comments on commit 60e3be2

Please sign in to comment.