Skip to content

Commit

Permalink
Modify how locale is set for testing non-printable characters, to sup…
Browse files Browse the repository at this point in the history
…port more systems (kaldi-asr#4612)
  • Loading branch information
danpovey authored Aug 23, 2021
1 parent 8e92112 commit d53b62f
Showing 1 changed file with 7 additions and 2 deletions.
9 changes: 7 additions & 2 deletions egs/wsj/s5/utils/validate_data_dir.sh
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ function show_help
echo "By default, utt2spk is expected to be sorted by both, which can be "
echo "achieved by making the speaker-id prefixes of the utterance-ids"
echo "e.g.: $0 data/train"
}
}

while [ $# -ne 0 ] ; do
case "$1" in
Expand Down Expand Up @@ -126,7 +126,12 @@ fi
num_utts=`cat $tmpdir/utts | wc -l`
if ! $no_text; then
if ! $non_print; then
n_non_print=$(LC_ALL="C.UTF-8" grep -c '[^[:print:][:space:]]' $data/text) && \
if locale -a | grep "C.UTF-8" >/dev/null; then
L=C.UTF-8
else
L=en_US.UTF-8
fi
n_non_print=$(LC_ALL="$L" grep -c '[^[:print:][:space:]]' $data/text) && \
echo "$0: text contains $n_non_print lines with non-printable characters" &&\
exit 1;
fi
Expand Down

0 comments on commit d53b62f

Please sign in to comment.