A Refined Similarity-Based Bigram Model
In my previous post, I discussed the similarity-based bigram model (Dagan et al., 1998) and showed its performance against other classic ngram smoothing techniques. Although the original similarity-based bigram model had less perplexity than the Katz backoff model, it didn’t fare as well as the interpolated Kneser-Ney model on both Penn-tree bank (PTB) and Wikitext103 datasets. In this blog post, I’ll introduce a new similarity-based bigram model with better performance.