A Refined Similarity-Based Bigram Model

In my previous post, I discussed the similarity-based bigram model (Dagan et al., 1998) and showed its performance against other classic ngram smoothing techniques. Although the original similarity-based bigram model had less perplexity than the Katz backoff model, it didn’t fare as well as the interpolated Kneser-Ney model on both Penn-tree bank (PTB) and Wikitext103 datasets. In this blog post, I’ll introduce a new similarity-based bigram model with better performance.

Similarity-based Generalization in N-gram Models

If you’re new to N-gram models, I recommend starting with my introduction to N-gram language models.

Introduction to N-gram language models

Language modeling has been a pivotal area in Natural Language Processing (NLP). It forms the foundation for several applications like speech recognition, machine translation, spell correction, and more. One of the initial and simplest techniques in language modeling is the N-gram model.

Introduction to Distributional Semantics

What makes car and automobile, rage and anger synonymous? A simple answer would be: they can be used interchangeably in many situations/contexts. This observation is where the intuition behind the Distributional Hypothesis came from: words that occur in the same contexts tend to have similar meanings.

Modifying Custom Matmul CUDA Kernels

I started to learn CUDA last year, and started writing matrix multiplication kernels as a learning project. After some struggles, I made them to work, but then got disappointed when I saw my kernels are 10 times slower than cuBLAS GEMM kernels. Maybe my expectations were a bit too high. I’ve tried lots of open sourced matmul kernels on github, but the best one I found was still about 5 times slower (some of them were optimized for older architectures). So I started the journey of optimizing my own matmul kernel. After few months of trial and error, my matmul kernel finally has comparable speed to cuBLAS GEMM.