T5 is an encoder-decoder transformer from Google that once was SOTA on several NLU and NLG problems and is still very useful as a base for seq2seq tasks such as text summarization. The first T5 model was for English only, and then the massively multilingual version followed. This model covers 101 languages and is massive indeed.

This post shows how to extract a single-language model from the multilingual one by pruning its redundant embeddings. This reduces the number of parameters more than twice without significant loss in quality. …

If you don’t like and don’t understand math, does it mean you are stupid?

Do you need to love math to achieve at finance/engineering/science? No.

People say that math is the queen of all sciences, that without mathematical thinking it’s impossible to survive and to strive, that everyone should be good at it. Mathematical problems are asssigned at school admission exams and job interviews. Mathman is a modern superhero, that solves all problems and saves the world when no one else can.

This tendency starts with junior school: all children have to get good math grades, otherwise they are difficult…

Recently I bumped into a question on Stackoverflow, how to recover phrases from abbreviations, e.g. turn “*wtrbtl*” into “*water bottle*”, and “*bsktball*” into “*basketball*”. The question had an additional complication: lack of comprehensive list of words. That means, we need an algorithm able to invent new likely words.

I was intrigued and started researching, which algorithms and math lie behind modern spell-checkers. It turned out that a good spell-checker can be made with an n-gram language model, a model of word distortions, and a greedy beam search algorithm. The whole construction is called a noisy channel model.

With this knowledge…

NLP researcher, chatbot developer, teacher of applied math. See daviddale.ru/en