The Czech National Corpus has released its new GrammatiKat tool, developed as part of our project by co-investigator Dominika Kováříková.

GrammatiKat provides information on the grammatical categories within a part of speech (e.g. which case is used most frequently for nouns etc.) as well as for individual lemmas (grammatical profiles). The tool is designed primarily for research into grammatical categories as well as for lexicological and lexicographic exploration, but it can be useful for other purposes, e.g. teaching Czech as a second language. At the moment, only information on Czech nouns is available, but the CNCI plan on adding adjectives and verbs in the future.
Data are drawn from two components of the Czech National Corpus: SYN2015 (a representative corpus of written Czech) and ORALv4 (a corpus of spoken Czech) and comprise nouns of frequency 100 and higher. It offers a summary of word form distribution within a part of speech that allows for quick comparisons: this is based on the distribution of the word forms of each lemma (each lemma has equal weight in the calculations, regardless of frequency), ensuring that extremely frequent lemmas do not distort the overall results. See and use it at the GrammatiKat website.