10 points · sethbarrettAU · 2 days ago
github.comSo I built latex-wc, a small Python CLI that:
- extracts tokens from LaTeX while ignoring common LaTeX “noise” (commands, comments, math, refs/cites, etc.)
- can take a single .tex file or a directory and recursively scan all *.tex files
- prints a combined report once (total words, unique words, top-N frequencies)
Fastest way to try it is `uvx latex-wc [path]` (file or directory). Feedback welcome, especially on edge cases where you think the heuristic filters are too aggressive or not aggressive enough.
gucci-on-fleek
mci
detex "$@" | wc
detex "$@" | tr -cs '[:alnum:]' '\n' | grep . | tr '[:upper:]' '[:lower:]' | sort | uniq -c | sort -rndang
[0]: https://ctan.org/pkg/texcount?lang=en