最近面試的人是 NLP 背景, 就簡單地惡補一下相關知識,
Wikipedia 永遠是你的好老師! 只是長期這樣眼睛很累
參考資料
- http://en.wikipedia.org/wiki/
Word_sense_disambiguation - http://en.wikipedia.org/wiki/
Part-of-speech_tagging
WSD (word sense disambiguation) is harder than POS.
- accuracy 70% vs. 95%
- WSD is harder to tag. Humans are hard to know all senses of a word.
Both areas are dominated by machine learning methods.
- WSD: SVM
- POS: HMM
Wikipedia 上卻說 WSD 用 classifier, 而 POS tagging 卻用 HMM 這類有考慮文字順序的 model。
Two kinds of approaches to WSD
- Deep approach:
- code human knowledge into computer-readable format.
- it's very hard to use in practice.
- Shallow approach:
- a statistical way.
- it works most times; when facing ambiguity, use window to reduce ambiguity.
- bass: low frequency or a kind of fish.
- bass can be distinguished by counting word co-occurrence.
- bass + sound -> low frequency
- bass + fish or sea -> fish
- A hard example: A dog barks at a tree.
- bark: 吠 or 樹皮?
- If we use window size <= 2, it's related to dog.
Misc
- A naive approach (simply match POS by the most possible one) achieves 90%. 這樣的成果還挺令人無言的, 一開始最多就只有 10% 進步空間。
沒有留言:
張貼留言