fcamel 技術隨手記: 超淺的 Word sense disambiguation 學習心得

2010年2月2日星期二

超淺的 Word sense disambiguation 學習心得

( 這篇是幾個月前寫的, 補貼上來備份資料, 內文可能有誤, 請以原始資料為準 )

最近面試的人是 NLP 背景, 就簡單地惡補一下相關知識,
Wikipedia 永遠是你的好老師! 只是長期這樣眼睛很累

參考資料

WSD (word sense disambiguation) is harder than POS.

accuracy 70% vs. 95%
WSD is harder to tag. Humans are hard to know all senses of a word.

Both areas are dominated by machine learning methods.

WSD: SVM
POS: HMM

近年來 ML / Statistics 四處亂踢館啊, 還滿好奇為啥 POS 和 WSD 這麼相像,
Wikipedia 上卻說 WSD 用 classifier, 而 POS tagging 卻用 HMM 這類有考慮文字順序的 model。

Two kinds of approaches to WSD

Deep approach:

code human knowledge into computer-readable format.
it's very hard to use in practice.

Shallow approach:

a statistical way.
it works most times; when facing ambiguity, use window to reduce ambiguity.

Example

bass: low frequency or a kind of fish.

bass can be distinguished by counting word co-occurrence.
bass + sound -> low frequency
bass + fish or sea -> fish

A hard example: A dog barks at a tree.

bark: 吠 or 樹皮?
If we use window size <= 2, it's related to dog.

Misc

A naive approach (simply match POS by the most possible one) achieves 90%. 這樣的成果還挺令人無言的, 一開始最多就只有 10% 進步空間。

fcamel 技術隨手記

2010年2月2日星期二

超淺的 Word sense disambiguation 學習心得

沒有留言:

張貼留言

在 Fedora 下裝 id-utils

搜尋此網誌

2010年2月2日 星期二

超淺的 Word sense disambiguation 學習心得

沒有留言:

張貼留言

在 Fedora 下裝 id-utils

2010年2月2日星期二