Korean morpheme analyzer tools

C/C++
  • KTS (1995) GPL v2

    • By 이상호, 서정연, 오영환 (KAIST & 서강대)
    • code
  • MACH (2002) custom

    • By Prof. Kwangseob Shim (성신여대)
  • MeCab-ko (2013) GPL LGPL BSD

    • By Yong-woon Lee and Youngho Yoo
Java
  • Arirang (2009) Apache v2

    • By SooMyung Lee
    • code
  • Hannanum (1999) GPL v3

    • By Prof. Key-Sun Choi Key’s research team (KAIST)
    • code, docs
  • KKMA (2010) GPL v2

    • By Prof. Sang-goo Lee’s research team (서울대)
    • Generates morpheme candidates using dynamic programming
    • Tags morphemes by checking neighbors, and employing some heuristics and HMM models
    • Developer blog: Dongjoo Lee
  • KOMORAN (2013) custom

    • By shineware
Python
  • KoNLPy (2014) GPL v3

    • By Lucy Park (서울대)
  • UMorpheme (2014) MIT

    • By Kyunghoon Kim (UNIST)
R
  • KoNLP (2011) GPL v3

    • By Heewon Jeon
Others

Other NLP tools

Language parser
  • KoreanParser – By DongHyun Choi, Jungyeul Park, Key-Sun Choi (KAIST)

Corpora

  • Yonsei Corpus, 연세대, 1987.

    • 42M tokens of Korean since the 1960s
  • Korea University Korean Corpus, 1995.

    • 10M tokens of Korean of 1970-90s
  • HANTEC 2.0, KISTI & 충남대, 1998-2003.

    • 120,000 test documents (237MB)
    • 50 TREC-type questions for QA (48KB)
  • HKIB-40075, KISTI & 한국일보, 2002.

    • 40,075 test documents for text categorization (88MB)
  • KAIST Corpus, KAIST, 1997-2005.

  • Sejong Corpus, National Institute of the Korean Language, 1998-2007.

General NLP resources

 

 

Original Link: http://konlpy.org/en/latest/references/

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s