Search ThaiLIS Digital Collection 2019 x

แจ้งเอกสารไม่ครบถ้วน, ไม่ตรงกับชื่อเรื่อง หรือมีข้อผิดพลาดเกี่ยวกับเอกสาร ติดต่อที่นี่ ==>
หากไม่มีอีเมลผู้รับให้กรอก thailis-noc@uni.net.th

Duanphen Thaiprayoon.  Ambiguity resolution in Thai word segmentation using contextual information.  Master's Degree(Computer Science ).  Mahidol University. Mahidol University Library and Knowledge Center. : Mahidol University, 2009.

Title

Ambiguity resolution in Thai word segmentation using contextual information

Title Alternative

วีธีการแก้ปัญหาความกำกวมในการแบ่งคำไทยโดยใช้ข้อมูลเชิงบริบท

Creator

Name: Duanphen Thaiprayoon

Subject

LCSH: Automatic speech recognition

LCSH: Natural language processing (Computer science)

LCSH: Word (Linguistics)

Description

Abstract: This thesis studies how to segment ambiguous words in the Thai language, where the ambiguity lies in whether the word in question should be segmented as either a combined (whole) word or as multiple separate words. To make a correct segmentation, we need to understand the structure of the language and the context in which the word is being used in the sentence. Hence, it is challenging for a computer to perform word segmentation for us. Three main approaches have been taken in the research community: rule-based, dictionary-based, and machine-learning-based approaches. In this thesis, we propose a machine-learning-based word segmentation algorithm that learns from a large, existing database of pre-segmented words. We count the number of times the considered ambiguous word is segmented in a combined form and in a separated form. We also look at the collocation of words surrounding the considered ambiguous word, allowing us to take into account the context that the ambiguous word is being used. To decide how to segment an ambiguous word, each segmented form is given a score or a weight which is proposed to be calculated using the following four methods: weight with probability, weight with frequency, weight with frequency and distance, and weight with term frequencyinverse document frequency. From experimental evaluation, the four weighting methods gave a similar performance of up to 84% in term of correctness in segmentation, but the weight-with-probability method requires the least runtime. In summary, the thesis proposes an ambiguous-word-segmentation method that could achieve 84.40% correctness.

Abstract: การศึกษาครั้งนี้เป็นการวิจัยเกี่ยวกับการหาวิธีแก้ไขปัญหาความกำกวมในการแบ่งคำ ไทย ซึ่งความกำกวมของการแบ่งคำจะเกิดขึ้นเมื่อคำๆนั้นมีรูปแบบการแบ่งได้มากกว่า 1 แบบ ได้แก่ รูปแบบของการรวมคำและรูปแบบการของการแยกคำ ในการตัดสินใจว่าควรจะเลือกใช้การแบ่งคำ รูปแบบใดจะต้องรู้โครงสร้างของภาษาและเข้าใจเนื้อหา จึงเป็นการยากเมื่อให้คอมพิวเตอร์แบ่งคำ แทนคน จึงได้ศึกษาหาวิธีการโดยเข้าไปเรียนรู้จากคลังข้อมูลที่ได้มีการแบ่งคำไว้ ดูจำนวนการเกิด การแบ่งในรูปแบบของการรวมคำและรูปแบบของการแยกคำ และดูคำแวดล้อมที่เกิดขึ้น จากข้อมูล ดังกล่าวนำมาคำนวณหาน้ำหนักของการแบ่งแต่ละแบบ โดยเสนอทั้งหมด 4 วิธี ได้แก่หาค่า น้ำหนักด้วยโอกาสการเกิดการแบ่งแต่ละแบบ, หาค่าน้ำหนักจากจำนวนความถี่ของคำแวดล้อม , หาค่าน้ำหนักจากความถี่และระยะห่างของคำแวดล้อม และหาค่าน้ำหนักจากความถี่ของคำแวดล้อม แปรผกผันกับจำนวนเอกสารที่คำแวดล้อมนั้นปรากฏอยู่ ซึ่งทั้ง 4 วิธีให้ความถูกต้องใกล้เคียงกัน แต่เมื่อพิจารณาจากความซับซ้อนของวิธีการพบว่าวิธีการหาโอกาสของการเกิดการแบ่งคำแต่ละ แบบเป็นวิธีที่มีความซับซ้อนน้อยที่สุด จึงเลือกใช้วิธีนี้ สรุปผลของการวิจัยนี้สามารถแบ่งคำที่มี ความกำกวมได้ถูกต้องเฉลี่ย 84.40 %

Publisher

Mahidol University. Mahidol University Library and Knowledge Center

Address: NAKHONPATHOM

Email: liwww@mahidol.ac.th

Contributor

Name: Charnyote Pluempitiwiriyawej

Role: Thesis Advisors

Date

Created: 2009

Modified: 2553-08-24

Issued: 2010-08-24

Type

วิทยานิพนธ์/Thesis

Format

application/pdf

Source

CallNumber: TH D812a 2009

Language

eng

Thesis

DegreeName: Master of Science

Level: Master's Degree

Descipline: Computer Science

Grantor: Mahidol University

Rights

RightsAccess:

ลำดับที่.	ชื่อแฟ้มข้อมูล	ขนาดแฟ้มข้อมูล	จำนวนเข้าถึง	วัน-เวลาเข้าถึงล่าสุด
1	4637282.pdf	2.42 MB	93	2024-09-26 00:31:45

ใช้เวลา

-0.716369 วินาที

Creator : Duanphen Thaiprayoon

Title	Contributor	Type
Ambiguity resolution in Thai word segmentation using contextual information มหาวิทยาลัยมหิดล Duanphen Thaiprayoon	Charnyote Pluempitiwiriyawej	วิทยานิพนธ์/Thesis

Contributor : Charnyote Pluempitiwiriyawej

Title	Creator	Type and Date Create
A table-based clustering strategy for XML data storage and querying มหาวิทยาลัยมหิดล Charnyote Pluempitiwiriyawej	Nattapong Thawornkool	วิทยานิพนธ์/Thesis
A study and evaluation of dna-sequence-assembly programs มหาวิทยาลัยมหิดล Charnyote Pluempitiwiriyawej	Anukoon Wisitsora-at	วิทยานิพนธ์/Thesis
Xpack : a grammar-based XML document compression มหาวิทยาลัยมหิดล Charnyote Pluempitiwiriyawej	Kesmanas Mairiang	วิทยานิพนธ์/Thesis
A semi-automatic construction of Thai wordnet lexical database from machine readable dictionaries มหาวิทยาลัยมหิดล Charnyote Pluempitiwiriyawej	Patanakul Sathapornrungkij	วิทยานิพนธ์/Thesis
Augmenting prefixspan algorithm to support public phone malfunction prediction มหาวิทยาลัยมหิดล Charnyote Pluempitiwiriyawej	Varaluk Diloktrakarnkij	วิทยานิพนธ์/Thesis
Bilingual machine readable dictionary extraction and concept tree construction for Thai wordnet development มหาวิทยาลัยมหิดล Charnyote Pluempitiwiriyawej	Yutthana Pirunsarn	วิทยานิพนธ์/Thesis
Ambiguity resolution in Thai word segmentation using contextual information มหาวิทยาลัยมหิดล Charnyote Pluempitiwiriyawej	Duanphen Thaiprayoon	วิทยานิพนธ์/Thesis
Session authentication for web services in mobile computing มหาวิทยาลัยมหิดล Damras Wongsawang;Charnyote Pluempitiwiriyawej	Teerapong Watanapitayakul	วิทยานิพนธ์/Thesis
Khmer WordNet construction มหาวิทยาลัยมหิดล Charnyote Pluempitiwiriyawej;Suppawong Tuarob;Thanapon Noraset	Udorm, Phon, 1996-	วิทยานิพนธ์/Thesis
Blockchain empowered system integration for federated DNA profiles database system มหาวิทยาลัยมหิดล Charnyote Pluempitiwiriyawej;Worapan Kusakunniran;Assadarat Khurat	Sai, Thu Ya Aung, 1991-	วิทยานิพนธ์/Thesis