詞類標注 的英文怎麼說
中文拼音 [cílèibiāozhù]
詞類標注
英文
part-of-speech tagging- 詞 : 名詞1 (說話或詩歌、文章、戲劇中的語句) speech; statement; lines of play 2 (一種韻文形式 起於唐...
- 類 : Ⅰ名1 (許多相似或相同的事物的綜合; 種類) class; category; kind; type 2 (姓氏) a surname Ⅱ動詞...
- 標 : Ⅰ名詞1 [書面語] (樹梢) treetop; the tip of a tree2 (枝節或表面) symptom; outside appearance; ...
- 注 : Ⅰ動詞1 (灌入) pour; irrigate 2 (集中) concentrate on; fix on; focus on 3 (用文字來解釋字句)...
- 標注 : affix
-
The letter reference is printed in small capital letter ( s ) in the right of the core design to indicate the type of certification. the letter reference reflects the type of certification, for example " s " represents " safety ". cnca will design and announce for mandatory implementation the letter reference required
在認證標志基本圖案的右部印製認證種類標注,證明產品所獲得的認證種類,認證種類標注由代表認證種類的英文單詞的縮寫字母組成,如圖二中的「 s 」代表安全認證。The contemporary chinese dictionary ( fifth edition ) has two outstanding points : labeling word class on the basis of differentiating between word and non - word ; labeling sub - word class for nouns, verbs and especially adjectives
摘要第5版《現代漢語詞典》有兩個突出的地方:在區分詞與非詞的基礎上給詞標注詞類;名詞、動詞,尤其是形容詞下標注附類。At first, the text is segmented to words and converted to a sequence of part - of - speech tags ; then based on the pos tags sequence parameters and phrase - break distance information from training, markov model is used to get the most likely phrase break sequence
首先,文本進行分詞,並轉換為一列由詞性標記所組成的序列;然後使用馬爾可夫模型,利用人工標注數據庫訓練詞語連接處詞性標注序列的概率分佈和連接類型序列的距離信息,得到輸入的詞性標記序列對應的具有最大似然概率的連接類型序列,最後利用后處理規則進行適當的糾錯。In addition to word segmentation and part - of - speech tagging, the processing involves the tagging of proper nouns ( person names, place names, organization names arid so on ), morpheme subcategories and the special usages of verbs and adjectives
加工項目除詞語切分和詞性標注外,還包括專有名詞(人名、地名、團體機構名稱等)標注、語素子類標注以及動詞、形容詞的特殊用法標注。In addition to word segmentation and part - of - speech tagging, the processing involves the tagging of proper nouns ( person names, place names, organization names and so on ), morpheme subcategories and the special usages of verbs and adjectives
加工項目除詞語切分和詞性標注外,還包括專有名詞(人名、地名、團體機構名稱等)標注、語素子類標注以及動詞、形容詞的特殊用法標注。Good as it is, it still has such weaknesses as inadequate popular words, lack of lexical labels and reference system and e - c index
但也存在以下一些問題:常用詞收量不足,字詞無詞類標注,且缺乏參見系統和英漢索引。Aiming at this question, the paper describes an approach to correcting the part - of - speech tagging of multi - category words automatically
針對這一難點問題,本文提出了一種兼類詞詞性標注的自動校對方法。Part of speech tagging, as part of syntactic tagging, is to mark each word ' s part of speech in < br > a sentence, according to its definition and context
詞性標注是根據詞義及其上下文信息,標注出其在句中所屬詞類的過程,屬于句法范疇的標注。Verb subdivision is similar to part of speech tagging. it subdivides verbs into more detailed classes based on the result of part of speech tagging
動詞細分類和詞性標注有些類似,它是在詞性標注基礎上對其中的動詞進行更細致的類別標注。The experimental results show the tagging accuracy and disambiguation accuracy are raised by using rule techniques and statistics techniques
試驗測試結果標明規則和統計相結合的兼類詞處理機制可以有效地提高詞性排歧正確率和詞性標注正確率。Nevertheless, it has some problems in respect of affirming attribute words, missing labels or mislabeling, the inconsistency in treating word and non - word units with three - syllables
同時,文章認為存在以下幾個方面問題:屬性詞的確認;詞類失標或標注不當;對某些三音節習用單位的詞和非詞的處理不一致。It adopts the hierachical clustering in vocabulary vsm model because of its special function, on the other hand enriches the subcategory tagging information by rules, it can decrease me data sparse problem, and introduces the confidence intervals into the model for the selection of priority between statistics and rules
另外還對標注模型從兩方面作了優化,由於詞匯特徵向量的特殊作用,本文對特徵詞匯採用層次聚類來提高其分類精度;另一方面,引入規則來進一步豐富細分類標注信息,減少數據稀疏等問題,並且引入置信度來選擇統計與規則的優先關系。Part of speech tagging and verb subdivision can provide richer grammatical information for upper level application. for example, parser can utilize the information of part of speech to distingulish the syntactical relationships of different types
詞性標注和動詞細分類可以為上層應用提供更豐富的語法信息,例如句法分析可以利用這些詞性信息進行句法關系的識別。With the above method, a system of disambiguation is materialized. the overall accuracy of close test is 97. 85 % and the accuracy of open test is 96. 71 %
按照上述策略,實現了一個兼類詞處理系統,閉式標注正確率達97 . 85 ,開式標注正確率達96 . 71 。The disambiguation of multi - category words is one of the difficulties in part - of - speech tagging of chinese text, which affects the processing quality of corpora greatly
摘要兼類詞的詞類排歧是漢語語料詞性標注中的難點問題,它嚴重影響語料的詞性標注質量。2. it discusses and analyzes the actuality of chinese part - of - speech tagging, and describes an approach to correcting the chinese part - of - speech tagging automatically
討論和分析了詞性標注的現狀,並針對詞性標注問題,提出了一種基於粗糙集的兼類詞詞性標注校對規則的自動獲取方法。According to the results of close - test and open - test on the corpus of 500, 000 chinese characters, the accuracy of multi - category words ' part - of - speech tagging can be increased by 11. 32 % and 5. 97 % respectively
分別對50萬漢語語料做封閉測試和開放測試,結果顯示,校對后語料的兼類詞詞性標注正確率分別可提高11 . 32 %和5 . 97 % 。It acquires correction rules for the part - of - speech tagging of multi - category words from right - tagged corpora based on the rough sets and data mining, and then corrects the corpora based on these rules automatically
它利用數據挖掘的方法從正確標注的訓練語料中挖掘獲取有效信息,自動生成兼類詞詞性校對規則,並應用獲取的規則實現對機器初始標注語料的自動校對,從而提高語料中兼類詞的詞性標注質量。It references the international methods about the auto - classifying and tagging verb subcategories, and analyses the internal research situation about some related fields, and investigates some resources, such as the subcategory system, the part - of - speech tagging method and corpus etc. it proposes a statistics integrated rules tagging model for part - of - speech subcategory and introduces vocabulary vsm and fuzzy set theory into this field
本文參考了國際上關于動詞自動分類和標注的研究方法,分析了國內相關領域關于詞性細分類標注研究的分類體系、詞性標注方法,以及語料庫資源等研究狀況,提出了一種統計與規則相結合的詞性細分類標注模型,並且把詞匯向量空間模型以及模糊集的方法引入詞性細分類自動標注領域。Experiments respectively adopt the tagging model based on part - of - speech information and vocabulary vsm methods through comparing the traditional tagging methods. then combines the two techniques to build the tagging model of part - of - speech subcategory. and it improves the tagging model by two ways
現代漢語詞性細分類標注模型是在對傳統的各種標注方法進行對比分析的基礎上提出的,實驗分別獨立採用基於詞性信息以及基於詞匯向量空間的細分類標注方法,最後兩種方法結合起來建立標注模型。分享友人