word segmentation 中文意思是什麼

word segmentation 解釋
詞語切分
  • word : n 1 單詞;〈pl 〉歌詞,臺詞。2 〈常 pl 〉談話,話,言語。3 〈不加冠詞〉音信,消息,傳言,口信;【...
  • segmentation : n. 1. 分割;切斷。2. 【生物學】(細胞)分裂;(動物)分節;斷裂。
  1. This article is composed of three parts, data collection, data filtering and machine learning. these three parts were assembled organically and enhanced the intelligence as far as it can at every point to improve on the traditionary word segmentation algorithm and inductive learnings

    三個子系統通過知識庫有機的結合在一起,並盡可能地在系統的各個環節利用agent的思想提高智能化,並對傳統的分詞演算法,歸納學習演算法做了融合和改進。
  2. Word segmentation is the basis of chinese information processing ( nlp )

    自動分詞技術是中文信息處理的基礎工程。
  3. Firstly, this thesis discusses the requirements of chinese word segmentation. based on that, seven algorithms are proposed with grid characteristic

    本文首先研究了漢語分詞應用需求的多樣性,結合網格的特點設計了七個服務演算法。
  4. This thesis explains the necessity of the character recognition technology of the computer at first, describe the meaning in which the handwritten numeral discerns ; pretreatment technology of handwritten numeral recognition, including two value, line segmentation, word segmentation smooth, removing noising, standardization and thinning are discussed two value concretely discusses whole threshold value, some threshold value, dynamic threshold value and utilize space information to carry on threshold, which are several kinds of common method of choosing threshold value, especially utilize space information to carry on threshold value is describe in detail ; adopting to the foundation of thinning based on mathematics morphology, thinning algorithm of serials same and thinning algorithm of protecting shape are discussed ; afterwards, according to principle ' s diagram of the on - line character recognition, by analyzing the structure feature of the handwritten numeral, this thesis has proposed the online recognition te chnology of the free handwritten numeral based on the stroke feature and the online recognition technology of the free handwritten numeral based on the multistage classifying device. detail narrated noise removing, stroke characteristic definition and discernment, distance criterion of whole word match ; then under the foundation of handwritten numeral segmentation, off - line handwritten numeral recognition is researched. especially minimum distance classifying device, tree classifying device and adaptive resonance ( art ) network classifying device is discussed at the same time, believes degree analyses are introduced to integrate a lot of classifying devices ; at the end, the typical application of the handwritten numeral recognition was briefly narrated, its application in extensive data statistics, financial affairs, tax, finance and mail sorting have been explored

    二值化時對整體閾值二值化、局部閾值二值化、動態閾值二值化和利用空間信息進行閾值選取幾種常用的閾值選取方法進行討論,特別對利用空間信息進行閾值選取進行了詳細論述;在對通過對基於數學形態學的細化的基礎上,討論序貫同倫形態細化演算法和保形的快速形態細化演算法;然後依據聯機字元識別原理框圖,分析了手寫數字的結構特點,提出了基於筆劃特徵的任意手寫數字在線識別技術和基於多級分類器任意手寫數字在線識別技術,對其中涉及的筆劃識別前的噪聲處理、筆劃間特徵量的定義及識別、整字匹配的距離準則進行了詳細敘述;繼而在對手寫數字的分割的基礎下對脫機手寫數字識別進行了研究,對基於最小距離分類器字元識別、基於樹分類器的字元識別、基於自適應共振( art )網路的字元識別分別進行了詳細討論,並引入置信度分析將多個分類器進行了混合集成;最後簡單闡述了手寫數字識別的典型應用,對其在大規模數據統計、財務、稅務、金融及郵件分揀中的應用進行了探索。
  5. The proper nouns usually derives new words frequently, easy to change or disappear, and can not be formed following particular rules, all of these cause the proper nouns hard to be recognized in the sentence and make chinese word segmentation difficult

    專有名詞存在新詞出現快,成詞無固定規則,容易變化等特點,給漢語分詞帶來了很大的干擾,使專有名詞的識別成為漢語分詞的一大瓶頸。
  6. In this paper, the word segmentation technology of chinese text classification is debated emphatically. and the method of word segmentation based on the phrase labeling of 2 - gram syntax is put forward combining the method of setting separate - signs and the method based on the statistic of word - frequency, which can recognize the vocabularies which the method based on the dictionary can not manage

    對于基於信息過濾的自動分類問題,使用字典分詞並不是一個必須的過程,因而本文提出了基於2元語法短語標引的分詞方法,它將設立切分標志法與基於詞頻統計的方法相結合,可以識別基於詞典方法處理不了的詞匯,如:人名、地名、專業術語等。
  7. In addition to word segmentation and part - of - speech tagging, the processing involves the tagging of proper nouns ( person names, place names, organization names arid so on ), morpheme subcategories and the special usages of verbs and adjectives

    加工項目除詞語切分和詞性標注外,還包括專有名詞(人名、地名、團體機構名稱等)標注、語素子類標注以及動詞、形容詞的特殊用法標注。
  8. In addition to word segmentation and part - of - speech tagging, the processing involves the tagging of proper nouns ( person names, place names, organization names and so on ), morpheme subcategories and the special usages of verbs and adjectives

    加工項目除詞語切分和詞性標注外,還包括專有名詞(人名、地名、團體機構名稱等)標注、語素子類標注以及動詞、形容詞的特殊用法標注。
  9. At first system accomplishes chinese language automatic word segmentation and part - of - speech tagging through chinese input approach with word segmentation, then forms corresponding surface semantic network according to the semantic structure grammar, and finally gets corresponding data flow diagram and data dictionary according to the automatic generation algorithms of data flow diagram and data dictionary, the whole completion of the work, can not only provide a description environment of natural language for case, but also develop into the system which takes the question described on the basis of the natural language as the system ' s input

    工作的中心是自然語言篇章理解。系統首先通過分詞輸入法實現漢語自動分詞與詞性標注,然後根據語義結構文法產生相應的表層語義網路,最後根據數據流圖、數據字典自動生成演算法轉換為相應的數據流圖和數據字典。這項工作的徹底完成,不僅可以給case提供一個自然語言的描述環境,而且可進一步發展為基於自然語言描述問題作為輸入的系統。
  10. The former includes chinese word segmentation, part - of - speech tagging, pinyin tagging, named entity recognition, new word detection, syntactic parsing, word sense disambiguation, etc

    前者涉及到詞法、句法、語義分析,包括漢語分詞、詞性標注、注音、命名實體識別、新詞發現、句法分析、詞義消歧等。
  11. The first step in nntcs is chinese word segmentation on chinese documents. feature terms are selected from documents. term frequencies of each term are recorded

    在nntcs中,第一步是對中文文檔進行漢語分詞,從文檔中抽出特徵詞,並且統計各特徵詞的詞頻。
  12. According to the particularity of chinese information, the author improves the mm word segmentation algorithm, and gained a preferable word segmentation algorithm

    本文根據中文信息處理的特殊性,在原有的最大匹配分詞演算法的基礎上進行改進,通過引進預處理過程,利用長詞優先規則得到一個較好的分詞演算法。
  13. In the meantime, combining the traits of network applications, we put forward a largest matched word segmentation algorithm

    其間兼顧網路上應用的特點,提出了基於無指導的最大匹配分詞演算法。
  14. Meanwhile it accelerates the search process by cache. ( 3 ) the chinese word segmentation module to support the text segmentation of system, it uses hmm - based disambiguation algorithm to improve the accuracy of the word segmentation. ( 4 ) the search module to response the users ’ search request, it applies an efficient clustering / classification algorithm to optimize the search service quality

    它使用基於hmm模型的歧義消除演算法來提高分詞預處理的切分精度。 ( 4 )檢索模塊用來響應用戶的查詢請求。它利用簡單靈活的聚類/分類演算法來優化系統的搜索服務。
  15. It was composed of three main modules, they were word segmentation module based on the maximum word - length matching algorithm, part of speech tagging module based on statistical method of training of relative frequency, and the syntax parsing module based on the improved chart analysis algorithm

    該系統實現了基於最大詞長匹配演算法的分詞模塊、基於統計方法的詞性標注模塊和基於改進的線圖分析演算法的句法分析模塊。
  16. After analyzing and summarizing the characteristics of technology sentence, translation memory is selected as the core of technology translation system in the paper, some key technologies, including word segmentation, similarity computation, alignment, english sentence construction and the design of bilingual dictionary, example corpus, sub - trunk gallery, are deeply studied

    論文在分析和總結工藝語句特點的基礎上,提出用翻譯記憶技術作為工藝翻譯系統核心,並分別對分詞演算法、相似度計算、對齊方法、譯文生成等關鍵技術進行了研究,建立了雙語詞典庫、例句庫和子塊庫。
  17. Through discussing such core technologies in the automatic processing of chinese information as automatic word segmentation, feature selecting and automatic representation of texts, the thesis makes some improvements and perfection on the current methods of automatic word segmentation and text space reduction of chinese texts, therefore improved their efficiencies and effects. with regard to the methods of text classification, the paper introduced two supervisory automatic classification methods of chinese texts based on multi - classification, i. e. fuzzy clustering and boosting, which settled the problem of low percentage of recall. through comparing the results of experiments with the two methods, an automatic classification system of multi - classification texts is constructed based on the boosting method, which received good effects in application and provides a good resolution to the problem of real - time classification of information

    通過對漢語信息自動處理中自動分詞、特徵提取、文本自動表示等核心技術討論,對目前漢語文本自動分詞和文本降維方法中的不足和缺陷作了改進,提高了分詞和文本分類的效率和效果;在文本自動分類方法上,介紹了兩種有監督的基於多類的漢語文本自動分類處理方法? ?模糊聚類方法和boosting方法,解決了實踐中文本分類查全率不高的問題;通過對兩種方法的實驗比較結果,構建了基於boosting方法的多類文本自動分類系統,在實際應用中收到了良好的效果,較好的解決了信息的實時分類問題。
  18. Asian language information processing, 2002, 1 : 225 - 268. 44 peng f, huang x, schuurmans d et al. investigating the relationship between word segmentation performance and retrieval performance in chinese ir

    中文動詞次范疇化的研究主要研究了漢語動詞次范疇化現象的語言學理論和漢語動詞scf信息的自動獲取技術,並獲得了目前國內外同類研究的最優性能。
  19. Sun maosong and zuo zhengping have presented a word segmentation algorithm based on a large chinese corpus. the approach may be beneficial to understanding unrestricted chinese texts

    他們給出了一個基於大規模語料的歧義切分演算法,該方法有助於理解非受限中文文本。
  20. The article introduced agent machine learning chinese word segmentation. generally, focused on over agent ' s intelligibility, substitution, go - aheadism

    文章系統介紹了智能代理,機器學習和漢語分詞技術,其中著重研究了agent的智能性,代理性,主動性。
分享友人