document clustering 中文意思是什麼

document clustering 解釋
文件聚類
  • document : n 1 文獻,文件;公文。2 證件,證書,憑證。3 記錄影片,記實小說。4 【航海】船舶執照。vt 1 用文件[...
  • clustering : 叢聚
  1. To improve document clustering, a document similarity measure based on cosine vector and keywords frequency in documents is proposed, but also with an input ontology

    為了改進文本聚類的效果,提出了將領域知識本體和文本關鍵詞詞頻相結合的基於餘弦向量的文本相似性測度方法。
  2. A method that combines category - based and keyword - based concepts for a better information retrieval system is introduced. to improve document clustering, a document similarity measure based on cosine vector and keywords frequency in documents is proposed, but also with an input ontology. the ontology is domain specific and includes a list of keywords organized by degree of importance to the categories of the ontology, and by means of semantic knowledge, the ontology can improve the effects of document similarity measure and feedback of information retrieval systems. two approaches to evaluating the performance of this similarity measure and the comparison with standard cosine vector similarity measure are also described

    介紹了一種綜合各層級分類類目和對應關鍵詞來構造概念體系並用於改進信息檢索系統效果的方法.為了改進文本聚類的效果,提出了將領域知識本體和文本關鍵詞詞頻相結合的基於餘弦向量的文本相似性測度方法.該本體面向特定領域,將關鍵詞以不同權值對應于各分類類目,通過其語義知識來改進文本相似性測度以及信息檢索系統的效果.進一步給出了對基於本體的相似性測度方法進行效果評價的2種策略以及該方法與經典餘弦向量測度方法的比較結果
  3. The thesis proposes a new document clustering method that uses a model named document index graph to represent chinese documents

    本文提出的一種新的文本聚類方法,採用一種稱為文檔索引圖的結構來構建中文文本表示模型。
  4. Document clustering techniques have been received more and more attentions as a fundamental and enabling tool for efficient organization, navigation, retrieval, and summarization of huge volumes of text documents. the aim of document clustering is to cluster the documents into different semantic classes in an unsupervised manner

    文本聚類作為一種對大規模文本信息進行有效地組織、導航、檢索和概括匯總的關鍵的、基本的技術而日益受到關注,其主要目的是在語義空間里以無監督的方式將文本集中的文本劃分成不同的類。
  5. The document space is generally of high dimensionality and clustering in such a high dimensional space is often infeasible due to the curse of dimensionality. so the primary step in document clustering is to project the document into a lower - rank semantic space in which the documents related to the same semantics are close to each other

    基於文本空間的文本聚類因為其具有高維的特徵而不容易直接實現,所以文本聚類的首要步驟就是將文本空間的數據投影到較低維的語義空間里,使在文本空間里相鄰的數據向量在語義空間里根據某些提取的特徵參數而相似。
  6. Users of web search engines are often forced to shift through the long ordered list of document " snippets " returned by the engines. this paper applied web content mining to the field of search engine. search engine results clustering relies on the information returned by the search engine

    本文將web內容挖掘技術應用於搜索引擎領域,它依賴于搜索引擎結果所提供的信息來歸納出聚類,使得在搜索引擎返回的非常大的文檔列表中的過濾操作變得十分方便。
  7. Different from previous document clustering method based on nmf, our methods try to discover both the geometric and discriminating structures of the document space in an unsupervised manner, companied with high accuracy in acceptable computationally expensive

    與基於nmf演算法的文本聚類不同,我們的演算法力求以無監督的方式,在時間復雜度允許的范圍內,找到更適合於分類操作的數據向量間的幾何局部特徵向量及相應的各文檔的編碼向量。
  8. However, their current status is still far from user ' s satisfaction. lt includes : ( 1 ) the content that search engine returns is a enormous flat bill ( information overloading question ) ; ( 2 ) the items return with search engine are not the content that user requisite in deed ( low precision question ) this paper presents a fuzzy ( soft ) clustering algorithm htsc ( hyperlink - text based soft clustering ) using a mixed similarity metric of document content and inter - document hyperlinks, for clustering web search results from a search engine in order to help users find relevant web information more easily

    這主要表現為: ( 1 )搜索引擎返回的結果是一個龐大的平坦結構的資源清單(即信息負載問題) ; ( 2 )搜索結果中的信息項並非都是用戶真正需要的信息資源(即低精度問題) ;論文提出了一種基於文檔文本內容和文檔間超鏈信息的混合相似度的模糊(軟)聚類演算法htsc 。該演算法可對搜索引擎返回的結果進行模糊聚類,以方便用戶從中找到真正需要的信息。
  9. Document [ 14 ] proposes the modified clustering analysis : variance square sum clustering analysis, weighted variance square sum clustering analysis etc. but they have not explained the number of the division of economic zones, and only show how to divide the economic zones

    在文獻[ 14 ]中作者給出了改進的聚類分析的方法:方差平方和聚類分析、加權變量影響的方差平方和法聚類分析、預先確定組數的擬和聚類分析。
  10. At the realization of the system, we analyze the integral structure and working principle of our system at first. then, we show the relationship among tables in core database. lastly, we study automatic document categorization algorithm and propose algorithm descriptions and experiment results of chinese language segmentation, schema matching of paper titles and clustering

    在系統實現部分分析了系統的整體結構和工作原理,介紹了系統核心數據庫中各表的聯系,最後重點研究了文檔自動分類演算法,給出了漢語分詞演算法、論文標題模式匹配演算法、聚類演算法的演算法描述及實驗結果。
  11. In this paper, a visual similarity based document layout analysis scheme is proposed, which by using clustering strategy can adaptively deal with documents in different languages, with different layout structures and skew angles

    事實上,不僅排版參數會有不同,不同文檔在內容復雜度上的變化也是很大的,用於簡單版面的演算法不能適應復雜版面,反之用於復雜版面的演算法也不能適應簡單版面。
  12. This paper researches and discusses the theory of latent semantic index, include the theory of single value decompose and word - document matrix. in this paper the author discusses the application of latent semantic index in chinese document clustering based on latent semantic index, researches and discusses vector space model, latent semantic index, electronic dictionary, word - splitting and the algorithm of k - means. this paper presents a improved structure of electronic dictionary and a improved algorithm of word - spliting

    本文對潛在語義索引模型進行系統的研究和探討,包括奇異值分解等相關矩陣理論、詞-文檔矩陣等;同時本文研究和探討了潛在語義索引模型在中文文本聚類中的具體應用和實現,包括文本間相似度的度量、詞-文檔矩陣、奇異值分解的具體實現;同時本文對中文文本聚類所涉及的其他一些中文處理技術,包括向量空間模型、電子字典、切詞、 k - means聚類演算法等也進行了研究和探討。
  13. A research of document clustering algorithm based on vector space model

    基於向量空間模型的文本檢索系統
  14. The combination of document clustering technique and web search engine has become a hot - spot in document mining area

    文本聚類技術和網路搜索引擎服務相結合,已經成為文本挖掘領域的一個熱點研究課題。
  15. But there are seldom researches in using document clustering technique into chinese web documents and cooperating with chinese web search engine services

    但是,把文本聚類技術應用於中文web文檔,與中文搜索引擎服務相結合的研究仍然比較匱乏。
  16. On document clustering based on fuzzy c - mean algorithm

    均值演算法文檔聚類問題的研究
  17. Fast and high - quality document clustering algorithms play an important role towards this goal as they have been shown to provide both an navigation / browsing mechanism by organizing large amounts of information into a small number of meaningful clusters as well as to greatly improve the retrieval performance either via cluster - driven dimensionality reduction or term - weighting

    快速和高質量的文本聚類技術在實現這個目標過程中扮演了重要的角色。通過將大量信息組織成少數有意義的簇,這種技術能夠提供導航瀏覽機制,或者,通過聚類驅動的降維或權值調整來極大地改善檢索性能。
  18. Document clustering techniques have been received more and more attentions as a fundamental and enabling tool for efficient organization, navigation, retrieval, and summarization of huge volumes of text documents. the aim of document clustering technique is to cluster different documents into different semantic classes based on their content in an unsupervised manner

    文本聚類作為一種對大規模文本信息進行有效組織、導航、檢索和概括匯總的基礎、關鍵技術而日益受到關注,其主要目的是以無監督指導的方式根據文本的內在關系將內容近似的文本分成不同的類。
  19. Our experimental evaluations show that our methods surpass the nmf not only in the easy and reliable derivation of document clustering results, but also in document clustering accuracies

    實驗結果顯示,在聚類的容易度、準確度、時間復雜度上均取得較nmf演算法更合理的效果。
  20. Dynamic document clustering based on genetic algorithm

    基於遺傳演算法的動態文本聚類
分享友人