Eugene chinese biography databases free
Threatpost dead
To learn more, view our Privacy Policy. To browse Academia. We use regular expressions and neural network models to systematically harvest data from primary and secondary sources and employ an entity-relationship model to organize our data. As a relational database with both online and offline versions, CBDB provides freely accessible, structured data for macroscopic, quantitative studies of premodern China.
The data in CBDB is continuously disambiguated and readily formatted for statistical, social network, and spatial analyses, and also has value for tagging named entities in historical texts and contextualizing other data collections. This paper reports the current status of using natural language processing and text mining methods to identify biographical information of government officers so that we can add the information into the China Biographical Database CBDB , which is hosted by Harvard University.
Mining texts in Difangzhi is not easy partially because there is litter knowledge about the grammars of literary Chinese so far. We employed techniques of language modeling and conditional random fields to find person and location names and their relationships.
Russian cybersecurity company
The methods were evaluated with realistic Difangzhi data of more than 2 million Chinese characters written in literary In this paper, we combine natural language processing NLP tech-niques and network analysis in order to systematically map the individuals men-tioned in the Biographical Dictionary of Republican China, thus revealing its underlying structure.
We depart from previous studies due to the distinction we make between the subject of a biography bionode and the individuals men-tioned within a biography object-node. We examine whether the bionodes form sociocentric networks based on shared attributes provincial origin, education, etc. Our major contribution consists of annotating the links between individ-uals in order to: 1 question the assumption that word cooccurrences equate to actual relations; 2 define a more accurate classification of relationships among elites in republican China.
We demonstrate that political and professional rela-tions in this population outweigh the types of social ties commonly accepted in scholarship on modern day China. We ultimately develop a method that can be applied to similar corpora in a critical and comparative perspective. Person names and location names are essential building blocks for identifying events and social networks in historical documents that were written in literary Chinese.
We take the lead to explore the research on algorithmically recognizing named entities in literary Chinese for historical studies with language-model based and conditional-random-field based methods, and extend our work to mining the document structures in historical documents.