Wikipedia demo - refresh this page to get more random examples

Categories assigned to text using ngram model (with relative ranking score):

Category assigned by SVM classifier using a linear model:

religion_Christianity

Summary:

("7th c BCE") attests that verb "Śavati" in the sense "to go" was used by only the Kambojas. Aggarwala; Geographical Data in the Early Puranas, A Critical Study, 1972, p 164, Dr M. Vidyalankar; Geographical and Economical Studies in the Mahabharata, Upayana Parva, p 37, Dr Motichandra; Ancient Kamboja, People and the Country, 1981, pp 127-28, 167, 218, Dr J.

Similar documents

44 49 53 72 73

Original text:

The current Tajik Republic harkens to the Samanid Empire (Anno Domini|AD The Tajik people came under Russian rule in the 1860s. The Basmachi revolt that broke out in the wake of the Russian Revolution of 1917 was quelled in the early 1920s and Tajikistan became an autonomous Soviet socialist republic (Tajik ASSR) within Uzbekistan in 1924. In 1929 Tajikistan was made one of the component republics of the Soviet Union – Tajik Soviet Socialist Republic (Tajik SSR) – and it kept this status until 1991. Tajikistan gained independence in 1991, and has experienced three changes in government and a Tajik civil war|civil war since then. A peace agreement among rival factions was signed in 1997 but its implementation has progressed slowly. Pre-Islamic Period (600 BC–AD 651) Tajikistan was part of the Archaeological Complex in the Bronze Age, candidate for or Proto-Iranian culture. Tajikistan was part of Scythia in Classical Antiquity. Most of modern Tajikstan had formed parts of ancient Location of the Kamboja and Parama Kamboja kingdoms, which find references in the ancient Indian epics like the Mahabharata. evidence, combined with ancient literary and inscriptional evidence has led many eminent Indologists to conclude that ancient Kambojas ("an Avestan speaking Iranain tribe") originally belonged to the area" of Central Asia. Achariya Yasaka's Nirukta Nirukta II. 2. ("7th c BCE") attests that verb "Śavati" in the sense "to go" was used by only the Kambojas. It has been shown that the modern Ghalcha dialects, "Valkhi, Shigali, Sriqoli, Jebaka (also called Sanglichi or Ishkashim), Munjani, Yidga and Yagnobi", mainly spoken in Pamir and countries on the headwaters of the Oxus, still use terms derived from ancient Kamboja "Śavati" in the sense "to go". Linguistic Survey of India, Vol X, pp 456ff, 468, 473, 474, 476, 500, 511, 524 etc; Journal of Royal Asiatic Society of Asia, 1911, pp 801-802, Sir Griersen; India as Known to Panini, 1968, p 49, Dr V. S. Aggarwala; Geographical Data in the Early Puranas, A Critical Study, 1972, p 164, Dr M. R. Singh; Bharata Bhumi aur uske Nivasi, Samvat 1987, pp 297-305, Dr J. C. Vidyalankar; Geographical and Economical Studies in the Mahabharata, Upayana Parva, p 37, Dr Motichandra; Ancient Kamboja, People and the Country, 1981, pp 127-28, 167, 218, Dr J. L. Kamboj; Sindhant Kaumudi 1966, pp 20-22, Acharya R. R. Pande. The Yagnobi dialect spoken in Yagnobi province around the headwaters of Zeravshan valley in Sogdiana. Further, Sir G Grierson says that the speech of Badakshan was a Ghalcha till about three centuries ago when it was supplanted by a form of Persian Linguistic Survey of India

Entities resolved to DBPedia URIs:

Category Name DBPedia URI (if available)
countries Tajikistan http://dbpedia.org/resource/Tajikistan
countries India http://dbpedia.org/resource/India
political parties Soviet Union http://dbpedia.org/resource/Communist_Party_of_the_Soviet_Union


Entities:

Tajikistan

Location in text: sentence index: 2 starting index in sentence: 22 ending index in sentence +1: 23

Entity data: {:entity=>"Tajikistan", :gender=>:none, :type=>:place, :place_type=>"country"}

Uzbekistan

Location in text: sentence index: 2 starting index in sentence: 33 ending index in sentence +1: 34

Entity data: {:entity=>"Uzbekistan", :gender=>:none, :type=>:place, :place_type=>"country"}

Tajikistan

Location in text: sentence index: 3 starting index in sentence: 2 ending index in sentence +1: 3

Entity data: {:entity=>"Tajikistan", :gender=>:none, :type=>:place, :place_type=>"country"}

Tajikistan

Location in text: sentence index: 4 starting index in sentence: 0 ending index in sentence +1: 1

Entity data: {:entity=>"Tajikistan", :gender=>:none, :type=>:place, :place_type=>"country"}

Tajikistan

Location in text: sentence index: 7 starting index in sentence: 7 ending index in sentence +1: 8

Entity data: {:entity=>"Tajikistan", :gender=>:none, :type=>:place, :place_type=>"country"}

Tajikistan

Location in text: sentence index: 8 starting index in sentence: 0 ending index in sentence +1: 1

Entity data: {:entity=>"Tajikistan", :gender=>:none, :type=>:place, :place_type=>"country"}

India

Location in text: sentence index: 14 starting index in sentence: 3 ending index in sentence +1: 4

Entity data: {:entity=>"India", :gender=>:none, :type=>:place, :place_type=>"country"}

India

Location in text: sentence index: 14 starting index in sentence: 42 ending index in sentence +1: 43

Entity data: {:entity=>"India", :gender=>:none, :type=>:place, :place_type=>"country"}


Part of speech tags with (very experimental) anaphora resolution annotations (bold font):

The/DT, current/JJ, Tajik/NN, Republic/NNP, harkens/NN, to/TO, the/DT, Samanid/NN, Empire/NNP, Anno/NN, Domini/NN, AD/NN, The/DT, Tajik/NN, people/NNS, came/VBD, under/IN, Russian/NNP, rule/NN, in/IN, the/DT, 1860s/NN, ./., The/DT, Basmachi/NN, revolt/NN, that/IN, broke/VBD, out/IN, in/IN, the/DT, wake/NN, of/IN, the/DT, Russian/NNP, Revolution/NNP, of/IN, 1917/NN, was/VBD, quelled/NN, in/IN, the/DT, early/RB, 1920s/NN, and/CC, Tajikistan/NN, became/VBD, an/DT, autonomous/JJ, Soviet/JJ, socialist/JJ, republic/NN, Tajik/NN, ASSR/NN, within/IN, Uzbekistan/NNP, in/IN, 1924/NN, In/IN, 1929/NN, Tajikistan/NN, was/VBD, made/VBN, one/NN, of/IN, the/DT, component/NN, republics/NNS, of/IN, the/DT, Soviet/JJ, Union/NNP, Tajik/NN, Soviet/JJ, Socialist/NNP, Republic/NNP, Tajik/NN, SSR/NN, and/CC, it/PRP/{SSR}, kept/VBD, this/DT, status/NNS, until/IN, 1991/NN, Tajikistan/NN, gained/VBD, independence/NN, in/IN, 1991/NN, and/CC, has/VBZ, experienced/VBN, three/NN, changes/NNS, in/IN, government/NN, and/CC, a/DT, Tajik/NN, civil/JJ, war/NN, civil/JJ, war/NN, since/IN, then/RB, ./., A/DT, peace/NN, agreement/NN, among/IN, rival/JJ, factions/NNS, was/VBD, signed/VBD, in/IN, 1997/NN, but/CC, its/PRP$, implementation/NN, has/VBZ, progressed/VBD, slowly/RB, ./., Pre/NN, Islamic/NNP, Period/NN, 600/NN, BC/NN, ndash/NN, AD/NN, 651/NN, Tajikistan/NN, was/VBD, part/NN, of/IN, the/DT, Archaeological/NN, Complex/NNP, in/IN, the/DT, Bronze/NN, Age/NNP, candidate/NN, for/IN, or/CC, Proto/NN, Iranian/JJ, culture/NN, ./., Tajikistan/NN, was/VBD, part/NN, of/IN, Scythia/NN, in/IN, Classical/JJ, Antiquity/NN, ./., Most/JJS, of/IN, modern/JJ, Tajikstan/NN, had/VBD, formed/VBN, parts/NNS, of/IN, ancient/JJ, Location/NNP, of/IN, the/DT, Kamboja/NN, and/CC, Parama/NN, Kamboja/NN, kingdoms/NNS, which/WDT, find/VB, references/NNS, in/IN, the/DT, ancient/JJ, Indian/NNP, epics/NNS, like/IN, the/DT, Mahabharata/NN, ./., evidence/NN, combined/VBN, with/IN, ancient/JJ, literary/JJ, and/CC, inscriptional/NN, evidence/NN, has/VBZ, led/VBN, many/JJ, eminent/JJ, Indologists/NN, to/TO, conclude/VB, that/IN, ancient/JJ, Kambojas/NN, an/DT, Avestan/NN, speaking/VBG, Iranain/NN, tribe/NN, originally/RB, belonged/VBD, to/TO, the/DT, area/NN, of/IN, Central/JJ, Asia/NNP, ./., Achariya/NN, Yasaka/NN, s/PRP, Nirukta/NN, Nirukta/NN, II/NN, ./., 7th/NN, c/NN, BCE/NNP, attests/VBZ, that/IN, verb/NN, avati/NN, in/IN, the/DT, sense/NN, to/TO, go/VB, was/VBD, used/VBN, by/IN, only/RB, the/DT, Kambojas/NN, ./., It/PRP/{Kambojas}, has/VBZ, been/VBN, shown/VBN, that/IN, the/DT, modern/JJ, Ghalcha/NN, dialects/NNS, Valkhi/NN, Shigali/NN, Sriqoli/NN, Jebaka/NN, also/RB, called/VBN, Sanglichi/NN, or/CC, Ishkashim/NN, Munjani/NN, Yidga/NN, and/CC, Yagnobi/NN, mainly/RB, spoken/VBN, in/IN, Pamir/NN, and/CC, countries/NNS, on/IN, the/DT, headwaters/NNS, of/IN, the/DT, Oxus/NN, still/RB, use/NN, terms/NNS, derived/VBN, from/IN, ancient/JJ, Kamboja/NN, avati/NN, in/IN, the/DT, sense/NN, to/TO, go/VB, ./., Linguistic/JJ, Survey/NNP, of/IN, India/NNP, Vol/NN, X/NN, pp/NN, 456ff/NN, 468/NN, 473/NN, 474/NN, 476/NN, 500/NN, 511/NN, 524/NN, etc/FW, Journal/JJ, of/IN, Royal/JJ, Asiatic/JJ, Society/NNP, of/IN, Asia/NNP, 1911/NN, pp/NN, 801/NN, 802/NN, Sir/NNP, Griersen/NN, India/NNP, as/IN, Known/VBN, to/TO, Panini/NN, 1968/NN, p/NN, 49/NN, Dr/NNP, V/NN, ./., S/NNP, ./., Aggarwala/NN, Geographical/JJ, Data/NNP, in/IN, the/DT, Early/RB, Puranas/NN, A/DT, Critical/JJ, Study/NNP, 1972/NN, p/NN, 164/NN, Dr/NNP, M/NNP, ./., R/NN, ./., Singh/NNP, Bharata/NN, Bhumi/NN, aur/NN, uske/NN, Nivasi/NN, Samvat/NN, 1987/NN, pp/NN, 297/NN, 305/NN, Dr/NNP, J/NNP, ./., C/NN, ./., Vidyalankar/NN, Geographical/JJ, and/CC, Economical/NN, Studies/NNS, in/IN, the/DT, Mahabharata/NN, Upayana/NN, Parva/NN, p/NN, 37/NN, Dr/NNP, Motichandra/NN, Ancient/NNP, Kamboja/NN, People/NNS, and/CC, the/DT, Country/NNP, 1981/NN, pp/NN, 127/NN, 28/NN, 167/NN, 218/NN, Dr/NNP, J/NNP, ./., L/NNP, ./., Kamboj/NN, Sindhant/NN, Kaumudi/NN, 1966/NN, pp/NN, 20/NN, 22/NN, Acharya/NN, R/NN, ./., R/NN, ./., Pande/NN, ./., The/DT, Yagnobi/NN, dialect/NN, spoken/VBN, in/IN, Yagnobi/NN, province/NN, around/IN, the/DT, headwaters/NNS, of/IN, Zeravshan/NN, valley/NN, in/IN, Sogdiana/NN, ./., Further/RB, Sir/NNP, G/NN, Grierson/NN, says/VBZ, that/IN, the/DT, speech/NN, of/IN, Badakshan/NN, was/VBD, a/DT, Ghalcha/NN, till/IN, about/IN, three/NN, centuries/NNS, ago/RB, when/WRB, it/PRP/{centuries}, was/VBD, supplanted/VBN, by/IN, a/DT, form/NN, of/IN, Persian/NNP, Linguistic/JJ, Survey/NNP, of/IN, Indi/NN, a/DT


Segmented sentences from raw text:

0 ["The", "current", "Tajik", "Republic", "harkens", "to", "the", "Samanid", "Empire", "(Anno", "Domini", "|"]
1 ["AD", "The", "Tajik", "people", "came", "under", "Russian", "rule", "in", "the", "1860s", "."]
2 ["The", "Basmachi", "revolt", "that", "broke", "out", "in", "the", "wake", "of", "the", "Russian", "Revolution", "of", "1917", "was", "quelled", "in", "the", "early", "1920s", "and", "Tajikistan", "became", "an", "autonomous", "Soviet", "socialist", "republic", "(Tajik", "ASSR", ")", "within", "Uzbekistan", "in", "1924", "."]
3 ["In", "1929", "Tajikistan", "was", "made", "one", "of", "the", "component", "republics", "of", "the", "Soviet", "Union", "–", "Tajik", "Soviet", "Socialist", "Republic", "(Tajik", "SSR", ")", "–", "and", "it", "kept", "this", "status", "until", "1991", "."]
4 ["Tajikistan", "gained", "independence", "in", "1991", ",", "and", "has", "experienced", "three", "changes", "in", "government", "and", "a", "Tajik", "civil", "war", "|"]
5 ["civil", "war", "since", "then", "."]
6 ["A", "peace", "agreement", "among", "rival", "factions", "was", "signed", "in", "1997", "but", "its", "implementation", "has", "progressed", "slowly", "."]
7 ["Pre-Islamic", "Period", "(600", "BC&ndash", ";AD", "651", ")", "Tajikistan", "was", "part", "of", "the", "Archaeological", "Complex", "in", "the", "Bronze", "Age", ",", "candidate", "for", "or", "Proto-Iranian", "culture", "."]
8 ["Tajikistan", "was", "part", "of", "Scythia", "in", "Classical", "Antiquity", "."]
9 ["Most", "of", "modern", "Tajikstan", "had", "formed", "parts", "of", "ancient", "Location", "of", "the", "Kamboja", "and", "Parama", "Kamboja", "kingdoms", ",", "which", "find", "references", "in", "the", "ancient", "Indian", "epics", "like", "the", "Mahabharata", "."]
10 ["evidence", ",", "combined", "with", "ancient", "literary", "and", "inscriptional", "evidence", "has", "led", "many", "eminent", "Indologists", "to", "conclude", "that", "ancient", "Kambojas", "(\"an", "Avestan", "speaking", "Iranain", "tribe\"", ")", "originally", "belonged", "to", "the", "area\"", "of", "Central", "Asia", "."]
11 ["Achariya", "Yasaka's", "Nirukta", "Nirukta", "II", ".2", "."]
12 ["(\"7th", "c", "BCE\"", ")", "attests", "that", "verb", "\"Śavati\"", "in", "the", "sense", "\"to", "go\"", "was", "used", "by", "only", "the", "Kambojas", "."]
13 ["It", "has", "been", "shown", "that", "the", "modern", "Ghalcha", "dialects", ",", "\"Valkhi", ",", "Shigali", ",", "Sriqoli", ",", "Jebaka", "(also", "called", "Sanglichi", "or", "Ishkashim", ")", ",", "Munjani", ",", "Yidga", "and", "Yagnobi\"", ",", "mainly", "spoken", "in", "Pamir", "and", "countries", "on", "the", "headwaters", "of", "the", "Oxus", ",", "still", "use", "terms", "derived", "from", "ancient", "Kamboja", "\"Śavati\"", "in", "the", "sense", "\"to", "go\"", "."]
14 ["Linguistic", "Survey", "of", "India", ",", "Vol", "X", ",", "pp", "456ff", ",", "468", ",", "473", ",", "474", ",", "476", ",", "500", ",", "511", ",", "524", "etc", ";", "Journal", "of", "Royal", "Asiatic", "Society", "of", "Asia", ",", "1911", ",", "pp", "801-802", ",", "Sir", "Griersen", ";", "India", "as", "Known", "to", "Panini", ",", "1968", ",", "p", "49", ",", "Dr", "V.", "S.", "Aggarwala", ";", "Geographical", "Data", "in", "the", "Early", "Puranas", ",", "A", "Critical", "Study", ",", "1972", ",", "p", "164", ",", "Dr", "M.", "R.", "Singh", ";", "Bharata", "Bhumi", "aur", "uske", "Nivasi", ",", "Samvat", "1987", ",", "pp", "297-305", ",", "Dr", "J.", "C.", "Vidyalankar", ";", "Geographical", "and", "Economical", "Studies", "in", "the", "Mahabharata", ",", "Upayana", "Parva", ",", "p", "37", ",", "Dr", "Motichandra", ";", "Ancient", "Kamboja", ",", "People", "and", "the", "Country", ",", "1981", ",", "pp", "127-28", ",", "167", ",", "218", ",", "Dr", "J.", "L.", "Kamboj", ";", "Sindhant", "Kaumudi", "1966", ",", "pp", "20-22", ",", "Acharya", "R.", "R.", "Pande", "."]
15 ["The", "Yagnobi", "dialect", "spoken", "in", "Yagnobi", "province", "around", "the", "headwaters", "of", "Zeravshan", "valley", "in", "Sogdiana", "."]