Journal of Research (Urdu), BZU - Multan

(جرنل آف ریسرچ (اردو

Bahauddin Zakariya University, Multan (Pakistan)
ISSN (print): 1726-9067
ISSN (online): 1816-3424

اردو کارپس: تکنیکی تعارف، اہمیت، ضرورت اور دائرہ و لائحہ عمل

  • Hafiz Safwan Muhammad Chohan/
  • December 31, 2008
Keywords
Urdu Corpus Language Linguistics Terms Techniques COCA TBU
Abstract

This article emphasizes the need of Urdu corpus on the example of the Bank of English and the Corpus of Contemporary American English (COCA) which are serving as the backbone of English language engineering, discourse analysis, corpus & lexicon development and works of the same fiber. This proposed Urdu corpus, namely The Bank of Urdu (TBU), will be a repository of Urdu texts of both written and spoken language gathered in platform-independent & machine-readable Indo-Perso-Arabic script. Since the mentioned English corpora have exactly the same architecture and interface so while comparing the TBU with the structure of English corpora, the name "English Corpus" will refer to both these repositories in this document. Add to devising its scope, technical and design issues of the architecture & interface of TBU are discussed in this introductory paper. Issues like those of code-mixing, false friends and homonyms in Urdu are addressed. Together, solution is given to standardize the Urdu orthograph for this work. Exemplary web view of the user interface is provided. Available Urdu written texts are mostly literature-oriented, so from the data gathering standpoint the proposed TBU must deviate from standard roadways of the English corpora at many instances. This fact is specially dealt with. A study of word-count and of lexicalizing high-frequency Urdu words in Urdu dictionaries of note is made part of this thesis. Aimed at discourse analysis, language engineering and natural language processing in Urdu, and of course, providing vital base for contemporary Urdu lexicon development, this proposed portal will not only separate Urdu language from Urdu literature but will also cast regional Pakistani languages in stationing their scholarly resources in their own scripts for such researches. This paper on TBU is a proposal of Dr Hafiz Safwan Muhammad Chohan for giving initial shape to the idea of Urdu Data Bank (UDB) of the Center of Excellence for Urdu Informatics (CEUI), National Language Authority (NLA) Islamabad. Due to homonymy of UDB with the Urdu Data Base, UDB was renamed as TBU at the CEUI in a consensus with the scholars of Urdu, IT professionals and representatives of the GoP from Cabinet Division & Planning Division. In this national workshop viz. "Urdu Informatics" Today & Tomorrow" held on 7-8 June 2008 in the NLA, Dr Chohan also coined the Urdu equivalent of TBU as اردو مثال گھر which was accepted by the participants. Acknowledgement & Dedication: Dr Hafiz Safwan Muhammad Chohan has been in contact with Prof John McHardy Sinclair (June 14, 1933 - March 13, 2007), Professor of Modern English Language at Birmingham University, 1965-2000. He pioneered work in corpus linguistics, discourse analysis, lexicography, and language teaching, and was the man behind the machine gun of British National Corpus (BNC) and the Collins COBUILD dictionaries. There is no trend of dedicating research papers to any person but with high regret that this paper (both in Urdu & in English) was not written when he was alive, this effort is being dedicated to him.

How to cite

Chohan, H. S. M. (2020). اردو کارپس: تکنیکی تعارف، اہمیت، ضرورت اور دائرہ و لائحہ عمل. (جرنل آف ریسرچ (اردو, 14(1), 121–146. https://jorurdu.bzu.edu.pk/website/journal/article/5e7f442b70397/page

Retrieved from https://jorurdu.bzu.edu.pk/website/journal/article/5e7f442b70397/page

More citation formats
ACM SIG Proceedings
[1]Chohan, H.S.M. 2020. اردو کارپس: تکنیکی تعارف، اہمیت، ضرورت اور دائرہ و لائحہ عمل. (جرنل آف ریسرچ (اردو. 14, 1 (Mar. 2020), 121–146.
ACS Nano
(1)Chohan, H. S. M. اردو کارپس: تکنیکی تعارف، اہمیت، ضرورت اور دائرہ و لائحہ عمل. (جرنل آف ریسرچ (اردو 2020, 14 (1), 121–146.
ABNT
CHOHAN, Hafiz Safwan Muhammad. اردو کارپس: تکنیکی تعارف، اہمیت، ضرورت اور دائرہ و لائحہ عمل. (جرنل آف ریسرچ (اردو, v. 14, n. 1, p. 121–146, 28 Mar.2020.
Chicago (author-date)

            
Harvard (Cite Them Right)
Chohan, H.S.M. (2020) “اردو کارپس: تکنیکی تعارف، اہمیت، ضرورت اور دائرہ و لائحہ عمل”, (جرنل آف ریسرچ (اردو, 14(1), pp. 121–146. Available at: https://jorurdu.bzu.edu.pk/website/journal/article/5e7f442b70397/page (Accessed: 30 May 2026).
IEEE
[1]H. S. M. Chohan, “اردو کارپس: تکنیکی تعارف، اہمیت، ضرورت اور دائرہ و لائحہ عمل”, (جرنل آف ریسرچ (اردو, vol. 14, no. 1, pp. 121–146, Mar. 2020, Accessed: May 30, 2026. [Online]. Available: https://jorurdu.bzu.edu.pk/website/journal/article/5e7f442b70397/page
MLA
Chohan, Hafiz Safwan Muhammad. “اردو کارپس: تکنیکی تعارف، اہمیت، ضرورت اور دائرہ و لائحہ عمل”. (جرنل آف ریسرچ (اردو, vols. 14, no. 1, Mar. 2020, pp. 121–146, https://jorurdu.bzu.edu.pk/website/journal/article/5e7f442b70397/page.
Turabian (full note bibliography)
Chohan, Hafiz Safwan Muhammad. “اردو کارپس: تکنیکی تعارف، اہمیت، ضرورت اور دائرہ و لائحہ عمل”. (جرنل آف ریسرچ (اردو 14, no. 1 (March 28, 2020): 121–146. Accessed May 30, 2026. https://jorurdu.bzu.edu.pk/website/journal/article/5e7f442b70397/page.
Vancouver
1.Chohan HSM. اردو کارپس: تکنیکی تعارف، اہمیت، ضرورت اور دائرہ و لائحہ عمل. (جرنل آف ریسرچ (اردو [Internet]. 2020 Mar. 28 [cited 2026 May 30];14(1):121–146. Available from: https://jorurdu.bzu.edu.pk/website/journal/article/5e7f442b70397/page
AMA
1.Chohan HSM. اردو کارپس: تکنیکی تعارف، اہمیت، ضرورت اور دائرہ و لائحہ عمل. (جرنل آف ریسرچ (اردو. 2020;14(1):121–146. Accessed May 30, 2026. https://jorurdu.bzu.edu.pk/website/journal/article/5e7f442b70397/page

Download citation Endnote/Zotero/Mendeley (RIS) BibTeX

Author(s):

Pakistan Telecommunication Company Limited, Faisalabad

Pakistan

  • 0092 333 524 60 94

Details:

Type: Article
Volume: 14
Issue: 1
Language: Urdu
Id: 5e7f442b70397
Pages 121 - 146
Discipline: Urdu
Published December 31, 2008

Statistics

  • 682
  • 288
  • 325

Copyrights

Journal of Research (Urdu) uses Creative Commons license Authors, retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.