An assessment of the Nam Pehchan computer program for the identification of names of south Asian ethnic origin

Carole Cummins; Heather Winter; Kar Keung Cheng; Roger Maric; P. Silcocks; Cherian Varghese

doi:10.1093/pubmed/21.4.401

An assessment of the Nam Pehchan computer program for the identification of names of south Asian ethnic origin

Carole Cummins^*, Heather Winter, Kar Keung Cheng, Roger Maric, P. Silcocks, Cherian Varghese

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

124 Citations (Scopus)

Abstract

Background. An assessment was made of the usefulness and accuracy of a computer program for the identification of the south Asian population through the classification of names on a disease register. Methods. The computer program, Nam Pehchan, was used to classify names as either south Asian or non south Asian. The results were compared with a reference standard, which combined use of the program with visual inspection. The latter was facilitated by a computer-generated dictionary of common non south Asian names. The data set consisted of 356 555 cases of incident cancer (ICD9: 140-208) registered between 1990 and 1992 by Thames, Trent, West Midlands and Yorkshire cancer registries. Results. Nam Pehchan classified 5506 cases as south Asian. Visual inspection identified 2024 false positives (36.8 per cent of all cases identified as south Asian by Nam Pehchan) and 363 false negatives (9.5 per cent of those identified by the reference standard). Compared with the reference standard, Nam Pehchan had a sensitivity of 90.5 per cent and a positive predictive value of 63.2 per cent. Conclusion. The Nam Pehchan program quickly identified a high proportion of the names classified as south Asian by the reference standard, but the high false positive rate means that the program alone is not an adequate single strategy. The time-consuming process of inspection of program negatives for large data sets can be substantially reduced by comparison with dictionaries of common non south Asian names.

Original language	English
Pages (from-to)	401-406
Number of pages	6
Journal	Journal of Public Health Medicine
Volume	21
Issue number	4
DOIs	https://doi.org/10.1093/pubmed/21.4.401
Publication status	Published - Dec 1999

Keywords

Nam Pehchan computer program
South Asian names

ASJC Scopus subject areas

Public Health, Environmental and Occupational Health

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

10.1093/pubmed/21.4.401

Cite this

@article{59cb999fe5e24cb28f38196728dca0d5,

title = "An assessment of the Nam Pehchan computer program for the identification of names of south Asian ethnic origin",

abstract = "Background. An assessment was made of the usefulness and accuracy of a computer program for the identification of the south Asian population through the classification of names on a disease register. Methods. The computer program, Nam Pehchan, was used to classify names as either south Asian or non south Asian. The results were compared with a reference standard, which combined use of the program with visual inspection. The latter was facilitated by a computer-generated dictionary of common non south Asian names. The data set consisted of 356 555 cases of incident cancer (ICD9: 140-208) registered between 1990 and 1992 by Thames, Trent, West Midlands and Yorkshire cancer registries. Results. Nam Pehchan classified 5506 cases as south Asian. Visual inspection identified 2024 false positives (36.8 per cent of all cases identified as south Asian by Nam Pehchan) and 363 false negatives (9.5 per cent of those identified by the reference standard). Compared with the reference standard, Nam Pehchan had a sensitivity of 90.5 per cent and a positive predictive value of 63.2 per cent. Conclusion. The Nam Pehchan program quickly identified a high proportion of the names classified as south Asian by the reference standard, but the high false positive rate means that the program alone is not an adequate single strategy. The time-consuming process of inspection of program negatives for large data sets can be substantially reduced by comparison with dictionaries of common non south Asian names.",

keywords = "Nam Pehchan computer program, South Asian names",

author = "Carole Cummins and Heather Winter and Cheng, {Kar Keung} and Roger Maric and P. Silcocks and Cherian Varghese",

year = "1999",

month = dec,

doi = "10.1093/pubmed/21.4.401",

language = "English",

volume = "21",

pages = "401--406",

journal = "Journal of Public Health Medicine",

issn = "0957-4832",

publisher = "Oxford University Press",

number = "4",

}

TY - JOUR

T1 - An assessment of the Nam Pehchan computer program for the identification of names of south Asian ethnic origin

AU - Cummins, Carole

AU - Winter, Heather

AU - Cheng, Kar Keung

AU - Maric, Roger

AU - Silcocks, P.

AU - Varghese, Cherian

PY - 1999/12

Y1 - 1999/12

N2 - Background. An assessment was made of the usefulness and accuracy of a computer program for the identification of the south Asian population through the classification of names on a disease register. Methods. The computer program, Nam Pehchan, was used to classify names as either south Asian or non south Asian. The results were compared with a reference standard, which combined use of the program with visual inspection. The latter was facilitated by a computer-generated dictionary of common non south Asian names. The data set consisted of 356 555 cases of incident cancer (ICD9: 140-208) registered between 1990 and 1992 by Thames, Trent, West Midlands and Yorkshire cancer registries. Results. Nam Pehchan classified 5506 cases as south Asian. Visual inspection identified 2024 false positives (36.8 per cent of all cases identified as south Asian by Nam Pehchan) and 363 false negatives (9.5 per cent of those identified by the reference standard). Compared with the reference standard, Nam Pehchan had a sensitivity of 90.5 per cent and a positive predictive value of 63.2 per cent. Conclusion. The Nam Pehchan program quickly identified a high proportion of the names classified as south Asian by the reference standard, but the high false positive rate means that the program alone is not an adequate single strategy. The time-consuming process of inspection of program negatives for large data sets can be substantially reduced by comparison with dictionaries of common non south Asian names.

AB - Background. An assessment was made of the usefulness and accuracy of a computer program for the identification of the south Asian population through the classification of names on a disease register. Methods. The computer program, Nam Pehchan, was used to classify names as either south Asian or non south Asian. The results were compared with a reference standard, which combined use of the program with visual inspection. The latter was facilitated by a computer-generated dictionary of common non south Asian names. The data set consisted of 356 555 cases of incident cancer (ICD9: 140-208) registered between 1990 and 1992 by Thames, Trent, West Midlands and Yorkshire cancer registries. Results. Nam Pehchan classified 5506 cases as south Asian. Visual inspection identified 2024 false positives (36.8 per cent of all cases identified as south Asian by Nam Pehchan) and 363 false negatives (9.5 per cent of those identified by the reference standard). Compared with the reference standard, Nam Pehchan had a sensitivity of 90.5 per cent and a positive predictive value of 63.2 per cent. Conclusion. The Nam Pehchan program quickly identified a high proportion of the names classified as south Asian by the reference standard, but the high false positive rate means that the program alone is not an adequate single strategy. The time-consuming process of inspection of program negatives for large data sets can be substantially reduced by comparison with dictionaries of common non south Asian names.

KW - Nam Pehchan computer program

KW - South Asian names

UR - http://www.scopus.com/inward/record.url?scp=0033401410&partnerID=8YFLogxK

U2 - 10.1093/pubmed/21.4.401

DO - 10.1093/pubmed/21.4.401

M3 - Article

C2 - 11469361

AN - SCOPUS:0033401410

SN - 0957-4832

VL - 21

SP - 401

EP - 406

JO - Journal of Public Health Medicine

JF - Journal of Public Health Medicine

IS - 4

ER -

An assessment of the Nam Pehchan computer program for the identification of names of south Asian ethnic origin

Abstract

Keywords

ASJC Scopus subject areas

UN SDGs

Access to Document

Fingerprint

Cite this