Production probability estimators for context-free grammars

Keith Humenik, Roger S. Pinkham

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

The problem of production probability estimation is considered. We examine the estimation of production probabilities for context-free probabilistic grammars (CFPGs). Given a context-free grammar, G, and a random sample of strings from L(G), we define ratio estimators to estimate the production probabilities of G. It is shown that the statistical analysis of the estimators becomes much less complex by using the theory of random walks. The main result of the article is that the biases and variances of all ratio estimators for any Chomsky Normal Form CFPG are approximately directly proportional to 1 n, where n is the size of the random sample. Random samples consisting of strings generated from an independent source are used to validate theoretical results. We show that the estimates can be used to increase the efficiency of parsing strings in a language. The ability to parse strings efficiently is extremely useful in compiler theory and the theory of artificial intelligence.

Original languageEnglish
Pages (from-to)43-53
Number of pages11
JournalThe Journal of Systems and Software
Volume12
Issue number1
DOIs
StatePublished - Apr 1990

Fingerprint

Dive into the research topics of 'Production probability estimators for context-free grammars'. Together they form a unique fingerprint.

Cite this