European
Data Catalogues
Dataset

CKAN

Sub menu


english-gigaword

Dataset Profile

Odm ID
50b2e0a5-7413-4128-82b6-f429bb14e44d
Title
english-gigaword
Notes
This is a recipe to train word n-gram language models using the newswire text provided in the English Gigaword corpus (1200M words of NYT, APW, AFE, XIE). It also prepares dictionaries needed to use the LMs with the HTK and Sphinx speech recognizers.
Author
Keith Vertanen
Author Email
Catalogue Url
Dataset Url
Metadata Updated
2015-09-14 22:59:26
Tags
Date Released
Date Updated
Update Frequency
Organisation
Global
Country
State
Platform
ckan
Language
en
Version
(not set)