nltk.corpus package¶
Subpackages¶
- nltk.corpus.reader package
- Submodules
- nltk.corpus.reader.aligned module
- nltk.corpus.reader.api module
- nltk.corpus.reader.bcp47 module
- nltk.corpus.reader.bnc module
- nltk.corpus.reader.bracket_parse module
- nltk.corpus.reader.categorized_sents module
- nltk.corpus.reader.chasen module
- nltk.corpus.reader.childes module
- nltk.corpus.reader.chunked module
- nltk.corpus.reader.cmudict module
- nltk.corpus.reader.comparative_sents module
- nltk.corpus.reader.conll module
- nltk.corpus.reader.crubadan module
- nltk.corpus.reader.dependency module
- nltk.corpus.reader.framenet module
- nltk.corpus.reader.ieer module
- nltk.corpus.reader.indian module
- nltk.corpus.reader.ipipan module
- nltk.corpus.reader.knbc module
- nltk.corpus.reader.lin module
- nltk.corpus.reader.markdown module
- nltk.corpus.reader.mte module
- nltk.corpus.reader.nkjp module
- nltk.corpus.reader.nombank module
- nltk.corpus.reader.nps_chat module
- nltk.corpus.reader.opinion_lexicon module
- nltk.corpus.reader.panlex_lite module
- nltk.corpus.reader.panlex_swadesh module
- nltk.corpus.reader.pl196x module
- nltk.corpus.reader.plaintext module
- nltk.corpus.reader.ppattach module
- nltk.corpus.reader.propbank module
- nltk.corpus.reader.pros_cons module
- nltk.corpus.reader.reviews module
- nltk.corpus.reader.rte module
- nltk.corpus.reader.semcor module
- nltk.corpus.reader.senseval module
- nltk.corpus.reader.sentiwordnet module
- nltk.corpus.reader.sinica_treebank module
- nltk.corpus.reader.string_category module
- nltk.corpus.reader.switchboard module
- nltk.corpus.reader.tagged module
- nltk.corpus.reader.timit module
- nltk.corpus.reader.toolbox module
- nltk.corpus.reader.twitter module
- nltk.corpus.reader.udhr module
- nltk.corpus.reader.util module
- nltk.corpus.reader.verbnet module
- nltk.corpus.reader.wordlist module
- nltk.corpus.reader.wordnet module
- nltk.corpus.reader.xmldocs module
- nltk.corpus.reader.ycoe module
- Module contents
- Corpus Reader Functions
AlignedCorpusReader
AlpinoCorpusReader
BCP47CorpusReader
BNCCorpusReader
BracketParseCorpusReader
CHILDESCorpusReader
CHILDESCorpusReader.MLU()
CHILDESCorpusReader.__init__()
CHILDESCorpusReader.age()
CHILDESCorpusReader.childes_url_base
CHILDESCorpusReader.convert_age()
CHILDESCorpusReader.corpus()
CHILDESCorpusReader.participants()
CHILDESCorpusReader.sents()
CHILDESCorpusReader.tagged_sents()
CHILDESCorpusReader.tagged_words()
CHILDESCorpusReader.webview_file()
CHILDESCorpusReader.words()
CMUDictCorpusReader
CategorizedBracketParseCorpusReader
CategorizedBracketParseCorpusReader.__init__()
CategorizedBracketParseCorpusReader.parsed_paras()
CategorizedBracketParseCorpusReader.parsed_sents()
CategorizedBracketParseCorpusReader.parsed_words()
CategorizedBracketParseCorpusReader.tagged_paras()
CategorizedBracketParseCorpusReader.tagged_sents()
CategorizedBracketParseCorpusReader.tagged_words()
CategorizedCorpusReader
CategorizedPlaintextCorpusReader
CategorizedSentencesCorpusReader
CategorizedTaggedCorpusReader
ChasenCorpusReader
ChunkedCorpusReader
ChunkedCorpusReader.__init__()
ChunkedCorpusReader.chunked_paras()
ChunkedCorpusReader.chunked_sents()
ChunkedCorpusReader.chunked_words()
ChunkedCorpusReader.paras()
ChunkedCorpusReader.sents()
ChunkedCorpusReader.tagged_paras()
ChunkedCorpusReader.tagged_sents()
ChunkedCorpusReader.tagged_words()
ChunkedCorpusReader.words()
ComparativeSentencesCorpusReader
ComparativeSentencesCorpusReader.CorpusView
ComparativeSentencesCorpusReader.__init__()
ComparativeSentencesCorpusReader.comparisons()
ComparativeSentencesCorpusReader.keywords()
ComparativeSentencesCorpusReader.keywords_readme()
ComparativeSentencesCorpusReader.sents()
ComparativeSentencesCorpusReader.words()
ConllChunkCorpusReader
ConllCorpusReader
ConllCorpusReader.CHUNK
ConllCorpusReader.COLUMN_TYPES
ConllCorpusReader.IGNORE
ConllCorpusReader.NE
ConllCorpusReader.POS
ConllCorpusReader.SRL
ConllCorpusReader.TREE
ConllCorpusReader.WORDS
ConllCorpusReader.__init__()
ConllCorpusReader.chunked_sents()
ConllCorpusReader.chunked_words()
ConllCorpusReader.iob_sents()
ConllCorpusReader.iob_words()
ConllCorpusReader.parsed_sents()
ConllCorpusReader.sents()
ConllCorpusReader.srl_instances()
ConllCorpusReader.srl_spans()
ConllCorpusReader.tagged_sents()
ConllCorpusReader.tagged_words()
ConllCorpusReader.words()
CorpusReader
CrubadanCorpusReader
DependencyCorpusReader
EuroparlCorpusReader
FramenetCorpusReader
FramenetCorpusReader.__init__()
FramenetCorpusReader.annotations()
FramenetCorpusReader.buildindexes()
FramenetCorpusReader.doc()
FramenetCorpusReader.docs()
FramenetCorpusReader.docs_metadata()
FramenetCorpusReader.exemplars()
FramenetCorpusReader.fe_relations()
FramenetCorpusReader.fes()
FramenetCorpusReader.frame()
FramenetCorpusReader.frame_by_id()
FramenetCorpusReader.frame_by_name()
FramenetCorpusReader.frame_ids_and_names()
FramenetCorpusReader.frame_relation_types()
FramenetCorpusReader.frame_relations()
FramenetCorpusReader.frames()
FramenetCorpusReader.frames_by_lemma()
FramenetCorpusReader.ft_sents()
FramenetCorpusReader.help()
FramenetCorpusReader.lu()
FramenetCorpusReader.lu_basic()
FramenetCorpusReader.lu_ids_and_names()
FramenetCorpusReader.lus()
FramenetCorpusReader.propagate_semtypes()
FramenetCorpusReader.semtype()
FramenetCorpusReader.semtype_inherits()
FramenetCorpusReader.semtypes()
FramenetCorpusReader.sents()
FramenetCorpusReader.warnings()
IEERCorpusReader
IPIPANCorpusReader
IPIPANCorpusReader.__init__()
IPIPANCorpusReader.categories()
IPIPANCorpusReader.channels()
IPIPANCorpusReader.domains()
IPIPANCorpusReader.fileids()
IPIPANCorpusReader.paras()
IPIPANCorpusReader.sents()
IPIPANCorpusReader.tagged_paras()
IPIPANCorpusReader.tagged_sents()
IPIPANCorpusReader.tagged_words()
IPIPANCorpusReader.words()
IndianCorpusReader
KNBCorpusReader
LinThesaurusCorpusReader
MTECorpusReader
MWAPPDBCorpusReader
MacMorphoCorpusReader
NKJPCorpusReader
NKJPCorpusReader.HEADER_MODE
NKJPCorpusReader.RAW_MODE
NKJPCorpusReader.SENTS_MODE
NKJPCorpusReader.WORDS_MODE
NKJPCorpusReader.__init__()
NKJPCorpusReader.add_root()
NKJPCorpusReader.fileids()
NKJPCorpusReader.get_paths()
NKJPCorpusReader.header()
NKJPCorpusReader.raw()
NKJPCorpusReader.sents()
NKJPCorpusReader.tagged_words()
NKJPCorpusReader.words()
NPSChatCorpusReader
NombankCorpusReader
NonbreakingPrefixesCorpusReader
OpinionLexiconCorpusReader
PPAttachmentCorpusReader
PanLexLiteCorpusReader
PanlexSwadeshCorpusReader
PanlexSwadeshCorpusReader.__init__()
PanlexSwadeshCorpusReader.entries()
PanlexSwadeshCorpusReader.get_languages()
PanlexSwadeshCorpusReader.get_macrolanguages()
PanlexSwadeshCorpusReader.language_codes()
PanlexSwadeshCorpusReader.license()
PanlexSwadeshCorpusReader.words_by_iso639()
PanlexSwadeshCorpusReader.words_by_lang()
Pl196xCorpusReader
Pl196xCorpusReader.__init__()
Pl196xCorpusReader.decode_tag()
Pl196xCorpusReader.head_len
Pl196xCorpusReader.paras()
Pl196xCorpusReader.sents()
Pl196xCorpusReader.tagged_paras()
Pl196xCorpusReader.tagged_sents()
Pl196xCorpusReader.tagged_words()
Pl196xCorpusReader.textids()
Pl196xCorpusReader.words()
Pl196xCorpusReader.xml()
PlaintextCorpusReader
PortugueseCategorizedPlaintextCorpusReader
PropbankCorpusReader
ProsConsCorpusReader
RTECorpusReader
ReviewsCorpusReader
SemcorCorpusReader
SensevalCorpusReader
SentiSynset
SentiWordNetCorpusReader
SinicaTreebankCorpusReader
StringCategoryCorpusReader
SwadeshCorpusReader
SwitchboardCorpusReader
SyntaxCorpusReader
TEICorpusView
TaggedCorpusReader
TimitCorpusReader
TimitCorpusReader.__init__()
TimitCorpusReader.audiodata()
TimitCorpusReader.fileids()
TimitCorpusReader.phone_times()
TimitCorpusReader.phone_trees()
TimitCorpusReader.phones()
TimitCorpusReader.play()
TimitCorpusReader.sent_times()
TimitCorpusReader.sentid()
TimitCorpusReader.sents()
TimitCorpusReader.spkrid()
TimitCorpusReader.spkrinfo()
TimitCorpusReader.spkrutteranceids()
TimitCorpusReader.transcription_dict()
TimitCorpusReader.utterance()
TimitCorpusReader.utteranceids()
TimitCorpusReader.wav()
TimitCorpusReader.word_times()
TimitCorpusReader.words()
TimitTaggedCorpusReader
ToolboxCorpusReader
TwitterCorpusReader
UdhrCorpusReader
UnicharsCorpusReader
VerbnetCorpusReader
VerbnetCorpusReader.__init__()
VerbnetCorpusReader.classids()
VerbnetCorpusReader.fileids()
VerbnetCorpusReader.frames()
VerbnetCorpusReader.lemmas()
VerbnetCorpusReader.longid()
VerbnetCorpusReader.pprint()
VerbnetCorpusReader.pprint_frames()
VerbnetCorpusReader.pprint_members()
VerbnetCorpusReader.pprint_subclasses()
VerbnetCorpusReader.pprint_themroles()
VerbnetCorpusReader.shortid()
VerbnetCorpusReader.subclasses()
VerbnetCorpusReader.themroles()
VerbnetCorpusReader.vnclass()
VerbnetCorpusReader.wordnetids()
WordListCorpusReader
WordNetCorpusReader
WordNetCorpusReader.ADJ
WordNetCorpusReader.ADJ_SAT
WordNetCorpusReader.ADV
WordNetCorpusReader.MORPHOLOGICAL_SUBSTITUTIONS
WordNetCorpusReader.NOUN
WordNetCorpusReader.VERB
WordNetCorpusReader.__init__()
WordNetCorpusReader.add_exomw()
WordNetCorpusReader.add_omw()
WordNetCorpusReader.add_provs()
WordNetCorpusReader.all_eng_synsets()
WordNetCorpusReader.all_lemma_names()
WordNetCorpusReader.all_omw_synsets()
WordNetCorpusReader.all_synsets()
WordNetCorpusReader.citation()
WordNetCorpusReader.custom_lemmas()
WordNetCorpusReader.digraph()
WordNetCorpusReader.disable_custom_lemmas()
WordNetCorpusReader.doc()
WordNetCorpusReader.get_version()
WordNetCorpusReader.ic()
WordNetCorpusReader.index_sense()
WordNetCorpusReader.jcn_similarity()
WordNetCorpusReader.langs()
WordNetCorpusReader.lch_similarity()
WordNetCorpusReader.lemma()
WordNetCorpusReader.lemma_count()
WordNetCorpusReader.lemma_from_key()
WordNetCorpusReader.lemmas()
WordNetCorpusReader.license()
WordNetCorpusReader.lin_similarity()
WordNetCorpusReader.map_to_many()
WordNetCorpusReader.map_to_one()
WordNetCorpusReader.map_wn()
WordNetCorpusReader.merged_synsets()
WordNetCorpusReader.morphy()
WordNetCorpusReader.of2ss()
WordNetCorpusReader.path_similarity()
WordNetCorpusReader.readme()
WordNetCorpusReader.res_similarity()
WordNetCorpusReader.split_synsets()
WordNetCorpusReader.ss2of()
WordNetCorpusReader.synonyms()
WordNetCorpusReader.synset()
WordNetCorpusReader.synset_from_pos_and_offset()
WordNetCorpusReader.synset_from_sense_key()
WordNetCorpusReader.synsets()
WordNetCorpusReader.words()
WordNetCorpusReader.wup_similarity()
WordNetICCorpusReader
XMLCorpusReader
YCOECorpusReader
find_corpus_fileids()
tagged_treebank_para_block_reader()
- Submodules
Submodules¶
Module contents¶
NLTK corpus readers. The modules in this package provide functions that can be used to read corpus files in a variety of formats. These functions can be used to read both the corpus files that are distributed in the NLTK corpus package, and corpus files that are part of external corpora.
Available Corpora¶
Please see https://www.nltk.org/nltk_data/ for a complete list. Install corpora using nltk.download().
Corpus Reader Functions¶
Each corpus module defines one or more “corpus reader functions”,
which can be used to read documents from that corpus. These functions
take an argument, item
, which is used to indicate which document
should be read from the corpus:
If
item
is one of the unique identifiers listed in the corpus module’sitems
variable, then the corresponding document will be loaded from the NLTK corpus package.If
item
is a filename, then that file will be read.
Additionally, corpus reader functions can be given lists of item names; in which case, they will return a concatenation of the corresponding documents.
Corpus reader functions are named based on the type of information they return. Some common examples, and their return types, are:
words(): list of str
sents(): list of (list of str)
paras(): list of (list of (list of str))
tagged_words(): list of (str,str) tuple
tagged_sents(): list of (list of (str,str))
tagged_paras(): list of (list of (list of (str,str)))
chunked_sents(): list of (Tree w/ (str,str) leaves)
parsed_sents(): list of (Tree with str leaves)
parsed_paras(): list of (list of (Tree with str leaves))
xml(): A single xml ElementTree
raw(): unprocessed corpus contents
For example, to read a list of the words in the Brown Corpus, use
nltk.corpus.brown.words()
:
>>> from nltk.corpus import brown
>>> print(", ".join(brown.words()))
The, Fulton, County, Grand, Jury, said, ...