Author : tmlab / Date : 2016. 10. 27. 17:54 / Category : Text Mining/Python
import nltk
nltk.download("all")
[nltk_data] Downloading collection 'all' [nltk_data] | [nltk_data] | Downloading package abc to /home/jester/nltk_data... [nltk_data] | Package abc is already up-to-date! [nltk_data] | Downloading package alpino to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package alpino is already up-to-date! [nltk_data] | Downloading package biocreative_ppi to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package biocreative_ppi is already up-to-date! [nltk_data] | Downloading package brown to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package brown is already up-to-date! [nltk_data] | Downloading package brown_tei to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package brown_tei is already up-to-date! [nltk_data] | Downloading package cess_cat to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package cess_cat is already up-to-date! [nltk_data] | Downloading package cess_esp to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package cess_esp is already up-to-date! [nltk_data] | Downloading package chat80 to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package chat80 is already up-to-date! [nltk_data] | Downloading package city_database to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package city_database is already up-to-date! [nltk_data] | Downloading package cmudict to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package cmudict is already up-to-date! [nltk_data] | Downloading package comparative_sentences to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package comparative_sentences is already up-to- [nltk_data] | date! [nltk_data] | Downloading package comtrans to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package comtrans is already up-to-date! [nltk_data] | Downloading package conll2000 to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package conll2000 is already up-to-date! [nltk_data] | Downloading package conll2002 to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package conll2002 is already up-to-date! [nltk_data] | Downloading package conll2007 to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package conll2007 is already up-to-date! [nltk_data] | Downloading package crubadan to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package crubadan is already up-to-date! [nltk_data] | Downloading package dependency_treebank to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package dependency_treebank is already up-to-date! [nltk_data] | Downloading package europarl_raw to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package europarl_raw is already up-to-date! [nltk_data] | Downloading package floresta to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package floresta is already up-to-date! [nltk_data] | Downloading package framenet_v15 to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package framenet_v15 is already up-to-date! [nltk_data] | Downloading package gazetteers to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package gazetteers is already up-to-date! [nltk_data] | Downloading package genesis to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package genesis is already up-to-date! [nltk_data] | Downloading package gutenberg to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package gutenberg is already up-to-date! [nltk_data] | Downloading package ieer to /home/jester/nltk_data... [nltk_data] | Package ieer is already up-to-date! [nltk_data] | Downloading package inaugural to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package inaugural is already up-to-date! [nltk_data] | Downloading package indian to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package indian is already up-to-date! [nltk_data] | Downloading package jeita to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package jeita is already up-to-date! [nltk_data] | Downloading package kimmo to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package kimmo is already up-to-date! [nltk_data] | Downloading package knbc to /home/jester/nltk_data... [nltk_data] | Package knbc is already up-to-date! [nltk_data] | Downloading package lin_thesaurus to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package lin_thesaurus is already up-to-date! [nltk_data] | Downloading package mac_morpho to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package mac_morpho is already up-to-date! [nltk_data] | Downloading package machado to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package machado is already up-to-date! [nltk_data] | Downloading package masc_tagged to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package masc_tagged is already up-to-date! [nltk_data] | Downloading package moses_sample to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package moses_sample is already up-to-date! [nltk_data] | Downloading package movie_reviews to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package movie_reviews is already up-to-date! [nltk_data] | Downloading package names to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package names is already up-to-date! [nltk_data] | Downloading package nombank.1.0 to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package nombank.1.0 is already up-to-date! [nltk_data] | Downloading package nps_chat to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package nps_chat is already up-to-date! [nltk_data] | Downloading package omw to /home/jester/nltk_data... [nltk_data] | Package omw is already up-to-date! [nltk_data] | Downloading package opinion_lexicon to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package opinion_lexicon is already up-to-date! [nltk_data] | Downloading package paradigms to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package paradigms is already up-to-date! [nltk_data] | Downloading package pil to /home/jester/nltk_data... [nltk_data] | Package pil is already up-to-date! [nltk_data] | Downloading package pl196x to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package pl196x is already up-to-date! [nltk_data] | Downloading package ppattach to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package ppattach is already up-to-date! [nltk_data] | Downloading package problem_reports to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package problem_reports is already up-to-date! [nltk_data] | Downloading package propbank to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package propbank is already up-to-date! [nltk_data] | Downloading package ptb to /home/jester/nltk_data... [nltk_data] | Package ptb is already up-to-date! [nltk_data] | Downloading package product_reviews_1 to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package product_reviews_1 is already up-to-date! [nltk_data] | Downloading package product_reviews_2 to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package product_reviews_2 is already up-to-date! [nltk_data] | Downloading package pros_cons to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package pros_cons is already up-to-date! [nltk_data] | Downloading package qc to /home/jester/nltk_data... [nltk_data] | Package qc is already up-to-date! [nltk_data] | Downloading package reuters to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package reuters is already up-to-date! [nltk_data] | Downloading package rte to /home/jester/nltk_data... [nltk_data] | Package rte is already up-to-date! [nltk_data] | Downloading package semcor to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package semcor is already up-to-date! [nltk_data] | Downloading package senseval to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package senseval is already up-to-date! [nltk_data] | Downloading package sentiwordnet to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package sentiwordnet is already up-to-date! [nltk_data] | Downloading package sentence_polarity to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package sentence_polarity is already up-to-date! [nltk_data] | Downloading package shakespeare to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package shakespeare is already up-to-date! [nltk_data] | Downloading package sinica_treebank to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package sinica_treebank is already up-to-date! [nltk_data] | Downloading package smultron to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package smultron is already up-to-date! [nltk_data] | Downloading package state_union to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package state_union is already up-to-date! [nltk_data] | Downloading package stopwords to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package stopwords is already up-to-date! [nltk_data] | Downloading package subjectivity to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package subjectivity is already up-to-date! [nltk_data] | Downloading package swadesh to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package swadesh is already up-to-date! [nltk_data] | Downloading package switchboard to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package switchboard is already up-to-date! [nltk_data] | Downloading package timit to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package timit is already up-to-date! [nltk_data] | Downloading package toolbox to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package toolbox is already up-to-date! [nltk_data] | Downloading package treebank to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package treebank is already up-to-date! [nltk_data] | Downloading package twitter_samples to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package twitter_samples is already up-to-date! [nltk_data] | Downloading package udhr to /home/jester/nltk_data... [nltk_data] | Package udhr is already up-to-date! [nltk_data] | Downloading package udhr2 to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package udhr2 is already up-to-date! [nltk_data] | Downloading package unicode_samples to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package unicode_samples is already up-to-date! [nltk_data] | Downloading package universal_treebanks_v20 to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package universal_treebanks_v20 is already up-to- [nltk_data] | date! [nltk_data] | Downloading package verbnet to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package verbnet is already up-to-date! [nltk_data] | Downloading package webtext to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package webtext is already up-to-date! [nltk_data] | Downloading package wordnet to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package wordnet is already up-to-date! [nltk_data] | Downloading package wordnet_ic to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package wordnet_ic is already up-to-date! [nltk_data] | Downloading package words to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package words is already up-to-date! [nltk_data] | Downloading package ycoe to /home/jester/nltk_data... [nltk_data] | Package ycoe is already up-to-date! [nltk_data] | Downloading package rslp to /home/jester/nltk_data... [nltk_data] | Package rslp is already up-to-date! [nltk_data] | Downloading package hmm_treebank_pos_tagger to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package hmm_treebank_pos_tagger is already up-to- [nltk_data] | date! [nltk_data] | Downloading package maxent_treebank_pos_tagger to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package maxent_treebank_pos_tagger is already up- [nltk_data] | to-date! [nltk_data] | Downloading package universal_tagset to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package universal_tagset is already up-to-date! [nltk_data] | Downloading package maxent_ne_chunker to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package maxent_ne_chunker is already up-to-date! [nltk_data] | Downloading package punkt to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package punkt is already up-to-date! [nltk_data] | Downloading package book_grammars to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package book_grammars is already up-to-date! [nltk_data] | Downloading package sample_grammars to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package sample_grammars is already up-to-date! [nltk_data] | Downloading package spanish_grammars to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package spanish_grammars is already up-to-date! [nltk_data] | Downloading package basque_grammars to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package basque_grammars is already up-to-date! [nltk_data] | Downloading package large_grammars to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package large_grammars is already up-to-date! [nltk_data] | Downloading package tagsets to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package tagsets is already up-to-date! [nltk_data] | Downloading package snowball_data to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package snowball_data is already up-to-date! [nltk_data] | Downloading package bllip_wsj_no_aux to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package bllip_wsj_no_aux is already up-to-date! [nltk_data] | Downloading package word2vec_sample to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package word2vec_sample is already up-to-date! [nltk_data] | Downloading package panlex_swadesh to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package panlex_swadesh is already up-to-date! [nltk_data] | Downloading package mte_teip5 to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package mte_teip5 is already up-to-date! [nltk_data] | Downloading package averaged_perceptron_tagger to [nltk_data] | /home/jester/nltk_data... [nltk_data] | Package averaged_perceptron_tagger is already up- [nltk_data] | to-date! [nltk_data] | Downloading package panlex_lite to [nltk_data] | /home/jester/nltk_data...
--------------------------------------------------------------------------- KeyboardInterrupt Traceback (most recent call last) <ipython-input-1-d22ab9bfeafe> in <module>() 1 import nltk ----> 2 nltk.download("all") /usr/local/lib/python3.5/dist-packages/nltk/downloader.py in download(self, info_or_id, download_dir, quiet, force, prefix, halt_on_error, raise_on_error) 662 subsequent_indent=prefix+prefix2+' '*4)) 663 --> 664 for msg in self.incr_download(info_or_id, download_dir, force): 665 # Error messages 666 if isinstance(msg, ErrorMessage): /usr/local/lib/python3.5/dist-packages/nltk/downloader.py in incr_download(self, info_or_id, download_dir, force) 541 if isinstance(info, Collection): 542 yield StartCollectionMessage(info) --> 543 for msg in self.incr_download(info.children, download_dir, force): 544 yield msg 545 yield FinishCollectionMessage(info) /usr/local/lib/python3.5/dist-packages/nltk/downloader.py in incr_download(self, info_or_id, download_dir, force) 527 # If they gave us a list of ids, then download each one. 528 if isinstance(info_or_id, (list,tuple)): --> 529 for msg in self._download_list(info_or_id, download_dir, force): 530 yield msg 531 return /usr/local/lib/python3.5/dist-packages/nltk/downloader.py in _download_list(self, items, download_dir, force) 570 else: 571 delta = len(item.packages)/num_packages --> 572 for msg in self.incr_download(item, download_dir, force): 573 if isinstance(msg, ProgressMessage): 574 yield ProgressMessage(progress + msg.progress*delta) /usr/local/lib/python3.5/dist-packages/nltk/downloader.py in incr_download(self, info_or_id, download_dir, force) 547 # Handle Packages (delegate to a helper function). 548 else: --> 549 for msg in self._download_package(info, download_dir, force): 550 yield msg 551 /usr/local/lib/python3.5/dist-packages/nltk/downloader.py in _download_package(self, info, download_dir, force) 616 num_blocks = max(1, info.size/(1024*16)) 617 for block in itertools.count(): --> 618 s = infile.read(1024*16) # 16k blocks. 619 outfile.write(s) 620 if not s: break /usr/lib/python3.5/http/client.py in read(self, amt) 446 # Amount is given, implement using readinto 447 b = bytearray(amt) --> 448 n = self.readinto(b) 449 return memoryview(b)[:n].tobytes() 450 else: /usr/lib/python3.5/http/client.py in readinto(self, b) 486 # connection, and the user is reading more bytes than will be provided 487 # (for example, reading in 1k chunks) --> 488 n = self.fp.readinto(b) 489 if not n and b: 490 # Ideally, we would raise IncompleteRead if the content-length /usr/lib/python3.5/socket.py in readinto(self, b) 573 while True: 574 try: --> 575 return self._sock.recv_into(b) 576 except timeout: 577 self._timeout_occurred = True /usr/lib/python3.5/ssl.py in recv_into(self, buffer, nbytes, flags) 927 "non-zero flags not allowed in calls to recv_into() on %s" % 928 self.__class__) --> 929 return self.read(nbytes, buffer) 930 else: 931 return socket.recv_into(self, buffer, nbytes, flags) /usr/lib/python3.5/ssl.py in read(self, len, buffer) 789 raise ValueError("Read on closed or unwrapped SSL socket.") 790 try: --> 791 return self._sslobj.read(len, buffer) 792 except SSLError as x: 793 if x.args[0] == SSL_ERROR_EOF and self.suppress_ragged_eofs: /usr/lib/python3.5/ssl.py in read(self, len, buffer) 573 """ 574 if buffer is not None: --> 575 v = self._sslobj.read(len, buffer) 576 else: 577 v = self._sslobj.read(len) KeyboardInterrupt:
import nltk
from nltk.book import *
*** Introductory Examples for the NLTK Book *** Loading text1, ..., text9 and sent1, ..., sent9 Type the name of the text or sentence to view it. Type: 'texts()' or 'sents()' to list the materials. text1: Moby Dick by Herman Melville 1851 text2: Sense and Sensibility by Jane Austen 1811 text3: The Book of Genesis text4: Inaugural Address Corpus text5: Chat Corpus text6: Monty Python and the Holy Grail text7: Wall Street Journal text8: Personals Corpus text9: The Man Who Was Thursday by G . K . Chesterton 1908
text1.concordance('monstrous')
#Q1.concordance함수를 활용하여 다른 단어들을 검색해보자.
Displaying 11 of 11 matches: ong the former , one was of a most monstrous size . ... This came towards us , ON OF THE PSALMS . " Touching that monstrous bulk of the whale or ork we have r ll over with a heathenish array of monstrous clubs and spears . Some were thick d as you gazed , and wondered what monstrous cannibal and savage could ever hav that has survived the flood ; most monstrous and most mountainous ! That Himmal they might scout at Moby Dick as a monstrous fable , or still worse and more de th of Radney .'" CHAPTER 55 Of the Monstrous Pictures of Whales . I shall ere l ing Scenes . In connexion with the monstrous pictures of whales , I am strongly ere to enter upon those still more monstrous stories of them which are to be fo ght have been rummaged out of this monstrous cabinet there is no telling . But of Whale - Bones ; for Whales of a monstrous size are oftentimes cast up dead u
text1.similar('monstrous')
horrible maddens careful exasperate uncommon tyrannical wise fearless curious christian untoward true mean pitiable trustworthy domineering singular passing puzzled contemptible
text2.common_contexts(['monstrous','very'])
is_pretty be_glad a_lucky a_pretty am_glad
import matplotlib
%matplotlib nbagg
text4.dispersion_plot(["citizens", "democracy", "freedom", "duties", "America"])
len(text3)
44764
print(sorted(set(text3))) #text3에 있는 모든 type을 보여준다.
print(len(set(text3))) #text3에 있는 모든 type의 수를 보여준다.
['!', "'", '(', ')', ',', ',)', '.', '.)', ':', ';', ';)', '?', '?)', 'A', 'Abel', 'Abelmizraim', 'Abidah', 'Abide', 'Abimael', 'Abimelech', 'Abr', 'Abrah', 'Abraham', 'Abram', 'Accad', 'Achbor', 'Adah', 'Adam', 'Adbeel', 'Admah', 'Adullamite', 'After', 'Aholibamah', 'Ahuzzath', 'Ajah', 'Akan', 'All', 'Allonbachuth', 'Almighty', 'Almodad', 'Also', 'Alvah', 'Alvan', 'Am', 'Amal', 'Amalek', 'Amalekites', 'Ammon', 'Amorite', 'Amorites', 'Amraphel', 'An', 'Anah', 'Anamim', 'And', 'Aner', 'Angel', 'Appoint', 'Aram', 'Aran', 'Ararat', 'Arbah', 'Ard', 'Are', 'Areli', 'Arioch', 'Arise', 'Arkite', 'Arodi', 'Arphaxad', 'Art', 'Arvadite', 'As', 'Asenath', 'Ashbel', 'Asher', 'Ashkenaz', 'Ashteroth', 'Ask', 'Asshur', 'Asshurim', 'Assyr', 'Assyria', 'At', 'Atad', 'Avith', 'Baalhanan', 'Babel', 'Bashemath', 'Be', 'Because', 'Becher', 'Bedad', 'Beeri', 'Beerlahairoi', 'Beersheba', 'Behold', 'Bela', 'Belah', 'Benam', 'Benjamin', 'Beno', 'Beor', 'Bera', 'Bered', 'Beriah', 'Bethel', 'Bethlehem', 'Bethuel', 'Beware', 'Bilhah', 'Bilhan', 'Binding', 'Birsha', 'Bless', 'Blessed', 'Both', 'Bow', 'Bozrah', 'Bring', 'But', 'Buz', 'By', 'Cain', 'Cainan', 'Calah', 'Calneh', 'Can', 'Cana', 'Canaan', 'Canaanite', 'Canaanites', 'Canaanitish', 'Caphtorim', 'Carmi', 'Casluhim', 'Cast', 'Cause', 'Chaldees', 'Chedorlaomer', 'Cheran', 'Cherubims', 'Chesed', 'Chezib', 'Come', 'Cursed', 'Cush', 'Damascus', 'Dan', 'Day', 'Deborah', 'Dedan', 'Deliver', 'Diklah', 'Din', 'Dinah', 'Dinhabah', 'Discern', 'Dishan', 'Dishon', 'Do', 'Dodanim', 'Dothan', 'Drink', 'Duke', 'Dumah', 'Earth', 'Ebal', 'Eber', 'Edar', 'Eden', 'Edom', 'Edomites', 'Egy', 'Egypt', 'Egyptia', 'Egyptian', 'Egyptians', 'Ehi', 'Elah', 'Elam', 'Elbethel', 'Eldaah', 'EleloheIsrael', 'Eliezer', 'Eliphaz', 'Elishah', 'Ellasar', 'Elon', 'Elparan', 'Emins', 'En', 'Enmishpat', 'Eno', 'Enoch', 'Enos', 'Ephah', 'Epher', 'Ephra', 'Ephraim', 'Ephrath', 'Ephron', 'Er', 'Erech', 'Eri', 'Es', 'Esau', 'Escape', 'Esek', 'Eshban', 'Eshcol', 'Ethiopia', 'Euphrat', 'Euphrates', 'Eve', 'Even', 'Every', 'Except', 'Ezbon', 'Ezer', 'Fear', 'Feed', 'Fifteen', 'Fill', 'For', 'Forasmuch', 'Forgive', 'From', 'Fulfil', 'G', 'Gad', 'Gaham', 'Galeed', 'Gatam', 'Gather', 'Gaza', 'Gentiles', 'Gera', 'Gerar', 'Gershon', 'Get', 'Gether', 'Gihon', 'Gilead', 'Girgashites', 'Girgasite', 'Give', 'Go', 'God', 'Gomer', 'Gomorrah', 'Goshen', 'Guni', 'Hadad', 'Hadar', 'Hadoram', 'Hagar', 'Haggi', 'Hai', 'Ham', 'Hamathite', 'Hamor', 'Hamul', 'Hanoch', 'Happy', 'Haran', 'Hast', 'Haste', 'Have', 'Havilah', 'Hazarmaveth', 'Hazezontamar', 'Hazo', 'He', 'Hear', 'Heaven', 'Heber', 'Hebrew', 'Hebrews', 'Hebron', 'Hemam', 'Hemdan', 'Here', 'Hereby', 'Heth', 'Hezron', 'Hiddekel', 'Hinder', 'Hirah', 'His', 'Hitti', 'Hittite', 'Hittites', 'Hivite', 'Hobah', 'Hori', 'Horite', 'Horites', 'How', 'Hul', 'Huppim', 'Husham', 'Hushim', 'Huz', 'I', 'If', 'In', 'Irad', 'Iram', 'Is', 'Isa', 'Isaac', 'Iscah', 'Ishbak', 'Ishmael', 'Ishmeelites', 'Ishuah', 'Isra', 'Israel', 'Issachar', 'Isui', 'It', 'Ithran', 'Jaalam', 'Jabal', 'Jabbok', 'Jac', 'Jachin', 'Jacob', 'Jahleel', 'Jahzeel', 'Jamin', 'Japhe', 'Japheth', 'Jared', 'Javan', 'Jebusite', 'Jebusites', 'Jegarsahadutha', 'Jehovahjireh', 'Jemuel', 'Jerah', 'Jetheth', 'Jetur', 'Jeush', 'Jezer', 'Jidlaph', 'Jimnah', 'Job', 'Jobab', 'Jokshan', 'Joktan', 'Jordan', 'Joseph', 'Jubal', 'Judah', 'Judge', 'Judith', 'Kadesh', 'Kadmonites', 'Karnaim', 'Kedar', 'Kedemah', 'Kemuel', 'Kenaz', 'Kenites', 'Kenizzites', 'Keturah', 'Kiriathaim', 'Kirjatharba', 'Kittim', 'Know', 'Kohath', 'Kor', 'Korah', 'LO', 'LORD', 'Laban', 'Lahairoi', 'Lamech', 'Lasha', 'Lay', 'Leah', 'Lehabim', 'Lest', 'Let', 'Letushim', 'Leummim', 'Levi', 'Lie', 'Lift', 'Lo', 'Look', 'Lot', 'Lotan', 'Lud', 'Ludim', 'Luz', 'Maachah', 'Machir', 'Machpelah', 'Madai', 'Magdiel', 'Magog', 'Mahalaleel', 'Mahalath', 'Mahanaim', 'Make', 'Malchiel', 'Male', 'Mam', 'Mamre', 'Man', 'Manahath', 'Manass', 'Manasseh', 'Mash', 'Masrekah', 'Massa', 'Matred', 'Me', 'Medan', 'Mehetabel', 'Mehujael', 'Melchizedek', 'Merari', 'Mesha', 'Meshech', 'Mesopotamia', 'Methusa', 'Methusael', 'Methuselah', 'Mezahab', 'Mibsam', 'Mibzar', 'Midian', 'Midianites', 'Milcah', 'Mishma', 'Mizpah', 'Mizraim', 'Mizz', 'Moab', 'Moabites', 'Moreh', 'Moreover', 'Moriah', 'Muppim', 'My', 'Naamah', 'Naaman', 'Nahath', 'Nahor', 'Naphish', 'Naphtali', 'Naphtuhim', 'Nay', 'Nebajoth', 'Neither', 'Night', 'Nimrod', 'Nineveh', 'Noah', 'Nod', 'Not', 'Now', 'O', 'Obal', 'Of', 'Oh', 'Ohad', 'Omar', 'On', 'Onam', 'Onan', 'Only', 'Ophir', 'Our', 'Out', 'Padan', 'Padanaram', 'Paran', 'Pass', 'Pathrusim', 'Pau', 'Peace', 'Peleg', 'Peniel', 'Penuel', 'Peradventure', 'Perizzit', 'Perizzite', 'Perizzites', 'Phallu', 'Phara', 'Pharaoh', 'Pharez', 'Phichol', 'Philistim', 'Philistines', 'Phut', 'Phuvah', 'Pildash', 'Pinon', 'Pison', 'Potiphar', 'Potipherah', 'Put', 'Raamah', 'Rachel', 'Rameses', 'Rebek', 'Rebekah', 'Rehoboth', 'Remain', 'Rephaims', 'Resen', 'Return', 'Reu', 'Reub', 'Reuben', 'Reuel', 'Reumah', 'Riphath', 'Rosh', 'Sabtah', 'Sabtech', 'Said', 'Salah', 'Salem', 'Samlah', 'Sarah', 'Sarai', 'Saul', 'Save', 'Say', 'Se', 'Seba', 'See', 'Seeing', 'Seir', 'Sell', 'Send', 'Sephar', 'Serah', 'Sered', 'Serug', 'Set', 'Seth', 'Shalem', 'Shall', 'Shalt', 'Shammah', 'Shaul', 'Shaveh', 'She', 'Sheba', 'Shebah', 'Shechem', 'Shed', 'Shel', 'Shelah', 'Sheleph', 'Shem', 'Shemeber', 'Shepho', 'Shillem', 'Shiloh', 'Shimron', 'Shinab', 'Shinar', 'Shobal', 'Should', 'Shuah', 'Shuni', 'Shur', 'Sichem', 'Siddim', 'Sidon', 'Simeon', 'Sinite', 'Sitnah', 'Slay', 'So', 'Sod', 'Sodom', 'Sojourn', 'Some', 'Spake', 'Speak', 'Spirit', 'Stand', 'Succoth', 'Surely', 'Swear', 'Syrian', 'Take', 'Tamar', 'Tarshish', 'Tebah', 'Tell', 'Tema', 'Teman', 'Temani', 'Terah', 'Thahash', 'That', 'The', 'Then', 'There', 'Therefore', 'These', 'They', 'Thirty', 'This', 'Thorns', 'Thou', 'Thus', 'Thy', 'Tidal', 'Timna', 'Timnah', 'Timnath', 'Tiras', 'To', 'Togarmah', 'Tola', 'Tubal', 'Tubalcain', 'Twelve', 'Two', 'Unstable', 'Until', 'Unto', 'Up', 'Upon', 'Ur', 'Uz', 'Uzal', 'We', 'What', 'When', 'Whence', 'Where', 'Whereas', 'Wherefore', 'Which', 'While', 'Who', 'Whose', 'Whoso', 'Why', 'Wilt', 'With', 'Woman', 'Ye', 'Yea', 'Yet', 'Zaavan', 'Zaphnathpaaneah', 'Zar', 'Zarah', 'Zeboiim', 'Zeboim', 'Zebul', 'Zebulun', 'Zemarite', 'Zepho', 'Zerah', 'Zibeon', 'Zidon', 'Zillah', 'Zilpah', 'Zimran', 'Ziphion', 'Zo', 'Zoar', 'Zohar', 'Zuzims', 'a', 'abated', 'abide', 'able', 'abode', 'abomination', 'about', 'above', 'abroad', 'absent', 'abundantly', 'accept', 'accepted', 'according', 'acknowledged', 'activity', 'add', 'adder', 'afar', 'afflict', 'affliction', 'afraid', 'after', 'afterward', 'afterwards', 'aga', 'again', 'against', 'age', 'aileth', 'air', 'al', 'alive', 'all', 'almon', 'alo', 'alone', 'aloud', 'also', 'altar', 'altogether', 'always', 'am', 'among', 'amongst', 'an', 'and', 'angel', 'angels', 'anger', 'angry', 'anguish', 'anointedst', 'anoth', 'another', 'answer', 'answered', 'any', 'anything', 'appe', 'appear', 'appeared', 'appease', 'appoint', 'appointed', 'aprons', 'archer', 'archers', 'are', 'arise', 'ark', 'armed', 'arms', 'army', 'arose', 'arrayed', 'art', 'artificer', 'as', 'ascending', 'ash', 'ashamed', 'ask', 'asked', 'asketh', 'ass', 'assembly', 'asses', 'assigned', 'asswaged', 'at', 'attained', 'audience', 'avenged', 'aw', 'awaked', 'away', 'awoke', 'back', 'backward', 'bad', 'bade', 'badest', 'badne', 'bak', 'bake', 'bakemeats', 'baker', 'bakers', 'balm', 'bands', 'bank', 'bare', 'barr', 'barren', 'basket', 'baskets', 'battle', 'bdellium', 'be', 'bear', 'beari', 'bearing', 'beast', 'beasts', 'beautiful', 'became', 'because', 'become', 'bed', 'been', 'befall', 'befell', 'before', 'began', 'begat', 'beget', 'begettest', 'begin', 'beginning', 'begotten', 'beguiled', 'beheld', 'behind', 'behold', 'being', 'believed', 'belly', 'belong', 'beneath', 'bereaved', 'beside', 'besides', 'besought', 'best', 'betimes', 'better', 'between', 'betwixt', 'beyond', 'binding', 'bird', 'birds', 'birthday', 'birthright', 'biteth', 'bitter', 'blame', 'blameless', 'blasted', 'bless', 'blessed', 'blesseth', 'blessi', 'blessing', 'blessings', 'blindness', 'blood', 'blossoms', 'bodies', 'boldly', 'bondman', 'bondmen', 'bondwoman', 'bone', 'bones', 'book', 'booths', 'border', 'borders', 'born', 'bosom', 'both', 'bottle', 'bou', 'boug', 'bough', 'bought', 'bound', 'bow', 'bowed', 'bowels', 'bowing', 'boys', 'bracelets', 'branches', 'brass', 'bre', 'breach', 'bread', 'breadth', 'break', 'breaketh', 'breaking', 'breasts', 'breath', 'breathed', 'breed', 'brethren', 'brick', 'brimstone', 'bring', 'brink', 'broken', 'brook', 'broth', 'brother', 'brought', 'brown', 'bruise', 'budded', 'build', 'builded', 'built', 'bulls', 'bundle', 'bundles', 'burdens', 'buried', 'burn', 'burning', 'burnt', 'bury', 'buryingplace', 'business', 'but', 'butler', 'butlers', 'butlership', 'butter', 'buy', 'by', 'cakes', 'calf', 'call', 'called', 'came', 'camel', 'camels', 'camest', 'can', 'cannot', 'canst', 'captain', 'captive', 'captives', 'carcases', 'carried', 'carry', 'cast', 'castles', 'catt', 'cattle', 'caught', 'cause', 'caused', 'cave', 'cease', 'ceased', 'certain', 'certainly', 'chain', 'chamber', 'change', 'changed', 'changes', 'charge', 'charged', 'chariot', 'chariots', 'chesnut', 'chi', 'chief', 'child', 'childless', 'childr', 'children', 'chode', 'choice', 'chose', 'circumcis', 'circumcise', 'circumcised', 'citi', 'cities', 'city', 'clave', 'clean', 'clear', 'cleave', 'clo', 'closed', 'clothed', 'clothes', 'cloud', 'clusters', 'co', 'coat', 'coats', 'coffin', 'cold', 'colours', 'colt', 'colts', 'come', 'comest', 'cometh', 'comfort', 'comforted', 'comi', 'coming', 'command', 'commanded', 'commanding', 'commandment', 'commandments', 'commended', 'committed', 'commune', 'communed', 'communing', 'company', 'compassed', 'compasseth', 'conceal', 'conceive', 'conceived', 'conception', 'concerning', 'concubi', 'concubine', 'concubines', 'confederate', 'confound', 'consent', 'conspired', 'consume', 'consumed', 'content', 'continually', 'continued', 'cool', 'corn', 'corrupt', 'corrupted', 'couch', 'couched', 'couching', 'could', 'counted', 'countenance', 'countries', 'country', 'covenant', 'covered', 'covering', 'created', 'creature', 'creepeth', 'creeping', 'cried', 'crieth', 'crown', 'cru', 'cruelty', 'cry', 'cubit', 'cubits', 'cunning', 'cup', 'current', 'curse', 'cursed', 'curseth', 'custom', 'cut', 'd', 'da', 'dainties', 'dale', 'damsel', 'damsels', 'dark', 'darkne', 'darkness', 'daughers', 'daught', 'daughte', 'daughter', 'daughters', 'day', 'days', 'dea', 'dead', 'deal', 'dealt', 'dearth', 'death', 'deceitfully', 'deceived', 'deceiver', 'declare', 'decreased', 'deed', 'deeds', 'deep', 'deferred', 'defiled', 'defiledst', 'delight', 'deliver', 'deliverance', 'delivered', 'denied', 'depart', 'departed', 'departing', 'deprived', 'descending', 'desire', 'desired', 'desolate', 'despised', 'destitute', 'destroy', 'destroyed', 'devour', 'devoured', 'dew', 'did', 'didst', 'die', 'died', 'digged', 'dignity', 'dim', 'dine', 'dipped', 'direct', 'discern', 'discerned', 'discreet', 'displease', 'displeased', 'distress', 'distressed', 'divide', 'divided', 'divine', 'divineth', 'do', 'doe', 'doer', 'doest', 'doeth', 'doing', 'dominion', 'done', 'door', 'dost', 'doth', 'double', 'doubled', 'doubt', 'dove', 'down', 'dowry', 'drank', 'draw', 'dread', 'dreadful', 'dream', 'dreamed', 'dreamer', 'dreams', 'dress', 'dressed', 'drew', 'dried', 'drink', 'drinketh', 'drinking', 'driven', 'drought', 'drove', 'droves', 'drunken', 'dry', 'duke', 'dukes', 'dunge', 'dungeon', 'dust', 'dwe', 'dwell', 'dwelled', 'dwelling', 'dwelt', 'e', 'ea', 'each', 'ear', 'earing', 'early', 'earring', 'earrings', 'ears', 'earth', 'east', 'eastward', 'eat', 'eaten', 'eatest', 'edge', 'eight', 'eighteen', 'eighty', 'either', 'elder', 'elders', 'eldest', 'eleven', 'else', 'embalm', 'embalmed', 'embraced', 'emptied', 'empty', 'end', 'ended', 'endued', 'endure', 'enemies', 'enlarge', 'enmity', 'enough', 'enquire', 'enter', 'entered', 'entreated', 'envied', 'erected', 'errand', 'escape', 'escaped', 'espied', 'establish', 'established', 'ev', 'even', 'evening', 'eventide', 'ever', 'everlasting', 'every', 'evil', 'ewe', 'ewes', 'exceeding', 'exceedingly', 'excel', 'excellency', 'except', 'exchange', 'experience', 'ey', 'eyed', 'eyes', 'fa', 'face', 'faces', 'fai', 'fail', 'failed', 'faileth', 'fainted', 'fair', 'fall', 'fallen', 'falsely', 'fame', 'families', 'famine', 'famished', 'far', 'fashion', 'fast', 'fat', 'fatfleshed', 'fath', 'fathe', 'father', 'fathers', 'fatness', 'faults', 'favour', 'favoured', 'fear', 'feared', 'fearest', 'feast', 'fed', 'feeble', 'feebler', 'feed', 'feeding', 'feel', 'feet', 'fell', 'fellow', 'felt', 'fema', 'female', 'fetch', 'fetched', 'fetcht', 'few', 'fie', 'field', 'fierce', 'fifteen', 'fifth', 'fifty', 'fig', 'fill', 'filled', 'find', 'findest', 'findeth', 'finding', 'fine', 'finish', 'finished', 'fir', 'fire', 'firmame', 'firmament', 'first', 'firstborn', 'firstlings', 'fish', 'fishes', 'five', 'flaming', 'fle', 'fled', 'fleddest', 'flee', 'flesh', 'flo', 'floc', 'flock', 'flocks', 'flood', 'floor', 'fly', 'fo', 'foal', 'foals', 'folk', 'follow', 'followed', 'following', 'folly', 'food', 'foolishly', 'foot', 'for', 'forbid', 'force', 'ford', 'foremost', 'foreskin', 'forgat', 'forget', 'forgive', 'forgotten', 'form', 'formed', 'former', 'forth', 'forty', 'forward', 'fou', 'found', 'fountain', 'fountains', 'four', 'fourscore', 'fourteen', 'fourteenth', 'fourth', 'fowl', 'fowls', 'freely', 'friend', 'friends', 'fro', 'from', 'frost', 'fruit', 'fruitful', 'fruits', 'fugitive', 'fulfilled', 'full', 'furnace', 'furniture', 'fury', 'gard', 'garden', 'garmen', 'garment', 'garments', 'gat', 'gate', 'gather', 'gathered', 'gathering', 'gave', 'gavest', 'generatio', 'generation', 'generations', 'get', 'getting', 'ghost', 'giants', 'gift', 'gifts', 'give', 'given', 'giveth', 'giving', 'glory', 'go', 'goa', 'goat', 'goats', 'gods', 'goest', 'goeth', 'going', 'gold', 'golden', 'gone', 'good', 'goodly', 'goods', 'gopher', 'got', 'gotten', 'governor', 'gr', 'grace', 'gracious', 'graciously', 'grap', 'grapes', 'grass', 'grave', 'gray', 'gre', 'great', 'greater', 'greatly', 'green', 'grew', 'grief', 'grieved', 'grievous', 'grisl', 'grisled', 'gro', 'ground', 'grove', 'grow', 'grown', 'guard', 'guiding', 'guiltiness', 'guilty', 'gutters', 'h', 'ha', 'habitations', 'had', 'hadst', 'hairs', 'hairy', 'half', 'halted', 'han', 'hand', 'handfuls', 'handle', 'handmaid', 'handmaidens', 'handmaids', 'hands', 'hang', 'hanged', 'hard', 'hardly', 'harlot', 'harm', 'harp', 'harvest', 'hast', 'haste', 'hasted', 'hastened', 'hastily', 'hate', 'hated', 'hath', 'have', 'haven', 'having', 'hazel', 'he', 'head', 'heads', 'healed', 'health', 'heap', 'hear', 'heard', 'hearken', 'hearkened', 'heart', 'hearth', 'hearts', 'heat', 'heav', 'heaven', 'heavens', 'heed', 'heel', 'heels', 'heifer', 'height', 'heir', 'held', 'help', 'hence', 'henceforth', 'her', 'herb', 'herd', 'herdmen', 'herds', 'here', 'herein', 'herself', 'hid', 'hide', 'high', 'hil', 'hills', 'him', 'himself', 'hind', 'hindermost', 'hire', 'hired', 'his', 'hith', 'hither', 'hold', 'hollow', 'home', 'honey', 'honour', 'honourable', 'hor', 'horror', 'horse', 'horsemen', 'horses', 'host', 'hotly', 'hou', 'hous', 'house', 'household', 'households', 'how', 'hundred', 'hundredfo', 'hundredth', 'hunt', 'hunter', 'hunting', 'hurt', 'husba', 'husband', 'husbandman', 'if', 'ill', 'image', 'images', 'imagination', 'imagined', 'in', 'increase', 'increased', 'indeed', 'inhabitants', 'inhabited', 'inherit', 'inheritance', 'iniquity', 'inn', 'innocency', 'instead', 'instructor', 'instruments', 'integrity', 'interpret', 'interpretation', 'interpretations', 'interpreted', 'interpreter', 'into', 'intreat', 'intreated', 'ir', 'is', 'isles', 'issue', 'it', 'itself', 'jewels', 'joined', 'joint', 'journey', 'journeyed', 'journeys', 'jud', 'judge', 'judged', 'judgment', 'just', 'justice', 'keep', 'keeper', 'kept', 'ki', 'kid', 'kids', 'kill', 'killed', 'kind', 'kindled', 'kindly', 'kindness', 'kindred', 'kinds', 'kine', 'king', 'kingdom', 'kings', 'kiss', 'kissed', 'kn', 'knead', 'kneel', 'knees', 'knew', 'knife', 'know', 'knowest', 'knoweth', 'knowing', 'knowledge', 'known', 'la', 'labour', 'lack', 'lad', 'ladder', 'lade', 'laded', 'laden', 'lads', 'laid', 'lamb', 'lambs', 'lamentati', 'lamp', 'lan', 'land', 'lands', 'language', 'large', 'last', 'laugh', 'laughed', 'law', 'lawgiver', 'laws', 'lay', 'lead', 'leaf', 'lean', 'leanfleshed', 'leap', 'leaped', 'learned', 'least', 'leave', 'leaves', 'led', 'left', 'length', 'lentiles', 'lesser', 'lest', 'let', 'li', 'lie', 'lien', 'liest', 'lieth', 'life', 'lift', 'lifted', 'light', 'lighted', 'lightly', 'lights', 'like', 'likene', 'likeness', 'linen', 'lingered', 'lion', 'little', 'live', 'lived', 'lives', 'liveth', 'living', 'lo', 'lodge', 'lodged', 'loins', 'long', 'longedst', 'longeth', 'look', 'looked', 'loose', 'lord', 'lords', 'loss', 'loud', 'love', 'loved', 'lovest', 'loveth', 'lower', 'lying', 'm', 'ma', 'made', 'magicians', 'magnified', 'maid', 'maiden', 'maidservants', 'make', 'male', 'males', 'man', 'mandrakes', 'manner', 'many', 'mark', 'marriages', 'married', 'marry', 'marvelled', 'mast', 'master', 'matter', 'may', 'mayest', 'me', 'mead', 'meadow', 'meal', 'mean', 'meanest', 'meant', 'measures', 'meat', 'meditate', 'meet', 'meeteth', 'men', 'menservants', 'mention', 'merchant', 'merchantmen', 'mercies', 'merciful', 'mercy', 'merry', 'mess', 'messenger', 'messengers', 'messes', 'met', 'mi', 'midst', 'midwife', 'might', 'mightier', 'mighty', 'milch', 'milk', 'millions', 'mind', 'mine', 'mirth', 'mischief', 'mist', 'mistress', 'mock', 'mocked', 'mocking', 'money', 'month', 'months', 'moon', 'more', 'moreover', 'morever', 'morning', 'morrow', 'morsel', 'morter', 'most', 'mother', 'mou', 'mount', 'mountain', 'mountains', 'mourn', 'mourned', 'mourning', 'mouth', 'mouths', 'moved', 'moveth', 'moving', 'much', 'mules', 'multiplied', 'multiply', 'multiplying', 'multitude', 'must', 'my', 'myrrh', 'myself', 'n', 'na', 'naked', 'nakedness', 'name', 'named', 'names', 'nati', 'natio', 'nation', 'nations', 'nativity', 'ne', 'near', 'neck', 'needeth', 'needs', 'neither', 'never', 'next', 'nig', 'nigh', 'night', 'nights', 'nine', 'nineteen', 'ninety', 'no', 'none', 'noon', 'nor', 'north', 'northward', 'nostrils', 'not', 'nothing', 'nought', 'nourish', 'nourished', 'now', 'number', 'numbered', 'numbering', 'nurse', 'nuts', 'o', 'oa', 'oak', 'oath', 'obeisance', 'obey', 'obeyed', 'observed', 'obtain', 'occasion', 'occupation', 'of', 'off', 'offended', 'offer', 'offered', 'offeri', 'offering', 'offerings', 'office', 'officer', 'officers', 'oil', 'old', 'olive', 'on', 'one', 'ones', 'only', 'onyx', 'open', 'opened', 'openly', 'or', 'order', 'organ', 'oth', 'other', 'ou', 'ought', 'our', 'ours', 'ourselves', 'out', 'over', 'overcome', 'overdrive', 'overseer', 'oversig', 'overspread', 'overtake', 'overthrew', 'overthrow', 'overtook', 'own', 'oxen', 'parcel', 'part', 'parted', 'parts', 'pass', 'passed', 'past', 'pasture', 'path', 'pea', 'peace', 'peaceable', 'peaceably', 'peop', 'people', 'peradventure', 'perceived', 'perfect', 'perform', 'perish', 'perpetual', 'person', 'persons', 'physicians', 'piece', 'pieces', 'pigeon', 'pilgrimage', 'pillar', 'pilled', 'pillows', 'pit', 'pitch', 'pitched', 'pitcher', 'pla', 'place', 'placed', 'places', 'plagued', 'plagues', 'plain', 'plains', 'plant', 'planted', 'played', 'pleasant', 'pleased', 'pleaseth', 'pleasure', 'pledge', 'plenteous', 'plenteousness', 'plenty', 'pluckt', 'point', 'poor', 'poplar', 'portion', 'possess', 'possessi', 'possession', 'possessions', 'possessor', 'posterity', 'pottage', 'poured', 'poverty', 'pow', 'power', 'praise', 'pray', 'prayed', 'precious', 'prepared', 'presence', 'present', 'presented', 'preserve', 'preserved', 'pressed', 'prevail', 'prevailed', 'prey', 'priest', 'priests', 'prince', 'princes', 'pris', 'prison', 'prisoners', 'proceedeth', 'process', 'profit', 'progenitors', 'prophet', 'prosper', 'prospered', 'prosperous', 'protest', 'proved', 'provender', 'provide', 'provision', 'pulled', 'punishment', 'purchase', 'purchased', 'purposing', 'pursue', 'pursued', 'put', 'putting', 'quart', 'quickly', 'quite', 'quiver', 'raiment', 'rain', 'rained', 'raise', 'ram', 'rams', 'ran', 'rank', 'raven', 'ravin', 'reach', 'reached', 'ready', 'reason', 'rebelled', 'rebuked', 'receive', 'received', 'red', 'redeemed', 'refrain', 'refrained', 'refused', 'regard', 'reign', 'reigned', 'remained', 'remaineth', 'remember', 'remembered', 'remove', 'removed', 'removing', 'renown', 'rent', 'repented', 'repenteth', 'replenish', 'report', 'reproa', 'reproach', 'reproved', 'require', 'required', 'requite', 'reserved', 'respect', 'rest', 'rested', 'restore', 'restored', 'restrained', 'return', 'returned', 'reviv', 'reward', 'rewarded', 'ri', 'rib', 'ribs', 'rich', 'riches', 'rid', 'ride', 'rider', 'right', 'righteous', 'righteousness', 'rightly', 'ring', 'ringstraked', 'ripe', 'rise', 'risen', 'riv', 'river', 'rode', 'rods', 'roll', 'rolled', 'roof', 'room', 'rooms', 'rose', 'roughly', 'round', 'rouse', 'royal', 'rul', 'rule', 'ruled', 'ruler', 'rulers', 'run', 's', 'sa', 'sac', 'sack', 'sackcloth', 'sacks', 'sacrifice', 'sacrifices', 'sad', 'saddled', 'sadly', 'said', 'saidst', 'saith', 'sake', 'sakes', 'salt', 'salvation', 'same', 'sanctified', 'sand', 'sat', 'save', 'saved', 'saving', 'savour', 'savoury', 'saw', 'sawest', 'say', 'saying', 'scarce', 'scarlet', 'scatter', 'scattered', 'sceptre', 'sea', 'searched', 'seas', 'season', 'seasons', 'second', 'secret', 'secretly', 'see', 'seed', 'seedtime', 'seeing', 'seek', 'seekest', 'seem', 'seemed', 'seen', 'seest', 'seeth', 'selfsame', 'selfwill', 'sell', 'send', 'sent', 'separate', 'separated', 'sepulchre', 'sepulchres', 'serpent', 'serva', 'servan', 'servant', 'servants', 'serve', 'served', 'service', 'set', 'seven', 'sevenfold', 'sevens', 'seventeen', 'seventeenth', 'seventh', 'seventy', 'sewed', 'sh', 'shadow', 'shall', 'shalt', 'shamed', 'shaved', 'she', 'sheaf', 'shear', 'sheaves', 'shed', 'sheddeth', 'sheep', 'sheepshearers', 'shekel', 'shekels', 'shepherd', 'shepherds', 'shew', 'shewed', 'sheweth', 'shield', 'ships', 'shoelatchet', 'shore', 'shortly', 'shot', 'should', 'shoulder', 'shoulders', 'shouldest', 'shrank', 'shrubs', 'shut', 'si', 'side', 'sight', 'signet', 'signs', 'silv', 'silver', 'sin', 'since', 'sinew', 'sinners', 'sinning', 'sir', 'sist', 'sister', 'sit', 'six', 'sixteen', 'sixth', 'sixty', 'skins', 'slain', 'slaughter', 'slay', 'slayeth', 'sle', 'sleep', 'slept', 'slew', 'slime', 'slimepits', 'small', 'smell', 'smelled', 'smite', 'smoke', 'smoking', 'smooth', 'smote', 'so', 'sod', 'softly', 'sojourn', 'sojourned', 'sojourner', 'sold', 'sole', 'solemnly', 'some', 'son', 'songs', 'sons', 'soon', 'sore', 'sorely', 'sorrow', 'sort', 'sou', 'sought', 'soul', 'souls', 'south', 'southward', 'sow', 'sowed', 'space', 'spake', 'spare', 'spe', 'speak', 'speaketh', 'speaking', 'speckl', 'speckled', 'spee', 'speech', 'speed', 'speedily', 'spent', 'spi', 'spicery', 'spices', 'spies', 'spilled', 'spirit', 'spoil', 'spoiled', 'spoken', 'sporting', 'spotted', 'spread', 'springing', 'sprung', 'staff', 'stalk', 'stand', 'standest', 'stars', 'state', 'statutes', 'stay', 'stayed', 'ste', 'stead', 'steal', 'steward', 'still', 'stink', 'sto', 'stole', 'stolen', 'stone', 'stones', 'stood', 'stooped', 'stopped', 'store', 'storehouses', 'stories', 'straitly', 'strakes', 'strange', 'stranger', 'strangers', 'straw', 'street', 'strength', 'strengthened', 'stretched', 'stricken', 'strife', 'stript', 'strive', 'strong', 'stronger', 'strove', 'struggled', 'stuff', 'subdue', 'submit', 'substance', 'subtil', 'subtilty', 'such', 'suck', 'suffered', 'summer', 'sun', 'supplanted', 'sure', 'surely', 'surety', 'sustained', 'sware', 'swear', 'sweat', 'sweet', 'sword', 'sworn', 'tabret', 'tak', 'take', 'taken', 'talked', 'talking', 'tar', 'tarried', 'tarry', 'teeth', 'tell', 'tempt', 'ten', 'tender', 'tenor', 'tent', 'tenth', 'tents', 'terror', 'th', 'than', 'that', 'the', 'thee', 'their', 'them', 'themselv', 'themselves', 'then', 'thence', 'there', 'thereby', 'therefore', 'therein', 'thereof', 'thereon', 'these', 'they', 'thi', 'thicket', 'thigh', 'thin', 'thine', 'thing', 'things', 'think', 'third', 'thirteen', 'thirteenth', 'thirty', 'this', 'thistles', 'thither', 'thoroughly', 'those', 'thou', 'though', 'thought', 'thoughts', 'thousand', 'thousands', 'thread', 'three', 'threescore', 'threshingfloor', 'throne', 'through', 'throughout', 'thus', 'thy', 'thyself', 'tidings', 'till', 'tiller', 'tillest', 'tim', 'time', 'times', 'tithes', 'to', 'togeth', 'together', 'toil', 'token', 'told', 'tongue', 'tongues', 'too', 'took', 'top', 'tops', 'torn', 'touch', 'touched', 'toucheth', 'touching', 'toward', 'tower', 'towns', 'tr', 'trade', 'traffick', 'trained', 'travail', 'travailed', 'treasure', 'tree', 'trees', 'trembled', 'trespass', 'tribes', 'tribute', 'troop', 'troubled', 'trough', 'troughs', 'tru', 'true', 'truly', 'truth', 'turn', 'turned', 'turtledove', 'twel', 'twelve', 'twentieth', 'twenty', 'twice', 'twins', 'two', 'unawares', 'uncircumcised', 'uncovered', 'under', 'understand', 'understood', 'ungirded', 'unit', 'unleavened', 'until', 'unto', 'up', 'upon', 'uppermost', 'upright', 'upward', 'urged', 'us', 'utmost', 'vagabond', 'vail', 'vale', 'valley', 'vengeance', 'venison', 'verified', 'verily', 'very', 'vessels', 'vestures', 'victuals', 'vine', 'vineyard', 'violence', 'violently', 'virgin', 'vision', 'visions', 'visit', 'visited', 'voi', 'voice', 'void', 'vow', 'vowed', 'vowedst', 'w', 'wa', 'wages', 'wagons', 'waited', 'walk', 'walked', 'walketh', 'walking', 'wall', 'wander', 'wandered', 'wandering', 'war', 'ward', 'was', 'wash', 'washed', 'wast', 'wat', 'watch', 'water', 'watered', 'watering', 'waters', 'waxed', 'waxen', 'way', 'ways', 'we', 'wealth', 'weaned', 'weapons', 'wearied', 'weary', 'week', 'weep', 'weig', 'weighed', 'weight', 'welfare', 'well', 'wells', 'went', 'wentest', 'wept', 'were', 'west', 'westwa', 'whales', 'what', 'whatsoever', 'wheat', 'whelp', 'when', 'whence', 'whensoever', 'where', 'whereby', 'wherefore', 'wherein', 'whereof', 'whereon', 'wherewith', 'whether', 'which', 'while', 'white', 'whither', 'who', 'whole', 'whom', 'whomsoever', 'whoredom', 'whose', 'whosoever', 'why', 'wi', 'wick', 'wicked', 'wickedly', 'wickedness', 'widow', 'widowhood', 'wife', 'wild', 'wilderness', 'will', 'willing', 'wilt', 'wind', 'window', 'windows', 'wine', 'winged', 'winter', 'wise', 'wit', 'with', 'withered', 'withheld', 'withhold', 'within', 'without', 'witness', 'wittingly', 'wiv', 'wives', 'wo', 'wolf', 'woman', 'womb', 'wombs', 'women', 'womenservan', 'womenservants', 'wondering', 'wood', 'wor', 'word', 'words', 'work', 'worse', 'worship', 'worshipped', 'worth', 'worthy', 'wot', 'wotteth', 'would', 'wouldest', 'wounding', 'wrapped', 'wrath', 'wrestled', 'wrestlings', 'wrong', 'wroth', 'wrought', 'y', 'ye', 'yea', 'year', 'yearn', 'years', 'yesternight', 'yet', 'yield', 'yielded', 'yielding', 'yoke', 'yonder', 'you', 'young', 'younge', 'younger', 'youngest', 'your', 'yourselves', 'youth'] 2789
len(set(text3)) / len(text3)
0.06230453042623537
print(text3.count("smote"))
#문서에서 출현한 smote의 빈도
print(len(text3))
print(100 * text3.count('smote') / len(text3))
#smote가 문서에서 차지하는 비율
#Q3.text5의 lol의 비율을 구하고 해석을 고민해보자.
5 44764 0.01116968992940756
fdist1=FreqDist(text1)
print(fdist1) #type, token
<FreqDist with 19317 samples and 260819 outcomes>
type(fdist1)
nltk.probability.FreqDist
fdist1.most_common(50)
[(',', 18713), ('the', 13721), ('.', 6862), ('of', 6536), ('and', 6024), ('a', 4569), ('to', 4542), (';', 4072), ('in', 3916), ('that', 2982), ("'", 2684), ('-', 2552), ('his', 2459), ('it', 2209), ('I', 2124), ('s', 1739), ('is', 1695), ('he', 1661), ('with', 1659), ('was', 1632), ('as', 1620), ('"', 1478), ('all', 1462), ('for', 1414), ('this', 1280), ('!', 1269), ('at', 1231), ('by', 1137), ('but', 1113), ('not', 1103), ('--', 1070), ('him', 1058), ('from', 1052), ('be', 1030), ('on', 1005), ('so', 918), ('whale', 906), ('one', 889), ('you', 841), ('had', 767), ('have', 760), ('there', 715), ('But', 705), ('or', 697), ('were', 680), ('now', 646), ('which', 640), ('?', 637), ('me', 627), ('like', 624)]
fdist1['whale']
906
fdist1.plot(10,cumulative=True)
fdist1.hapaxes()#한번 출현한 단어들을 보여줌.
['recede', 'clayey', 'hardest', 'greedy', 'transferringly', 'corkscrew', 'salamander', 'languishing', 'insupportable', 'dun', 'circumnavigations', 'lunges', 'rogues', 'gizzard', 'stifle', 'bedraggled', 'subdivide', 'nosed', 'crimsoned', 'Tusked', 'Carson', 'bravadoes', 'Vermont', 'discreetly', 'Grenadier', 'bought', 'Want', 'Pressing', 'fatherless', 'enchantment', 'deformity', 'hoarded', 'Maccabees', 'Cholo', 'blusterer', 'Anyhow', 'genteel', 'forethrown', 'FEET', 'hieroglyphical', 'Partners', 'Created', 'Plains', 'speckled', 'splendors', 'BRACTON', 'overbalance', 'absorbingly', 'strays', 'meddling', 'Bentham', 'boatmen', 'couldst', 'vignettes', 'APPLICATION', 'swagger', 'silences', 'Herman', 'Caw', 'Kedron', 'servile', 'Randolphs', 'paregoric', 'clenching', 'literal', 'amputations', 'analogical', 'COWPER', 'bush', 'lover', 'frisky', 'Respectively', 'Retribution', 'grievances', 'incommodiously', 'arrah', 'swum', 'Mistress', 'honouring', 'repute', 'Plunge', 'sinker', 'OAKES', 'stultifying', 'ungracious', 'unskilful', 'illness', 'gross', 'End', 'Doctor', 'reclines', 'ruffled', 'augmented', 'cutlets', 'wainscots', 'Lights', 'infantileness', 'agonies', 'enticing', 'Roses', ';"--', 'invertedly', 'honesty', 'bounteous', 'CHACE', 'rechurned', 'gaffs', 'Kills', 'madden', 'penetrating', 'FIGURED', 'studious', 'Pity', 'insufficient', 'YARD', 'satirizing', 'stumbled', 'tunnel', 'bucks', 'Detached', 'comparable', 'bleakness', 'retraced', 'mountaineers', 'Giver', 'orchestra', 'analysis', 'enthrone', 'Saul', 'Turkey', 'soles', 'Led', 'ruffed', 'railways', 'weeps', 'grasps', 'droves', 'summits', 'sulphurous', 'Fountain', 'drench', 'recrossing', 'douse', 'Belated', 'hornpipe', 'Ombay', 'reproachfully', 'dissolutions', 'kine', 'alpacas', 'graved', 'thrills', 'sobriety', 'voided', 'unrolling', 'cinders', 'uncomfortableness', 'THAR', 'outdone', 'radical', 'Archbishop', '129', 'bumpers', 'AROUND', 'guarding', 'FIRMLY', 'prose', 'couples', 'Channel', 'scorchingly', 'missent', 'chases', 'wipe', 'RED', 'particles', 'Louisiana', 'baulks', 'mannikin', 'paralysed', 'missionaries', 'FEGEE', 'metaphysically', 'habitual', 'exception', 'crackers', 'Usually', 'Miserable', 'familyless', 'perpetuates', 'Warmest', 'conjure', 'aliment', 'autumn', 'bearings', 'vacillations', 'assistance', 'indite', 'characterized', 'gamboge', 'outspreadingly', 'spraining', 'maddens', 'patches', 'landscapes', 'tenement', 'nestling', 'Power', 'hussar', 'ebbs', 'manoeuvred', 'persuasiveness', 'Orleans', 'appoint', 'pledges', 'conceives', 'maccaroni', 'mooted', 'unendurable', 'fixing', 'Zeuglodon', 'HIMSELF', 'ulceration', 'fencer', 'footman', 'complain', 'evanescence', 'silks', 'drugging', 'Inasmuch', 'lacks', 'mystically', 'Heart', 'Levanter', 'shan', 'searches', 'shave', 'fattening', 'senate', 'unlock', 'imposed', 'needful', 'Exploring', 'CHAPTERS', 'impeach', 'odoriferous', 'giddy', 'favoured', 'horrifying', 'warfare', 'deliriums', 'Help', 'cloves', 'Decapitation', 'unscientific', 'roundingly', 'transit', 'straddling', 'revels', 'marchings', 'Bellies', 'beaters', 'sagacity', 'Patience', 'threatens', 'wasps', 'Earthsman', 'soliloquizer', 'Constantine', 'exterminated', 'Japans', 'repent', 'Anak', 'Pampas', 'infatuation', 'pelvis', 'abjectly', 'seignories', 'flanked', 'SOLANDER', 'RESPECTABLE', 'dissent', 'catalogue', 'transformed', 'SCREWS', 'append', 'refining', 'religionists', 'resent', 'Sachem', 'unwound', 'starved', 'corruption', 'middling', 'overgrowing', 'outyell', 'peltry', 'Netherlands', 'unchanged', 'farmers', 'consulting', 'TWISTED', 'ATTACK', 'slaughter', 'NOSTRIL', 'Sixteen', 'Secretary', 'groin', 'Measured', '5TH', 'GOES', 'Either', 'capsize', 'divined', 'elevations', 'persuading', 'threading', 'WAY', 'bladder', 'tipping', 'dungeoned', 'symbolically', 'tiny', 'Oft', 'IAN', 'educated', 'tissues', 'Pottsfich', 'Mesopotamian', 'Utter', 'interrupt', 'shallowest', 'clearer', 'penning', 'reprehensible', 'classical', 'snuffling', 'Fisheries', 'libertines', 'muffledness', 'apertures', 'clad', 'sashless', 'unbiased', 'confines', 'foible', 'REQUIEM', 'jostle', 'lancers', 'judgmatically', 'ungainly', 'consecrating', 'unequal', 'comets', 'brutal', 'expert', 'creative', 'conducting', 'Meshach', 'bat', 'Probably', 'frankincense', 'saving', 'backwardly', 'spoiling', 'roundly', 'launch', 'paternity', 'simplicity', 'strutting', 'Pompey', 'perusal', 'chastisements', 'peeringly', 'panel', 'bewildered', 'prophesies', 'bountifully', 'bilocular', 'initials', 'aggregations', 'stair', 'habituated', 'blisters', 'appeal', 'durable', 'Judges', 'fared', 'Stylites', 'forswears', 'depose', 'disembowelments', 'betokening', 'gleamings', 'Sphinx', 'cynical', 'chalking', 'remembrances', 'engrossing', 'freight', 'tapered', 'dispenses', 'derision', 'burnish', 'developing', 'popularize', 'Bungle', 'Feegeeans', 'deposited', 'murmuring', 'limpid', 'feebler', 'elevation', 'gowns', 'cleets', 'Stammering', 'Horner', 'Said', 'Mohawk', 'inscribed', 'profitable', 'Slid', 'preservers', 'fetches', 'Pooh', 'disinfecting', 'HAILS', 'WIDOW', 'fastener', 'MEN', 'outspread', 'Cold', 'blisteringly', 'keg', 'kneepans', 'Earl', 'acerbities', 'uncheered', 'imminglings', 'overtaking', 'intrepidly', 'MSS', 'asses', 'Trinity', 'bevy', 'extinguishing', 'waxy', 'liberally', 'amidst', 'thirteenth', 'agrees', 'confessed', 'juniper', 'ungentlemanly', 'uncapturable', 'slippered', 'PARLIAMENT', 'overtakes', 'Stoic', 'acquiescence', 'lovings', 'jeering', 'treating', 'Horrible', 'allegory', 'Tiger', 'loungingly', 'CURRENTS', 'headmost', 'dalliance', 'Fiery', 'weazel', 'surest', 'bigamist', 'mace', 'napping', 'glee', 'betaken', 'ceti', 'Calais', 'superincumbent', 'aforesaid', 'substantiate', 'motionlessly', 'Rocky', 'sagged', 'parried', 'results', 'erudition', '76', 'cultivate', 'domineered', 'spiracles', 'Damocles', 'soladoes', 'stilts', 'thoughtfulness', 'confabulations', 'unsays', 'promissory', 'FLOOD', 'contingent', 'mobbing', 'fiendish', 'complaints', 'paine', 'reverential', 'Growlands', 'issues', 'submits', 'shindy', 'leopard', 'imperceptibly', 'ST', 'cats', 'tolling', 'STRAPS', 'slaughtering', 'pulsations', 'bestowal', 'butteries', 'Savage', 'sayst', 'Bag', 'herrings', 'rudeness', 'exhaustive', 'Future', 'SCATTER', 'le', 'dispute', 'descendants', 'brackish', 'rocket', 'Champagne', 'LINE', 'Somehow', 'ejaculated', 'Europa', 'scrambled', 'bodied', 'amounted', 'ARE', 'metropolis', 'sleeplessness', 'Saco', 'Starboard', 'props', 'Baling', 'sweethearts', 'destinations', '32', 'digester', 'inanimate', 'cartloads', 'unrifled', '24', 'shod', 'vintage', 'hermaphroditical', 'pirouetting', 'Bendigoes', 'Juba', 'childlessness', 'entrenchments', 'sentinels', 'usefulness', 'suffused', 'uncontinented', 'worldly', 'bordering', 'privations', 'ELIZABETH', 'disorder', 'HOMEWARD', 'fumbled', 'Melville', 'sixteenth', 'overleap', 'bushy', 'predominating', 'Lieutenant', 'professed', 'Joe', 'gauntleted', 'collecting', 'damsels', 'CONVERSATIONS', 'advised', 'finical', 'ruptured', 'joist', 'Affected', 'abominate', 'pitcher', 'watchmen', 'commentator', 'whets', 'hoop', '18', 'dentists', 'abstraction', 'sheered', 'emptying', 'interlacings', 'bartered', 'brats', 'Crockett', 'giddily', 'invariability', 'dictionaries', 'anticipative', 'damning', 'wept', "?'--'", 'lingers', 'Wise', 'Matsmai', 'bamboo', 'Remembering', 'tucking', 'remind', 'Chartering', 'Helena', 'Carthage', 'rending', 'Argo', 'Monsoons', 'nibbling', 'crashing', 'sordid', 'gnashing', 'uniqueness', 'entreated', 'skrimshandering', 'deepeningly', 'fencing', 'voluntary', 'Slope', 'Devils', 'modifies', 'Gurry', 'Physiognomy', 'Spinoza', 'KEDGER', 'loon', 'mannerly', 'mildewed', 'swaller', 'stig', 'Bonneterre', 'batten', 'PEQUOD', 'assigns', 'hacked', 'reliance', 'encasing', 'packs', 'wrapall', 'convulsive', 'designates', 'MAT', 'stereotype', 'sprouts', 'parade', 'keeper', 'Hindoos', 'perspective', 'GRIMLY', '131', 'inspectingly', 'bejuggled', 'sooth', 'trending', 'FLASHES', 'blessing', 'sinecure', 'analytic', 'Mason', 'phiz', 'undervalue', 'concernments', 'mornings', 'grooved', 'rifled', 'ENSUING', 'slanderous', 'complimentary', 'scolds', 'hastier', 'plebeian', 'toughness', 'exceeds', 'flyin', 'painters', 'communities', 'equanimity', 'Muffled', 'imported', 'wickedness', 'supernaturalism', 'Ebony', 'braining', 'Sodom', 'journeyman', 'raked', 'arbitrary', 'experiments', 'layeth', 'expresses', 'waxes', 'sharpest', 'pallet', 'YORK', 'blazed', 'exaggerating', 'inferentially', 'tanned', 'drawlingly', 'Deep', 'watergate', 'inkstand', 'digressively', 'TROIL', 'Physiognomist', 'spectral', 'Arter', 'fallacious', 'heartwoes', 'kindhearted', 'enchanter', 'patriot', 'Proceed', 'KETOS', 'Cattegat', 'Aldrovandi', 'dissemble', 'keeling', 'Exception', 'fastenings', 'heath', 'breedeth', 'Coke', 'panels', 'contingencies', 'Melancthon', 'tumblers', 'canonicals', 'unreluctantly', 'Judea', 'slatternly', 'Shake', 'salamed', 'thwack', 'quadrupeds', 'hilarity', 'lengthen', 'goring', 'CAP', 'tellin', 'voyaged', 'axles', 'Keeping', 'knockings', 'heraldic', 'HORRID', 'Dame', 'knightly', 'magnify', 'admonitory', 'flickering', 'fibrous', 'Stir', 'thunderings', 'regardless', 'toadstools', 'intensities', 'ungovernable', 'Saxon', 'Olassen', 'Bottom', 'butchering', 'gown', 'hie', 'disjointedly', 'firmer', 'unbecomingness', 'Horned', '15', 'reservoirs', 'bump', 'Befooled', 'communicated', 'freshening', 'convicts', 'siding', 'delectable', 'RICHARDSON', 'panellings', 'cosmopolite', 'songster', 'Sultan', 'boon', 'overruns', 'patrolled', 'reap', 'uniformity', 'leadership', 'knaves', 'glitters', 'vats', 'bitterer', 'palpableness', 'Wretched', 'markest', 'Swimming', 'incidental', 'overrunningly', 'cheered', 'affecting', 'cylindrically', 'Split', 'Ochotsh', 'whalin', 'blackest', 'BREACH', 'unappalled', 'torsoes', 'Common', 'asunder', 'spectre', 'wigwams', 'sigh', 'Jig', 'sinks', 'patentees', 'rural', 'Hydriote', 'disrated', 'cypher', 'changeful', 'mustered', 'aesthetics', 'Scorpio', 'demonstrations', 'distilled', 'Won', 'quaff', 'Canadian', 'bulge', 'OCTAVOES', 'fished', 'osseous', 'Regent', 'derisive', 'Satanic', 'Ezekiel', 'Gros', 'knob', 'untidy', 'inuendoes', 'squatting', 'premised', 'elevate', 'inactive', 'BOARD', 'palavering', 'DARKENS', 'beholder', 'forges', 'pertains', 'weaves', 'chilled', 'Abominable', 'Kingdom', 'exotic', 'Expedition', 'cleats', 'cannikin', 'princess', '49', 'allowances', 'Welding', 'slacken', 'Raise', 'spiral', 'Whosoever', 'nipper', 'soils', 'drilled', 'incomputable', 'amuck', 'pedestals', 'clusters', 'ineffably', 'BELOW', 'abstained', 'basso', 'dissociated', 'instructions', 'rumor', 'indigenous', 'Regarded', 'Immense', 'coax', 'slink', 'unsettled', 'undetached', 'Yoke', 'PROGRESS', 'tee', 'abbreviation', 'showest', 'gruff', 'haunt', 'Pascal', 'caput', 'passively', 'parcelling', 'sends', 'Pandects', 'Englander', 'Cancer', 'Roll', 'transfix', 'sprat', 'gnaw', 'tumults', 'empties', 'retires', 'glutinous', 'ERROMANGOAN', 'gesticulated', 'harvesting', 'piercer', 'Berlin', 'deathful', 'peer', 'Cave', 'midship', 'Lit', 'Cato', 'BIT', 'detects', 'Tattoo', 'anomalous', 'masted', 'enkindling', 'frayed', 'worried', 'shrinked', 'oilpainting', 'amputated', 'Advancement', 'rugged', 'passionlessness', 'pealing', 'deserving', 'CHEERLY', 'sinned', 'grows', 'perturbation', 'Friar', 'crumpled', 'hostility', 'mud', 'undressing', 'localness', 'soulless', 'BEWARE', 'intervene', 'tougher', 'wrinkling', 'tubes', 'truths', 'shipyards', 'lookouts', 'solaces', 'Paint', 'lurches', 'crucified', 'aboriginalness', 'trover', 'Tekel', 'shyness', '73', 'exhort', 'Astronomy', 'samphire', 'hallo', 'albatrosses', 'sympathetical', 'spice', 'predicament', 'miner', 'adjust', 'punctiliously', 'hymns', 'coyings', 'marge', 'irregularity', ...]
V = set(text1)
long_words = [w for w in V if len(w) > 15]
sorted(long_words)
['CIRCUMNAVIGATION', 'Physiognomically', 'apprehensiveness', 'cannibalistically', 'characteristically', 'circumnavigating', 'circumnavigation', 'circumnavigations', 'comprehensiveness', 'hermaphroditical', 'indiscriminately', 'indispensableness', 'irresistibleness', 'physiognomically', 'preternaturalness', 'responsibilities', 'simultaneousness', 'subterraneousness', 'supernaturalness', 'superstitiousness', 'uncomfortableness', 'uncompromisedness', 'undiscriminating', 'uninterpenetratingly']
fdist5 = FreqDist(text5)
sorted(w for w in set(text5) if len(w) > 7 and fdist5[w] > 7)
['#14-19teens', '#talkcity_adults', '((((((((((', '........', 'Question', 'actually', 'anything', 'computer', 'cute.-ass', 'everyone', 'football', 'innocent', 'listening', 'remember', 'seriously', 'something', 'together', 'tomorrow', 'watching']
list(nltk.bigrams(['more', 'is', 'said', 'than', 'done']))
[('more', 'is'), ('is', 'said'), ('said', 'than'), ('than', 'done')]
text4.collocations()
United States; fellow citizens; four years; years ago; Federal Government; General Government; American people; Vice President; Old World; Almighty God; Fellow citizens; Chief Magistrate; Chief Justice; God bless; every citizen; Indian tribes; public debt; one another; foreign nations; political parties
[len(w) for w in text1]
fdist = FreqDist(len(w) for w in text1)
print(fdist)
fdist #단어의 길이:출현빈도
<FreqDist with 19 samples and 260819 outcomes>
FreqDist({1: 47933, 2: 38513, 3: 50223, 4: 42345, 5: 26597, 6: 17111, 7: 14399, 8: 9966, 9: 6428, 10: 3528, 11: 1873, 12: 1053, 13: 567, 14: 177, 15: 70, 16: 22, 17: 12, 18: 1, 20: 1})
fdist.most_common()
[(3, 50223), (1, 47933), (4, 42345), (2, 38513), (5, 26597), (6, 17111), (7, 14399), (8, 9966), (9, 6428), (10, 3528), (11, 1873), (12, 1053), (13, 567), (14, 177), (15, 70), (16, 22), (17, 12), (18, 1), (20, 1)]
fdist.max()
3
fdist[3]
50223
fdist.freq(3)#단어의 길이가 3인 아이들은 전체 20%를 차지한다.
#nltk에서 제공하는 도수분포와 관련된 함수는 table3.1에 정리되어있다.
0.19255882431878046