WoGroFuBiCo wc
word | count |
---|---|
library | 263 | bibliographic | 236 |
data | 170 | libraries | 144 |
lc | 127 | control | 109 |
information | 98 | cataloging | 91 |
records | 88 | subject | 82 |
materials | 81 | standards | 81 |
use | 80 | congress | 79 |
work | 76 | record | 73 |
community | 67 | users | 61 |
working | 59 | group | 58 |
access | 57 | recommendations | 56 |
resources | 53 | authority | 52 |
metadata | 47 | future | 46 |
new | 40 | environment | 37 |
development | 37 | web | 36 |
collections | 35 | systems | 35 |
available | 35 | creation | 35 |
services | 34 | headings | 32 |
national | 31 | findings | 30 |
research | 30 | unique | 29 |
sharing | 29 | oclc | 28 |
model | 28 | catalog | 28 |
international | 27 | develop | 27 |
value | 27 | lcsh | 26 |
pcc | 26 | user | 26 |
need | 26 | report | 25 |
make | 25 | practices | 25 |
rda | 25 | used | 25 |
time | 24 | needs | 24 |
rare | 24 | including | 24 |
provide | 23 | discovery | 23 |
communities | 23 | special | 23 |
frbr | 23 | current | 22 |
resource | 22 | rules | 22 |
digital | 21 | cooperative | 21 |
program | 21 | participants | 21 |
management | 21 | service | 20 |
dc | 20 | programs | 20 |
online | 20 | costs | 20 |
washington | 20 | standard | 19 |
support | 19 | knowledge | 19 |
different | 19 | appropriate | 19 |
effort | 18 | applications | 18 |
marc | 18 | shared | 18 |
exchange | 18 | process | 18 |
changes | 17 | lcs | 17 |
increase | 16 | public | 16 |
search | 16 | creating | 16 |
broader | 16 | catalogs | 16 |
controlled | 16 |
I converted the pdf to text file called ‘lc’ with xpdf and then wrote a little python:
#!/usr/bin/env python
from urllib import urlopen
from re import sub
stop_words = urlopen('http://www.dcs.gla.ac.uk/idom/ir_resources/linguistic_utils/stop_words').read().split()
text = file('lc').read()
counts = {}
for word in text.split():
word = word.lower()
word = sub(r'\W', '', word)
word = sub(r'\d+', '', word)
if word == '' or word in stop_words: continue
counts[word] = counts.get(word,0) + 1
words = counts.keys()
words.sort(lambda a,b: cmp(counts[b], counts[a]))
for word in words[0:100]:
print "%20s %i" % (word, counts[word])
Does me writing code to read the report count as reading the report? …