Skip to content

Tag Archives: data

APIs Suck

With TransparencyCamp last weekend, news of the mandated use of feed syndication by Federal Agencies receiving funds from the Recovery Act, recent blog posts by Tim O’Reilly and the Special Libraries Association, an article in Newsweek, news of Carl Malamud’s bid to become the Public Printer of the United States (aka head of the GPO), [...]

public.resource.org to liberate Code of Federal Regulations

good news via the govtrack mailing list

Carl Malamud of public.resource.org, with funding from a bunch of places including a small bit from GovTrack’s ad profits, announced his intention to purchase from the Government Printing Office documents they produce in the course of their statutory obligations and then have the nerve to sell back to the [...]

provide and enable

I got a chance to meet Jennifer Rigby of the National Archives UK at the LinkedDataPlanet Conference in New York City (thanks Ian). Jennifer is the Head of IT Strategy, and told me lots of interesting stuff related to a profound shift they’ve had in their online strategies to:

Provide and Enable

So rather than pouring [...]

WoGroFuBiCo cloud

access accessible addition al american analysis application applications appropriate archives areas association authority available based benefit benefits bibliographic broad broader catalog catalogers cataloging catalogs cataloguing chain change changes classification code collaboration collections committee communities community congress consequences consider considered content continue control controlled cooperative cost costs create created creating creation current data databases dc description [...]

WoGroFuBiCo wc

word
count

library
263

bibliographic
236

data
170

libraries
144

lc
127

control
109

information
98

cataloging
91

records
88

subject
82

materials
81

standards
81

use
80

congress
79

work
76

record
73

community
67

users
61

working
59

group
58

access
57

recommendations
56

resources
53

authority
52

metadata
47

future
46

new
40

environment
37

development
37

web
36

collections
35

systems
35

available
35

creation
35

services
34

headings
32

national
31

findings
30

research
30

unique
29

sharing
29

oclc
28

model
28

catalog
28

international
27

develop
27

value
27

lcsh
26

pcc
26

user
26

need
26

report
25

make
25

practices
25

rda
25

used
25

time
24

needs
24

rare
24

including
24

provide
23

discovery
23

communities
23

special
23

frbr
23

current
22

resource
22

rules
22

digital
21

cooperative
21

program
21

participants
21

management
21

service
20

dc
20

programs
20

online
20

costs
20

washington
20

standard
19

support
19

knowledge
19

different
19

appropriate
19

effort
18

applications
18

marc
18

shared
18

exchange
18

process
18

changes
17

lcs
17

increase
16

public
16

search
16

creating
16

broader
16

catalogs
16

controlled
16

I converted the pdf to text file called ‘lc’ with xpdf and then wrote a little python:

#!/usr/bin/env python
 
from urllib import urlopen
from re import sub
 
stop_words = urlopen(’http://www.dcs.gla.ac.uk/idom/ir_resources/linguistic_utils/stop_words’).read().split()
text = file(’lc’).read()
 
counts = {}
for word in text.split():
word = word.lower()
word = sub(r’\W’, ”, word)
word = sub(r’\d+’, ”, word)
[...]