Skip to content

Tag Archives: lc

q & a

Q: What do 100 year old knitting patterns and a lost Robert Louis-Stevenson story have in common?
A: A digitally preserved newspaper page.
Q: What about if you add:

URIs for knitting materials
William Blake’s Engravings
The similarities/differences between XMPP, HTTP and NNTP
Web crawling as data integration
Project coordination with rooms on FriendFeed
brewing Kombucha

A: Just a typical lunch time conversation at [...]

digital-curation

Some folks at LC and CDL are trying to kick-start a new public discussion list for talking about digital curation in its many guises: repositories, tools, standards, techniques, practices, etc. The intuition being that there is a social component to the problems of digital preservation and repository interoperability.
Of course NDIIPP (the arena for the [...]

BagIt

One little bit of goodness that has percolated out from my group at $work in collaboration with the California Digital Library is the BagIt spec (more readable version). BagIt is an IETF RFC for bundling up files for transfer over the network, or for shipping on physical media. Just yesterday a little article about BagIt [...]

justify my links

Thanks to a tip from Ian, I’m looking forward to (hopefully) attending the Linked Data Planet conference in New York City as a volunteer. The idea is that I just have to pay for my hotel, and the cost of admission is waived. It seems my travel money is a bit limited at the moment [...]

tripleshot

Recently there was a bit of interesting news around a MARBI Discussion Paper 2008-DP04 regarding semweb technologies at LC.

Related to this work are RDF/OWL representations and models for MODS and MARC, which we are also developing. Several representations of MODS in RDF/OWL, such as the one from the SIMILE project, have been made [...]

WoGroFuBiCo cloud

access accessible addition al american analysis application applications appropriate archives areas association authority available based benefit benefits bibliographic broad broader catalog catalogers cataloging catalogs cataloguing chain change changes classification code collaboration collections committee communities community congress consequences consider considered content continue control controlled cooperative cost costs create created creating creation current data databases dc description [...]

WoGroFuBiCo wc

word
count

library
263

bibliographic
236

data
170

libraries
144

lc
127

control
109

information
98

cataloging
91

records
88

subject
82

materials
81

standards
81

use
80

congress
79

work
76

record
73

community
67

users
61

working
59

group
58

access
57

recommendations
56

resources
53

authority
52

metadata
47

future
46

new
40

environment
37

development
37

web
36

collections
35

systems
35

available
35

creation
35

services
34

headings
32

national
31

findings
30

research
30

unique
29

sharing
29

oclc
28

model
28

catalog
28

international
27

develop
27

value
27

lcsh
26

pcc
26

user
26

need
26

report
25

make
25

practices
25

rda
25

used
25

time
24

needs
24

rare
24

including
24

provide
23

discovery
23

communities
23

special
23

frbr
23

current
22

resource
22

rules
22

digital
21

cooperative
21

program
21

participants
21

management
21

service
20

dc
20

programs
20

online
20

costs
20

washington
20

standard
19

support
19

knowledge
19

different
19

appropriate
19

effort
18

applications
18

marc
18

shared
18

exchange
18

process
18

changes
17

lcs
17

increase
16

public
16

search
16

creating
16

broader
16

catalogs
16

controlled
16

I converted the pdf to text file called ‘lc’ with xpdf and then wrote a little python:

#!/usr/bin/env python
 
from urllib import urlopen
from re import sub
 
stop_words = urlopen(’http://www.dcs.gla.ac.uk/idom/ir_resources/linguistic_utils/stop_words’).read().split()
text = file(’lc’).read()
 
counts = {}
for word in text.split():
word = word.lower()
word = sub(r’\W’, ”, word)
word = sub(r’\d+’, ”, word)
[...]