Wednesday, April 30th, 2008
If you happen to be in the DC area on May 8th and are interested in linked data and the practical application of semantic web technologies like RDF and OWL please join us at the Library of Congress for a presentation by Alistair Miles, key developer of SKOS, and semantic web practitioner at the University of Oxford.
Below is the announcement, I hope you can make it. Oh, and if you are really interested in this stuff we’re having some brown bag sessions later in the afternoon that you are welcome to attend, just email me at ehs [at] pobox [dot] com.
The Simple Knowledge Organization System (SKOS), in the Context of Semantic Web Deployment, Alistair Miles, University of Oxford
May 8th 10am –
6th 11:30am, 2008,
Montepelier Room, Madison Building, Library of Congress (map) .
Links are valuable. Links between documents, between people, between ideas, between data. Data is now a first class Web citizen, and the Web is expanding as more of these valuable networks are deployed within its fabric. Well-established knowledge organization systems like the Library of Congress Subject Headings will play a major role within these networks, as hubs, connecting people with information and providing a firm foundation for network growth as many new routes to the discovery of information emerge through the collective action of individuals. Or will they?
This talk introduces the Simple Knowledge Organization System (SKOS), a soon-to-be-completed W3C standard for publishing thesauri, classification schemes and subject headings as linked data in the Web. This talk also presents SKOS in the context of the W3C’s Semantic Web Activity, and in particular the work of the W3C’s Semantic Web Deployment Working Group where other specifications are being developed for publishing linked data in the Web, for embedding linked data in Web pages, and for managing Semantic Web vocabularies. Finally, this talk takes a mildly inquisitive look at the value propositions for linked data in the Web, and how LCSH might be deployed in the Web for better information discovery.
Alistair’s background is in the development of Web technologies for scientific applications. He was a research associate in the e-Science department of the Rutherford Appleton Laboratory, where he was introduced to Semantic Web technologies and first developed SKOS. He has recently moved to the University of Oxford to work on linking fruit fly genomics research data, and he hopes everything he knows about the Semantic Web will turn out to be useful after all.
Tags: cataloging, lcsh, libraries, links, metadata, owl, rdf, skos, web
Posted in libraries, opensource, semweb | 2 Comments »
Friday, January 11th, 2008
access accessible addition al american analysis application applications appropriate archives areas association authority available based benefit benefits bibliographic broad broader catalog catalogers cataloging catalogs cataloguing chain change changes classification code collaboration collections committee communities community congress consequences consider considered content continue control controlled cooperative cost costs create created creating creation current data databases dc description descriptive desired develop developed development different digital discovery distribution dublin ed education effort encourage enhance environment et evidence exchange exist findings focus format formats frameworks frbr future greater group headings hidden identifiers identify ifla impact include including increase increasingly information institution institutions international knowledge language lc lcs lcsh libraries limited lis maintaining make management marc materials metadata model national need needs networks new number oclc online organization organizations outcomes outside participants particular pcc possible potential practice practices primary principles process processes production program programs provide public publishers quo range rare rda recommendations records reference relationships report require requirements research resource resources responsibility results role rules search serve service services share shared sharing sources special specific standards states status subject supply support systems technology terms time today tools types unique united university use used users using value variety various vendors vocabularies washington ways web working works
same stats as before, but the top 200 this time, and as a cloud. It’s crying out for some kind of stemming to collapse some terms together I suppose…but it’s also 3:17AM.
Tags: bibliography, cataloging, cloud, data, lc, libraries
Posted in libraries | 1 Comment »
Thursday, January 10th, 2008
| word |
count |
| library |
263 |
bibliographic |
236 |
| data |
170 |
libraries |
144 |
| lc |
127 |
control |
109 |
| information |
98 |
cataloging |
91 |
| records |
88 |
subject |
82 |
| materials |
81 |
standards |
81 |
| use |
80 |
congress |
79 |
| work |
76 |
record |
73 |
| community |
67 |
users |
61 |
| working |
59 |
group |
58 |
| access |
57 |
recommendations |
56 |
| resources |
53 |
authority |
52 |
| metadata |
47 |
future |
46 |
| new |
40 |
environment |
37 |
| development |
37 |
web |
36 |
| collections |
35 |
systems |
35 |
| available |
35 |
creation |
35 |
| services |
34 |
headings |
32 |
| national |
31 |
findings |
30 |
| research |
30 |
unique |
29 |
| sharing |
29 |
oclc |
28 |
| model |
28 |
catalog |
28 |
| international |
27 |
develop |
27 |
| value |
27 |
lcsh |
26 |
| pcc |
26 |
user |
26 |
| need |
26 |
report |
25 |
| make |
25 |
practices |
25 |
| rda |
25 |
used |
25 |
| time |
24 |
needs |
24 |
| rare |
24 |
including |
24 |
| provide |
23 |
discovery |
23 |
| communities |
23 |
special |
23 |
| frbr |
23 |
current |
22 |
| resource |
22 |
rules |
22 |
| digital |
21 |
cooperative |
21 |
| program |
21 |
participants |
21 |
| management |
21 |
service |
20 |
| dc |
20 |
programs |
20 |
| online |
20 |
costs |
20 |
| washington |
20 |
standard |
19 |
| support |
19 |
knowledge |
19 |
| different |
19 |
appropriate |
19 |
| effort |
18 |
applications |
18 |
| marc |
18 |
shared |
18 |
| exchange |
18 |
process |
18 |
| changes |
17 |
lcs |
17 |
| increase |
16 |
public |
16 |
| search |
16 |
creating |
16 |
| broader |
16 |
catalogs |
16 |
| controlled |
16 |
I converted the pdf to text file called ‘lc’ with xpdf and then wrote a little python:
#!/usr/bin/env python
from urllib import urlopen
from re import sub
stop_words = urlopen('http://www.dcs.gla.ac.uk/idom/ir_resources/linguistic_utils/stop_words').read().split()
text = file('lc').read()
counts = {}
for word in text.split():
word = word.lower()
word = sub(r'\W', '', word)
word = sub(r'\d+', '', word)
if word == '' or word in stop_words: continue
counts[word] = counts.get(word,0) + 1
words = counts.keys()
words.sort(lambda a,b: cmp(counts[b], counts[a]))
for word in words[0:100]:
print "%20s %i" % (word, counts[word])
Does me writing code to read the report count as reading the report? …
Tags: bibliography, cataloging, data, lc, libraries, metadata, word
Posted in libraries, python | 3 Comments »