more marcdb
This morning Clay and I were chatting about Library of Congress Subject Headings and SKOS a bit. At one point we found ourselves musing about how much reuse there is of topical subdivisions in topical headings in the LC authority file. You know how it is. Anyhow, I remembered that I’d used marcdb to import all of Simon Spiro’s authority data–so I fired up psql and wrote a query:
SELECT subfields.value AS subdivision, count(*) AS total FROM subfields, data_fields WHERE subfields.code = 'x' AND subfields.data_field_id = data_fields.id AND data_fields.tag = '150' GROUP BY subfields.value ORDER BY total DESC;
And a few seconds later…
subdivision | total --------------------------------------+------- Law and legislation | 3342 Religious aspects | 2500 Buddhism, [Christianity, etc.] | 898 History | 847 Equipment and supplies | 571 Taxation | 566 Baptists, [Catholic Church, etc.] | 476 Diseases | 450 Research | 422 Campaigns | 378 Awards | 342 Finance | 284 Study and teaching | 284 Surgery | 275 Employees | 269 Spectra | 261 Computer programs | 259 Labor unions | 218 Testing | 207 Diagnosis | 194 Isotopes | 190 Complications | 183 Physiological effect | 172 Programming | 163
There’s nothin’ like the smell of strong set theory in the morning. Although something seems a bit fishy about [Christianity, etc.] and [Catholic Church, etc.]… If you want to try similar stuff and don’t want to wait hours for marcdb to import all the data and you use postgres, here’s the full database dump which you ought to be able to import:
% createdb authorities % wget http://inkdroid.org/data/authorities.sql.bz2 % bunzip2 authorities.sql.bz2 % psql authorities < authorities.sql