This morning Clay and I were chatting about Library of Congress Subject Headings and SKOS a bit. At one point we found ourselves musing about how much reuse there is of topical subdivisions in topical headings in the LC authority file. You know how it is. Anyhow, I remembered that I’d used marcdb to import all of Simon Spiro’s authority data–so I fired up psql and wrote a query:

SELECT subfields.value AS subdivision, count(*) AS total
FROM subfields, data_fields
WHERE subfields.code = 'x'
  AND subfields.data_field_id = data_fields.id
  AND data_fields.tag = '150'
GROUP BY subfields.value
ORDER BY total DESC;

And a few seconds later…

 subdivision                          | total  
--------------------------------------+-------
 Law and legislation                  |  3342
 Religious aspects                    |  2500
 Buddhism, [Christianity, etc.]       |   898
 History                              |   847
 Equipment and supplies               |   571
 Taxation                             |   566
 Baptists, [Catholic Church, etc.]    |   476
 Diseases                             |   450
 Research                             |   422
 Campaigns                            |   378
 Awards                               |   342
 Finance                              |   284
 Study and teaching                   |   284
 Surgery                              |   275
 Employees                            |   269
 Spectra                              |   261
 Computer programs                    |   259
 Labor unions                         |   218
 Testing                              |   207
 Diagnosis                            |   194
 Isotopes                             |   190
 Complications                        |   183
 Physiological effect                 |   172
 Programming                          |   163

There’s nothin’ like the smell of strong set theory in the morning. Although something seems a bit fishy about [Christianity, etc.] and [Catholic Church, etc.]… If you want to try similar stuff and don’t want to wait hours for marcdb to import all the data and you use postgres, here’s the full database dump which you ought to be able to import:

  % createdb authorities
  % wget http://inkdroid.org/data/authorities.sql.bz2
  % bunzip2 authorities.sql.bz2
  % psql authorities < authorities.sql