natha 0 Newbie Poster

Please forgive the long post, but I thought it best to be thorough, even if I'm only looking for pointers.

I am working on a schema for a PostGreSQL dbase that is meant to contain all the metadata for all our Digital assets that are for distribution on online: audio, video, multi-media and text/html publications on our web site. This will serve as a back end to drive a front end CMS. We are still in the early stages of design. Having looked into (and used some of these) the backends of Drupal, Joomla, WordPress et al I'm convince that we need a more human readable framework and I've adopted the Dublin Core (DCIM) set of terms as a beginning with the addition of additional field names from the Media Annotation Initiative. My problem is: DCIM vocabulary is RDF in nature and doesn't translate very well into a dBase schema. (See initial field lists below, some of which will be keys from other tables)

I'm a bit of a newbie and obviously out of my depth , but being fearless I go forward anyway...My problem and question today is three fold:

1) are there any models digital-library assets for this that I could look at? I feel like I'm reinventing the wheel, but it may not be a bad idea.
2) if you have a single "title" of a digital asset that has 6 incarnations: print (URL locator would in our eStore), text/html version (free on line,) audio version, ePub version (available thru iBookstore), PDF, and an iPhone app. I'm not sure how best to build this... do we have a single record for the Title, and then we have additional "keys" that tell us what are the various formats the title is in? or is it better to have a single record for each of the formats (they have very different properties, URL locators, mime-types etc. for the same "Digital Asset"

3) many of the text/html resources are better handled by actually putting the data to be presented, into the dBase itself and then using CGI and JS to drive runtime delivery. This is especially true where we may have a title in several languages. Dublin Core addresses metadata for "unitary" assets, from the library model where the "object" for which metadata is stored is a book or an article

For some of what we do, we would like to access the text/html data in a very "atomic" form. e.g. from a series of verse from the Rig-Veda, (10,000 or more) we may only keep 2000 on line. I'm trying to find a model (I'm sure it exists in the world of academic dBase design) for dealing with these, what I call "text Fragments" which are not entire books, but parts of a book or parts of a work. where e.g. a single record contains a single verse. Now the citation for a verse is easily defined by bibliographic-library standards where a typical full citation might include some or all of the following:

Collection Title, Volume No, Part, Section, Chapter, Verse Number (or paragraph number) e.g

Artharva Veda, Bk 1, Part 1, Section 1, Chapter 4, Vs 18

How do we set up a schema where the text fragments such that we can meet the output requirement where text fragments which are adjacent-together in the original work, need to be gathered and concatenated correctly as a unit, in order for web delivery? e.g.

Artharva Veda, Bk 1, Part 1, Section 1, Chapter 4, Vs 18
Artharva Veda, Bk 1, Part 1, Section 1, Chapter 4, Vs 19
Artharva Veda, Bk 1, Part 1, Section 1, Chapter 4, Vs 20
(where the above may be all about same subject. e.g. Law and order)

Our real life use case is Vedic Verses where the current concept is one verse per record. How do we get all five of the verses we want to be together, together on output? Are they tied by subject? Are they tied by some precise ordinal enumerators in the bibliographic citation? If so how do we handle such a citation in terms of data base fields? would it be better to keep all the versse for a single chapter in a singel record with some delimiter and then programatically extract their number on query? (get the third line of fld "chapter 4" = ch4, vs 3)

here is what I have so far, without a solution to the above problem of storing text fragments, fields for their bibliographic citation.

Anyone can point me to existing models for Digital Assets that I can look at? and Models for storing text fragments and their citations?

thanks, again sorry this is so long but sometimes more details is better

Natha

---------


table: Collection
* collection_id -KEY
* name

Table Component
* component_id KEY
* description
* collection_id # from Collection

Table: Item
* item_id
* component_id #From Component
* title # Can include subtitle or “Tagline” follow convention: separate title from sub-title space colon space. e.g. The Life Cycle of Oil Spills : What Happens 15 Years After the Clean Up?”
Constrain subtitles, more info goes into “description”
* creator # maps to author, composer, artist
* subject: # CONTROLLED LIST, multiple words-phrases allowed
* description # obvious
* publisher # obvious
* contributor # obvious
* date # obvious
* type # Mime Type CONTROLLED LIST.
* identifier: # dunno what this maps to: TO DO find out what values are appropriate for “Identifier” “ma-identifier” A URI idenfies an entity; which can be either a “Resource” (abstract concept) or a “Representation” (instance/file). See 4.4 Annotating Media Fragments” What does this mean?
* source # Obvious
* Location # obvious
* language # Obvious
* relation # use the MA standard for this: A pair identifying the resource and the nature of the realtionship. E.g. transcript : URL to audio.
* coverage # CONTROLLED LIST OF GEO AREAS e.g. Kauai, Mauritius, Sri Lanka, India, USA-West, USA-East, UK etc.
* IsFormatOf ?? I think this must key into another table.

* content: text where we choose to keep it in the dBase
* file_size: in bytes. Where required.

Additions from the Media Annotation initiative (replaces DMCI “format” and “rights” with more specific fields

* keywords: CONTROLLED LIST
* genre for music, art
* FrameSize: in pixels; use for image width and height, or for movies or multi-media
* Copyright: This plus policy, take the place of generic “Rights”
* Policy: Second half of “rights” when needed.
* Compression: mimetype plus codec (see RFC 4281)
* Locator: URL that points to the content.
* Audience: can use as a flag for whether a resource is internal or available for public distribution. CONTROLLED LIST