Data Provenance
A problem that is constantly faced by organizations that deal with vast amounts of data is handling how to notify consumers when their data set changes. We are working defining an XML format that can be used to syndicate changes to consumers. This format would ideally be included in an Atom or XML feed to which consumers can subscribe.
<dataset id="sunlight:earmarks:recipients"
xmlns="http://sunlightlabs.com/cda/provenance">
<insert>
<key name="id" type="integer">00001</key>
<element name="name" type="string">County Hosp. Corp.</element>
<element name="description">fix description later</element>
</insert>
<update>
<key name="id" type="integer">00001</key>
<element name="name" type="string">County Hospital Corp.</element>
<element name="description" />
</update>
<delete>
<key name="id" type="integer">00001</key>
</delete>
</dataset>
The above examples shows updates resulting from the insertion, updating, and deletion of records in the data set. A few elements of note:
- dataset
- One to many dataset elements may be included in the feed. Each dataset should refer to a table or other data structure. The id attribute is a URI that is used as a unique identifier for the data structure.
- key
- The key element references the unique identifier for each data entry. For databases this would be the primary key of the table. Multiple key elements may be used for compound keys.
- element
- Represents an individual piece of data within the set. The type attribute indicates the type of the data. We do not currently have standard for this field.
- insert, update, delete
- Self descriptive data operations.