The metadata spreadsheet is a list of all items that were added to the SUCHO collection in the Internet Archive before we had a clearly laid out metadata template. The headings of this spreadsheet indicate the metadata we would ideally like for each SUCHO item. While we realize that filling in each cell will not always be possible, the columns that are mandatory for IA are indicated with an * in the header.
Add your name/initials to a selection of rows you’d like to work on in Column B claimed by and use the Column A Status dropdown to indicate they are In Progress
Click on the URL in Column D identifier and open the link. Examine the item. It may be helpful to look at the links after 'scraped from' and 'linked from' in Column O description to gain additional context and information about the item. These links are also found in the description on the IA page for the item.
- Sometimes an item will display with the screen "There is no preview available for this item." In these cases, check the file types listed in the download options.
- If the only options are HTML and/or TORRENT, mark the row as "Remove from IA" in Column A and in Column M comment "This is a [webpage/torrent] file.
- If ZIP is an option (meaning the item is a zip folder), describe the folder as a unit. For example, the CFP and the program for the same conference may have been zipped together.
Below is an example of what the first case often looks like:
Occasionally, the wrong image may have been attached to the scraped metadata in the description Column O (See rows 2169-2180 where the same image has been attached to the all 11 records). This will become apparent if the image linked to in the URL after "image linked from" does not match the image displayed on IA. In those instances:
- Mark the row as "Needs Help" in Column A, and in Column M comment "incorrect image, needs to be replaced." Anna or Alex will be able to correct this in the IA collection.
- Fill out the metadata as if you were describing the correct image.
Often there are images that are logos from webpages or advertisements. Use your best judgment here: these are not part of the goal of saving Cultural Heritage Collections. If you do not feel that your time is best spent locating information about these, simply mark them as Low Priority in the Column A Status and leave the rest of the metadata as is. Note: Institutional logos are NOT low priority images (the web designers’ generic stamps are). See these examples:
Determine, as best you can, what the proper name of the resource should be (in its original language wherever possible) and edit the data in Column C title.
- When images are scraped from a website, it is useful to indicate if the image is a thumbnail (a smaller version of a full digital image). If describing a thumbnail, please add [thumbnail] after the title. Here is an example of a thumbnail vs. the full image:
- Thumbnail on this page
- Thumbnails that are so small that they cannot be easily seen should be categorized as low priority images (see Step 2) Example
- Full image of the same work on this page
For items that do not have a link pointing back to the original item at the host institution - check the (insert link to updated main SUCHO spreadsheet) in the 'done' tab to see if you can find the collection this came from and go back to the item record.
- If you are unsure where to locate information about the images, see our FAQ section
- Website down: If the source URL or domain for the item is down, use the Wayback Machine to check if the URL was archived in the Wayback Machine.
- Not actively linked: If a document is not actively linked to on a host-institution's website, but the host institution is readily identifiable
- Find the host-institution website on the WayBack Machine, and see if an older version of the website contains the page where the document was found. If you find it, use the WayBack Machine URL as the source URL Column J
- If no older version of the site exists, give the domain of the host institution as the source URL Column J and put "Source URL to domain; object no longer actively linked to on website" in the comments Column M
- Example: see row 4396 (Title="Положення про тендерний комітет")
- "Data rescue", "scraped from", and/or "image linked from" domains different: In the description Column O, often there are often three URL: "data rescue copy of", "scraped from" and "image linked from". Normally, all three are from the same or similar domains (e.g., all are "lib.kherson.ua"; or they're "artkavun.kherson.ua" and "lib.kherson.ua"). However, sometimes they're different. In those cases do the following, and continue through the rest of these directions for filling out the metadata:
- If the domains for "scraped from" and "image linked from" match (or are similar), but "data rescue copy of" does not, still fill out metadata. The "imaged linked from" is likely to be the source URL Column J, and the host institution Column K is the institution of "image linked from"
- If the domains for "scraped from" and "image linked from" don't match, use "image linked from" as the source URL Column J and the institution Column K of that source URL as the host institution.
Wherever possible, fill in Columns F to O, using the details in the metadata template. Please keep the following in mind:
- Subject headings should be selected from Column A in the Subject Headings List tab. If you need a term that isn’t currently available there, please see our FAQs for details on how to add one.
- If the item has subject headings (SH) in Ukrainian, please add or move that to Column F original_subject_heading.
- If the item has a description in Ukrainian, please add or move that to Column N original_description.
- Host Institution (Column K): please use the transliterated version of the institution’s name (eg. Khersons′ka oblasna universal′na naukova biblioteka imeni Olesi͡a Honchara). For transliterating, see the FAQs.
- Host institution and creator are not the same thing. Host institution is a required field and usually the owner of the website is the host institution (e. g. the library, the archive, the theater, etc.) The creator is the creator of the actual item (the photographer, the author, the painter, etc.)
- Host location (Column L) should be at the city level. See instructions in the metadata template for details on how to fill out this column. Refer to WikiData for location names.
- Consult the FAQs!
- If you have questions or confusion over the Ukrainian text, please use the Status dropdown to mark the row as Translation and ping the folks in the #translation channel in Slack.
- If you can’t figure out the metadata for any item, or you have questions about what has already been entered into IA, change the Column A Status dropdown to Needs Help, and add some comments in Column M comments. Someone will come along and try to help/answer questions
Once the entire row is complete, change the Status to Done.