BIBFRAME Scriptshifter Utility Enhancement

Location:	District of Columbia
Posted:	Jun 17, 2025
Agency:	LIBRARY OF CONGRESS
Type of Contract:	Awards
Type of Government:	Federal
Category:	23 - Ground Effect Vehicles, Motor Vehicles, Trailers, and Cycles
Solicitation No:	2025-LGC-0006
Publication URL:	To access bid details, please log in.

Active

Contract Opportunity

Notice ID

2025-LGC-0006

Related Notice

Contract Line Item Number

Department/Ind. Agency

LIBRARY OF CONGRESS

Sub-tier

LIBRARY OF CONGRESS

Office

CONTRACTS SERVICES

Award Details

Contract Award Date: Jun 13, 2025
Contract Award Number: LCLGD25P0012
Task/Delivery Order Number:

Contractor Awarded Unique Entity ID: GB9ARM36YKG5

Contractor Awarded Name: Stefano Cossu
Contractor Awarded Address: Philadelphia , PA 19143 USA
Base and All Options Value (Total Contract Value): $112,000.00

General Information

Contract Opportunity Type: Award Notice (Original)
Original Published Date: Jun 17, 2025 01:26 pm EDT
Inactive Policy: 15 days after contract award date
Original Inactive Date: Jun 28, 2025
Initiative:
- None

Classification

Original Set Aside: No Set aside used
Product Service Code:
NAICS Code:
Place of Performance:
Washington , DC

USA

Description

The Library of Congress (LC) collects information resources from around the world and makes them known and available to potential users via bibliographic records that describe the resources. The bibliographic descriptions contain typical bibliographic data such as titles, names of creators and other associated entities, publication information such as publisher and place of publication, and other information. This information is presented in the publications in various scripts, including many non-Latin scripts such as Chinese, Korean, Cyrillic, Arabic, Greek, Hebrew, and over 30 additional scripts. The Library of Congress records descriptive information in the original script of the item being cataloged (technology allowing) but needs to transliterate certain data elements of the description into the Latin script for various processing components and in some cases to assist end users. When technology does not support non-Latin scripts, LC staff must manually transliterate into the Latin script more of the descriptive information.

The Library of Congress maintains transliteration tables for over 75 languages and scripts, ALA-LC Romanization Tables: Transliteration Schemes for Non-Roman Scripts and makes them available on its web site: http://www.loc.gov/catdir/cpso/roman.html. These tables are jointly maintained by LC and the American Library Association (ALA). These tables are used by United States libraries and many libraries outside the United States. The Library of Congress catalog contains several million bibliographic records for resources in non-Latin scripts and collects over 75,000 additional non-Latin resources each year. The Library of Congress requires a utility that can transliterate between non-Latin and Latin scripts using the transliteration tables approved by LC and ALA.

A utility, called Scriptshifter, has been developed for transliteration of 20+ scripts in the Balkan/Caucasian, Slavic, Turkic, and Chinese script families, and for Korean, Greek, Arabic, and Hebrew. Scriptshifter needs to be continually enhanced to incorporate additional scripts and improve the tool for very complex scripts like the Arabic, Southeast Asian, and several Asian scripts like Japanese in which the Library receives resources. Continual updating of the software framework to improve efficiency is necessary as the technical possibilities change and the transliteration tables change.

The Contractor shall design, code, test, and document the additions to the Scriptshifter transliteration utility capability for non-Latin data into the Latin alphabet according to the ALA-LC Romanization Tables, and where possible the conversion of Latin script transliteration to non-Latin script. The utility will focus on research and improvement of Indic and related languages such as Devanagari and Brahmi scripts, Southeast Asian scripts such as Thai, Laotian, Khmer, Burmese, Tibetan, and Arabic scripts such as Kurdish, Sindhi, Persian, Pushto, Urdu, and Mophah. In addition, research on developing a Japanese transliteration tool and refinement of other Asian scripts such as Korean and Chinese will be done. More specifically, the contractor shall:

Improve Persian transliteration

Improve Thai, Roman-to-script only

Improve some inaccurate South Asian languages handled by current software

Implement Lao

Improve Tibetan: fix tables in current software or create a new one managed by SS

Improve overall testing

Research existing tools for transliterating Japanese and implement a S2r-only transliteration, if an appropriate tool is identified

Research a better machine learning model for Arabic scripts, possibly usable with multiple languages

Improve some inaccurate South Asian languages handled by current software

Implement a better method to separate words

Research possible solutions on other Arabic scripts with less available data (Pushto, Urdu, etc.)

The utility must also carry out reverse transliteration, converting Latin transliterated strings into non-Latin strings, where feasible. The utility must remain adjustable as transliterations change.

The contractor will review the software utility as a whole and make general improvements to the framework. One area of focus will be the Aksharamukha tool that has been incorporated for Asian scripts. Work will be done related to authentication and external use of the tool.

Attachments/Links

Contact Information

Contracting Office Address

101 Independence Ave SE LA 325
Washington , DC 20540
USA

Primary Point of Contact

Betsy Lewis-Matsuoka
bmatsuoka@loc.gov
Phone Number 2027070170

Secondary Point of Contact

History

Jun 17, 2025 01:26 pm EDTAward Notice (Original)
Jun 06, 2025 10:49 am EDT Special Notice (Original)