JLIFF: Where we are, and where we're going

Chase Tingley
Spartan Software, Inc
JLIFF: Where we are,
where we're going

https://www.flickr.com/photos/anirvan/12002658/
The Story So Far

XLIFF has always been XML-based
● No abstraction independent of the syntax
● Limits our ability to reason about the
format
● Limits its adoption and (possibly) longevity

The OASIS XLIFF
Object
Model
Other
Serializations
Technical Committee
The XLIFF-OMOS TC
&
“Since December 2015”

● Develop an Object Model (OM)
● Develop non-XML representations of the OM
● Also in scope:
○ Mappings between different *LIFFs
○ Future versions of TMX
○ APIs related to XLIFF or related standard data
exchange
XLIFF-OMOS Goals

Brief Digression
The XLIFF Object Model
http://blog.blprnt.com/blog/blprnt/data-in-an-alien-context-kepler-visualization-source-code

● Define “LIFF” independent of representation
● Streamline introduction to XLIFF/JLIFF/etc
● Standardize terminology of language
interchange concepts
OM Goals

● Working draft of prose spec
● UML of XLIFF core
● Work ongoing at https://github.com/oasis-
tcs/xliff-omos-om
OM Status

● Ubiquitous in web service implementations
● Syntactically simple
● Widely understood
● Good tooling
Why JSON?

● Webservices
● Online translation environments
● Data storage in JSON-based stores
Use Cases

● Proof of concept for OM idea
● Improve availability of support for LIOM-
compatible implementations in various
development scenarios
● Data should be interchangeable without loss
between JLIFF and XLIFF!
Goals for JLIFF

● Simple
● Reasonably concise
● Widely supported
● Good Unicode support
JSON: Strengths and Limitations
● Incompletely specified
(Nicolas Seriot,
“Parsing JSON is a
Minefield”)
● Lack of attributes
● Difficult to express
some XML concepts

● JSON-Schema
○ Data typing, validation
● JSON-LD
○ Namespacing
Complementary Technologies

XLIFF always works at the file level
<xliff>
<file id="f1">
...
</file>
<file id="f2">
...
</file>
</xliff>

JLIFF is attempting a more flexible
approach
{
"files": [...]
}
{
"groups": [...]
}
{
"fragment": {...}
}
* These representations
may change
** What are the
semantics of converting
this to XLIFF?

"unit": {
"type": "object",
"properties": {
"id": { "type": "string" },
"name": { "type": "string" },
"canResegment": { "type": "boolean", "default": "false" },
"translate": { "type": "boolean", "default": "false" },
"srcDir": { "$ref": "#/definitions/dir" },
"trgDir": { "$ref": "#/definitions/dir" },
"type": { "$ref": "#/definitions/type" },
"notes": { "$ref": "#/definitions/notes" },
"subunits": { "$ref": "#/definitions/subunits" },
"originalData": { "$ref": "#/definitions/originalData" },
},
"additionalProperties": false,
"required": [ "id" ]
},
Schema work has been straightforward

<unit id="1">
<gls:glossary>
<gls:glossEntry ref="#m1">
<gls:term source="publicTermbase">TAB key</gls:term>
<gls:translation id="1"
source="myTermbase">Tabstopptaste</gls:translation>
<gls:translation ref="#m2" source="myTermbase">TAB-
TASTE</gls:translation>
<gls:definition source="publicTermbase">A keyboard key that is
traditionally used to insert tab characters into a document.
</gls:definition>
</gls:glossEntry>
</gls:glossary>
<segment>
<source>Press the <mrk id="m1" type="term">TAB key</mrk>.</source>
<target>Drücken Sie die <mrk id="m2" type="term">TAB-TASTE</mrk>.
</target>
</segment>
</unit>
What about Namespaces?
urn:oasis:names:tc:xliff:glossary:2.0 glossary
urn:oasis:names:tc:xliff:document:2.0 segment

Namespacing via JSON-LD
{
"@context": {
"gls": "urn:oasis:names:tc:xliff:glossary:2.0:glossary",
}
}
Defined at http://docs.oasis-open.org/xliff-omos/jliff/v2.1/jliff-v2.1.jsonld:
{
"@context": "http://docs.oasis-open.org/xliff-omos/jliff/v2.1/jliff-v2.1.jsonld",
....
"gls:glossary": [
{
"gls:definition": { ... }
}
]
}
Use of a colon is a minor inconvenience for some
languages (Javascript)

● Source, Target data are flat lists of objects
○ snippet of text
○ marker
○ inline code
● Segments, Ignorables collected as
"subunits" array
● Default property values in JSON-schema
make things more concise
XLIFF 2.x Unit Model

{
"@context":"http://docs.oasis-open.org/xliff-omos/jliff/v2.1/jliff-v2.1.jsonld",
"jliff": "2.1",
"srcLang": "en",
"trgLang": "de",
"units": [ {
"id": "u1",
"subunits": [
{
"source": [
{ "text": "Press the " },
{ "kind": "sm", "id": "m1", "type": "term" },
{ "text": "TAB key" },
{ "kind": "em", "startRef": "m1" },
{ "text": "." }
],
"target": [ ... ]
},
{ "type": "ignorable", "source": "<br/>" }
],
A real example, part 1
Markers
(These may change)
Always preserve space

"gls:glossary": [ {
"ref": "m1",
"term": { "text": "TAB key", "source": "publicTermbase" },
"translations": [ {
"id": "1",
"source": "myTermbase",
"text": "Tabstopptaste"
}
],
"definition": {
"text": "A keyboard key that is traditionally used to
insert tab characters into a document.",
"source": "publicTermbase"
}
} ]
}
]
}
A real example, part 2

This looks... fine to a machine
{"@context":"http://docs.oasis-open.org/xliff-
omos/jliff/v2.1/jliff-v2.1.jsonld",
"jliff":"2.1","srcLang":"en",
"trgLang":"de","units":[{"id":"u1","subunits":[{"source":[
{"text":"Press the "},{"id":"m1","kind":"sm","type":
"term"},{"text":"TAB key"},{"startRef":"m1","kind":"em"},
{"text":"."}],"target":[{"text":"Drücken Sie die "},
{"id":"m2","kind":"sm","type":"term"},{"text": "TAB-
TASTE"},{"startRef":"m1","kind":"em"},{"text":"."}]},{"typ
e":"ignorable","source":"<br/>"}],"gls:glossary":
[{"ref":"m1","term":{"text":"TAB key","source":
"publicTermbase"},"translations":[{"id":"1","source":
"myTermbase","text":"Tabstopptaste"}], "definition":
{"text":"A keyboard key that is traditionally used to
insert tab characters into a document.",
"source":"publicTermbase"}}]}]}

Obstacles
https://www.flickr.com/photos/fernando/2620041065

<mda:metadata>
<mda:metaGroup category="document_xml_attribute">
<mda:meta type="version">3</mda:meta>
<mda:meta type="phase">draft</mda:meta>
</mda:metaGroup>
</mda:metadata>
Obstacle: XML Haunts Us
It's just a bunch of
key/value pairs!

"mda:metadata": [
{
"mda:category": "document_xml_attribute",
"mda:meta": {
"mda:version": 3,
"mda:phase": "draft"
}
}
]
First Attempt at JLIFF mda
Simple list of
metadata groups
Use object
properties for
keys/values

"mda:metadata": {
"id": "optional-id",
"mda:metaGroups": [
{
"mda:category": "document_xml_attribute",
"mda:meta": [
{ "type": "version", "value": 3 },
{ "type": "phase", "value": "draft" }
]
}
]
}
Second Attempt at JLIFF mda
Additional structure
necessary
XML smell

Unfortunately, that doesn't work
Duplicate type keys are
possible
metadata must be an
object to hold this property

"Writers that do not support a given custom
namespace based user extension SHOULD
preserve that extension without Modification."
Obstacle: Cross-format Extension Handling
This works across
XLIFF/JLIFF
conversion, right?

<foo:data id="id123"
xmlns:foo="urn:foo:bar">
<foo:value>50</foo:value>
</foo:data>
Imagine a Custom XLIFF extension
This is easy to pass through in JLIFF! I
can just use a local @Context declaration
for the namespace, and then....
Oh.
Oh dear.

The Revenge of Untyped Values
<foo:data id="id123"
xmlns:foo="urn:foo:bar">
<foo:value>50</foo:value>
</foo:data>
"value" : 50
"value" : "50"
What is the data type of this value?
Is the appropriate JLIFF representation
or

Looking Ahead
https://www.flickr.com/photos/pmillera4/13570027834/

● Finish the schema! (Modules, modules)
● Write the doc!
● Implementations! (Okapi, ???)
● (Finish OM)
● (Look at other formats)
Goals for 2018

JLIFF work on GitHub:
https://github.com/oasis-tcs/xliff-omos-jliff
OM work on GitHub:
https://github.com/oasis-tcs/xliff-omos-om
Where to learn more

Discussion!
Questions?
chase@spartansoftwareinc.com
@ctatwork

JLIFF: Where we are, and where we're going

JLIFF: Where we are, and where we're going

More Related Content

What's hot

What's hot (19)

Similar to JLIFF: Where we are, and where we're going

Similar to JLIFF: Where we are, and where we're going (20)

Recently uploaded

Recently uploaded (20)

JLIFF: Where we are, and where we're going