1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
|
=encoding UTF-8
=head1 NAME
po4a - framework to translate documentation and other materials
=head1 Introduction
po4a (PO for anything) eases the maintenance of documentation translation using
the classical gettext tools. The main feature of po4a is that it decouples the
translation of content from its document structure.
This document serves as an introduction to the po4a project with a focus on
potential users considering whether to use this tool and on the curious wanting
to understand why things are the way they are.
=head1 Why po4a?
The philosophy of Free Software is to make the technology truly available to
everyone. But licensing is not the only consideration: untranslated free
software is useless for non-English speakers. Therefore, we still have some
work to do to make software available to everybody.
This situation is well understood by most projects and everybody is now
convinced of the necessity to translate everything. Yet, the actual
translations represent a huge effort of many individuals, crippled by small
technical difficulties.
Thankfully, Open Source software is actually very well translated using the
gettext tool suite. These tools are used to extract the strings to translate
from a program and present the strings to translate in a standardized format
(called PO files, or translation catalogs). A whole ecosystem of tools has
emerged to help the translators actually translate these PO files. The result
is then used by gettext at run time to display translated messages to the end
users.
Regarding documentation, however, the situation still somewhat disappointing.
At first translating documentation may seem to be easier than translating a
program as it would seem that you just have to copy the documentation source
file and start translating the content. However, when the original
documentation is modified, keeping track of the modifications quickly turns
into a nightmare for the translators. If done manually, this task is unpleasant
and error-prone.
Outdated translations are often worse than no translation at all. End-users can
be tricked by documentation describing an old behavior of the program.
Furthermore, they cannot interact directly with the maintainers since they
don't speak English. Additionally, the maintainer cannot fix the problem as
they don't know every language in which their documentation is translated.
These difficulties, often caused by poor tooling, can undermine the motivation
of volunteer translators, further aggravating the problem.
B<The goal of the po4a project is to ease the work of documentation translators>.
In particular, it makes documentation translations I<maintainable>.
The idea is to reuse and adapt the gettext approach to this field. As with
gettext, texts are extracted from their original locations and presented to
translators as PO translation catalogs. The translators can leverage the
classical gettext tools to monitor the work to do, collaborate and organize as
teams. po4a then injects the translations directly into the documentation
structure to produce translated source files that can be processed and
distributed just like the English files. Any paragraph that is not translated
is left in English in the resulting document, ensuring that the end users never
see an outdated translation in the documentation.
This automates most of the grunt work of the translation maintenance.
Discovering the paragraphs needing an update becomes very easy, and the process
is completely automated when elements are reordered without further
modification. Specific verification can also be used to reduce the chance of
formatting errors that would result in a broken document.
Please also see the B<FAQ> below in this document for a more complete list
of the advantages and disadvantages of this approach.
=head2 Supported formats
Currently, this approach has been successfully implemented to several kinds
of text formatting formats:
=over
=cut
# TRANSLATOR: 'man' refers to manpages and should probably not be translated
=item man (mature parser)
The good old manual pages' format, used by so many programs out there. po4a
support is very welcome here since this format is somewhat difficult to use and
not really friendly to newbies.
The L<Locale::Po4a::Man(3pm)|Man> module also supports the mdoc format, used by
the BSD man pages (they are also quite common on Linux).
=item AsciiDoc (mature parser)
This format is a lightweight markup format intended to ease the authoring of
documentation. It is for example used to document the git system. Those
manpages are translated using po4a.
See L<Locale::Po4a::AsciiDoc> for details.
=item pod (mature parser)
This is the Perl Online Documentation format. The language and extensions
themselves are documented using this format in addition to most existing Perl
scripts. It makes easy to keep the documentation close to the actual code by
embedding them both in the same file. It makes programmer's life easier, but
unfortunately, not the translator's, until you use po4a.
See L<Locale::Po4a::Pod> for details.
=item sgml (mature parser)
Even if superseded by XML nowadays, this format is still used for documents
which are more than a few screens long. It can even be used for complete books.
Documents of this length can be very challenging to update. B<diff> often
reveals useless when the original text was re-indented after update.
Fortunately, po4a can help you after that process.
Currently, only DebianDoc and DocBook DTD are supported, but adding support for
a new one is really easy. It is even possible to use po4a on an unknown SGML
DTD without changing the code by providing the needed information on the
command line. See L<Locale::Po4a::Sgml(3pm)> for details.
=item TeX / LaTeX (mature parser)
The LaTeX format is a major documentation format used in the Free Software
world and for publications.
The L<Locale::Po4a::LaTeX(3pm)|LaTeX> module was tested with the Python
documentation, a book and some presentations.
=item text (mature parser)
The Text format is the base format for many formats that include long blocks of
text, including Markdown, fortunes, YAML front matter section, debian/changelog,
and debian/control.
This supports the common format used in Static Site Generators, READMEs, and
other documentation systems. See L<Locale::Po4a::Text(3pm)|Text> for details.
=item xml and XHMTL (probably mature parser)
The XML format is a base format for many documentation formats.
Currently, the DocBook DTD (see L<Locale::Po4a::Docbook(3pm)> for details) and
XHTML are supported by po4a.
=item BibTex (probably mature parser)
The BibTex format is used alongside LaTex for formatting lists of references (bibliographies).
See L<Locale::Po4a::BibTex> for details.
=item Docbook (probably mature parser)
A XML-based markup language that uses semantic tags to describe documents.
See L<Locale::Po4a:Docbook> for greater details.
=item Guide XML (probably mature parser)
A XML documentation format. This module was developed specifically to help with supporting
and maintaining translations of Gentoo Linux documentation up until at least March 2016
(Based on the Wayback Machine). Gentoo have since moved to the DevBook XML format.
See L<Locale::Po4a:Guide> for greater details.
=item Wml (probably mature parser)
The Web Markup Language, do not mixup WML with the WAP stuff used on cell phones.
This module relies on the Xhtml module, which itself relies on the XmL module.
See L<Locale::Po4a::Wml> for greater details.
=item Yaml (probably mature parser)
A strict superset of JSON. YAML is often used as systems or configuration projects.
YAML is at the core of Red Hat's Ansible.
See L<Locale::Po4a::Yaml> for greater details.
=item RubyDoc (probably mature parser)
The Ruby Document (RD) format, originally the default documentation format for Ruby
and Ruby projects before converted to RDoc in 2002. Though apparently the Japanese
version of the Ruby Reference Manual still use RD.
See L<Locale::Po4a::RubyDoc> for greater details.
=item Halibut (probably experimental parser)
A documentation production system, with elements similar to TeX, debiandoc-sgml,
TeXinfo, and others, developed by Simon Tatham, the developer of PuTTY.
See L<Locale::Po4a:Halibut> for greater details.
=item Ini (probably experimental parser)
Configuration file format popularized by MS-DOS.
See L<Locale::Po4a::Ini> for greater details.
=item texinfo (very highly experimental parser)
All of the GNU documentation is written in this format (it's even one of the
requirements to become an official GNU project). The support for
L<Locale::Po4a::Texinfo(3pm)|Texinfo> in po4a is still at the beginning.
Please report bugs and feature requests.
=item gemtext (very highly experimental parser)
The native plain text format of the Gemini protocol. The extension
".gmi" is commonly used. Support for this module in po4a is still in
its infancy. If you find anything, please file a bug or feature
request.
=item Others supported formats
Po4a can also handle some more rare or specialized formats, such as the
documentation of compilation options for the 2.4+ Linux kernels (L<Locale::Po4a::KernelHelp>) or the diagrams
produced by the dia tool (L<Locale::Po4a:Dia>). Adding a new format is often very easy and the main
task is to come up with a parser for your target format. See
L<Locale::Po4a::TransTractor(3pm)> for more information about this.
=item Unsupported formats
Unfortunately, po4a still lacks support for several documentation formats. Many
of them would be easy to support in po4a. This includes formats not just used
for documentation, such as, package descriptions (deb and rpm), package
installation scripts questions, package changelogs, and all the specialized
file formats used by programs such as game scenarios or wine resource files.
=back
=head1 Using po4a
The easiest way to use this tool in your project is to write a configuration file for the B<po4a>
program, and only interact with this program. Please refer to its documentation, in L<po4a(1)>. The
rest of this section provides more details for the advanced users of po4a wanting to deepen their
understanding.
=head2 Detailed schema of the po4a workflow
Make sure to read L<po4a(1)> before this overly detailed section to get a simplified overview of the
po4a workflow. Come back here when you want to get the full scary picture, with almost all details.
In the following schema, F<master.doc> is an example name for the documentation to be translated;
F<XX.doc> is the same document translated in the language XX while F<doc.XX.po> is the translation
catalog for that document in the XX language. Documentation authors will mostly be concerned with
F<master.doc> (which can be a manpage, an XML document, an AsciidDoc file, etc); the translators
will be mostly concerned with the PO file, while the end users will only see the F<XX.doc> file.
Transitions with square brackets such as C<[po4a updates po]> represent the execution of a po4a tool
while transitions with curly brackets such as C<{update of master.doc}> represent a manual
modification of the project's files.
master.doc
|
V
+<-----<----+<-----<-----<--------+------->-------->-------+
: | | :
{translation} | {update of master.doc} :
: | | :
XX.doc | V V
(optional) | master.doc ->-------->------>+
: | (new) |
V V | |
[po4a-gettextize] doc.XX.po -->+ | |
| (old) | | |
| ^ V V |
| | [po4a updates po] |
V | | V
translation.pot ^ V |
| | doc.XX.po |
| | (fuzzy) |
{translation} | | |
| ^ V V
| | {manual editing} |
| | | |
V | V V
doc.XX.po --->---->+<---<-- doc.XX.po addendum master.doc
(initial) (up-to-date) (optional) (up-to-date)
: | | |
: V | |
+----->----->----->------> + | |
| | |
V V V
+------>-----+------<------+
|
V
[po4a updates translations]
|
V
XX.doc
(up-to-date)
Again, this schema is overly complicated. Check on L<po4a(1)> for a simplified overview.
The left part depicts how L<po4a-gettextize(1)> can be used to convert an existing translation
project to the po4a infrastructure. This script takes an original document and its translated
counterpart, and tries to build the corresponding PO file. Such manual conversion is rather
cumbersome (see the L<po4a-gettextize(1)> documentation for more details), but it is only needed
once to convert your existing translations. If you don't have any translation to convert, you can
forget about this and focus on the right part of the schema.
On the top right part, the action of the original author is depicted, updating the documentation.
The middle right part depicts the automatic updates of translation files: the new material is
extracted and compared against the exiting translation. The previous translation is used for the
parts that didn't change, while partially modified parts are connected to the previous translation
with a "fuzzy" marker indicating that the translation must be updated. New or heavily modified
material is left untranslated.
Then, the I<manual editing> block depicts the action of the translators, that
modify the PO files to provide translations to every original string and
paragraph. This can be done using either a specific editor such as the B<GNOME
Translation Editor>, KDE's B<Lokalize> or B<poedit>, or using an online
localization platform such as B<weblate> or B<pootle>. The translation result is
a set of PO files, one per language. Please refer to the gettext documentation
for more details.
The bottom part of the figure shows how B<po4a> creates a translated source document from the
F<master.doc> original document and the F<doc.XX.po> translation catalog that was updated by the
translators. The structure of the document is reused, while the original content is replaced by its
translated counterpart. Optionally, an addendum can be used to add some extra text to the
translation. This is often used to add the name of the translator to the final document. See below
for details.
Upon invocation, B<po4a> updates both the translation files and the translated documentation files
automatically.
=head2 Starting a new translation project
If you start from scratch, you just have to write a configuration file for po4a, and you are set.
The relevant templates are created for the missing files, allowing your contributors to translate
your project to their language. Please refer to L<po4a(1)> for a quick start tutorial and for all
details.
If you have an existing translation, i.e. a documentation file that was translated manually, you can
integrate its content in your po4a workflow using B<po4a-gettextize>. This task is a bit cumbersome
(as described in the tool's manpage), but once your project is converted to po4a workflow,
everything will be updated automatically.
=head2 Updating the translations and documents
Once setup, invoking B<po4a> is enough to update both the translation PO files and translated
documents. You may pass the C<--no-translations> to B<po4a> to not update the translations (thus
only updating the PO files) or C<--no-update> to not update the PO files (thus only updating the
translations). This roughly corresponds to the individual B<po4a-updatepo> and B<po4a-translate>
scripts which are now deprecated (see "Why are the individual scripts deprecated" in the FAQ below).
=head2 Using addenda to add extra text to translations
Adding new text to the translation is probably the only thing that is easier in
the long run when you translate files manually :). This happens when you want
to add an extra section to the translated document, not corresponding to any
content in the original document. The classical use case is to give credits to
the translation team, and to indicate how to report translation-specific
issues.
With po4a, you have to specify B<addendum> files, that can be conceptually
viewed as patches applied to the localized document after processing. Each
addendum must be provided as a separate file, which format is however very
different from the classical patches. The first line is a I<header line>,
defining the insertion point of the addendum (with an unfortunately cryptic
syntax -- see below) while the rest of the file is added verbatim at the
determined position.
The header line must begin with the string B<PO4A-HEADER:>, followed by a
semi-colon separated list of I<key>B<=>I<value> fields.
For example, the following header declares an addendum that must be placed
at the very end of the translation.
PO4A-HEADER: mode=eof
Things are more complex when you want to add your extra content in the middle
of the document. The following header declares an addendum that must be
placed after the XML section containing the string C<About this document> in
translation.
=cut
#TRANSLATORS: you can keep the French example here, or invert it: Use a header in your own language at the beginning, and then say that it shouldn't be in English, translating the second header example to English
=pod
PO4A-HEADER: position=About this document; mode=after; endboundary=</section>
In practice, when trying to apply an addendum, po4a searches for the first line
matching the C<position> argument (this can be a regexp). Do not forget that
po4a considers the B<translated> document here. This documentation is in
English, but your line should probably read as follows if you intend your
addendum to apply to the French translation of the document.
PO4A-HEADER: position=À propos de ce document; mode=after; endboundary=</section>
Once the C<position> is found in the target document, po4a searches for the next
line after the C<position> that matches the provided C<endboundary>. The
addendum is added right B<after> that line (because we provided an
I<endboundary>, i.e. a boundary ending the current section).
The exact same effect could be obtained with the following header, that is equivalent:
PO4A-HEADER: position=About this document; mode=after; beginboundary=<section>
Here, po4a searches for the first line matching C<< <section> >> after the line
matching C<About this document> in the translation, and add the addendum
B<before> that line since we provided a I<beginboundary>, i.e. a boundary marking
the beginning of the next section. So this header line requires placing the
addendum after the section containing C<About this document>, and instruct po4a
that a section starts with a line containing the C<< <section> >> tag. This is
equivalent to the previous example because what you really want is to add this
addendum either after C<< </section> >> or before C<< <section> >>.
You can also set the insertion I<mode> to the value C<before>, with a similar
semantic: combining C<mode=before> with an C<endboundary> will put the addendum
just B<after> the matched boundary, that is the last potential boundary line before
the C<position>. Combining C<mode=before> with an C<beginboundary> will put the
addendum just B<before> the matched boundary, that is the last potential boundary
line before the C<position>.
Mode | Boundary kind | Used boundary | Insertion point compared to the boundary
========|===============|========================|=========================================
'before'| 'endboundary' | last before 'position' | Right after the selected boundary
'before'|'beginboundary'| last before 'position' | Right before the selected boundary
'after' | 'endboundary' | first after 'position' | Right after the selected boundary
'after' |'beginboundary'| first after 'position' | Right before the selected boundary
'eof' | (none) | n/a | End of file
=head3 Hint and tricks about addenda
=over
=item
Remember that these are regexp. For example, if you want to match the end of a
nroff section ending with the line C<.fi>, do not use C<.fi> as B<endboundary>,
because it will match with C<the[ fi]le>, which is obviously not what you
expect. The correct B<endboundary> in that case is: C<^\.fi$>.
=item
White spaces ARE important in the content of the C<position> and boundaries. So
the two following lines B<are different>. The second one will only be found if
there is enough trailing spaces in the translated document.
PO4A-HEADER: position=About this document; mode=after; beginboundary=<section>
PO4A-HEADER: position=About this document ; mode=after; beginboundary=<section>
=item
Although this context search may be considered to operate roughly on each line
of the B<translated> document, it actually operates on the internal data string of
the translated document. This internal data string may be a text spanning a
paragraph containing multiple lines or may be a XML tag itself alone. The
exact I<insertion point> of the addendum must be before or after the internal
data string and can not be within the internal data string.
=item
Pass the C<-vv> argument to B<po4a> to understand how the addenda are added to the
translation. It may also help to run B<po4a> in debug mode to see the actual
internal data string when your addendum does not apply.
=back
=head3 Addenda examples
=over
=item
If you want to add something after the following nroff section:
.SH "AUTHORS"
You should select a two-step approach by setting B<mode=after>. Then you should
narrow down search to the line after B<AUTHORS> with the B<position> argument
regex. Then, you should match the beginning of the next section (i.e.,
B<^\.SH>) with the B<beginboundary> argument regex. That is to say:
PO4A-HEADER:mode=after;position=AUTHORS;beginboundary=\.SH
=item
If you want to add something right after a given line (e.g. after the line
"Copyright Big Dude"), use a B<position> matching this line, B<mode=after> and
give a B<beginboundary> matching any line.
PO4A-HEADER:mode=after;position=Copyright Big Dude, 2004;beginboundary=^
=item
If you want to add something at the end of the document, give a B<position>
matching any line of your document (but only one line. Po4a won't proceed if
it's not unique), and give an B<endboundary> matching nothing. Don't use simple
strings here like B<"EOF">, but prefer those which have less chance to be in
your document.
PO4A-HEADER:mode=after;position=About this document;beginboundary=FakePo4aBoundary
=back
=head3 More detailed example
Original document (POD formatted):
|=head1 NAME
|
|dummy - a dummy program
|
|=head1 AUTHOR
|
|me
Then, the following addendum will ensure that a section (in French) about
the translator is added at the end of the file (in French, "TRADUCTEUR"
means "TRANSLATOR", and "moi" means "me").
|PO4A-HEADER:mode=after;position=AUTEUR;beginboundary=^=head
|
|=head1 TRADUCTEUR
|
|moi
|
To put your addendum before the AUTHOR, use the following header:
PO4A-HEADER:mode=after;position=NOM;beginboundary=^=head1
This works because the next line matching the B<beginboundary> C</^=head1/> after
the section "NAME" (translated to "NOM" in French), is the one declaring the
authors. So, the addendum will be put between both sections. Note that if
another section is added between NAME and AUTHOR sections later, po4a
will wrongfully put the addenda before the new section.
To avoid this you may accomplish the same using B<mode>=I<before>:
PO4A-HEADER:mode=before;position=^=head1 AUTEUR
=head1 How does it work?
This chapter gives you a brief overview of the po4a internals, so that you
may feel more confident to help us to maintain and to improve it. It may also
help you to understand why it does not do what you expected, and how to
solve your problems.
=head2 TransTractors and project architecture
At the core of the po4a project, the
L<Locale::Po4a::TransTractor(3pm)|TransTractor> class is the common ancestor to
all po4a parsers. This strange name comes from the fact that it is at the same
time in charge of translating document and extracting strings.
More formally, it takes a document to translate plus a PO file containing
the translations to use as input while producing two separate outputs:
Another PO file (resulting of the extraction of translatable strings from
the input document), and a translated document (with the same structure as
the input one, but with all translatable strings replaced with content of
the input PO). Here is a graphical representation of this:
Input document --\ /---> Output document
\ TransTractor:: / (translated)
+-->-- parse() --------+
/ \
Input PO --------/ \---> Output PO
(extracted)
This little bone is the core of all the po4a architecture. If you provide both
input and disregard the output PO, you get B<po4a-translate>. If you disregard
the output document instead, you get B<po4a-updatepo>. The B<po4a> uses a first
TransTractor to get an up-to-date output POT file (disregarding the output
documents), calls B<msgmerge -U> to update the translation PO files on disk, and
builds a second TransTractor with these updated PO files to update the output
documents. In short, B<po4a> provides one-stop solution to update what needs to
be, using a single configuration file.
B<po4a-gettextize> also uses two TransTractors, but another way: It builds one
TransTractor per language, and then build a new PO file using the msgids of the
original document as msgids, and the msgids of the translated document as
msgstrs. Much care is needed to ensure that the strings matched this way
actually match, as described in L<po4a-gettextize(1)>.
=head2 Format-specific parsers
All po4a format parsers are implemented on top of the TransTractor. Some of them
are very simple, such as the Text, Markdown and AsciiDoc ones. They load the
lines one by one using C<TransTractor::shiftline()>, accumulate the paragraphs'
content or whatever. Once a string is completely parsed, the parser uses
C<TransTractor::translate()> to (1) add this string to the output PO file and
(2) get the translation from the input PO file. The parser then pushes the
result to the output file using C<TransTractor::pushline()>.
Some other parsers are more complex because they rely on an external parser to
analyze the input document. The Xml, HTML, SGML and Pod parsers are built on top
of SAX parsers. They declare callbacks to events such as "I found a new title
which content is the following" to update the output document and output POT
files according to the input content using C<TransTractor::translate()> and
C<TransTractor::pushline()>. The Yaml parser is similar but different: it
serializes a data structure produced by the YAML::Tiny parser. This is why the
Yaml module of po4a fails to declare the reference lines: the location of each
string in the input file is not kept by the parser, so we can only provide
"$filename:1" as a string location. The SAX-oriented parsers use globals and
other tricks to save the file name and line numbers of references.
One specific issue arises from file encodings and BOM markers. Simple parsers
can forget about this issue, that is handled by C<TransTractor::read()> (used
internally to get the lines of an input document), but the modules relying on an
external parser must ensure that all files are read with an appropriate PerlIO
decoding layer. The easiest is to open the file yourself, and provide an
filehandle or directly the full string to your external parser. Check on
C<Pod::read()> and C<Pod::parse()> for an example. The content read by the
TransTractor is ignored, but a fresh filehandle is passed to the external
parser. The important part is the C<< "<:encoding($charset)" >> mode that is passed
to the B<open()> perl function.
=head2 Po objects
The L<Locale::Po4a::Po(3pm)|Po> class is in charge of loading and using PO and
POT files. Basically, you can read a file, add entries, get translations with
the B<gettext()> method, write the PO into a file. More advanced features such as
merging a PO file against a POT file or validating a file are delegated to
B<msgmerge> and B<msgfmt> respectively.
=head2 Contributing to po4a
Even if you have never contributed to any Open Source project in the past, you
are welcome: we are willing to help and mentor you here. po4a is best maintained
by its users nowadays. As we lack manpower, we try to make the project welcoming
by improving the doc and the automatic tests to make you confident in
contributing to the project. Please refer to the CONTRIBUTING.md file for more
details.
=head1 Open-source projects using po4a
Here is a very partial list of projects that use po4a in production for their
documentation. If you want to add your project to the list, just drop
us an email (or a Merge Request).
=over
=item
adduser (man): users and groups management tool.
=item
apt (man, docbook): Debian package manager.
=item
aptitude (docbook, svg): terminal-based package manager for Debian
=item
L<F-Droid website|https://gitlab.com/fdroid/fdroid-website> (markdown):
installable catalog of FOSS (Free and Open Source Software) applications for the
Android platform.
=item
L<git|https://github.com/jnavila/git-manpages-l10n> (asciidoc):
distributed version-control system for tracking changes in source code.
=item
L<Linux manpages|https://salsa.debian.org/manpages-l10n-team/manpages-l10n> (man)
This project provides an infrastructure for translating many manpages
to different languages, ready for integration into several major
distributions (Arch Linux, Debian and derivatives, Fedora).
=item
L<Stellarium|https://github.com/Stellarium/stellarium> (HTML):
a free open source planetarium for your computer. po4a is used to
translate the sky culture descriptions.
=item
L<Jamulus|https://jamulus.io/> (markdown, yaml, HTML):
a FOSS application for online jamming in real time. The website
documentation is maintained in multiple languages using po4a.
=item
Other item to sort out:
L<https://gitlab.com/fdroid/fdroid-website/>
L<https://github.com/fsfe/reuse-docs/pull/61>
=back
=head1 FAQ
=head2 How do you pronounce po4a?
I personally vocalize it as L<pouah|https://en.wiktionary.org/wiki/pouah>, which
is a French onomatopoetic that we use in place of yuck :) I may have a strange sense of humor :)
=head2 Why are the individual scripts deprecated?
Indeed, B<po4a-updatepo> and B<po4a-translate> are deprecated in favor of B<po4a>. The reason is
that while B<po4a> can be used as a drop-in replacement to these scripts, there is quite a lot of
code duplication here. Individual scripts last around 150 lines of codes while the B<po4a> program
lasts 1200 lines, so they do a lot in addition of the common internals. The code duplication results
in bugs occuring in both versions and needing two fixes. One example of such duplication are the
bugs #1022216 in Debian and the issue #442 in GitHub that had the exact same fix, but one in B<po4a>
and the other B<po4a-updatepo>.
In the long run, I would like to drop the individual scripts and only maintain one version of this
code. The sure thing is that the individual scripts will not get improved anymore, so only B<po4a>
will get the new features. That being said, there is no deprecation urgency. I plan to keep the
individual scripts as long as possible, and at least until 2030. If your project still use
B<po4a-updatepo> and B<po4a-translate> in 2030, you may have a problem.
We may also remove the deprecation of these scripts at some point, if a refactoring reduces the code
duplication to zero. If you have an idea (or better: a patch), your help is welcome.
=head2 What about the other translation tools for documentation using gettext?
There are a few of them. Here is a possibly incomplete list, and more tools are
coming at the horizon.
=over
=item B<poxml>
This is the tool developed by KDE people to handle DocBook XML. AFAIK, it
was the first program to extract strings to translate from documentation to
PO files, and inject them back after translation.
It can only handle XML, and only a particular DTD. I'm quite unhappy with
the handling of lists, which end in one big msgid. When the list become big,
the chunk becomes harder to swallow.
=item B<po-debiandoc>
This program done by Denis Barbier is a sort of precursor of the po4a SGML
module, which more or less deprecates it. As the name says, it handles only
the DebianDoc DTD, which is more or less a deprecated DTD.
=item B<xml2po.py>
Used by the GIMP Documentation Team since 2004, works quite well even if,
as the name suggests, only with XML files and needs specially configured
makefiles.
=item B<Sphinx>
The Sphinx Documentation Project also uses gettext extensively to manage its
translations. Unfortunately, it works only for a few text formats, rest and
markdown, although it is perhaps the only tool that does this managing the whole
translation process.
=back
The main advantages of po4a over them are the ease of extra content addition
(which is even worse there) and the ability to achieve gettextization.
=head2 SUMMARY of the advantages of the gettext based approach
=over 2
=item
The translations are not stored along with the original, which makes it
possible to detect if translations become out of date.
=item
The translations are stored in separate files from each other, which prevents
translators of different languages from interfering, both when submitting
their patch and at the file encoding level.
=item
It is based internally on B<gettext> (but B<po4a> offers a very simple
interface so that you don't need to understand the internals to use it).
That way, we don't have to re-implement the wheel, and because of their
wide use, we can think that these tools are more or less bug free.
=item
Nothing changed for the end-user (beside the fact translations will
hopefully be better maintained). The resulting documentation file
distributed is exactly the same.
=item
No need for translators to learn a new file syntax and their favorite PO
file editor (like Emacs' PO mode, Lokalize or Gtranslator) will work just fine.
=item
gettext offers a simple way to get statistics about what is done, what should
be reviewed and updated, and what is still to do. Some example can be found
at those addresses:
- https://docs.kde.org/stable5/en/kdesdk/lokalize/project-view.html
- http://www.debian.org/intl/l10n/
=back
But everything isn't green, and this approach also has some disadvantages
we have to deal with.
=over 2
=item
Addenda are somewhat strange at the first glance.
=item
You can't adapt the translated text to your preferences, like splitting a
paragraph here, and joining two other ones there. But in some sense, if
there is an issue with the original, it should be reported as a bug anyway.
=item
Even with an easy interface, it remains a new tool people have to learn.
One of my dreams would be to integrate somehow po4a to Gtranslator or
Lokalize. When a documentation file is opened, the strings are
automatically extracted, and a translated file + po file can be
written to disk. If we manage to do an MS Word (TM) module (or at
least RTF) professional translators may even use it.
=back
=head1 SEE ALSO
=over
=item
The documentation of the all-in-one tool that you should use: L<po4a(1)>.
=item
The documentation of the individual po4a scripts: L<po4a-gettextize(1)>,
L<po4a-updatepo(1)>, L<po4a-translate(1)>, L<po4a-normalize(1)>.
=item
The additional helping scripts:
L<msguntypot(1)>, L<po4a-display-man(1)>, L<po4a-display-pod(1)>.
=item
The parsers of each formats, in particular to see the options accepted by each
of them: L<Locale::Po4a::AsciiDoc(3pm)> L<Locale::Po4a::Dia(3pm)>,
L<Locale::Po4a::Guide(3pm)>, L<Locale::Po4a::Ini(3pm)>,
L<Locale::Po4a::KernelHelp(3pm)>, L<Locale::Po4a::Man(3pm)>,
L<Locale::Po4a::RubyDoc(3pm)>, L<Locale::Po4a::Texinfo(3pm)>,
L<Locale::Po4a::Text(3pm)>, L<Locale::Po4a::Xhtml(3pm)>,
L<Locale::Po4a::Yaml(3pm)>, L<Locale::Po4a::BibTeX(3pm)>,
L<Locale::Po4a::Docbook(3pm)>, L<Locale::Po4a::Halibut(3pm)>,
L<Locale::Po4a::LaTeX(3pm)>, L<Locale::Po4a::Pod(3pm)>,
L<Locale::Po4a::Sgml(3pm)>, L<Locale::Po4a::TeX(3pm)>,
L<Locale::Po4a::Wml(3pm)>, L<Locale::Po4a::Xml(3pm)>.
=item
The implementation of the core infrastructure:
L<Locale::Po4a::TransTractor(3pm)> (particularly important to understand the
code organization), L<Locale::Po4a::Chooser(3pm)>, L<Locale::Po4a::Po(3pm)>,
L<Locale::Po4a::Common(3pm)>. Please also check the F<CONTRIBUTING.md> file in
the source tree.
=back
=head1 AUTHORS
Denis Barbier <barbier,linuxfr.org>
Martin Quinson (mquinson#debian.org)
=cut
LocalWords: PO gettext SGML XML texinfo perl gettextize fr Lokalize KDE updatepo
LocalWords: Gtranslator gettextization VCS regexp boundary
LocalWords: lang TransTractor debconf diff poxml debiandoc LocalWords
LocalWords: Denis barbier linuxfr org Quinson
|