Edit Links |
GSoC /
LyxToWordConversionCategories: GSOC << | Page list | >>This page contains information about the Roundtrip Conversion from LyX to ODT GSoC 2014 project. Roundtrip Conversion from LyX to ODT Table of contents (hide)
1. About the ProjectThe overall objective of the project is to develop a framework which enables Round trip conversion from LyX to ODT. Unlike normal export,in this case the same LyX file should result if the exported ODT file is imported back into LyX. First goal of the project would be to make one click export to ODT up and running. Framework should be planned in such a way that it is backend independent so that the same framework with appropriate changes can be used after deciding the feature set of the some other conversion. Metadata needs to be carried around to facilitate the Round trip conversion. Final outcome of the project will be a robust conversion tool from LyX to ODT and a prototype of the reverse conversion tool. 2. Developers Involved in this ProjectStudent participant: Prannoy Pilligundla Mentors : Stefano Franchi, Richard Heck, Cyrille Artho, Vincent van Ravesteijn 3. ImplementationAfter analysing and exploring many approaches,going ahead with TeX4ht for the conversion seems the best way to go. TeX4ht doesn’t work on Lyx files directly so Lyx to LaTeX export is essential in this approach. First step would be to fix all the issues with TeX4ht in the conversion of feature set containing sections, headers, lists, comments, tracked changes, math, bibliography, tables, figures, cross references, footnotes, emphasis etc. This will involve editing some sources written in special literal programming format and inserting configurable hooks into the tex4ht-ooffice.tex which inturn would involve some command redefinitions in *.4ht files. Then after tuning the export to ODT as per the requirements next thing to do would be to replace the last step of TeX4ht with a Lua module in LuaTeX which relies on the same *.4ht files. In the last step TeX4ht processes the DVI file which has special instructions to complete the conversion. A package needs to be written in Lua which has full access to the nodes created by LuaTeX and loads tex4ht.sty to have access to the *.4ht files. LuaTeX gives access to the internals of TeX Engine which were previously inaccessible to programmers. Using the power of LuaTeX special instructions of TeX4ht need to be tapped before they are written into the DVI file and then handled appropriately to complete the conversion. 4. ScheduleTill 18 May: Build TeX4ht from source and make changes in the makefile to fix issues(if any) in building from source.Dig deep into the working of TeX4ht and understand how it works. Write some basic configuration files to get into the flow of things. Get familiar with LuaTeX and explore Lua based approach of tapping special instructions before writing into the DVI file
2 June(First Subgoal): Tex4ht successfully converts LaTeX to ODT for a limited feature set of Standard Paragraphs,Sections,Emphasis,Comments,Lists and Greyed out notes 5. ScopeThough this project focusses on conversion of Lyx to ODT,it may still attract attention of users who want to convert Lyx to .docx. When generated ODT is converted to .docx using LibreOffice and if the .docx file is opened using MS Word,the results are pretty good for most of the things but there are some issues with tables. In case of tables all the information is properly formatted but there were no lines drawn between rows and columns. As expected everything works fine when the .docx file is opened with LibreOffice itself. Similar issues arise when the ODT file is directly opened with MS Word. These tests are not accurate for Math because translation of formulas from LaTeX to ODT is not yet perfect so nothing can be commented on ODT to .docx conversion of Math as of now. 6. Progress6.1 LyX to ODT
6.2 ODT to LyX
7. General Documentation7.1 Compiling tex4ht-ooffice.texAfter making changes in tex4ht-ooffice.4ht if we compile it as "tex tex4ht-ooffice.tex", one of the output files is ooffice.4ht which needs to be kept in the same directory as the tex file to be converted for this to have effect. While development in Linux I wrote a shell script which uses the Makefile to compile tex4ht-ooffice.4ht directly if the argument given is ooffice.4ht. After compilation is done, it moves the ooffice.4ht into the 4htfiles directory and other files generated into appropriate directories 7.2 Configuring Greyedout Text in tex4htTeX4ht was treating lyxgreyedout as normal text, so I just had to style it appropriately. As it is an environment it has default hooks i.e before-env, after-env, before-list, after-list. For lyxgreyedout, before-list and after-list are not of use as lyxgreyedout doesn't have any heading kind of a thing. So, we just need to style two of the existing four hooks i.e before-env and after-env. This is the piece of code I added to tex4ht-ooffice.tex for configuring lyxgreyedout \<ooffice begin-end env\><<< \NewConfigureOO{lyxgreyedout} \ConfigureOO{lyxgreyedout}{% <style:style style:name="First-line-indent" style:family="paragraph" style:parent-style-name="Text-body" style:class="text">\Hnewline <style:paragraph-properties fo:margin-left="0cm" fo:margin-right="0cm" fo:text-indent="0.499cm" style:auto-text-indent="false"/> </style:style>\Hnewline <style:style style:name="greyedoutnote" style:family="text">\Hnewline <style:text-properties fo:color="\#808080"/>\Hnewline </style:style>\Hnewline } \ConfigureEnv{lyxgreyedout}{\HCode{<text:p text:style-name="First-line-indent">}\HCode{<text:span text:style-name="greyedoutnote">}} {\HCode{</text:span>}\HCode{</text:p>\Hnewline}}{}{} >>> <ooffice begin-end env\> is also very important as this instructs how to treat the piece of code under it while compiling tex4ht-ooffice.tex to generate ooffice.4ht. If we write something like <configure ooffice tex4ht\> istead of that things would never work out. \NewConfigureOO{} is used to write something into the styles.xml of the ODT file. So in \ConfigureOO{lyxgreyedout} I created a proper style to be used for lyxgreyedout. I can directly use this created style inside content.xml with text:style-name as "greyedoutnote". Next I used \ConfigureEnv{lyxgreyedout} to configure the environment. Hooks of \ConfigureEnv are as follows \ConfigureEnv{name}{before-env}{after-env}{before-list}{after-list}. As we are using only before-env and after-env I styled lyxgreyedout accordingly. 7.3 Configuring PoemTitle Environment for memoirDifference between configuring lyxgreyedout and PoemTitle is that lyxgreyed was a custom environment defined by us seeing things in the perspective of a tex file. But PoemTitle comes from the memoir class, so we need to insert hooks into it to be able to style it inside tex4ht-ooffice.tex. While handling lyxgreyedout we didn't have to bother about inserting new hooks as it a custom environment in tex's perspective as pointed out earlier. I thank Michal Hoftich for guiding me on how to insert hooks. So the entire credit for this subsection goes to Michal. First we need to find out where memoir class is redefined to insert the hooks, which is the file `tex4ht-4ht.tex` and we should also find memoir source code in `memoir.cls` to see which macros would be best to redefine. In the case of `\PoemTitle`, code in `memoir.cls` is complex, lot of stuff happens because of table of contents, numbering etc. The title itself is printed with macro `@makePoemTitlehead`, also `\printPoemTitlenum` for poem mark could be used for configuration of the poem number. `\afterPoemTitlenum` inserts unwanted paragraph, so we need also deal with it. Now we need to find `memoir` section in file `tex4ht-4ht.tex`. All command redefinitions happens in this file. I added the following code to `memoir cfg` section of tex4ht-4ht.tex \pend:defI\@makePoemTitlehead{\a:PoemTitle} \append:defI\@makePoemTitlehead{\b:PoemTitle} \pend:defI\printPoemTitlenum{\a:PoemTitlenum} \append:defI\printPoemTitlenum{\b:PoemTitlenum} \NewConfigure{PoemTitle}{2} \NewConfigure{PoemTitlenum}{2} \let\afterPoemTitlenum\@empty \let\afterPoemTitle\@empty We define configurable hooks with `\NewConfigure` commands. First parameter is name of the configuration, second is number of hooks which should be created. we need two of them, one for code inserted before command, second is inserted after command. These hooks are named \<a,b,c,...>:ConfigureName, so \a:PoemTitle and \b:PoemTitle were created in our case. In this simple case we can use commands `\pend:defI` and `\append:defI` to preppend and append these hooks to the redefined command. We could also used `\def\@makePoemTitlehead#1{\a:PoemTitle#1\b:PoemTitle}`, but we would lose code executed in `\@makePoemTitlehead` in that case. Generate output files with `tex tex4ht-4ht.tex` and copy the generated memoir.4ht into the same directory as the tex file that is about to be converted. Now as we have configured the hooks we can directly go and configure PoemTitle inside tex4ht-ooffice.tex. There was no subsection in tex4ht-ooffice.tex so I added to following code to style PoemTitle. \subsection{Memoir} \<configure ooffice memoir\><<< |<memoir poemtitle|> >>> \<memoir poemtitle\><<< \NewConfigureOO{PoemTitle} \ConfigureOO{PoemTitle}{% <style:style style:name="poemtitle" style:family="text" style:parent-style-name="Heading" style:class="chapter" >\Hnewline <style:text-properties fo:font-size="15pt" fo:font-weight="normal" style:font-size-asian="18pt" style:font-weight-asian="bold" style:font-size-complex="18pt" style:font-weight-complex="bold" fo:text-align="center" style:justify-single-word="false"/> </style:style> \Hnewline } \Configure{PoemTitle}{% \ifvmode\IgnorePar\fi\EndP \HCode{<text:span text:style-name="poemtitle">}\NoFonts} {\EndNoFonts\HCode{</text:span>\Hnewline} "\ifvmode\IgnorePar\fi\EndP" will close preceding paragraph, otherwise tag mismatch happens, "\NoFonts" disables further styling of the title, "\\" will start new \verseline. Other styling part should be self-explanatory after reading the previous subsection. 7.4 Fixing Errors with Tables in MemoirTeX4ht handled tables correctly in article and book classes. But in Memoir it gives an error. So the following code needs to be added to the beginning of memoir.4ht to fix this error. % Tables Handling \input array.4ht \input dcolumn.4ht \input tabularx.4ht \input booktabs.4ht \let\columnlines\empty If we don't write \let\columnlines\empty, we will get an error while handling math. First of all we added this piece of code into memoir.4ht for configuring tables because both memoir and tex4ht redefine lot of standard macros. I read somewhere online that memoir also changes standard table macros, in some way which is incompatible with tex4ht, so this is the main reason we add this piece of code to memoir.4ht. So, while doing this there is a bug in tex4ht which gets exposed and there are errors while handling eqn arrays etc. Adding \let\columnlines\empty resolves this error and conversion of \eqnarray,\array happens perfectly from then on. 7.5 Configuring a PDF image converterTeX4ht doesn't support graphics in pdf format so we have to use a PDF to PNG converter then instruct TeX4ht to use the generated PNG file instead of the PDF file. I added the following code into the \subsection{Graphics} part of tex4ht-ooffice.tex \Configure{graphics*} {pdf} {\Needs{"convert \csname Gin@base\endcsname.pdf -scale \expandafter\dim\the\csname Gin@req@width\endcsname\space "\csname Gin@base\endcsname-\expandafter\the\csname Gin@req@width\endcsname.png" "}% \Picture[pict]{\csname Gin@base\endcsname--\expandafter\the\csname Gin@req@width\endcsname.png}% \special{t4ht+@File: \csname Gin@base\endcsname-\expandafter\the\csname Gin@req@width\endcsname.png} } { \def\dim{% \catcode`\p=12 \catcode`\t=12 \gdef\dim} \dim#1pt{#1} } Gin@req@width is a macro from graphicx package and it directly gives us the width of the final image after using the scale given by the user. \Needs{} instructs TeX4ht to write this into the .lg file which is read in the last step of TeX4ht i.e when t4ht is run. So we are converting the given pdf to png with a name originalname-widthOfTheImage.png. And now we now need to tell tex4ht to use this newly generated png file. So I am doing it using \Picture[pict]{} in which I write the name of the png image I want to be used. 7.6 Fixing the issue with incorrect sizes of imagesI used a similar approach of the previous subsection to fix this issue. Instead of converting pdf to png like in the previous subsection I just converted the original png image to a new png image with dimensions according to the scale specified. So the original filename.png is converted to filename-widthOfTheImage.png which is inturn used by TeX4ht. I added the following code below the \Configure{graphics*}{pdf}{} section inside tex4ht-ooffice.tex. \Configure{graphics*} {png} {\Needs{"convert \csname Gin@base\endcsname.png -scale \expandafter\dim\the\csname Gin@req@width\endcsname\space "\csname Gin@base\endcsname-\expandafter\the\csname Gin@req@width\endcsname.png" "}% \Picture[pict]{\csname Gin@base\endcsname-\expandafter\the\csname Gin@req@width\endcsname.png }} { \def\dim{% \catcode`\p=12 \catcode`\t=12 \gdef\dim} \dim#1pt{#1} } This code is self explanatory after reading the previous subsection on PDF conversion. Similar code needs to be added into tex4ht-ooffice.tex to fix sizing issues with various formats like jpg, jpeg, gif etc 7.7 Issues with MathIn Math there were two main issues, one was that \array, equations and tables with expressions get squashed into a single line. Another issue is that environments like eqnarray, align are not displayed correctly in the resultant ODT file. The first problem was fixed when we added \let\columnlines\empty into the memoir.4ht file. As I mentioned earlier this was an accepted bug in TeX4ht. In the second issue when I tried to render the generated MathML code, equations were displayed correctly. So, that means that correct MathML code is being generated which inturn means TeX4ht is doing nothing wrong. So the problem here is that support for MathML is limited in Libre Office and Open Office which is the source of the issue 7.8 Injecting MetaDataIn case of Math I had to carry the LaTeX expression as it is into the content.xml of the resultant ODT file. I stored the math expression inside the annotation tag. First task was to identify math exclusively in the tex file. LyX made this task simpler as it has formula inside the formula inset. For example the following is the LyX syntax of displaying an inline formula \begin_inset Formula $m$ \end_inset The following is the syntax for representing other kind of Math \begin_inset Formula \begin{align} zw & =(3+2i)(2-i)\notag By this syntax of LyX its very easy to identify whether a Formula is an inline formula or not. In simple words if "\n" is found in between \begin_inset Formula and \end_inset its not an inline formula. Now the main task is to put this formula appropriately into the tex file and make TeX4ht understand that this is needs to go into the annotation tag. So, I made a new package named roundtrip and defined two commands, one for handling inline math and one for other math expressions. The following is the code I wrote inside roundtrip.sty \ProvidesPackage{roundtrip} \usepackage{fancyvrb} \usepackage{environ} \usepackage{cprotect} \newenvironment{metadata} {\VerbatimEnvironment \begin{Verbatim}} {\end{Verbatim}} \newcommand{\inlinemeta}[1] {\Pre \texttt{#1} \Post} \cMakeRobust{\inlinemeta} New Environment metadata is to handle Multiline Math like eqnarray, align etc and the new command \inlinemeta is for handling inlinemath as the name suggests. Now the task is to put every math expression inside the tex file into one of these two commands. So, I wrote a python script which is called after LyX to LaTeX conversion and before calling mk4ht for LaTeX to ODT conversion. This script first parses the LyX file and identifies all the math expressions and then opens the tex file, searches for that math expression inside that tex file and replaces it in the following way. Lets take an example to understand this better. Take the following math expression inside LyX \begin_inset Formula $m$ \end_inset Now we extract $m$ from here, search for this in the tex file and replace it with $m$ \inlinemeta\verb in the tex file. Incase of a multiline math expression, the following is the LyX syntax. \begin_inset Formula \[ \vec{p} \] \end_inset We extract this formula from the lyx file, search it in the tex file and replace it with the following inside the tex file. \[ \vec{p} \] \begin{metadata} \[ \vec{p} \] \end{metadata} I wrote a python script which does all this which as I mentioned is called before calling mk4ht. Now the final task is to style these commands in roundtrip.4ht which will automatically be considered as we are using roundtrip package. The following is the code in roundtrip.4ht \:CheckOption{ooffice}\if:Option \ConfigureEnv{metadata}{ \ifvmode \IgnorePar\fi \EndP \HCode{<annotation encoding="\jobname-m\math:obj">}} {\ifvmode \IgnorePar\fi \EndP\HCode{</annotation> \Hnewline}\par}{}{} \NewConfigure{inlinemeta}[2]{% \newcommand\Pre{#1}% \newcommand\Post{#2}} \Configure{inlinemeta}{ \Configure{texttt}{\NoFonts}{\EndNoFonts} \Configure{verb}{}{} \Configure{obeylines}{}{}{} \HCode{<annotation encoding="\jobname-m\math:obj">}} {\HCode{</annotation> \Hnewline}} \fi Here \jobname-m\math:obj is the decoding link to find out the formula for which this annotation tag belongs to in the content.xml file. 8. Future Work8.1 LyX to ODT
8.2 ODT to LyX
|