LaTeX « Brain Dump

Generating HTML pages from Latex

Posted by cheshirekow in LaTeX on June 29, 2011

While latex is pretty much “not designed” for web content, it is very useful to generate a web-version of a latex document. The purpose of latex is clearly for typesetting layouts on a pre-defined page, but when you want to share the information with others, it’s generally a lot easier for them to go to a webpage then it is to download and open a PDF. In addition, it’s generally easier to view a webpage than a PDF because the content is continuous, and one can scroll around and click hyperlinks in a way that is far more fluid than on a PDF.

Now that MathML and SVG are becoming more supported by web browsers, there is a strong case for sharing mathy documents on the web in addition to paper documents (or PDFs, which are only slightly more readable than paper).

To this end, I’ve been evaluating various different Latex to HTML converters. I’ve tried the following on Linux (Ubuntu):

By far my favorite is LaTeXML. It generates crisp, simple pages using MathML and CSS, making it easy to customize the style. It doesn’t support a whole lot of packages that I generally would like to use (like algorithm2e), but then again none of them do. Also, the ArXiV project is working on a branch of LaTeXML so there is promise that it will grow quickly to support a lot of the best packages.

Document Setup

My current approach to generating both PDFs and HTMLs from latex source is to use separate top-level documents for both. The directory structure looks something like this:

    document
     |- document_html.tex
     |- document_pdf.tex
     |- document.tex
     |- preamble_common.tex
     |- preamble_html.tex
     |- preamble_pdf.tex
     \- references.bib

The two versions of document_[output].tex are the top-level files. They look like this:

%document_html.tex
 
\documentclass[10pt]{article}
\input{preamble_common}
\input{preamble_html} 
\begin{document}
\input{document}
\end{document}

The pdf version is the same but it uses preamble_pdf as an input. Note that in latex you cannot nest \include directives, but you can nest \input directives. Also, \include inserts a page-break so there is no need to use them here. Rather document.tex may \include it’s chapters as tex files or the like.

Makefile

To ease the process of generating the different types, I’m using a makefile.

# The following definitions are the specifics of this project
PDF_OUTPUT  :=  document.pdf
HTML_OUTPUT :=  document.html
 
PDF_MAIN	:=  document_pdf.tex
HTML_MAIN   :=  document_html.tex
 
COMMON_TEX 	:=	document.tex \
                preamble_common.tex
 
PDF_TEX		:=  $(COMMON_SRC) \
                document_pdf.tex \
                preamble_pdf.tex
 
HTML_TEX    :=  $(COMMON_SRC) \
                document_html.tex \
                preamble_html.tex 
 
BIB         :=  references.bib
 
 
 
# these variables are the dependencies for the outputs
PDF_SRC     := $(PDF_TEX) $(BIB)
HTML_SRC    := $(HTML_TEX) $(BIB)
 
# the 'all' target will make both the pdf and html outputs
all: pdf html
 
# the 'pdf' target will make the pdf output
pdf: $(PDF_OUTPUT)
 
# the 'html' target will make the html output
html: $(HTML_OUTPUT)
 
# the pdf output depends on the pdf tex files
# we use a shell script to optionally run pdflatex multiple times until the
# output does not suggest that we rerun latex
$(PDF_OUTPUT): $(PDF_TEX) 
	@echo "Running pdflatex on $(PDF_MAIN)"
	@pdflatex $(basename $(PDF_MAIN)) > $(basename $(PDF_MAIN))_0.log
	@echo "Running bibtex"
	@-bibtex   $(basename $(PDF_MAIN)) > bibtex_pdf.log 
	@echo "Checking for rerun suggestion"
	@for ITER in 1 2 3 4; do \
		STABELIZED=`cat $(basename $(PDF_MAIN)).log | grep "Rerun"`; \
		if [ -z "$$STABELIZED" ]; then \
			echo "Document stabelized after $$ITER iterations"; \
			break; \
		fi; \
		echo "Document not stabelized, rerunning pdflatex"; \
		pdflatex $(basename $(PDF_MAIN)) > $(basename $(PDF_MAIN))_$$ITER.log; \
	done
	@echo "Copying pdf to target file"
	@cp $(basename $(PDF_MAIN)).pdf $(PDF_OUTPUT)
 
# the html output depends on the html tex files
# we have to process all of the bibliography files separately into xml files, 
# and then include them all in the call to the postprocessor
$(HTML_OUTPUT): $(HTML_TEX) 
	@echo "Running latexml on $(HTML_MAIN)"
	@latexml $(HTML_MAIN) --dest=$(basename $(HTML_OUTPUT)).xml > $(basename $(HTML_MAIN)).log 2>&1
	@BIBSTRING=""; \
	for BIBFILE in $(BIB); do \
		echo "Running latexml on $$BIBFILE"; \
		XMLFILE=`basename "$$BIBFILE" .bib`.xml; \
		LOGFILE=`basename "$$BIBFILE" .bib`_html.log; \
	    latexml $$BIBFILE --dest=$$XMLFILE > $$LOGFILE 2>&1; \
	    BIBSTRING="$$BIBSTRING --bibliography=$$XMLFILE"; \
	done; \
	echo $$BIBSTRING > bibstring.txt
	@echo "postprocessing with `cat bibstring.txt`"
	@latexmlpost $(basename $(HTML_OUTPUT)).xml `cat bibstring.txt` --dest=$(HTML_OUTPUT) --css=navbar-left.css
 
# the 2>/dev/null redirects stderr to the null device so that we don't get error
# messages in the console when rm has nothing to remove
clean:
	@-rm -v *.log 2>/dev/null
	@-rm -v *.out 2>/dev/null
	@-rm -v *.aux 2>/dev/null
	@-rm -v *.xml 2>/dev/null
	@-rm -v *.pdf 2>/dev/null
	@-rm -v *.html 2>/dev/null
	@-rm -v bibstring.txt 2>/dev/null

# The following definitions are the specifics of this project PDF_OUTPUT := document.pdf HTML_OUTPUT := document.html PDF_MAIN := document_pdf.tex HTML_MAIN := document_html.tex COMMON_TEX := document.tex \ preamble_common.tex PDF_TEX := $(COMMON_SRC) \ document_pdf.tex \ preamble_pdf.tex HTML_TEX := $(COMMON_SRC) \ document_html.tex \ preamble_html.tex BIB := references.bib # these variables are the dependencies for the outputs PDF_SRC := $(PDF_TEX) $(BIB) HTML_SRC := $(HTML_TEX) $(BIB) # the 'all' target will make both the pdf and html outputs all: pdf html # the 'pdf' target will make the pdf output pdf: $(PDF_OUTPUT) # the 'html' target will make the html output html: $(HTML_OUTPUT) # the pdf output depends on the pdf tex files # we use a shell script to optionally run pdflatex multiple times until the # output does not suggest that we rerun latex $(PDF_OUTPUT): $(PDF_TEX) @echo "Running pdflatex on $(PDF_MAIN)" @pdflatex $(basename $(PDF_MAIN)) > $(basename $(PDF_MAIN))_0.log @echo "Running bibtex" @-bibtex $(basename $(PDF_MAIN)) > bibtex_pdf.log @echo "Checking for rerun suggestion" @for ITER in 1 2 3 4; do \ STABELIZED=`cat $(basename $(PDF_MAIN)).log | grep "Rerun"`; \ if [ -z "$$STABELIZED" ]; then \ echo "Document stabelized after $$ITER iterations"; \ break; \ fi; \ echo "Document not stabelized, rerunning pdflatex"; \ pdflatex $(basename $(PDF_MAIN)) > $(basename $(PDF_MAIN))_$$ITER.log; \ done @echo "Copying pdf to target file" @cp $(basename $(PDF_MAIN)).pdf $(PDF_OUTPUT) # the html output depends on the html tex files # we have to process all of the bibliography files separately into xml files, # and then include them all in the call to the postprocessor $(HTML_OUTPUT): $(HTML_TEX) @echo "Running latexml on $(HTML_MAIN)" @latexml $(HTML_MAIN) --dest=$(basename $(HTML_OUTPUT)).xml > $(basename $(HTML_MAIN)).log 2>&1 @BIBSTRING=""; \ for BIBFILE in $(BIB); do \ echo "Running latexml on $$BIBFILE"; \ XMLFILE=`basename "$$BIBFILE" .bib`.xml; \ LOGFILE=`basename "$$BIBFILE" .bib`_html.log; \ latexml $$BIBFILE --dest=$$XMLFILE > $$LOGFILE 2>&1; \ BIBSTRING="$$BIBSTRING --bibliography=$$XMLFILE"; \ done; \ echo $$BIBSTRING > bibstring.txt @echo "postprocessing with `cat bibstring.txt`" @latexmlpost $(basename $(HTML_OUTPUT)).xml `cat bibstring.txt` --dest=$(HTML_OUTPUT) --css=navbar-left.css # the 2>/dev/null redirects stderr to the null device so that we don't get error # messages in the console when rm has nothing to remove clean: @-rm -v *.log 2>/dev/null @-rm -v *.out 2>/dev/null @-rm -v *.aux 2>/dev/null @-rm -v *.xml 2>/dev/null @-rm -v *.pdf 2>/dev/null @-rm -v *.html 2>/dev/null @-rm -v bibstring.txt 2>/dev/null

Some notes on the makefile. I execute bibtex ignoring errors (the dash symbol before ‘bibtex’) because bibtex will exit with an error if it doesn’t find any citations, or if there is no bibliography. Each iteration of pdflatex is output to a logfile named “document_pdf_<i>.log” where “<i>” is the iteration number. The output of pdflatex and bibtex is supressed by dumping it to the logfile (I the verbosity useless to have in the console).

The shell script in the PDF recipe iterates up to four times. The first thing it does is greps the output of the most recent run pdf latex looking for the line where latex recommends that we “Rerun” latex. If it finds such a line it sets the shell variable STABELIZED to that string. Otherwise it gets the empty string. Then we test to see if the string is empty. If it’s empty, we’re done so we break the loop. If it’s not, then we rerun pdflatex.

The shell script in the HTML recipe iterates over each of the (potentially multiple, potentially zero) bibliography files, processing each of them with latexml. It then appends the string “–bibliography=<filename>.xml” to the BIBSTRING shell variable. The last thing it does is echos the contents of that shell variable to the file “bibstring.txt”. This so so that subsequent commands by make can find it.

No Comments

Getting Inkscape to Use Latex Fonts (in Windows)

Posted by cheshirekow in LaTeX on April 22, 2010

Introduction

Creating graphics for latex can be a real pain. There are a number of different options for doing this, though none of them is completely ideal. If you’re comfortable using regular latex and generating DVI’s, then the pstricks package is a very powerful tool. If you prefer generating PDF (now an open standard) files (as I do) then the PGF/TikZ latex packages are a very powerful and can do just about everything… except that you have to code your graphics… which is a very slow iterative process. The GNU Diagramming tool Dia can create block diagrams and flow charts and can export either pstricks or tikz code. Inkscape is a much nicer user-oriented graphical vector drawing tool, but doesn’t have native support for for LaTex, and creating graphics including LaTeX math-mode is a real pain. In any case, I’ve found a number of situations where a figure like the following was pretty easy to do create in Inkscape.

Inkscape Figure for LaTeX

Getting the Fonts

In order to get the math font’s to look like they do in LaTeX, though, you need to have the font’s installed where Inkscape can find them. Unfortunately, LaTex uses type1 postscript fonts, while Inkscape can only find font’s that windows has installed in the system, which includes true-type or open-type fonts. Fortunately you can get the fonts for “Computer Modern” (Knuth’s Font uses as the default in LaTeX) from the TeX archives in these formats. Simply download these fonts, and install them in windows (drag them to C:/Windows/Fonts). The next time you run Inkscape, it will have these font’s available and you can use them in your pretty graphics.

Other Fonts

There are some other font’s that you’ll find used by latex that aren’t in OTF or TTF format though. The only (open-source) way I’ve found to convert type-1 font’s to OTF is through an ancient tool called Font-Forge. It’s an X-Windows program so you’ll have to install the Cygwin x-server, or, luckily, someone has ported it to MinGW (native Win32).

Tex Text plugin

Lately, I’ve been using the Tex Text plugin instead of using latex fonts with regular inkscape text. The interface is a little tedious, but it works quite well (and it’s a lot less tedious then laying out the text by hand).

1 Comment

Brain Dump

Archive for category LaTeX

svg2pdf and svg2eps (convert svg to pdf or eps from the command line)

Generating HTML pages from Latex

Document Setup

Makefile

Getting Inkscape to Use Latex Fonts (in Windows)

Introduction

Getting the Fonts

Other Fonts

Tex Text plugin

Pages

Categories

Archives

other stuff