Archive for June, 2011

Generating HTML pages from Latex

Posted by cheshirekow in LaTeX on June 29, 2011

While latex is pretty much “not designed” for web content, it is very useful to generate a web-version of a latex document. The purpose of latex is clearly for typesetting layouts on a pre-defined page, but when you want to share the information with others, it’s generally a lot easier for them to go to a webpage then it is to download and open a PDF. In addition, it’s generally easier to view a webpage than a PDF because the content is continuous, and one can scroll around and click hyperlinks in a way that is far more fluid than on a PDF.

Now that MathML and SVG are becoming more supported by web browsers, there is a strong case for sharing mathy documents on the web in addition to paper documents (or PDFs, which are only slightly more readable than paper).

To this end, I’ve been evaluating various different Latex to HTML converters. I’ve tried the following on Linux (Ubuntu):

By far my favorite is LaTeXML. It generates crisp, simple pages using MathML and CSS, making it easy to customize the style. It doesn’t support a whole lot of packages that I generally would like to use (like algorithm2e), but then again none of them do. Also, the ArXiV project is working on a branch of LaTeXML so there is promise that it will grow quickly to support a lot of the best packages.

Document Setup

My current approach to generating both PDFs and HTMLs from latex source is to use separate top-level documents for both. The directory structure looks something like this:

    document
     |- document_html.tex
     |- document_pdf.tex
     |- document.tex
     |- preamble_common.tex
     |- preamble_html.tex
     |- preamble_pdf.tex
     \- references.bib

The two versions of document_[output].tex are the top-level files. They look like this:

%document_html.tex
 
\documentclass[10pt]{article}
\input{preamble_common}
\input{preamble_html} 
\begin{document}
\input{document}
\end{document}

The pdf version is the same but it uses preamble_pdf as an input. Note that in latex you cannot nest \include directives, but you can nest \input directives. Also, \include inserts a page-break so there is no need to use them here. Rather document.tex may \include it’s chapters as tex files or the like.

Makefile

To ease the process of generating the different types, I’m using a makefile.

# The following definitions are the specifics of this project
PDF_OUTPUT  :=  document.pdf
HTML_OUTPUT :=  document.html
 
PDF_MAIN	:=  document_pdf.tex
HTML_MAIN   :=  document_html.tex
 
COMMON_TEX 	:=	document.tex \
                preamble_common.tex
 
PDF_TEX		:=  $(COMMON_SRC) \
                document_pdf.tex \
                preamble_pdf.tex
 
HTML_TEX    :=  $(COMMON_SRC) \
                document_html.tex \
                preamble_html.tex 
 
BIB         :=  references.bib
 
 
 
# these variables are the dependencies for the outputs
PDF_SRC     := $(PDF_TEX) $(BIB)
HTML_SRC    := $(HTML_TEX) $(BIB)
 
# the 'all' target will make both the pdf and html outputs
all: pdf html
 
# the 'pdf' target will make the pdf output
pdf: $(PDF_OUTPUT)
 
# the 'html' target will make the html output
html: $(HTML_OUTPUT)
 
# the pdf output depends on the pdf tex files
# we use a shell script to optionally run pdflatex multiple times until the
# output does not suggest that we rerun latex
$(PDF_OUTPUT): $(PDF_TEX) 
	@echo "Running pdflatex on $(PDF_MAIN)"
	@pdflatex $(basename $(PDF_MAIN)) > $(basename $(PDF_MAIN))_0.log
	@echo "Running bibtex"
	@-bibtex   $(basename $(PDF_MAIN)) > bibtex_pdf.log 
	@echo "Checking for rerun suggestion"
	@for ITER in 1 2 3 4; do \
		STABELIZED=`cat $(basename $(PDF_MAIN)).log | grep "Rerun"`; \
		if [ -z "$$STABELIZED" ]; then \
			echo "Document stabelized after $$ITER iterations"; \
			break; \
		fi; \
		echo "Document not stabelized, rerunning pdflatex"; \
		pdflatex $(basename $(PDF_MAIN)) > $(basename $(PDF_MAIN))_$$ITER.log; \
	done
	@echo "Copying pdf to target file"
	@cp $(basename $(PDF_MAIN)).pdf $(PDF_OUTPUT)
 
# the html output depends on the html tex files
# we have to process all of the bibliography files separately into xml files, 
# and then include them all in the call to the postprocessor
$(HTML_OUTPUT): $(HTML_TEX) 
	@echo "Running latexml on $(HTML_MAIN)"
	@latexml $(HTML_MAIN) --dest=$(basename $(HTML_OUTPUT)).xml > $(basename $(HTML_MAIN)).log 2>&1
	@BIBSTRING=""; \
	for BIBFILE in $(BIB); do \
		echo "Running latexml on $$BIBFILE"; \
		XMLFILE=`basename "$$BIBFILE" .bib`.xml; \
		LOGFILE=`basename "$$BIBFILE" .bib`_html.log; \
	    latexml $$BIBFILE --dest=$$XMLFILE > $$LOGFILE 2>&1; \
	    BIBSTRING="$$BIBSTRING --bibliography=$$XMLFILE"; \
	done; \
	echo $$BIBSTRING > bibstring.txt
	@echo "postprocessing with `cat bibstring.txt`"
	@latexmlpost $(basename $(HTML_OUTPUT)).xml `cat bibstring.txt` --dest=$(HTML_OUTPUT) --css=navbar-left.css
 
# the 2>/dev/null redirects stderr to the null device so that we don't get error
# messages in the console when rm has nothing to remove
clean:
	@-rm -v *.log 2>/dev/null
	@-rm -v *.out 2>/dev/null
	@-rm -v *.aux 2>/dev/null
	@-rm -v *.xml 2>/dev/null
	@-rm -v *.pdf 2>/dev/null
	@-rm -v *.html 2>/dev/null
	@-rm -v bibstring.txt 2>/dev/null

# The following definitions are the specifics of this project PDF_OUTPUT := document.pdf HTML_OUTPUT := document.html PDF_MAIN := document_pdf.tex HTML_MAIN := document_html.tex COMMON_TEX := document.tex \ preamble_common.tex PDF_TEX := $(COMMON_SRC) \ document_pdf.tex \ preamble_pdf.tex HTML_TEX := $(COMMON_SRC) \ document_html.tex \ preamble_html.tex BIB := references.bib # these variables are the dependencies for the outputs PDF_SRC := $(PDF_TEX) $(BIB) HTML_SRC := $(HTML_TEX) $(BIB) # the 'all' target will make both the pdf and html outputs all: pdf html # the 'pdf' target will make the pdf output pdf: $(PDF_OUTPUT) # the 'html' target will make the html output html: $(HTML_OUTPUT) # the pdf output depends on the pdf tex files # we use a shell script to optionally run pdflatex multiple times until the # output does not suggest that we rerun latex $(PDF_OUTPUT): $(PDF_TEX) @echo "Running pdflatex on $(PDF_MAIN)" @pdflatex $(basename $(PDF_MAIN)) > $(basename $(PDF_MAIN))_0.log @echo "Running bibtex" @-bibtex $(basename $(PDF_MAIN)) > bibtex_pdf.log @echo "Checking for rerun suggestion" @for ITER in 1 2 3 4; do \ STABELIZED=`cat $(basename $(PDF_MAIN)).log | grep "Rerun"`; \ if [ -z "$$STABELIZED" ]; then \ echo "Document stabelized after $$ITER iterations"; \ break; \ fi; \ echo "Document not stabelized, rerunning pdflatex"; \ pdflatex $(basename $(PDF_MAIN)) > $(basename $(PDF_MAIN))_$$ITER.log; \ done @echo "Copying pdf to target file" @cp $(basename $(PDF_MAIN)).pdf $(PDF_OUTPUT) # the html output depends on the html tex files # we have to process all of the bibliography files separately into xml files, # and then include them all in the call to the postprocessor $(HTML_OUTPUT): $(HTML_TEX) @echo "Running latexml on $(HTML_MAIN)" @latexml $(HTML_MAIN) --dest=$(basename $(HTML_OUTPUT)).xml > $(basename $(HTML_MAIN)).log 2>&1 @BIBSTRING=""; \ for BIBFILE in $(BIB); do \ echo "Running latexml on $$BIBFILE"; \ XMLFILE=`basename "$$BIBFILE" .bib`.xml; \ LOGFILE=`basename "$$BIBFILE" .bib`_html.log; \ latexml $$BIBFILE --dest=$$XMLFILE > $$LOGFILE 2>&1; \ BIBSTRING="$$BIBSTRING --bibliography=$$XMLFILE"; \ done; \ echo $$BIBSTRING > bibstring.txt @echo "postprocessing with `cat bibstring.txt`" @latexmlpost $(basename $(HTML_OUTPUT)).xml `cat bibstring.txt` --dest=$(HTML_OUTPUT) --css=navbar-left.css # the 2>/dev/null redirects stderr to the null device so that we don't get error # messages in the console when rm has nothing to remove clean: @-rm -v *.log 2>/dev/null @-rm -v *.out 2>/dev/null @-rm -v *.aux 2>/dev/null @-rm -v *.xml 2>/dev/null @-rm -v *.pdf 2>/dev/null @-rm -v *.html 2>/dev/null @-rm -v bibstring.txt 2>/dev/null

Some notes on the makefile. I execute bibtex ignoring errors (the dash symbol before ‘bibtex’) because bibtex will exit with an error if it doesn’t find any citations, or if there is no bibliography. Each iteration of pdflatex is output to a logfile named “document_pdf_<i>.log” where “<i>” is the iteration number. The output of pdflatex and bibtex is supressed by dumping it to the logfile (I the verbosity useless to have in the console).

The shell script in the PDF recipe iterates up to four times. The first thing it does is greps the output of the most recent run pdf latex looking for the line where latex recommends that we “Rerun” latex. If it finds such a line it sets the shell variable STABELIZED to that string. Otherwise it gets the empty string. Then we test to see if the string is empty. If it’s empty, we’re done so we break the loop. If it’s not, then we rerun pdflatex.

The shell script in the HTML recipe iterates over each of the (potentially multiple, potentially zero) bibliography files, processing each of them with latexml. It then appends the string “–bibliography=<filename>.xml” to the BIBSTRING shell variable. The last thing it does is echos the contents of that shell variable to the file “bibstring.txt”. This so so that subsequent commands by make can find it.

No Comments

Personal Dynamic DNS in Ubuntu

Posted by cheshirekow in Uncategorized on June 13, 2011

I finally got around to purchasing a personal server and one of the first things I did was set up a private DNS server for cheshirekow.com. As it turns out, setting it up to be dynamic is quite easy. In this post I’ll go through the steps I took to get it up and running.

I wont bother with all the fun stuff about how dynamic DNS works or how to properly configure everything, but instead I’ll just post my configuration files for posterity.

More detailed information on configuring bind can be found in the Ubuntu Server Guide. A good article on nsupdate and dynamic updates to bind can be found on jeff garzik’s linux pages. I found the information I needed on Network manager hooks from sysadmin’s journey

Why Dynamic DNS?

Mostly because I’m lazy. I have a work laptop, a personal desktop, a netbook, an android tablet, and an android phone. I’m constantly scp’ing files from one to another, and I really hate having to write out the ip address specifically all the time. Since I own the domain cheshirekow.com, I figured it would be really slick to be able to address all of my machines as subdomains. For instance, I could label them as “laptop.cheshirekow.com”, “desktop.cheshirekow.com”, “netbook.cheshirekow.com”, “tablet.cheshirekow.com”, and “phone.cheshirekow.com”. If these dns entries are automatically updated when each of these devices connects to a wifi access point using DHCP, then I can even get files from one machine to another without even being physically near them.

named.conf.local

Following the ubuntu guide, I edited /etc/bind/named.conf.local to look like the following:

//
// Do any local configuration here
//
 
// Consider adding the 1918 zones here, if they are not used in your
// organization
//include "/etc/bind/zones.rfc1918";
 
zone "cheshirekow.com" {
	type master;
	file "/var/lib/bind/db.cheshirekow.com";
	allow-transfer { aaa.bbb.ccc.ddd; };
	allow-update { key "user.cheshirekow.com."; };
};

Note that the file is in /var/lib/bind/db.cheshirekow.com not in /etc/bind/db.cheshirekow.com like a lot of tutorials will tell you. This is because ubuntu prevents bind from writing to files in /etc/bind. You can either change the apparmor profile for bind, or, just do as I do, and put the file where you’re supposed to go in /var/lib/bind/ (there’s a note in the bind apparmor profile about this). Putting it in “/etc/bind” is fine if the dns entries are all static, but if there are dynamic entries then bind will try to create a .jnl file in the same directory as the db.xxx file. Since bind can’t write to /etc/bind we need to put the db file somewhere else.

Also, note that aaa.bbb.ccc.ddd is the ip address of my secondary name server for cheshirekow.com. I’m using afraid.org to host my secondary DNS.

The allow-update line allows the user user@cheshirekow.com to update the dns entries (the dynamic part) as verified by a keypair (generating the keypair comes later). Note that I don’t use the literal “user”.

/var/lib/bind/db.cheshirekow.com

The next thing was to create the db.cheshirekow.com file which looks like this.

$ORIGIN .
$TTL 604800	; 1 week
cheshirekow.com		IN SOA	ns1.cheshirekow.com. cheshirekow.gmail.com. (
				9          ; serial
				604800     ; refresh (1 week)
				86400      ; retry (1 day)
				2419200    ; expire (4 weeks)
				604800     ; minimum (1 week)
				)
			NS	ns1.cheshirekow.com.
			A	aaa.bbb.ccc.ddd
			AAAA	::1
$ORIGIN cheshirekow.com.
ns1			A	aaa.bbb.ccc.ddd
www			A	eee.fff.ggg.hhh

Note that aaa.bbb.ccc.ddd is the ipaddress of the name server itself and eee.fff.ggg.hhh is the ip address of my web server (where you are currently reading this). Also note that my email address is cheshirekow@gmail.com but is written in this file as cheshirekow.gmail.com..

You can (should?) also set up reverse dns entries for all these things but I did not as the server is actually sitting in a different physical domain. In other words I don’t own a network of ip-addresses so there’s no reason to expect my server to be queried for reverse dns lookups.

Create Keys

The next thing we need to do is setup a key that we can use to do dynamic updates. This can be done on a separate machine from the name server… it doesn’t matter.

user@ns1:~$ mkdir .bind
user@ns1:~$ cd .bind
user@ns1:~$ dnssec-keygen -a HMAC-MD5 -b 512 -n USER user.cheshirekow.com.

Note that “USER” is a literal string, not a placeholder for something that you create. Also note that “user.cheshirekow.com” is the name of this key, and corresponds to the email address “user@cheshirekow.com”.

This command creates a public and private key.

user@ns1:~/.bind$ ls -l
total 8
-rw------- 1 user user 127 2011-06-10 16:51 Kuser.cheshirekow.com.+157+56713.key
-rw------- 1 user user 229 2011-06-10 16:51 Kuser.cheshirekow.com.+157+56713.private

Install Keys

Now we create a file to store these keys. I put them in /etc/bind/keys.local

key "user.cheshirekow.com." {
	algorithm HMAC-MD5;
	secret "2345A/bkd7GDcu9orjzblkj2r37ajglk489DLHD/m987addzjDCadsh8 bbIUOY809glkashDEmPj5alIUoiEeA==";
};

Note that this is not a real key, but random gibberish I pounded out on the keyboard. In reality, this key is copied directly from Kuser.cheshirekow.com.+157+56713.key.

I then added this file to named.conf.local so that it looks like this:

// This is the primary configuration file for the BIND DNS server named.
//
// Please read /usr/share/doc/bind9/README.Debian.gz for information on the 
// structure of BIND configuration files in Debian, *BEFORE* you customize 
// this configuration file.
//
// If you are just adding zones, please do that in /etc/bind/named.conf.local
 
include "/etc/bind/named.conf.options";
include "/etc/bind/named.conf.local";
include "/etc/bind/named.conf.default-zones";
include "/etc/bind/keys.local";

Restart bind

That’s it for the bind setup so restart

user@ns1:~$sudo /etc/init.d/bind9 restart

Client Update Script

I then created the following update script in /etc/NetworkManager/dispatcher.d/99updatedns. This script is called as a hook from network manager every time an interface goes up or down. It receives two parameters. The first is the name of the interface (i.e. eth0 or wlan0) and the second is the status (i.e. up or down).

#!/bin/bash
 
INTERFACE=$1
STATUS=$2
DIRECTORY="/home/user/Codes/shell/dyndns"
 
if [ "$STATUS" = "up" ]; then
    IPADDRESS=`ifconfig $INTERFACE | grep inet | grep -v inet6 | cut -d ":" -f 2 | cut -d " " -f 1`
    cp $DIRECTORY/nsupdate_src.txt /tmp/nsupdate.txt
    sed -i "s/IPADDRESS/$IPADDRESS/" /tmp/nsupdate.txt 
    nsupdate -k /home/user/.bind/Kuser.cheshirekow.com.+157+56713.private -v /tmp/nsupdate.txt
fi

Note that this script requires the nsupdate_src.txt which is here:

server ns1.cheshirekow.com
zone cheshirekow.com
update delete netbook.cheshirekow.com. A
update add netbook.cheshirekow.com. 86400 A IPADDRESS
show
send

The script extracts the ip address from the output of ifconfig for the correct interface, copies the file to /tmp/, replaces IPADDRESS with the actual address of the machine, and then calls nsupdate using the private key and the file. This script is saved as /etc/NetworkManager/dispatcher.d/99updatedns, owned by root and flagged executable. Note that this script accesses the key for my specific user, which is fine in my case because my netbook is a single-user machine. If the machine has multiple users, you may want to store the key and text file in /home/root or something.

Result

The result of this process is that netbook.cheshirekow.com always points to the ip address of my netbook, given that it is connected to a wifi access point. Whenever the netbook (re)connects to an access point, the network manager calls the script, and the dns entry on ns1.cheshirekow.com is updated.

(Update) Better Script

I changed the update script a little bit. Since I use a wired connection on my laptop most of the time, I don’t want the ip address for the wireless connection to supercede that of the wired connection if it is active.

#!/bin/bash
 
INTERFACE=$1
STATUS=$2
DIRECTORY="/home/user/Codes/shell/dyndns"
 
echo "network interface change hook:"
echo "----------------------------";
 
#first, check to see if eth0 is up and running
ETH0STR=`ifconfig eth0 | grep inet | grep -v inet6`
if [ -z "$ETH0STR" ]
then
    echo "eth0 has no address (probably is down or disconnected)"
    echo "checking interface $INTERFACE whose changed launched this script"
    if [ "$STATUS" = "up" ]
    then
        IPADDRESS=`ifconfig $INTERFACE | grep inet | grep -v inet6 | cut -d ":" -f 2 | cut -d " " -f 1`
        if [ -z "$IPADDRESS" ]
        then
            echo "$INTERFACE has no address, aborting (str = $IPADDRESS)"
        else
            echo "$INTERFACE has address $IPADDRESS"
            cp $DIRECTORY/nsupdate_src.txt /tmp/nsupdate.txt
            sed -i "s/IPADDRESS/$IPADDRESS/" /tmp/nsupdate.txt 
            nsupdate -k /home/user/.bind/Kuser.cheshirekow.com.+157+56713.private -v /tmp/nsupdate.txt            
        fi
    else
        echo "Status is not 'up', aborting"
    fi
else
    IPADDRESS=`echo $ETH0STR | cut -d ":" -f 2 | cut -d " " -f 1`
    echo "eth0 has address $IPADDRESS, ignoring changed interface $INTERFACE"
    cp $DIRECTORY/nsupdate_src.txt /tmp/nsupdate.txt
    sed -i "s/IPADDRESS/$IPADDRESS/" /tmp/nsupdate.txt 
    nsupdate -k /home/user/.bind/Kuser.cheshirekow.com.+157+56713.private -v /tmp/nsupdate.txt
fi

Edit:

For some reason whenever I update db.cheshirekow.com bind refuses to restart correctly. When I do this update, I have to delete the file /var/lib/bind/db.cheshirekow.com.jnl and restart.

1 Comment

Inkbook Introduction

Posted by cheshirekow in Inkbook, Uncategorized on June 7, 2011

Inkbook is a new project I’ve started to replace Xournal for my needs. What I really want is a tightly integrated, full-features inking experience for Ubuntu.

What’s wrong with xournal?

Xournal is great. I use it all the time. However, there are a lot of really simple features I would like it to have. I took a look at the code, and it’s pretty hard to understand. The lack of good documentation means it’s not worth my time. There’s no sense in committing a ton of time trying to learn the code base, just to find out that an apparently simple feature is impossible to implement without restructuring the whole thing. So, I’m just restructuring the whole thing :).

I’ll start by going through all the things that I don’t like about Xournal.

Memory Usage

One of the biggest problems I have with Xournal is it’s memory usage. A typical 10 page Xournal document consumes around 300MB of RAM, and takes about 60 seconds to open. This is a big nuisance to me. I suspect that Xournal stores the whole document in memory, which is the cause.

Bitmaps

A lot of times I really want to paste some snippet into my notes. There is a Xournal patch for using bitmaps, and it’s not terrible, but the images render fuzzy and it’s difficult to scale and place them in the document. I usually end up exporting the whole thing to PDF for later reference. I’ve written a script which can copy parts of the screen to the clipboard (like the Adobe Reader snapshot tool), so I’d really like to be able to drop a bunch of images into a notebook and draw around them, write on them, etc.

Layers

I think that layers are a really useful tool, but it’s hard to use them in xournal. First of all, you have to select them from a drop down list at the bottom of the screen, not a list box. You can’t reorder them. And if you move to a lower layer, all the layers above it disappear.

Pen Options

Only three line widths and no fast-access colorwheel.

Shapes

Can only draw shapes by having the recognizer interpret them. Why not have shape tools that allow you to drop the shape and then resize, move around?

No lasso tool

Rectangular selection just doesn’t cut it for me. Especially when I have potato shaped drawings that I want to move around, without moving the text around it.

Inkbook

What I really want is a digital notebook. Inkbook aims to be just that. Inkbook is really a merger of features that I like from both Xournal and Inkscape, and an attempt to fix some of the problems I have with both. Here is a list of the features I’m currently focusing on.

very large documents
ability to organize notebooks (like folders)
ability to link individual pages to multiple notebooks
multiple layers per page
multiple page sizes
continuous range of brush sizes
continuous color picking
bitmap cut & paste
grouping of paths
objects (shapes)
collaboration (openbook module?)

Very large documents and Organization

I want to be able to have several dozens of pages in a document, which basically means that the entire document can’t be stored in memory. Therefore, I’m attemping to store the data an a sqlite database. This also addresses the desire to have better organizational facilities. I’m implementing separate database objects for notebooks, pages, layers, objects, and paths.

A notebook is an ordered list of notebooks and an ordered list of pages (i.e. a folder). A page is an ordered list of layers. A layer is an ordered list of objects. An object is an ordered list of objects, images, or paths. A path is an ordered list of drawing primitives (most likely a one-to-one mapping to the cairo API).

Organization and View

For organizing notebooks, I plan to have a triew-view (i.e. directory tree). I’ll have a thumbnail page view which shows the current pages and those near it, and allows for scrolling through the whole notebook. This will be a custom widget which renders each of the pages via their thubmail image. I’ll have a list-view to organize layers on the page. The list view will also show list complex objects so they can be easily selected and edited (but it wont display any information about handdrawn paths, as there will be a large number of these). The main view will display a viewport of the page.

Current Progress

I’ve got a proof-of-concept running with the sqlite database file backend and working views the notebook organization and layers. I’ve got a proof-of-concept for the thumbnail view but it needs more work. It’s written in C++ and meant to be very easy to understand and extend. I’m using Gtkmm3 (unstable) because it’s GTK, but it’s C++, and it has cairo as the native API. Here’s a screenshot:

Inkbook Screenshot