ix01.html is the index that is generated by DocBook. Unfortunately it seems like it does not generate a machine-readable index representation that we could parse.
The way DocBook works is that it is looking for
<indexterm> tags in the source XML files. They look like this:
(There could be a secondary part, but we don’t use them yet as far as I know).
Basically, you could either take all the XML files that we use as DocBook sources and use some simple Python script to collect all the
<indexterm> tags (or just use
grep for a crude solution), or you could implement an alternative DocBook XSL stylesheet that collects the
<indexterm> tags and dumps them somewhere. I believe the former is the easier option. You can also try parsing
ix01.html with a lenient HTML parser like BeautifulSoup in Python.
Or, we could hack
doxrox.py further to create a list of index terms for us.