I try to extract latex equation formulas from a HTML page (generated with latex2html
) in order to replace latex equations images by mathjax
formulas.
First, I had the following idea, here's an example :
Input :
<div align="CENTER" class="mathdisplay"><a name="eq402"></a><!-- MATH
\begin{equation}
\text{d}\,v_{k}=\partial_{j}\,v_{k}\,\dfrac{\text{d}\,y^{j}}{\text{d}\,s}\,\text{d}\,s
\end{equation}
-->
<table class="equation" cellpadding="0" width="100%" align="CENTER">
<tr valign="MIDDLE">
<td nowrap align="CENTER"><span class="MATH">d<img width="150" height="65" align="MIDDLE" border="0" src="img1919.gif" alt="$\displaystyle \,v_{k}=\partial_{j}\,v_{k}\,\dfrac{\text{d}\,y^{j}}{\text{d}\,s}\,\text{d}\,s$"></span></td>
<td nowrap class="eqno" width="10" align="RIGHT">
(<span class="arabic">5</span>.<span class="arabic">65</span>)</td></tr>
</table></div>
By inserting the following javascript code at the bottom of the HTML page :
<script type="text/javascript">
function transform() {
[].forEach.call(document.querySelectorAll('table tr img'),function(img) {
var puretext = img.getAttribute('alt');
if(!puretext || puretext == 'up' || puretext == 'previous' || puretext == 'next' || puretext == 'contents') return;
puretext = puretext.replace(/..displaystyle /g,"$");
var text = document.createTextNode(puretext);
img.parentNode.insertBefore(text, img);
img.style.display = 'none';
});
}
transform();
</script>
I get the following rendering on my HTML page, i.e I have the mathjax formulae :
$\,v_{k}=\partial_{j}\,v_{k}\,\dfrac{\text{d}\,y^{j}}{\text{d}\,s}\,\text{d}\,s$
It could be enough but I noticed that sometimes, into the HTML page, I have for "alt
" attribute an incomplete formulae, here is an example :
<div align="CENTER" class="mathdisplay"><a name="eq407"></a><!-- MATH
\begin{equation}
\text{d}\,(\mathbf{V}\,\cdot\,\mathbf{n})=\mathbf{V_{M}}(M')\,\cdot\,\mathbf{n}-\mathbf{V}(M)\,\cdot\,\mathbf{n}=[\mathbf{V_{M}}(M')-\mathbf{V}(M)]\,\cdot\,\mathbf{n}=\text{d}\,\mathbf{V}\,\cdot\,\mathbf{n}
\end{equation}
-->
<table class="equation" cellpadding="0" width="100%" align="CENTER">
<tr valign="MIDDLE">
<td nowrap align="CENTER"><span class="MATH">d<img width="538" height="38" align="MIDDLE" border="0" src="img1929.gif" alt="$\displaystyle \,(\mathbf{V}\,\cdot\,\mathbf{n})=\mathbf{V_{M}}(M')\,\cdot\,\mat...
...V}(M)\,\cdot\,\mathbf{n}=[\mathbf{V_{M}}(M')-\mathbf{V}(M)]\,\cdot\,\mathbf{n}=$">d<img width="56" height="34" align="MIDDLE" border="0" src="img1930.gif" alt="$\displaystyle \,\mathbf{V}\,\cdot\,\mathbf{n}$"></span></td>
<td nowrap class="eqno" width="10" align="RIGHT">
(<span class="arabic">5</span>.<span class="arabic">70</span>)</td></tr>
</table></div>
As you can see, I have for "alt
" attribute of <img
:
$\displaystyle \,(\mathbf{V}\,\cdot\,\mathbf{n})=\mathbf{V_{M}}(M')\,\cdot\,\mat... ...V}(M)\,\cdot\,\mathbf{n}=[\mathbf{V_{M}}(M')-\mathbf{V}(M)]\,\cdot\,\mathbf{n}=$
The entire latex equation has not been generated by latex2html
(see ... characters)
So I can't always deal with the img alt
attribute and I would like to use the \begin{equation} ... \end{equation}
block which is into HTML comments tag ( <!-- ... -->
)
How can I get this comments block
with querySelectorAll
? does it exist for example a document.querySelectorAll('div.mathdisplay a comments'),function(comments) {
or something like this which could allow to extract this block of comments ?
If I could get this text block, I would save it into a variable and insert it, as I did with my first idea, before the img tag, like this :
var text = document.createTextNode(puretext);
img.parentNode.insertBefore(text, img);
img.style.display = 'none';
Any help would be nice
Aucun commentaire:
Enregistrer un commentaire