- Jun 6, 2020
- 11,517
- 4,383
MiniHtmlParser is a cross platform class that parses html strings and creates a tree with the various elements.
It is a less powerful alternative to jTidy or jSoup, however it is simple to use, cross platform and as it is implemented in B4X it can be extended quite easily.
Note that many real-world html pages are not 100% valid. The parser tries to handle a few cases, far from browsers which can handle many common html problems.
The example demonstrates how to use the parser.
It parses the html saved from:
Depends on B4XCollections.
Latest version is attached separately.
Updates:
- 0.93 - Fixes an issue with whitespace characters being removed too aggressively.
- 0.92 - Unescapes more entities including entities written with the unicode code point, e.g. ℵ
- 0.91 - Fixes an issue with text after the last element.
It is a less powerful alternative to jTidy or jSoup, however it is simple to use, cross platform and as it is implemented in B4X it can be extended quite easily.
Note that many real-world html pages are not 100% valid. The parser tries to handle a few cases, far from browsers which can handle many common html problems.
The example demonstrates how to use the parser.
It parses the html saved from:
برای دیدن لینک ها باید ثبت نام کنید
and finds the rates from the top 10 currencies. This is only done as an example.Depends on B4XCollections.
Latest version is attached separately.
Updates:
- 0.93 - Fixes an issue with whitespace characters being removed too aggressively.
- 0.92 - Unescapes more entities including entities written with the unicode code point, e.g. ℵ
- 0.91 - Fixes an issue with text after the last element.
برای دیدن لینک ها باید ثبت نام کنید