Dart native package inspired by Beautiful Soup 4 Python library. Provides easy ways of navigating, searching, and modifying the HTML tree.
A simple usage example:
import 'package:beautiful_soup_dart/beautiful_soup.dart';
/// 1. parse a document String
BeautifulSoup bs = BeautifulSoup(html_doc_string);
// use BeautifulSoup.fragment(html_doc_string) if you parse a part of html
/// 2. navigate quickly to any element
bs.body!.a!; // navigate quickly with tags, use outerHtml or toString to get outer html
bs.find('p', class_: 'story'); // finds first element with html tag "p" and which has "class" attribute with value "story"
bs.findAll('a', attrs: {'class': true}); // finds all elements with html tag "a" and which have defined "class" attribute with whatever value
bs.find('', selector: '#link1'); // find with custom CSS selector (other parameters are ignored)
bs.find('*', id: 'link1'); // any element with id "link1"
bs.find('*', regex: r'^b'); // find any element which tag starts with "b", for example: body, b, ...
bs.find('p', string: r'^Article #\d*'); // find "p" element which text starts with "Article #[number]"
bs.find('a', attrs: {'href': 'http://example.com/elsie'}); // finds by "href" attribute
/// 3. perform any other actions for the navigated element
Bs4Element bs4 = bs.body!.p!; // navigate quickly with tags
bs4.name; // get tag name
bs4.string; // get text
bs4.toString(); // get String representation of this element, same as outerHtml
bs4.innerHtml; // get html elements inside the element
bs4.className; // get class attribute value
bs4['class']; // get class attribute value
bs4['class'] = 'board'; // change class attribute value to 'board'
bs4.children; // get all element's children elements
bs4.replaceWith(otherBs4Element); // replace with other element
... and many more
Check test
folder for more examples.
The unlinked titles are not yet implemented.
- Navigating the tree
- Going down
- Going up
- Going sideways
- Going back and forth
- .nextElement and .previousElement - returns next/previous Bs4Element
- .nextElements and .previousElements
- .nextParsed and .previousParsed - returns next/previous any parsed Node (doc comments, tags, text), to get its data as String use
node.data
- .nextParsedAll and .previousParsedAll
- Searching the tree
- findFirstAny() - returns the top most (first) element of the parse tree, of any tag type
- findAll()
- find()
- findParents() and findParent()
- findNextSiblings() and findNextSibling()
- findPreviousSiblings() and findPreviousSibling()
- findAllNextElements() and findNextElement()
- findAllPreviousElements() and findPreviousElement()
- findNextParsedAll() and findNextParsed()
- findPreviousParsedAll() and findPreviousParsed()
- Modifying the tree
- Output
- prettify() - partial support
- .text and getText()
Other methods from the Element
from html package
can be accessed via bs4element.element
.
Please file feature requests and bugs at the issue tracker or feel free to raise a PR.