ericchiang/pup - stats on ReviewGithub

HTML Go Other

This is stars and forks stats for /ericchiang/pup repository. As of 11 May, 2024 this repository has 7842 stars and 281 forks.

pup pup is a command line tool for processing HTML. It reads from stdin, prints to stdout, and allows the user to filter parts of the page using CSS selectors. Inspired by jq, pup aims to be a fast and flexible way of exploring HTML from the terminal. Install Direct downloads are available through the releases page. If you have Go installed on your computer just run go get. go get github.com/ericchiang/pup If you're on OS X, use Homebrew to install (no Go required). brew install https://raw.githubusercontent.com/EricChiang/pup/master/pup.rb Quick start $ curl -s https://news.ycombinator.com/ Ew, HTML. Let's run that through some pup selectors: $ curl -s https://news.ycombinator.com/ | pup 'table table tr:nth-last-of-type(n+2) td.title a' Okay, how about only the links? $ curl -s https://news.ycombinator.com/ | pup 'table table tr:nth-last-of-type(n+2) td.title a attr{href}' Even better, let's grab the titles too: $ curl -s https://news.ycombinator.com/ | pup 'table table tr:nth-last-of-type(n+2) td.title a json{}' Basic Usage $ cat index.html | pup [flags] '[selectors] [display function]' Examples Download a webpage with wget. $ wget http://en.wikipedia.org/wiki/Robots_exclusion_standard -O robots.html Clean and indent By default pup will fill in missing tags and properly indent the page. $ cat robots.html # nasty looking HTML $ cat robots.html | pup --color # cleaned, indented, and colorful HTML Filter by tag $ cat robots.html | pup 'title' <title> Robots exclusion standard - Wikipedia, the free encyclopedia </title> Filter by id $ cat robots.html | pup 'span#See_also' See also Filter by attribute $ cat robots.html | pup 'th[scope="row"]' <th scope="row" class="navbox-group"> Exclusion standards </th> <th scope="row" class="navbox-group"> Related marketing topics </th> <th scope="row" class="navbox-group"> Search marketing related topics </th> <th scope="row" class="navbox-group"> Search engine spam </th> <th scope="row" class="navbox-group"> Linking </th> <th scope="row" class="navbox-group"> People </th> <th scope="row" class="navbox-group"> Other </th> Pseudo Classes CSS selectors have a group of specifiers called "pseudo classes" which are pretty cool. pup implements a majority of the relevant ones them. Here are some examples. $ cat robots.html | pup 'a[rel]:empty' <a rel="license" href="//creativecommons.org/licenses/by-sa/3.0/" style="display:none;"> </a> $ cat robots.html | pup ':contains("History")' History History $ cat robots.html | pup ':parent-of([action="edit"])' <a action="edit" href="//www.wikidata.org/wiki/Q80776#sitelinks-wikipedia" text="Edit links" title="Edit interlanguage links" class="wbc-editpage"> Edit links </a> For a complete list, view the implemented selectors section. +, >, and , These are intermediate characters that declare special instructions. For instance, a comma , allows pup to specify multiple groups of selectors. $ cat robots.html | pup 'title, h1 span[dir="auto"]' <title> Robots exclusion standard - Wikipedia, the free encyclopedia </title> Robots exclusion standard Chain selectors together When combining selectors, the HTML nodes selected by the previous selector will be passed to the next ones. $ cat robots.html | pup 'h1#firstHeading' <h1 id="firstHeading" class="firstHeading" lang="en"> Robots exclusion standard </h1> $ cat robots.html | pup 'h1#firstHeading span' Robots exclusion standard Implemented Selectors For further examples of these selectors head over to MDN. pup '.class' pup '#id' pup 'element' pup 'selector + selector' pup 'selector > selector' pup '[attribute]' pup '[attribute="value"]' pup '[attribute*="value"]' pup '[attribute~="value"]' pup '[attribute^="value"]' pup '[attribute$="value"]' pup ':empty' pup ':first-child' pup ':first-of-type' pup ':last-child' pup ':last-of-type' pup ':only-child' pup ':only-of-type' pup ':contains("text")' pup ':nth-child(n)' pup ':nth-of-type(n)' pup ':nth-last-child(n)' pup ':nth-last-of-type(n)' pup ':not(selector)' pup ':parent-of(selector)' You can mix and match selectors as you wish. cat index.html | pup 'element#id[attribute="value"]:first-of-type' Display Functions Non-HTML selectors which effect the output type are implemented as functions which can be provided as a final argument. text{} Print all text from selected nodes and children in depth first order. $ cat robots.html | pup '.mw-headline text{}' History About the standard Disadvantages Alternatives Examples Nonstandard extensions Crawl-delay directive Allow directive Sitemap Host Universal "*" match Meta tags and headers See also References External links attr{attrkey} Print the values of all attributes with a given key from all selected nodes. $ cat robots.html | pup '.catlinks div attr{id}' mw-normal-catlinks mw-hidden-catlinks json{} Print HTML as JSON. $ cat robots.html | pup 'div#p-namespaces a' <a href="/wiki/Robots_exclusion_standard" title="View the content page [c]" accesskey="c"> Article </a> <a href="/wiki/Talk:Robots_exclusion_standard" title="Discussion about the content page [t]" accesskey="t"> Talk </a> $ cat robots.html | pup 'div#p-namespaces a json{}' [ { "accesskey": "c", "href": "/wiki/Robots_exclusion_standard", "tag": "a", "text": "Article", "title": "View the content page [c]" }, { "accesskey": "t", "href": "/wiki/Talk:Robots_exclusion_standard", "tag": "a", "text": "Talk", "title": "Discussion about the content page [t]" } ] Use the -i / --indent flag to control the intent level. $ cat robots.html | pup -i 4 'div#p-namespaces a json{}' [ { "accesskey": "c", "href": "/wiki/Robots_exclusion_standard", "tag": "a", "text": "Article", "title": "View the content page [c]" }, { "accesskey": "t", "href": "/wiki/Talk:Robots_exclusion_standard", "tag": "a", "text": "Talk", "title": "Discussion about the content page [t]" } ] If the selectors only return one element the results will be printed as a JSON object, not a list. $ cat robots.html | pup --indent 4 'title json{}' { "tag": "title", "text": "Robots exclusion standard - Wikipedia, the free encyclopedia" } Because there is no universal standard for converting HTML/XML to JSON, a method has been chosen which hopefully fits. The goal is simply to get the output of pup into a more consumable format. Flags Run pup --help for a list of further options

Read on Github Github Stats Page

repo	techs	stars	weekly	forks
darwinanddavis/worldmaps	HTMLJavaScriptOther	59	0	7
yidongnan/grpc-spring-boot-starter	JavaOther	3.1k	0	744
ShirasawaSama/CefDetectorX	JavaScriptCSSHTML	1.6k	0	25
neherlab/pangraph	JuliaPythonShell	68	0	6
MIT-LCP/mimic-code	Jupyter NotebookHTMLPython	2.1k	0	1.4k
scalameta/nvim-metals	LuaOther	372	+6	68
g-s-k/matlab-toml	MATLABOther	9	0	7
Leanplum/Leanplum-iOS-SDK	Objective-CSwiftOther	69	0	57
VSoftTechnologies/DUnitX	PascalBatchfileOther	364	0	197
voxel51/fiftyone	PythonTypeScriptJavaScript	5.2k	0	427