Spletno mesto bo delovalo z omejenimi zmožnostmi, medtem ko na njem izvajamo vzdrževalna dela za vas. Če članki ne rešijo vaše težave in želite zastaviti vprašanje, naša skupnost za podporo čaka na vas na @FirefoxSupport na Twitterju in na /r/firefox na Redditu.

Iskanje po podpori

Izogibajte se prevarantski tehnični podpori. Nikoli vam ne bomo naročili, da pokličete telefonsko številko ali nam pošljete osebne podatke. Sumljivo dejavnost prijavite z gumbom »Prijavi zlorabo«.

Več o tem

Simplest way to extract text content regularly from HTML?

more options

When I call wget on a webpage, I get raw HTML in response. I would like to write a simple parsing script which extracts the main article text content from web pages with similar structure, i.e. different documentation articles about Microsoft Visual Basic for Applications. I am pretty sure I should inspect the HTML tree, figure out which nodes tend to contain article headers and paragraphs, and then just write a script that retrieves those nodes. What would be the simplest way to inspect the HTML tree to find the nodes, and then with which library should I extract the text content from those nodes? Thank you

When I call wget on a webpage, I get raw HTML in response. I would like to write a simple parsing script which extracts the main article text content from web pages with similar structure, i.e. different documentation articles about Microsoft Visual Basic for Applications. I am pretty sure I should inspect the HTML tree, figure out which nodes tend to contain article headers and paragraphs, and then just write a script that retrieves those nodes. What would be the simplest way to inspect the HTML tree to find the nodes, and then with which library should I extract the text content from those nodes? Thank you

Vsi odgovori (2)

more options

Are you saying the webpage is not loading properly?

Load the web page. Then, to reload the page bypassing the cache and force a fresh retrieval; Ctrl+Shift+R (Mac=Command+Shift+R)

Try this several times.


https://support.mozilla.org/en-US/kb/view-web-pages-reader-view-firefox-ios View web pages in Reader View

Reader View in Firefox for iOS strips away images, ads, videos and menus from a web page, so you can focus on reading. Reader View is available for articles, blog posts and other web pages that can be simplified.

more options

This question is beyond the scope of Firefox support. You could consider Stack Overflow, or if this is specific to Microsoft tooling, one of their developer forums.

If you are using J(ava)Script, you could look at the Readability library, which is the foundation for Firefox's Reader View feature. It has some code to "guess" the important parts of a page:

https://github.com/mozilla/readability

(I have always found it difficult to follow, but you may have better code-reading skills than me.)