I'm trying to scrape this web page because there is no other way to automatically be alerted when its contents change:
Using cURL works fine to download the page (at which point its contents can be parsed), but using Invoke-WebRequest or the DownloadFile/DownloadString methods of System.Net.WebClient causes an error to arise, saying that the web server returned a 404 error.
Checking in Chrome confirms that the page always responds with a 404, but also returns content, which is what I want.
Using PowerShell 5.1, is there a way to instruct Invoke-WebRequest to ignore the spurious 404 error, or some method by which I can get the response data regardless?
1 Answer
In PowerShell 7, there is a -SkipHttpErrorCheck which will make Invoke-WebRequest behave like you want it in your usecase.
Invoke-WebRequest -SkipHttpErrorCheck -OutFile C:\install\test.htmlIn PowerShell 5.1, use curl.exe.
If you're on Windows 10 v1803 or later, curl.exe is shipped with the OS, if you're on a lower version you need to download it manually.
curl.exe --output C:\install\abc.htmlremember to specify the .exe because curl without it is just an alias of Invoke-WebRequest
If you don't want to use curl.exe, all you can do is wrapping it in try/catch and access the response data through the exception, but not really download it as a file, and without as many information as you probably would like.
Try { Invoke-WebRequest -ErrorAction Stop
} Catch { $_.Exception.Response
}
IsMutuallyAuthenticated : False
Cookies : {}
Headers : {Connection, Vary, X-Content-Type-Options, X-XSS-Protection...}
SupportsHeaders : True
ContentLength : 1123
ContentEncoding :
ContentType : text/html;charset=UTF-8
CharacterSet : UTF-8
Server :
LastModified : 08.10.2021 19:01:01
StatusCode : NotFound
StatusDescription :
ProtocolVersion : 1.1
ResponseUri :
Method : GET
IsFromCache : False 2