Node Unblocker for Web Scraping
We all hate the moments when we try to access a page and the site blocks our request for no good reason. Geoblocking, for example, can be passed only by using a proxy.
Node-unblocker helps us to create a custom proxy and have it up and running in a matter of minutes.
What is Node-Unblocker?
Node unblocker is a general-purpose library to create a web proxy, intercept and alter requests and responses.
This library is also used in web scraping for bypassing restrictions implemented by the site like geo-blocking, hiding the IP address and rate limiting, or for sending authentication tokens.
To make it shorter, while using this library, you can kiss blocked and censored content goodbye.
In this article, we create an Express application with a custom proxy using Node Unblocker, we add a middleware that changes the user agent for each request, discuss the proxy limitations, deploy it to Heroku and compare it to a managed service like WebScrapingAPI.
Prerequisites
Before we start, make sure you have the latest version of Node.JS installed. Installing Node.JS for each platform (Windows, Linux, Mac) would be the subject of a separate article so instead of going into details, check the official website and follow the instructions.
Setting things up
We start by creating a directory for our project named unblocked and we initialize a Node.JS project inside of it:
Install dependencies
For this application, we install two libraries: Express, a minimalist framework for Node.JS, and Node Unblocker.
Originally Published as Node Unblocker for Web Scraping on Web Scrapin Api Blog
Comments
Post a Comment