Web Scraping Tool with Node.js and Express

Web Scraping Tool with Node.js and Express

ยท

2 min read

Web scraping involves extracting data from websites, and building a web scraping tool using Node.js and Express requires a combination of server-side and client-side logic. Below is a simplified example to help you get started.

Step 1: Set Up Your Project

  1. Create a new project folder:

     mkdir web-scraping-tool
     cd web-scraping-tool
    
  2. Initialize a new Node.js project:

     npm init -y
    
  3. Install necessary dependencies:

     npm install express axios cheerio
    

Step 2: Create Your Folder Structure

Create the following folder structure:

web-scraping-tool/
|-- src/
|   |-- routes/
|       |-- scraper.js
|   |-- app.js
|-- server.js

Step 3: Set Up Express Server

In app.js, set up an Express server:

const express = require('express');
const scraperRoutes = require('./routes/scraper');

const app = express();

app.use(express.json());
app.use('/scraper', scraperRoutes);

module.exports = app;

Step 4: Create Routes for Web Scraping

In routes/scraper.js, set up routes for the web scraping tool:

const express = require('express');
const router = express.Router();
const axios = require('axios');
const cheerio = require('cheerio');

router.post('/scrape', async (req, res) => {
  const { url } = req.body;

  try {
    const response = await axios.get(url);
    const html = response.data;
    const $ = cheerio.load(html);

    // Example: Extracting title and links
    const title = $('head title').text();
    const links = [];
    $('a').each((index, element) => {
      links.push($(element).attr('href'));
    });

    res.json({ title, links });
  } catch (error) {
    console.error(error);
    res.status(500).json({ error: 'Error scraping data. Please try again.' });
  }
});

module.exports = router;

Step 5: Run Your Application

In server.js, use the exported app to start the application:

const app = require('./src/app');

const PORT = process.env.PORT || 3000;

app.listen(PORT, () => {
  console.log(`Server is running on http://localhost:${PORT}`);
});

Start the server:

node server.js

Step 6: Test Web Scraping

Use a tool like Postman or Insomnia to send a POST request to http://localhost:3000/scraper/scrape with a JSON body containing the website URL:

{
  "url": "https://example.com"
}

You should receive a JSON response with the extracted data (title and links).

Congratulations! You've created a simple web scraping tool using Node.js, Express, Axios, and Cheerio. Note that web scraping should be done responsibly and within legal and ethical boundaries. Always check and respect a website's terms of service and robots.txt file.

Did you find this article valuable?

Support Revive Coding by becoming a sponsor. Any amount is appreciated!

ย