Scraping Instagram Profile Data with NodeJs

I’m going to give you a quick and nice snippet to use for fun or for your projects,

and I’m talking about getting basic informations of any Instagram profiles, even private accounts.

Table of contents

Lets get right into it and see what’s next.

How are we doing this?

Screen scraping and dealing with an Automated browser like Puppeteer with NodeJs can be problematic and it is not efficient in this use case.

Why is that?

It’s because Instagram is actively rendering each page that we go to and scraping it’s dynamic html content can be pretty difficult with a platform that constantly pushes changes to it’s website.

So, here are the downsides of using a Browser API to do this:

  • slow because of multiple images and javascript files loading in order to generate the view
  • not reliable because of constant changes in the structures of the html, therefor selectors needing to be constantly updated to make sure it works, therefor more maintenance required.

Which is the better option?

Now that we excluded scraping these data with Puppeteer or any other automated browsers, we can talk about the proper method for doing this.

Firstly, you need to have some background on why we choose this option, so here is it:

What you can see in this image is exactly what Instagram has into it’s html content BEFORE rendering the actual page.

So if you go to your IG Account and right click and View Source on that page, you can search and find in it this exact JSON object inside of a <script> tag.

So, with this being said..

Programming Steps with NodeJs

How are we going to make use of this data with our NodeJs scraper?

  1. Use the Request library to send direct requests to the instagram page
  2. Use the exact HTML content response ( before rendering ) to send it to Cheerio for parsing
  3. Use Cheerio library to navigate and get the exact content that contains the JSON object with data of the user in it
  4. Use Regex magic to get it from the variable
  5. Traverse through the JSON and get the needed details that you want.

Also make sure that you’ve read my previous blog post on 4 Easy Steps to Web Scraping with NodeJs if you want to have a little bit of more information on how to start and what are we actually using.

Full Code

I’m just gonna put this right here so that you don’t have to scroll to the bottom to get the actual code that I’m talking about.

And if you want some more details about this, continue scrolling..

const request = require('request-promise');
const cheerio = require('cheerio');

/* Create the base function to be ran */
const start = async () => {
    /* Here you replace the username with your actual instagram username that you want to check */
    const USERNAME = 'motivational.coder';
    const BASE_URL = `https://www.instagram.com/${USERNAME}/`;

    /* Send the request and get the html content */
    let response = await request(
        BASE_URL,
        {
            'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
            'accept-encoding': 'gzip, deflate, br',
            'accept-language': 'en-US,en;q=0.9,fr;q=0.8,ro;q=0.7,ru;q=0.6,la;q=0.5,pt;q=0.4,de;q=0.3',
            'cache-control': 'max-age=0',
            'upgrade-insecure-requests': '1',
            'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'
        }
    );
    
    /* Initiate Cheerio with the response */
    let $ = cheerio.load(response);
    
    /* Get the proper script of the html page which contains the json */
    let script = $('script').eq(4).html();
    
    /* Traverse through the JSON of instagram response */
    let { entry_data: { ProfilePage : {[0] : { graphql : {user} }} } } = JSON.parse(/window\._sharedData = (.+);/g.exec(script)[1]);
    
    /* Output the data */
    console.log(user);

    debugger;
}

start();

Keep in mind, this may work now but instagram can always change it’s structure and break something, so use it with care.

Running the code

I’m running the code with VSCode, because as I’ve said in my other blog posts: I like it because it has the NodeJs debugger coming to it by default and it is on point.

Now, after running the actual code from above you should get a big JSON variable named data that will hold everything related to that instagram user’s statistics, like: total followers, total followings, uploads, description, full name.

the actual result when running the code

Make sure to read my other blog post to see how I ran this and other interesting details with another example.

Want to learn more?

Also if you want to learn more and go much more in-depth with the downloading of files, I have a great course with more  hours of secret content on web scraping with nodejs.

You’re in for a treat! Get 95% Off my first course on Udemy ( Make sure you’re fast because it is a LIMITED offer )

Leave a Reply

Your email address will not be published. Required fields are marked *