Scrape Pinterest Data to CSV

Scrape Pinterest Pin Data UBot Studio

For this task to scrape Pinterest Data to CSV file I want to navigate to a Pinterest search eg: https://uk.pinterest.com/search/pins/?q=orange%20trees

I then wrote a bot to scrape data from a specified number of pins. The data the bot will scrape is the Pin URL, the Pin Description and the Number of Likes the Pin has.

I will then save this data into a CSV file and download it to a folder on my computer.

 

To accomplish this task I’m going to use UBot Studio. There are many ways to achieve this and even within UBot Studio there a multitude of ways this can be tackled. In this instance I’ve opted to use the built in UBot browser.

setup csv()
Go to URL()
scrape to csv()
divider
define Go to URL {
navigate(#searchURL,"Wait")
wait($rand(7,9))
}
define setup csv {
set(#row,0,"Global")
set table cell(&results,#row,0,"Pin URL")
set table cell(&results,#row,1,"Pin Title")
set table cell(&results,#row,2,"Number of Likes")
}
define scrape to csv {
clear list(%innertext)
wait(0.5)
add list to list(%innertext,$scrape attribute(<class="pinWrapper ">,"outerhtml"),"Delete","Global")
wait(0.75)
set(#mainURL,$find regular expression($url,"^(http(s)?://)?[^/]+"),"Global")
wait(1)
set(#row,1,"Global")
loop(#noPins) {
set(#pin,$next list item(%innertext),"Global")
wait(1)
load html(#pin)
wait(1)
set table cell(&results,#row,0,"{#mainURL}{$scrape attribute(<href=w"/pin/*">,"href")}")
set table cell(&results,#row,1,$scrape attribute(<class="pinDescription">,"innertext"))
set table cell(&results,#row,2,$replace($scrape attribute(<class="socialMetaCount repinCountSmall">,"innertext"),"saves",$nothing))
wait(1)
increment(#row)
}
wait(1)
save to file(#fileLocation,&results)
}
divider
ui text box("Search Page URL:",#searchURL)
ui text box("Number of Pins:",#noPins)
ui save file("Save to File Location: ",#fileLocation)

I started with a define called ‘setup csv’ – this simply sets the title for each column in the CSV file.

The second define is ‘Go to URL’ – this navigates to the URL specified in the user interface by the variable #searchURL

Finally the last define is called ‘scrape to csv’ and this is where the magic happens.  Firstly I ‘clear list’ to ensure no data was accidentally stored in there.

Then I create a list using ‘add list to list’ – this scrapes the outerhtml from all the visible pins on the page, each one is on a new line in the list.

I then set a variable called #mainURL which gets the root of the domain (it may be Pinterest UK or Pinterest USA etc)

Then I set the #row variable to 1 so that it starts populating the csv on the second row, which is the row below the titles.

Now it’s time to enter the loop. The loop will increment in relation to the number of pins required, as specified in the user interface with variable #noPins.

In each instance of the loop the bot loads the html from the next list item. It then scrapes the data and three table cells are set, which populates the next row of the table. The first table cell is the pin URL. It places the root domain as per the variable #mainURL and completes it by scraping the href for the pin.

The second table cell is the pin Description and the third table cell is the number of likes. I included a $replace function here to remove the word ‘saves’.

Once the loop has completed then the table is downloaded to the specified folder and saved as a .csv file.

 

NOTE:

This code is for information purposes only. It is most likely against the Pinterest terms of service to scrape their data. Web scraping is a legal grey area with many factors to consider. An alternative to scraping is to use an API which already does what you need to do.

Be the first to comment

Leave a Reply

Your email address will not be published.


*