Ruby on Rails Wednesday, January 2, 2019

If you get "Connection reset by peer" while scraping a website, it is very likely that your scraping attempts were detected and automatically blocked for a while. They might have also been alerted by all of these malformed requests.

Consider pacing out your requests so that it doesn't look like you are scraping.

On Mon, Dec 31, 2018 at 7:58 AM fugee ohu <fugee279@gmail.com> wrote:


On Monday, December 31, 2018 at 7:25:13 AM UTC-5, Colin Law wrote:
On Mon, 31 Dec 2018 at 12:12, fugee ohu <fuge...@gmail.com> wrote:
>
>
>
> On Sunday, December 30, 2018 at 10:56:30 AM UTC-5, Colin Law wrote:
>>
>> On Sun, 30 Dec 2018 at 15:45, fugee ohu <fuge...@gmail.com> wrote:
>> >
>> >
>> >
>> > On Sunday, December 30, 2018 at 8:36:40 AM UTC-5, Colin Law wrote:
>> >>
>> >> On Sun, 30 Dec 2018 at 12:06, fugee ohu <fuge...@gmail.com> wrote:
>> >> > ...
>> >> > It's not json it's javascript so I don't have to run JSON.parse
>> >>
>> >> {"success":true,"code":0,"results":[{"productId":32815555905, ...
>> >> Looks like JSON to me (embedded in js admittedly).  You said that the
>> >> data you want is in that string.  If that is correct then all you have
>> >> to do is to extract it and parse it as JSON.
>> >>
>> >> Colin
>> >
>> >
>> > It's not json it's javascript
>>
>> What is it about the string
>> {"success":true,"code":0,"results":[{"productId":32815555905, ...
>> that makes it not JSON?
>>
>> Colin
>
>
> Now I only get connection reset by peer when I try to make the request

Can't help you there, presumably either the website has changed or you
have changed the way you are fetching it.  Try the url in a browser.

Colin

 I changed my code and now getting a text/html content type response Not sure what I'm doing I commented out my previous creation of http object and used an inline syntax that's part of the creation of res object

require "net/http"
require "uri"
url = URI.parse("https://www.ali<notshown>.com/item/Robotic-Vacuum-Cleaner-Proscenic-790T-Vacuum-Mop-Sweep-3-in-1-Cleaner-for-Pet-Hair-Wifi/32840149410.html?spm=2114.search0104.3.1.24d566b6GAD2uI&ws_ab_test=searchweb0_0,searchweb201602_1_10065_10068_10130_10890_10547_319_10546_317_10548_5730311_10545_10696_453_10084_454_10083_5729211_10618_10307_538_537_536_10059_10884_10887_100031_321_322_10103-10890,searchweb201603_51,ppcSwitch_0&algo_expid=99dc32b9-d1ce-4020-8bec-624c18225f44-0&algo_pvid=99dc32b9-d1ce-4020-8bec-624c18225f44")
#http = Net::HTTP.new(url.host, url.port)
#http.use_ssl = true
req = Net::HTTP::Get.new url 
res = Net::HTTP.start(url.host, url.port, :use_ssl => url.scheme == 'https') {|http| http.request req}
puts res.body

--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rubyonrails-talk+unsubscribe@googlegroups.com.
To post to this group, send email to rubyonrails-talk@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/638bfc50-c217-4a52-892e-a3302b537f47%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rubyonrails-talk+unsubscribe@googlegroups.com.
To post to this group, send email to rubyonrails-talk@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/CALn2xuD6pOYcjGXwnqJ_w6iDAOcAgvZVd4e6uAAwg9-Yf9GY1g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

No comments:

Post a Comment