Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

switch instagram to scraping due to new permission policy :( #603

Closed
11 tasks done
snarfed opened this issue Jan 15, 2016 · 38 comments
Closed
11 tasks done

switch instagram to scraping due to new permission policy :( #603

snarfed opened this issue Jan 15, 2016 · 38 comments
Labels

Comments

@snarfed
Copy link
Owner

snarfed commented Jan 15, 2016

Instagram is locking down their API and requiring all apps to go through a review process similar to facebook's. details in snarfed/granary#65.

they're mainly locking down /users/self/feed and /media/popular and sending photos outside of instagram, neither of which bridgy does, so i think we'll be ok, but no guarantees.

TODO for switching to scraping:

  • poll
  • mf2 handlers. (added scraping support to get_comment, get_like, etc.)
  • signup. started in the [scrape_instagram](https://github.com/snarfed/bridgy/tree/scrape_instagram branch) branch.
    • if their account is protected, complain and don't finish signup.
  • cron job that updates profile pictures.
  • figure out backward compatibility for existing accounts. poll/propagate work ok, but...
    • data migration: remove the publish feature from all existing accts so that when they delete listen, their acct disappears correctly.
    • delete. indieauth into the first website we have for them. if they don't have any, make them add one and re-login first.
  • cache comment and like counts? like we already do for twitter and G+, so that we only fetch individual photo pages when we need to. should help keep our load lower and off IG's rader for a while.
  • handle /instagram/bret.io. he evidently changed his username from bret.io to uhhyeahbret, but we don't periodically refetch profiles (re-fetch silo profiles periodically to pick up new homepage links #304), so we didn't notice. plus username is the datastore entity key id, so it's tough to change. easiest answer will be to ask him to sign up again after we've ported signup. (done.)
  • delete. (was briefly blocked on state isn't quoted properly in embedded JS aaronpk/IndieAuth.com#113.)
@snarfed snarfed added the now label Jan 15, 2016
@snarfed
Copy link
Owner Author

snarfed commented Jan 15, 2016

the new set of oauth scopes aka permissions is on https://www.instagram.com/developer/authorization/ :

  • basic - to read a user’s profile info and media (granted by default)
  • public_content - to read any public profile info and media on a user’s behalf
  • follower_list - to read the list of followers and followed-by users
  • comments - to post and delete comments on a user’s behalf
  • relationships - to follow and unfollow accounts on a user’s behalf
  • likes - to like and unlike media on a user’s behalf

@snarfed
Copy link
Owner Author

snarfed commented Jan 15, 2016

i started on the review process, but stopped when i saw it requires a screencast. ugh.

i'll do that eventually. here's the rest of what i have written up so far:

https://www.instagram.com/developer/clients/580be8883446443d8216ebdf0462f3b8/review/

1. Description

Got a blog? Do you post your public Instagram photos on your blog? Bridgy notifies your blog posts when people like or comment on your photos on Instagram.

2. How does your app use the Instagram API?

Bridgy helps individual users share their own content with their own web sites. Specifically, when a user posts a photo on their own web site (by any means) as well as Instagram, Bridgy notifies their web site when people like that photo or comment on it inside Instagram. This requires the basic permission.

Bridgy only operates on public accounts. It does not support private accounts.

Bridgy also has a publish feature that integrates with users' web sites in the other direction. Users can post on their web site that they like an Instagram photo, or have a comment on it, and they can then use Bridgy to post that comment or like that photo inside Instagram. These require the likes permission, which Bridgy currently has, and the comments permission, which it doesn't.

3. Do you need additional permissions?

Permission: likes

Users can post on their web site that they like an Instagram photo. They can then use Bridgy to like that photo inside Instagram.

Permission: comments

Users can post on their web site a comment on an Instagram photo. They can then use Bridgy to post that comment on that photo inside Instagram.

@snarfed
Copy link
Owner Author

snarfed commented Jan 19, 2016

made the screencast: https://youtu.be/eGMNItivBdY

@snarfed
Copy link
Owner Author

snarfed commented Jan 19, 2016

...and submitted to instagram for approval. fingers crossed! https://www.instagram.com/developer/clients/580be8883446443d8216ebdf0462f3b8/edit/#permissions

@snarfed
Copy link
Owner Author

snarfed commented Jan 22, 2016

they denied us. :(

Invalid Use Case: The use case described in your submission notes, screencast and website is not a valid use case that we allow on our Platform. Please see our Permissions Review and valid use cases description (https://www.instagram.com/developer/review/) for more information.

well. that's a problem.

they also denied commenting and liking, which is a bit less surprising, and due to a technicality: we didn't describe our use case well. meh.

This permission (comments) does not support the use case you described in your submission notes, screencast and website. Please review Login Permissions (http://instagram.com/developer/authorization) for a comprehensive list of permissions and valid use cases.
likes:

This permission (likes) does not support the use case you described in your submission notes, screencast and website. Please review Login Permissions (http://instagram.com/developer/authorization/) for a comprehensive list of permissions and valid use cases.

@snarfed
Copy link
Owner Author

snarfed commented Jan 22, 2016

next step: apply for oauth-dropins and see if i can get it approved. not holding my breath, but i'd like to find at least one app i can get approved, just to see how the process works all the way through.

@snarfed
Copy link
Owner Author

snarfed commented Jan 24, 2016

done. fingers crossed!

@snarfed
Copy link
Owner Author

snarfed commented Jan 26, 2016

oauth-dropins got rejected too. :/

Still in Development: Your app is still in development. Please resubmit only when your app is ready to go live and no longer in development.
Invalid Use Case: The use case described in your submission notes, screencast and website is not a valid use case that we allow on our Platform.

@snarfed
Copy link
Owner Author

snarfed commented Jan 26, 2016

i'm running out of ideas. i may have to start scraping. :/

@snarfed snarfed changed the title submit to instagram's new app review/sandbox process handle instagram's new app permissioning Feb 1, 2016
@snarfed snarfed changed the title handle instagram's new app permissioning handle instagram's new permission policy Feb 1, 2016
@snarfed
Copy link
Owner Author

snarfed commented Feb 5, 2016

i took a brief look at what it would take to switch to scraping. the good news is, it's doable. instagram profile and photo pages happily serve without being logged in, and the data is easily available in JSON that we already have code to extract and parse.

the bad news is, profile pages only include counts of comments and likes for each photo, not the actual data about them. we'd have to fetch the individual photo pages to get the data. annoying, but not too bad. we already do this for twitter and google+.

the more worrisome part is that comments and likes are paged, so fetching the photo only gets us the first 10 of each. hrmph. if it's the most recent 10, we'll be able to backfeed at least 10 comments and likes per photo per poll period (20m right now)...but i expect some people peak above that sometimes. hrmph.

@kylewm
Copy link
Contributor

kylewm commented Feb 6, 2016

Iiiiiii'd give some serious thought to whether it's worth the effort. Because of aaronpk/OwnYourGram#16 PESOS doesn't work for many people any more anyway, and it's very likely OYG will be cut off altogether (even if he rebrands it).

I'm curious what the situation with IFTTT/Zapier/etc. integration is... whether their channels will be shut off too.

@snarfed
Copy link
Owner Author

snarfed commented Feb 6, 2016

hrmph, true. point taken.

i still posse to IG manually, so i may still do it if only for myself. we'll see.

@kylewm
Copy link
Contributor

kylewm commented Feb 6, 2016

well, if you do do it, I'll certainly continue to use it :P

snarfed added a commit to snarfed/granary that referenced this issue Feb 12, 2016
@snarfed snarfed self-assigned this Feb 21, 2016
@snarfed snarfed changed the title handle instagram's new permission policy switch instagram to scraping due to new permission policy :( Feb 21, 2016
snarfed added a commit to snarfed/granary that referenced this issue Feb 21, 2016
snarfed added a commit to snarfed/granary that referenced this issue Feb 21, 2016
snarfed added a commit to snarfed/granary that referenced this issue Feb 21, 2016
@snarfed
Copy link
Owner Author

snarfed commented Feb 21, 2016

ok, this is implemented, naively. it has to do an HTTP fetch per picture, in serial, to get comments and likes. ideally, those would be parallelized, and also cache and check the counts like G+ now (and i think twitter) so it only does the fetches when there are new comments or likes.

@snarfed
Copy link
Owner Author

snarfed commented Apr 3, 2016

current plan for deleting legacy API accounts is that we'll indieauth into their first web site in domain_urls, which means delete won't work for accounts without any web sites. they'll need to re-login (with indieauth) first. here are those accounts:

/instagram/adamdohm
/instagram/amohd2
/instagram/andresin87
/instagram/chellebb
/instagram/debbite
/instagram/dougmckown
/instagram/eddy.arnold
/instagram/espylaub
/instagram/fck_yeah_
/instagram/fermentationfan
/instagram/hendryque
/instagram/isapien
/instagram/jamieontiveros
/instagram/johnbenson
/instagram/mathewi
/instagram/mistermaumau
/instagram/nikolnieto
/instagram/njashanmal
/instagram/photofox
/instagram/realkoyuchan
/instagram/silveradepy
/instagram/srevo
/instagram/the_timweston
/instagram/tylergillies
/instagram/zlojkashtan

@snarfed
Copy link
Owner Author

snarfed commented Apr 3, 2016

ran this in remote_api_shell to remove publish from all instagram accounts:

for i in Instagram.query(Instagram.features == 'publish'):
  i.features.remove('publish')
  i.put()

@snarfed
Copy link
Owner Author

snarfed commented Apr 3, 2016

flipped the switch! all instagram accounts are now on scraping and using indieauth for login/delete. fingers crossed!

@snarfed
Copy link
Owner Author

snarfed commented Apr 4, 2016

looking good so far. tentatively closing. woo!

@Johnathangalliano
Copy link

@snarfed May I ask how you passed the "Still in Development: Your app is still in development. Please resubmit only when your app is ready to go live and no longer in development." part? I am trying to submit my app now and I get this error back. And I can't for the life of my understand what it means. Sorry to hijack your thread but you seem to be the only one that has faced this issue.

@snarfed
Copy link
Owner Author

snarfed commented May 30, 2016

@Johnathangalliano sounds like your app is still in sandbox mode? https://www.instagram.com/developer/sandbox/

i didn't actually get approved, so i don't have more specific advice, sorry. i switched to scraping their html instead. :/

@rummykhan
Copy link

i was also doing scrapping, and it was all going very well, but suddenly my all accounts started getting limit exceeded. even when i sign in. do u have a fix for this.. and did you monitor the rate limit on different end points?
thanks

@snarfed
Copy link
Owner Author

snarfed commented Aug 15, 2016

@rummykhan if you're getting 429s, then yeah, instagram rate limits HTTP requests by IP address or subnet. i hit that at one point too. lots of details in #665 and https://groups.google.com/d/msg/google-appengine/rpendSIxJMo/_u4G6uXiBQAJ .

@rummykhan
Copy link

rummykhan commented Aug 15, 2016

thanks @snarfed and yea i was getting response code 429, today i did some testing and what i found is here..
maybe it'll help somebody.

Instagram Scrapping WORKAROUND

Tests:

2.  Get Posts of a user


    Test # 1 (Instgram Form Auth - Account 1)
    ------------------------------
        Login Status = Success

        Minutes     = 2:42
        Seconds     = 162
        Requests    = 354

        After this got response code 429 (Limit Exceeded)


    Test # 2 (Instgram Form Auth - Account 2)
    ------------------------------
        Login Status = Success

        Minutes     = 3:11
        Seconds     = 191
        Requests    = 400

        After this got response code 429 (Limit Exceeded)


    Test # 3 (Instgram Form Auth - Account 3)
    ------------------------------
        Login Status = Fail (Asked for email/phone verification)

        Minutes     = 3:13
        Seconds     = 182
        Requests    = 393

        After this got response code 429 (Limit Exceeded)

        Observation
        -----------
        1. We can get the user posts without being logged in.


    Test # 4 (No Auth - Time Delay 1 Second)
    ------------------------------
        Minutes     = 173
        Seconds     = 10438
        Requests    = 7051

        State: Stopped intentionally

Key Observation

  1. Requests Counts are ip based (which previously i thought are user based.)

Solution


  1. Use Proxies to avoid rate limiting. (Change the proxy as you receive 429)
  2. To Enhance speed Use Python multiprocessing with proxy chaining.

@jgozal
Copy link

jgozal commented Sep 27, 2016

If I may ask, about how many requests were you guys making per hour before hitting that limit?

@snarfed
Copy link
Owner Author

snarfed commented Sep 27, 2016

sure! details on request volume above and in https://groups.google.com/d/msg/google-appengine/rpendSIxJMo/_u4G6uXiBQAJ :

  • i saw it after doing ~.1qps for a week or two
  • a Google engineer saw it after doing 100qps for 15s
  • @rummykhan saw it after doing ~2qps for a few min

@jgozal
Copy link

jgozal commented Sep 27, 2016

Thanks @snarfed . I'm making ~.7qps non-stop for the whole day and haven't gotten any 429s (its only been 3-4 days). Do you think I should be concerned about getting them in the future at that rate?

@snarfed
Copy link
Owner Author

snarfed commented Sep 27, 2016

based on this data, maybe yes, within weeks. good luck though!

@harshdamaniahd
Copy link

Do I have to go through app approval process even for fetching photos from my Instagram account?

@snarfed
Copy link
Owner Author

snarfed commented Jul 20, 2018

@harshdamaniahd to write your own Instagram app? yes, and good luck even then.

@harshdamaniahd
Copy link

but here i see this : which means for non business account , we are redirected to old developer site
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants