Img Scrapy has no result with correct xpath - xpath

I get correct links in http://tieba.baidu.com/f?kw=dota2&fr=index by Chrome Xpath Helper.
but in scrapy's spider has no result like this log:
> E:\ladder\tieba\tieba\spiders\tiebaSpiber.py:11: ScrapyDeprecationWarning: tieba.spiders.tiebaSpiber.tiebaSpider inherits from deprecated class scrapy.spiders.BaseSpider, please inherit from scrapy.spiders.Spider. (warning only on first subclass, there may be others)
class tiebaSpider(BaseSpider):
img_url:
['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']
Spider code:
class tiebaSpider(BaseSpider):
name = "tiebaSpider"
allowed_domains = ["tieba.baidu.com"]
download_delay = 1
start_urls = ["http://tieba.baidu.com/f?ie=utf-8&kw=dota2", ]
rules = (
Rule(LinkExtractor(allow=(r'http://tieba.baidu.com/f?kw=dota2&ie=utf-8&pn=')), callback='parse_tieba',
follow=True),
)
def parse_tieba(self, response):
self.log("Fetch Dota2 Tieba Page:%s" % response.url)
sel = Selector(response)
rep_num = sel.xpath('//span[#class="threadlist_rep_num center_text"]/text()').extract()
title = sel.xpath('//div[#class="threadlist_title pull_left j_th_tit "]/a/text()').extract()
author = sel.xpath('//span[#class="frs-author-name-wrap"]/a/text()').extract()
img_url = sel.xpath('//div[#class="threadlist_text pull_left"]//div[#class="small_wrap j_small_wrap"]//a[#class="thumbnail vpic_wrap"]/img/#src').extract()
item = TiebaItem()
item['rep_num'] = [n for n in rep_num]
item['title'] = [n for n in title]
item['author'] = [n for n in author]
item['img_url'] = [n for n in img_url]
print("img_url:\n")
print(img_url)
yield item

If you check what is actually received as HTML from the webserver, you notice that the src attribute of the <img> tags is empty:
$ scrapy shell 'http://tieba.baidu.com/f?kw=dota2&fr=index'
2016-10-28 11:13:58 [scrapy] INFO: Scrapy 1.2.1 started (bot: scrapybot)
2016-10-28 11:14:00 [scrapy] DEBUG: Crawled (200) <GET http://tieba.baidu.com/f?kw=dota2&fr=index> (referer: None)
>>> print(response.xpath('//div[#class="threadlist_text pull_left"]//div[#class="small_wrap j_small_wrap"]//a[#class="thumbnail vpic_wrap"]').extract_first())
<a class="thumbnail vpic_wrap"><img src="" attr="71814" data-original="http://imgsrc.baidu.com/forum/wh%3D135%2C90/sign=d25862d404d79123e0b59c759e0175bb/a92cb751f3deb48f948c9302f81f3a292ff5785e.jpg" bpic="http://imgsrc.baidu.com/forum/pic/item/a92cb751f3deb48f948c9302f81f3a292ff5785e.jpg" class="threadlist_pic j_m_pic "></a>
>>>
But you can also notice that the data-original attribute looks more interesting:
>>> from pprint import pprint
>>> pprint(response.xpath('//div[#class="threadlist_text pull_left"]//div[#class="small_wrap j_small_wrap"]//a[#class="thumbnail vpic_wrap"]/img/#data-original').extract())
[u'http://imgsrc.baidu.com/forum/wh%3D135%2C90/sign=d25862d404d79123e0b59c759e0175bb/a92cb751f3deb48f948c9302f81f3a292ff5785e.jpg',
u'http://imgsrc.baidu.com/forum/wh%3D90%2C90%3Bcrop%3D0%2C0%2C90%2C90/sign=4909678ffe246b607b5bba7ddbd4237c/9f396e094b36acafd9ddaf2074d98d1000e99c07.jpg',
u'http://imgsrc.baidu.com/forum/wh%3D90%2C180%3Bcrop%3D0%2C0%2C90%2C90/sign=6d1bc479d943ad4ba67b4ec9b22e6b97/5c2c493d269759ee89455917bafb43166c22df2f.jpg',
u'http://imgsrc.baidu.com/forum/wh%3D90%2C90/sign=46c1cc9483d4b31cf0699cb2b7fa1e4f/bd862d2ac65c10385f6f1915ba119313b17e892e.jpg',
u'http://imgsrc.baidu.com/forum/wh%3D91%2C90/sign=de722bda78cf3bc7e855c5e5e02c8391/accf9e18367adab4f396cc9483d4b31c8501e4fe.jpg',
u'http://imgsrc.baidu.com/forum/wh%3D90%2C90/sign=9549bad85182b2b7a7ca31cd0181f2df/9dc1673e6709c93d44c22c2b973df8dcd000540b.jpg',
u'http://imgsrc.baidu.com/forum/wh%3D90%2C160%3Bcrop%3D0%2C0%2C90%2C90/sign=1361b72e751ed21b799c26ec9d42ecf2/caf91f0828381f307dd1ab75a1014c086c06f07c.jpg',
u'http://imgsrc.baidu.com/forum/wh%3D90%2C160%3Bcrop%3D0%2C0%2C90%2C90/sign=003bc7ff7bf082022dc799367bd7cadb/0d38256d55fbb2fbee667bce474a20a44423dcf7.jpg',
u'http://imgsrc.baidu.com/forum/wh%3D90%2C160%3Bcrop%3D0%2C0%2C90%2C90/sign=c30aaadd546034a829b7b088fb3f7862/c3fdcc0735fae6cd21a688bd07b30f2443a70f35.jpg',
u'http://imgsrc.baidu.com/forum/wh%3D90%2C159%3Bcrop%3D0%2C0%2C90%2C90/sign=8ff4a1d85182b2b7a7ca31cd0181fada/3857980a19d8bc3e9f853f168a8ba61eaad345b6.jpg',
u'http://imgsrc.baidu.com/forum/wh%3D90%2C159%3Bcrop%3D0%2C0%2C90%2C90/sign=fd928ccac7fc1e17fdea84387abcc736/5d2188529822720eb2c8d92673cb0a46f31fab3a.jpg',
u'http://imgsrc.baidu.com/forum/wh%3D90%2C159%3Bcrop%3D0%2C0%2C90%2C90/sign=4cb4bdf006f41bd5da06e0fd61f6b0fe/6410b912c8fcc3cef25793e89a45d688d53f2051.jpg',
u'http://imgsrc.baidu.com/forum/wh%3D90%2C159%3Bcrop%3D0%2C0%2C90%2C90/sign=f5025694962f07085f502209d90889ac/7ce22c9b033b5bb5bb3e64253ed3d539b400bc52.jpg',
u'http://imgsrc.baidu.com/forum/wh%3D145%2C90/sign=86be70ceb4315c6043c063eeb984e72a/241923c79f3df8dcb740534ac511728b451028c6.jpg',
u'http://imgsrc.baidu.com/forum/wh%3D90%2C90%3Bcrop%3D0%2C0%2C90%2C90/sign=f5d7eef34934970a47261826a5e6e8f8/c3fdcc0735fae6cd268d8dbd07b30f2443a70f02.jpg',
u'http://imgsrc.baidu.com/forum/wh%3D90%2C120%3Bcrop%3D0%2C0%2C90%2C90/sign=2d5511d753b5c9ea62a60beae5158732/08b62ca85edf8db1d2dd5bb80123dd54544e7454.jpg',
u'http://imgsrc.baidu.com/forum/wh%3D136%2C90/sign=24d6709da751f3dec3e7b165a7d8dc26/64983d1f95cad1c81e464470773e6709c83d513a.jpg',
u'http://imgsrc.baidu.com/forum/wh%3D90%2C106%3Bcrop%3D0%2C0%2C90%2C90/sign=b985bccac3ef76093c5e91961ef192fc/fc05e51f4134970a853fa8789dcad1c8a6865d6b.jpg',
u'http://imgsrc.baidu.com/forum/wh%3D90%2C90%3Bcrop%3D0%2C0%2C90%2C90/sign=04db575ec1ea15ce41bbe800862c03c3/edee83504fc2d56282d5e936ef1190ef74c66c65.jpg',
u'http://imgsrc.baidu.com/forum/wh%3D90%2C90%3Bcrop%3D0%2C0%2C90%2C90/sign=e97ea5f2a0d3fd1f365caa3300621c2f/5df2b318972bd4075e0fe52173899e510db30973.jpg',
u'http://imgsrc.baidu.com/forum/wh%3D90%2C110%3Bcrop%3D0%2C0%2C90%2C90/sign=e9f47f005ce736d158468401ab7c7ef3/99c76a8b4710b9129906c722cbfdfc0390452278.jpg',
u'http://imgsrc.baidu.com/forum/wh%3D159%2C90/sign=0034aae273ec54e741b9121f8c01b769/02988a58d109b3debd89c3b4c4bf6c81810a4c09.jpg',
u'http://imgsrc.baidu.com/forum/wh%3D160%2C90/sign=19b0661bf4039245a1e0e90eb1a488fb/d6d442afa40f4bfb21930e820b4f78f0f53618ff.jpg',
u'http://imgsrc.baidu.com/forum/wh%3D160%2C90/sign=06e4f2d0a4af2eddd4a441e8bb202dd0/cdf3a4315c6034a82e323457c31349540b23766e.jpg',
u'http://imgsrc.baidu.com/forum/wh%3D90%2C159%3Bcrop%3D0%2C0%2C90%2C90/sign=743f9e9fa98b87d65017a3163724190d/348f3d2dd42a2834f3b81aa553b5c9ea14cebf5c.jpg',
u'http://imgsrc.baidu.com/forum/wh%3D90%2C159%3Bcrop%3D0%2C0%2C90%2C90/sign=a3aea53fc511728b3078842bf8d0f2fb/4ac19282b9014a9084ad6c13a1773912b11beee7.jpg',
u'http://imgsrc.baidu.com/forum/wh%3D90%2C159%3Bcrop%3D0%2C0%2C90%2C90/sign=e82781cf07b30f2435cfe40af8b9e076/56de63f40ad162d9617d48b219dfa9ec8813cde7.jpg']
>>>
So try using img_url = sel.xpath('//div[#class="threadlist_text pull_left"]//div[#class="small_wrap j_small_wrap"]//a[#class="thumbnail vpic_wrap"]/img/#data-attribute').extract()

Related

About the 'experimental' of 'Changed in version 1.4: Index key for glossary term should be considered experimental'

question
https://www.sphinx-doc.org/en/master/usage/restructuredtext/directives.html
About the 'experimental' of 'Changed in version 1.4: Index key for glossary term should be considered experimental.'
What puts index key on 'experimental'?
confirmed
Editing glossary with index key, 'make clean' was needed before 'make html'. Or, the index key disappeared from genindex.html.
I guess that all of the follwing testcase shuold be passed. the first testcase is related with 'make clean'. They are passed at all, on a new indexer.
#!/usr/bin/python
import unittest
from sphinx.environment.adapters.indexentries import IndexEntries
testcase01i = {
'doc1': [
('single', 'aaa', 'id-111', '', 'clf1'),
('single', 'bbb', 'id-112', '', None),
],
'doc2': [
('single', 'aaa', 'id-121', '', None),
('single', 'bbb', 'id-122', '', 'clf2'),
], }
testcase01o = [
('clf1',
[('aaa', [[('', 'doc1.html#id-111'), ('', 'doc2.html#id-121')], [], 'clf1']), ]
),
('clf2',
[('bbb', [[('', 'doc1.html#id-112'), ('', 'doc2.html#id-122')], [], None]), ]
),
]
testcase02i = {
'doc1': [
('see','hogehoge; foo','id-211','main',None),
('seealso','hogehoge; bar','id-212','main',None), ]
}
testcase02o = [
('H',
[('hogehoge',
[[],
[('see also bar', []), ('see foo', [])],
None
])
])
]
testcase03i = {
'doc1': [
('single','func1() (aaa module)','id-311','',None),
('single','func1() (bbb module)','id-312','',None),
('single','func1() (ccc module)','id-313','',None),]
}
testcase03o = [
('F',
[('func1()',
[[],
[('(aaa module)', [('', 'doc1.html#id-311')]),
('(bbb module)', [('', 'doc1.html#id-312')]),
('(ccc module)', [('', 'doc1.html#id-313')])
],
None])])
]
testcase04i = {
'doc1': [
('single','func1() (aaa module)','id-411','',None),
('single','func1() (bbb module)','id-412','',None),
('single','func1() (ccc module)','id-413','main',None), ]
}
testcase04o = [
('F',
[('func1()',
[[],
[('(aaa module)', [('', 'doc1.html#id-411')]),
('(bbb module)', [('', 'doc1.html#id-412')]),
('(ccc module)', [('main', 'doc1.html#id-413')]), ],
None])])
]
#-------------------------------------------------------------------
class _domain(object):
def __init__(self, entries):
self.entries = entries
class _env(object):
def __init__(self, domain):
self.domain = {}
self.domain['index'] = domain
def get_domain(self, domain_type):
return self.domain[domain_type]
class _builder(object):
def get_relative_uri(self, uri_type, file_name):
return f'{file_name}.html'
#-------------------------------------------------------------------
bld = _builder()
class TestcaseIndexEntries(unittest.TestCase):
def test01_classifier(self):
self.maxDiff = None
dmn = _domain(testcase01i)
env = _env(dmn)
gidx = IndexEntries(env).create_index(bld)
self.assertEqual(testcase01o, gidx)
def test02_see_and_seealso(self):
self.maxDiff = None
dmn = _domain(testcase02i)
env = _env(dmn)
gidx = IndexEntries(env).create_index(bld)
self.assertEqual(testcase02o, gidx)
def test03_homonymous_function(self):
self.maxDiff = None
dmn = _domain(testcase03i)
env = _env(dmn)
gidx = IndexEntries(env).create_index(bld)
self.assertEqual(testcase03o, gidx)
def test04_homonymous_function(self):
self.maxDiff = None
dmn = _domain(testcase04i)
env = _env(dmn)
gidx = IndexEntries(env).create_index(bld)
self.assertEqual(testcase04o, gidx)
#-------------------------------------------------------------------
if __name__ == '__main__':
unittest.main()
purpose
I want to know which there is some other puting index key on 'experimental'. If there is it, I want to see if it is fixed by the indexer.
note
I am writing this using a translation software. If there is anything difficult to understand, I will add it.

Problem with the error * A server error occurred. Please contact the administrator * django rest framework

I am trying to test some basic backed fonctionnalities but I seem to have this error A server error occurred. Please contact the administrator while trying to connect to the localhost http://127.0.0.1:8000 or http://127.0.0.1:8000/sise/.
At first I had this error showing django.core.exceptions.ImproperlyConfigured: The included URLconf 'Dashboard_sise.urls' does not appear to have any patterns in it. If you see valid patterns in the file then the issue is probably caused by a circular import. but After I commented this line in the settings file ROOT_URLCONF = 'Dashboard_sise.urls' the error changed into A server error occurred. Please contact the administrator.
Can anyone please help me figure this problem out, I already tried changing the urlpatterns in the urls.py files but it didn't work, I also tried manipulating the MIDDLEWARE section in the settings file but nothing changed.
This is the Dashboard_sise.urls code
from django.contrib import admin
from django.urls import include, path
urlpatterns = [
path('admin/', admin.sites.urls),
path('sise/', include('Dashboard.urls')),
]
This is the Dashboard.urls code
from rest_framework.routers import DefaultRouter
from Dashboard.views import *
router = DefaultRouter()
router.register(r'accee', AcceeViewSet, basename='accee')
router.register(r'rapport', RapportViewSet, basename='rapport')
router.register(r'prise_fonction', PointageUtilisateurViewSet, basename='prise_fonction')
urlPatterns = router.urls
and finally the settings file
# Build paths inside the project like this: os.path.join(BASE_DIR, ...)
BASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
# Quick-start development settings - unsuitable for production
# See https://docs.djangoproject.com/en/3.0/howto/deployment/checklist/
# SECURITY WARNING: keep the secret key used in production secret!
SECRET_KEY = '+f-#$j*(-8^*7ijk#6_hpki#)am4e%na6ttp)54#-ddcs0#fgy'
# SECURITY WARNING: don't run with debug turned on in production!
DEBUG = True
ALLOWED_HOSTS = ['*']
PREPEND_WWW = False
# Application definition
INSTALLED_APPS = [
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
'rest_framework',
'corsheaders',
'Dashboard',
'frontend'
]
MIDDLEWARE = [
'django.middleware.security.SecurityMiddleware',
'django.contrib.sessions.middleware.SessionMiddleware',
'django.middleware.common.CommonMiddleware',
'django.middleware.csrf.CsrfViewMiddleware',
'django.contrib.auth.middleware.AuthenticationMiddleware',
'django.contrib.messages.middleware.MessageMiddleware',
'django.middleware.clickjacking.XFrameOptionsMiddleware',
'corsheaders.middleware.CorsMiddleware',
]
TEMPLATES = [
{
'BACKEND': 'django.template.backends.django.DjangoTemplates',
'DIRS': [os.path.join(BASE_DIR, 'templates')]
,
'APP_DIRS': True,
'OPTIONS': {
'context_processors': [
'django.template.context_processors.debug',
'django.template.context_processors.request',
'django.contrib.auth.context_processors.auth',
'django.contrib.messages.context_processors.messages',
],
},
},
]
WSGI_APPLICATION = 'Dashboard_sise.wsgi.application'
'''ROOT_URLCONF = 'Dashboard_sise.urls'''
# Database
# https://docs.djangoproject.com/en/3.0/ref/settings/#databases
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql',
'NAME': 'MainCourante',
'USER': 'postgres',
'PASSWORD': 'root',
'HOST': 'localhost',
'PORT': '5432',
}
}
# Password validation
# https://docs.djangoproject.com/en/3.0/ref/settings/#auth-password-validators
AUTH_PASSWORD_VALIDATORS = [
{
'NAME': 'django.contrib.auth.password_validation.UserAttributeSimilarityValidator',
},
{
'NAME': 'django.contrib.auth.password_validation.MinimumLengthValidator',
},
{
'NAME': 'django.contrib.auth.password_validation.CommonPasswordValidator',
},
{
'NAME': 'django.contrib.auth.password_validation.NumericPasswordValidator',
},
]
# Internationalization
# https://docs.djangoproject.com/en/3.0/topics/i18n/
LANGUAGE_CODE = 'en-us'
TIME_ZONE = 'UTC'
USE_I18N = True
USE_L10N = True
USE_TZ = True
APPEND_SLASH = False
# Static files (CSS, JavaScript, Images)
# https://docs.djangoproject.com/en/3.0/howto/static-files/
STATIC_URL = '/static/'
'''REST_FRAMEWORK = {
'DEFAULT_RENDERER_CLASSES': (
'rest_framework.renderers.JSONRenderer',
'rest_framework.authentification.BasicAuthentification'
'rest_framework.authentification.SessionAuthentification'
# 'rest_framework.permissions.IsAuthentificated'
# 'rest_framework.permissions.AllowAny'
)
}'''
CORS_ORIGIN_ALLOW_ALL = True # If this is used then `CORS_ORIGIN_WHITELIST` will not have any effect
Thank you in adavance
I will answer this in case someone faced the same bug. I had a problem with the database structure .. The models implemented in the models.py file and the database created didn't match so it kept showing me this error ... once I fixed the models.py file it all worked well
Maybe this was the cause.
urlPatterns = router.urls
its usually urlpatterns.

Scrapy Image Pipeline: How to rename images?

I've a spider which fetches both the data and images. I want to rename the images with the respective 'title' which i'm fetching.
Following is my code:
spider1.py
from imageToFileSystemCheck.items import ImagetofilesystemcheckItem
import scrapy
class TestSpider(scrapy.Spider):
name = 'imagecheck'
def start_requests(self):
searchterms=['keyword1','keyword2',]
for item in searchterms:
yield scrapy.Request('http://www.example.com/s?=%s' % item,callback=self.parse, meta={'item': item})
def parse(self,response):
start_urls=[]
item = response.meta.get('item')
for i in range(0,2):
link=str(response.css("div.tt a.chek::attr(href)")[i].extract())
start_urls.append(link)
for url in start_urls:
print(url)
yield scrapy.Request(url=url, callback=self.parse_info ,meta={'item': item})
def parse_info(self, response):
url=response.url
title=str(response.xpath('//*[#id="Title"]/text()').extract_first())
img_url_1=response.xpath("//img[#id='images']/#src").extract_first()
scraped_info = {
'url' : url,
'title' : title,
'image_urls': [img_url_1]
}
yield scraped_info
items.py
import scrapy
class ImagetofilesystemcheckItem(scrapy.Item):
# define the fields for your item here like:
# name = scrapy.Field()
image_urls = scrapy.Field()
images = scrapy.Field()
pass
pipelines.py
class ImagetofilesystemcheckPipeline(object):
def process_item(self, item, spider):
return item
settings.py
BOT_NAME = 'imageToFileSystemCheck'
SPIDER_MODULES = ['imageToFileSystemCheck.spiders']
NEWSPIDER_MODULE = 'imageToFileSystemCheck.spiders'
ITEM_PIPELINES = {'scrapy.pipelines.images.ImagesPipeline': 1}
IMAGES_STORE = '/home/imageToFileSystemCheck/images/'
ROBOTSTXT_OBEY = True
Can you please help me with the required changes so that scrapy could save the scraped images in the 'title'.jpg format where title is scraped by the spider?
Create a Spider like this
class ShopeeSpider(scrapy.Spider):
_TEMP_IMAGES_STORE = "/home/crawler/scrapers/images"
custom_settings = {
'ITEM_PIPELINES': {
'coszi.pipelines.CustomImagePipeline': 400,
}
"IMAGES_STORE": _TEMP_IMAGES_STORE
}
def parse(self, response):
data = {}
data['images'] = {"image_link_here": "image_name_here"}
Then your pipelines.py should be like this
class CustomImagePipeline(ImagesPipeline):
def get_media_requests(self, item, info):
if 'images' in item:
for image_url, img_name in item['images'].iteritems():
if os.path.exists(os.path.join(item['images_path'], img_name)) == False:
request = scrapy.Request(url=image_url)
request.meta['img_name'] = img_name
request.meta['this_prod_img_folder'] = item['img_name_here']
request.dont_filter = True
yield request
def file_path(self, request, response=None, info=None):
return os.path.join(info.spider.CRAWLER_IMAGES_STORE, request.meta['this_prod_img_folder'], request.meta['img_name'])

Fetching images binary data from the CRML

Trying to fetch :all (first :item) from the CRML Media Resource. Using Estately RETS repo. Here is my ruby example file:
require 'rets'
client = Rets::Client.new({
login_url: 'url',
username: 'user',
password: 'password',
version: 'RETS/1.7.2'
})
begin
client.login
rescue => e
puts 'Error: ' + e.message
exit!
end
puts 'We connected! Lets get all the photos for a property...'
photos = client.find (:first), {
search_type: 'Media',
class: 'Media',
query: '(MediaModificationTimestamp=2017-04-15+),(MediaType=Image)'
}
photo = open(photo = photos['MediaURL'])
require 'base64'
image = Base64.encode64(photo.read)
File.open('property-1.gif', 'wb') do|f|
f.write(Base64.decode64(image))
end
puts photos.length.to_s + ' photos saved.'
client.logout
but I'm only getting one image instead of the 26 expected. Not sure also if this will be the best method of retrieving all of the images for all of the listings, after I get the first one working. Here is more information regarding this issue https://github.com/estately/rets/issues/210
require 'rets'
client = Rets::Client.new({
login_url: 'url',
username: 'username',
password: 'password',
version: 'RETS/1.7.2'
})
begin
client.login
rescue => e
puts 'Error: ' + e.message
exit!
end
puts 'We connected! Lets get all the photos for a property...'
photos = client.find (:all), {
search_type: 'Media',
class: 'Media',
query: '(ResourceRecordKeyNumeric=117562969),(MediaType=Image)'
}
photos.each_with_index do |data, index|
photo = open(photo = data['MediaURL'])
puts data['MediaURL']
require 'base64'
image = Base64.encode64(photo.read)
File.open("property-#{index.to_s}.jpg", 'wb') do |f|
f.write(Base64.decode64(image))
end
end
puts photos.length.to_s + ' photos saved.'
client.logout
You can try giving listing IDs comma separated to get all images of multiple listings at a time, in your query part.
photos = client.find (:all), {
search_type: 'Media',
class: 'Media',
query: '(ResourceRecordKeyNumeric=117562969,117562970,117562971),(MediaType=Image)'
}

Image not loading after updating a Google document

I have a problem trying to update a Google document containing an image. In the first revision, the image will load as expected. But after updating it with the same HTML code I keep getting a spinner instead of the image.
I am using the Ruby gem created by Google (https://github.com/google/google-api-ruby-client).
Here is my test code:
# Setting up the client instance
require "google/api_client"
require "tempfile"
client = Google::APIClient.new
client.authorization.client_id = "<CLIENTID>"
client.authorization.client_secret = "<CLIENTSECRET>"
client.authorization.redirect_uri = "<REDIRECTURI>"
client.authorization.scope = "https://www.googleapis.com/auth/drive"
client.authorization.access_token = "<ACCESSTOKEN>"
client.authorization.refresh_token = "<REFRESHTOKEN>"
drive = client.discovered_api("drive", "v2")
# Creating the document (IMAGE DISPLAYED CORRECTLY)
file = drive.files.insert.request_schema.new({"title" => "Test document", "mimeType" => "text/html"})
temp = Tempfile.new "temp.html"
temp.write "<h1>Testing!</h1><p>Lorem ipsum.</p><img width='400px' src='http://www.digitaleconomics.nl/wp-content/uploads/2013/04/see-how-your-google-results-measure-up-with-google-grader-video-6b8bbb4b41.jpg'>"
temp.rewind
media = Google::APIClient::UploadIO.new(temp, "text/html")
result = client.execute(:api_method => drive.files.insert, :body_object => file, :media => media, :parameters => {"uploadType" => "multipart", "convert" => true})
temp.close
# Updating the document (GETTING SPINNER INSTEAD OF IMAGE)
file = client.execute(:api_method => drive.files.get, :parameters => {"fileId" => result.data.to_hash["id"]}).data
file.title = "Updated test document"
temp = Tempfile.new "temp.html"
temp.write "<h1>Testing!</h1><p>Lorem ipsum.</p><img width='400px' src='http://www.digitaleconomics.nl/wp-content/uploads/2013/04/see-how-your-google-results-measure-up-with-google-grader-video-6b8bbb4b41.jpg'>"
temp.rewind
media = Google::APIClient::UploadIO.new(temp, "text/html")
result = client.execute(:api_method => drive.files.update, :body_object => file, :media => media, :parameters => {"uploadType" => "multipart", "convert" => true, "fileId" => result.data.to_hash["id"], "newRevision" => false})
temp.close
Also, setting newRevision to false does not prevent a new revision from being created.
Can anyone help me out?

Resources