Scraping external links

Vashiel

New member
Mar 20, 2016
1
0
0
Hi, maybe someone can find a solution to my problem?

I wrote a little video addon with help of some tutorials just for fun. Now it does work so far, but not as good as i wish.

How it is now (just the needed part of the code):
Code:
import urllib, urllib2, re, os, sys
import xbmc, xbmcplugin, xbmcgui, xbmcaddon

def main():
	add_dir('Name of the Website', startingpage' , 2, logos , fanart)

def start(url): 
	home()
	if 'Name of the Website' in url:
		content = make_request(url)
		match = re.compile('href="([^"]+)"><img src="([^"]+)".+?alt="([^"]+)"').findall(content)
		for url, thumb, name in match:
			add_link(name, url, 4, thumb, fanart)


def resolve_url(url):
	content = make_request(url)
	if 'Name of the Website' in url:
		media_url = re.compile('<div class="video-wrap" data-origin-source="([^"]+)">').findall(content)[0]


def add_dir(name, url, mode, iconimage, fanart):
	u = sys.argv[0] + "?url=" + urllib.quote_plus(url) + "&mode=" + str(mode) + "&name=" + urllib.quote_plus(name) + "&iconimage=" + urllib.quote_plus(iconimage)
	ok = True
	liz = xbmcgui.ListItem(name, iconImage = "DefaultFolder.png", thumbnailImage = iconimage)
	liz.setInfo( type = "Video", infoLabels = { "Title": name } )
	liz.setProperty('fanart_image', fanart)
	ok = xbmcplugin.addDirectoryItem(handle = int(sys.argv[1]), url = u, listitem = liz, isFolder = True)
	return ok

	
def add_link(name, url, mode, iconimage, fanart):
	u = sys.argv[0] + "?url=" + urllib.quote_plus(url) + "&mode=" + str(mode) + "&name=" + urllib.quote_plus(name) + "&iconimage=" + urllib.quote_plus(iconimage)	
	liz = xbmcgui.ListItem(name, iconImage = "DefaultVideo.png", thumbnailImage = iconimage)
	liz.setInfo( type = "Video", infoLabels = { "Title": name } )
	liz.setProperty('fanart_image', fanart)
	liz.setProperty('IsPlayable', 'true')  
	ok = xbmcplugin.addDirectoryItem(handle = int(sys.argv[1]), url = u, listitem = liz)  


if mode == None or url == None or len(url) < 1:
	main()

elif mode == 2:
	start(url)
  
elif mode == 4:
	resolve_url(url)
This is how it ist now. So start(url): sends to resolve_url(url) and the match there is the videolink (media_url). Does work just fine.
BUT...in some Websites the Videolinks are externel.
The Structure is like FirstPage---->SubPage---->ExternalPage----->Videolink


resolve_url(url): only gets the URL of the ExternalPage. So i thought maybe i should match twice. FirstPage and and Subpage and send the URL of External Page to resolve_url(url) to get the Videolink. Like this

Code:
	content = make_request(url)
	match = re.compile('href="([^"]+)"><img src="([^"]+)".+?alt="([^"]+)"').findall(content)
	for url, thumb, name in match:
		content2 = make_request(url)
		match = re.compile('<div class="video-wrap" data-origin-source="([^"]+)">').findall(content2)
		for url in match:
			add_link(name, url, 4, thumb, fanart)
Does work...but now it takes about 10 seconds to load the first page, because it looks for URL twice. There must be a better solution.
 
Last edited: