hello need help with this link please thanks
hxxp://cablegratis.online/canal-4-montecarlo-tv-en-vivo/
hxxp://cablegratis.online/canal-4-montecarlo-tv-en-vivo/
http://www1.fastdrama.me/browse/chinese/movies/all/all/all/0
#This regex works
<regex>
<name>getUrlPart</name>
<expres><![CDATA[return.+?getElementById[(]"(.*?)"[)].innerHTML]]></expres>
<page>https://PageImRegexing</page>
<cookieJar></cookieJar>
</regex>
#This regex doesn't work. I'm trying to use the text I captured with the above regex, to find it in another place on the page and trying to capture the text right after it.
<regex>
<name>finalUrlPart</name>
<expres><![CDATA[span.+?('$doregex[getUrlPart]')>(.*?)</span>]]></expres>
<page>https://PageImRegexing</page>
<cookieJar></cookieJar>
</regex>
<regex>
<name>finalUrlPart</name>
<expres><![CDATA[$doregex[getUrlPart].>([^<]+)]]></expres>
<page>https://PageImRegexing</page>
<cookieJar></cookieJar>
</regex>
<item>
<title>Test URL</title>
<link>http:$doregex[MainUrlPart]$doregex[final2ndUrlPart]$doregex[final3rdUrlPart]|Referer=hxxps://www.str**mlive.to/view/46476/ABC-(HD)&User-Agent=Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.110 Safari/537.36</link>
#Here I'm trying to get the main part of the url
<regex> THIS WORKS
<name>getMainUrl</name>
<expres><![CDATA[return.+?"(.*?)["]].join]]></expres>
<page>hxxps://www.str**mlive.to/view/46476/ABC-(HD)</page>
<cookieJar></cookieJar>
</regex>
#Here I'm replacing the ",", the , and the \/ with / to get the real main url part
<regex> THIS WORKS
<name>MainUrlPart</name>
<expres>$pyFunction:('$doregex[getMainUrl]').replace('","','').replace('\/','/').replace(',','')</expres>
<page>hxxps://www.str**mlive.to/view/46476/ABC-(HD)</page>
<cookieJar></cookieJar>
</regex>
#Here I'm trying to get the variable for the 2nd part of the url
<regex> THIS WORKS
<name>find2ndUrlPart</name>
<expres><![CDATA[return.+?join.+?[("][")]\s*[+]\s*(.*?).join]]></expres>
<page>hxxps://www.str**mlive.to/view/46476/ABC-(HD)</page>
<cookieJar></cookieJar>
</regex>
#Here I'm searching for the find2ndUrlPart variable which is just a little further down the page, to get the next part of the url
<regex> THIS DOES NOT WORK
<name>get2ndUrlPart</name>
<expres><![CDATA[innerHTML.+?var\s*$doregex[find2ndUrlPart] =.+?[["](.*?)["];]]]></expres>
<page>hxxps://www.str**mlive.to/view/46476/ABC-(HD)</page>
<cookieJar></cookieJar>
</regex>
#Here I'm replacing the "," to get the real 2nd url part
<regex> THIS SHOULD WORK WHEN THE get2ndUrlPart WORKS
<name>final2ndUrlPart</name>
<expres>$pyFunction:('$doregex[get2ndUrlPart]').replace('","','')</expres>
<page>hxxps://www.str**mlive.to/view/46476/ABC-(HD)</page>
<cookieJar></cookieJar>
</regex>
#Here I'm trying to get the variable for the last part of the url
<regex> THIS WORKS
<name>get3rdUrlPart</name>
<expres><![CDATA[return.+?getElementById[(]"(.*?)"[)].innerHTML]]></expres>
<page>hxxps://www.str**mlive.to/view/46476/ABC-(HD)</page>
<cookieJar></cookieJar>
</regex>
#Here I'm searching for the get3rdUrlPart variable which is at the top of the page, to get the final part of the url
<regex> THIS DOES NOT WORK
<name>final3rdUrlPart</name>
<expres><![CDATA[id=$doregex[getUrlPart]>([^<]+)]]></expres>
<page>hxxps://www.str**mlive.to/view/46476/ABC-(HD)</page>
<cookieJar></cookieJar>
</regex>
<thumbnail></thumbnail>
<fanart></fanart>
</item>
<item>
<title>ABC (HD)</title>
<link>$doregex[getUrl]</link>
<regex>
<name>getUrl</name>
<expres><![CDATA[#$pyFunction
import re
def GetLSProData(page_data,Cookie_Jar,m):
sid=re.findall('source:\s*([^\(]+)',page_data)[0]
url,tok1,tok2=re.findall('%s[\w\W]*?return.+?\[(.*?)\].+?\+\s*([^\.]+).+?"(\w[^"]+)'%sid,page_data)[0]
rtmp=''.join(eval(url)).replace('\\','')
token=re.findall('var\s*%s.+?\[([^\]]+)'%tok1,page_data)[0];token=''.join(eval(token))
atoken=re.findall('id=%s>(.*?)<'%tok2,page_data)[0]
return 'https:%s%s%s|user-agent=ipad&referer=https://www.streamlive.to/view/46476/ABC-(HD)'%(rtmp,token,atoken)
]]></expres>
<page>hxxps://www.st**mlive.to/view/46476/ABC-(HD)</page>
</regex>
</item>
<item>
<title>ABC</title>
<link>http:$doregex[MainUrlPart]c2Vydm$doregex[final2ndUrlPart]dXRlcz0yNDAmc3RybV9sZW49MjMmaXA9MTc2LjEwMy4xMzAuMTMw|Referer=hxxps://www.str**mlive.to/view/46476/ABC-(HD)&User-Agent=Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.110 Safari/537.36</link>
<regex>
<name>getMainUrl</name>
<expres><![CDATA[return.+?"(.*?)["]].join]]></expres>
<page>hxxps://www.str**mlive.to/view/46476/ABC-(HD)</page>
<cookieJar></cookieJar>
</regex>
<regex>
<name>MainUrlPart</name>
<expres>$pyFunction:('$doregex[getMainUrl]').replace('","','').replace('\/','/').replace(',','')</expres>
<page>hxxps://www.str**mlive.to/view/46476/ABC-(HD)</page>
<cookieJar></cookieJar>
</regex>
<regex>
<name>getThe2ndUrlPart</name>
<expres><![CDATA[document.getElementById.*[\w\W\s]*?[[]"c2V","ydm(.*?)["]]]]></expres>
<page>hxxps://www.str**mlive.to/view/46476/ABC-(HD)</page>
<cookieJar></cookieJar>
</regex>
<regex>
<name>final2ndUrlPart</name>
<expres>$pyFunction:('$doregex[getThe2ndUrlPart]').replace('","','')</expres>
<page>hxxps://www.str**mlive.to/view/46476/ABC-(HD)</page>
<cookieJar></cookieJar>
</regex>
<thumbnail></thumbnail>
<fanart></fanart>
</item>
OZtO2nwiRsctAed+MdIGV23YPz9vlEq3xXDinXbBm0CP6AYSk2KBmd8DSbaBu0b7LnpXxiGzh6h//Uk4XG7RD6X6U3AxRCIAqtExc2h6jaEwpKSiTzedv4btIncIqsfj83qAqfZ+cUSq825rlCtFrgTddMgZDZ+tYrUa32vqyzu8HoBaV0eTDlS/AGbxnKkN2n1Gcf5u1dyDA+Rpn0YDv/azfi0wMqzm21Tp/UEsBzs/ScktU/E2yc9ivkhk3b/USa2biO1xjkyMrKjNHAnaCBQBlfyGqyr2078G0xd8F2Q=
https://www.ballsoi8.com/ztvapi/generator.php?channel_id=beinsport1fr
https://cdn227.cloud-streaming.com/ballsoi8_r4/beinsport1fr/playlist.m3u8?wmsAuthSign=c2VydmVyX3RpbWU9MTEvMTEvMjAxOSAyOjAzOjI0IFBNJmhhc2hfdmFsdWU9emY0anJQbDhIdXROc202RWFoZXh6Zz09JnZhbGlkbWludXRlcz0xJmlkPTgxMjkyNg==
https://canlitvizle.com/star-tv-hd-izle-4
<item>
<title>Star TV</title>
<link>$doregex[getUrl]|User-Agent=iPad</link>
<regex>
<name>getUrl</name>
<expres><![CDATA[#$pyFunction
def GetLSProData(page_data,Cookie_Jar,m):
import re, unwise
r = re.search("file:\s*(?:''\+)?(?P<var1>[^+]+)\+'(?P<part2>[^']+)'\+(?P<var3>[^+]+)", page_data)
while 'w,i,s,e' in page_data:
page_data = unwise.unwise_process(page_data)
part1 = re.findall("{0}\s*=\s*'([^']+)".format(r.group('var1')), page_data)[0]
part3 = re.findall("{0}\s*=\s*'([^']+)".format(r.group('var3')), page_data)[0]
return part1 + r.group('part2') + part3
]]></expres>
<page>$doregex[iframe]</page>
<referer>https://canlitvizle.com/</referer>
</regex>
<regex>
<name>iframe</name>
<expres>iframe.+?src="([^"]+)</expres>
<page>https://canlitvizle.com/star-tv-hd-izle-4</page>
</regex>
</item>
playerInstance.setup({
file: "https://unlimited5-us.dps.live/nettv/nettv.smil/nettv/livestream1/chunks.m3u8",
type: "hls",
width: "100%",
height: "100%",
aspectratio: "16:9",
autostart: true,
cast:{},
ga:{}
});
<item>
<title>https://m.surfmusik.de/land/deutschland.html</title>
<link>$doregex[makelist]</link>
<regex>
<name>makelist</name>
<listrepeat><![CDATA[
<title>[makelist.param2]</title>
<link>$doregex[getmp3]</link>
]]></listrepeat>
<expres><![CDATA[<li><a\s*href="(.*?)".+?>([^<]+)]]></expres>
<page>https://m.surfmusik.de/land/deutschland.html</page>
</regex>
<regex>
<name>getmp3</name>
<expres>'file':\s*'([^']+)</expres>
<page>[makelist.param1]</page>
</regex>
</item>
<item>
<title>Television Libre</title>
<link>$doregex[makelist]</link>
<regex>
<name>makelist</name>
<listrepeat><![CDATA[
<title>[makelist.param3]</title>
<link>$doregex[getpage]|User-Agent=iPad&Referer=http://ustvgo.tv/</link>
<thumbnail>[makelist.param2]</thumbnail>
]]></listrepeat>
<expres><![CDATA[(?s)card-wrapper">\s*<a href="(?:\.\.)?(/[^"]+).+?src="([^"]+).+?title="([^"]+)]]></expres>
<page>https://televisionlibre.net/es/</page>
</regex>
<regex>
<name>getpage</name>
<expres>file:\s*['"]([^'""]+)</expres>
<page>https://televisionlibre.net$doregex[embed]</page>
<referer>https://televisionlibre.net/</referer>
</regex>
<regex>
<name>embed</name>
<expres><![CDATA[<iframe.+?src="(?:\.\./\.\.)?(/[^"]+)]]></expres>
<page>https://televisionlibre.net[makelist.param1]</page>
<referer>https://televisionlibre.net/</referer>
</regex>
</item>
<item>
<title>https://zzanime.com/movies/</title>
<link>$doregex[makelist]</link>
<regex>
<name>makelist</name>
<listrepeat><![CDATA[
<title>[makelist.param3]</title>
<link>$doregex[getUrl]</link>
<thumbnail>[makelist.param2]</thumbnail>
]]></listrepeat>
<expres><![CDATA[#$pyFunction
import re,requests
def GetLSProData(page_data,Cookie_Jar,m):
mdata=re.findall('(?s)"poster".+?href="([^"]+).+?src="([^"]+).+?alt="([^"]+)',re.findall('id="archive-content"(?s)(.*?)pagination',page_data)[0].replace('’','\'').replace('–','-').replace('#038;',''))
count=10;pn=2;data=[]
while pn <= int(count):
page='https://zzanime.com/movies/page/'+str(pn)+'/';source=requests.get(page).content.replace('’','\'').replace('–','-').replace('#038;','')
data +=re.findall('(?s)"poster".+?href="([^"]+).+?src="([^"]+).+?alt="([^"]+)',re.findall('id="archive-content"(?s)(.*?)pagination',source)[0]);pn +=1
return mdata+data
]]></expres>
<page>https://zzanime.com/movies/</page>
</regex>
<regex>
<name>getUrl</name>
<expres><![CDATA[#$pyFunction
import re,requests
def GetLSProData(page_data,Cookie_Jar,m):
streamID=int(re.findall('id=.player-option-1.+?data-post=.(\d+)',page_data)[0])
source=requests.post('https://zzanime.com/wp-admin/admin-ajax.php',headers={'user-agent':'Mozilla/5.0','referer':'[makelist.param1]','x-requested-with':'XMLHttpRequest'},data={'action':'doo_player_ajax','post':streamID,'nume':'1','type':'movie'}).content
return re.findall('"file":"([^"]+)',source)[0]+'|user-agent=ipad&referer=[makelist.param1]'
]]></expres>
<page>[makelist.param1]</page>
</regex>
</item>
<item>
<title>https://teveplay.xavitec.net/</title>
<link>$doregex[makelist]</link>
<regex>
<name>makelist</name>
<listrepeat><![CDATA[
<title>[makelist.param3] ([makelist.param4])</title>
<link>$doregex[getUrl]</link>
<thumbnail>[makelist.param1]</thumbnail>
<info>[makelist.param5]</info>
]]></listrepeat>
<expres><![CDATA[#$pyFunction
import re,requests
def GetLSProData(page_data,Cookie_Jar,m):
mdata=re.findall('(?s)"item movies".+?src="([^\?]+).+?<h3.+?href="(.*?)">([^<]+).+?imdb".+?<span>(\d+).+?"texto">([^<]+)',re.findall('id="archive-content"(?s)(.*?)pagi',page_data)[0].replace('’','\'').replace('–','-').replace('#038;',''))
count=re.findall('"pagination".+?de\s*(\d+)',page_data)[0];pn=2;data=[]
while pn <= int(count):
page='https://teveplay.xavitec.net/movies/page/'+str(pn)+'/';source=requests.get(page).content.replace('’','\'').replace('–','-').replace('#038;','')
data +=re.findall('(?s)"item movies".+?src="([^\?]+).+?<h3.+?href="(.*?)">([^<]+).+?imdb".+?<span>(\d+).+?"texto">([^<]+)',re.findall('id="archive-content"(?s)(.*?)pagi',source)[0]);pn +=1
return mdata+data
]]></expres>
<page>https://teveplay.xavitec.net/movies/</page>
</regex>
<regex>
<name>getUrl</name>
<expres><![CDATA[#$pyFunction
import re,requests
def GetLSProData(page_data,Cookie_Jar,m):
streamID=int(re.findall('id=.player-option-1.+?data-post=.(\d+)',page_data)[0])
source=requests.post('https://teveplay.xavitec.net/wp-admin/admin-ajax.php',headers={'user-agent':'Mozilla/5.0','referer':'[makelist.param2]','x-requested-with':'XMLHttpRequest'},data={'action':'doo_player_ajax','post':streamID,'nume':'1','type':'movie'}).content
if 'iframe' in source:
link=re.findall('<iframe.+?src=[\'"]([^\'"]+)',source)[0]
source=requests.get(link,headers={'user-agent':'Mozilla/5.0','referer':'[makelist.param2]'}).content
return re.findall('<video.+?source\s*src="([^"]+)',source)[0]+'|user-agent=ipad&referer='+link
else:
return re.findall('<video.+?source\s*src="([^"]+)',source)[0]+'|user-agent=ipad&referer=[makelist.param2]'
]]></expres>
<page>[makelist.param2]</page>
</regex>
</item>
Thread starter | Similar threads | Forum | Replies | Date |
---|---|---|---|---|
G | LSP | Scraper Development | 0 | |
J | PHP Scraper/regex | Scraper Development | 1 | |
O | UniversalSports.com | Addon Requests | 5 |
Similar threads |
---|
LSP |
PHP Scraper/regex |
UniversalSports.com |