My development add-on has been working great scraping the site and pulling meta data until the site started allowing non ASCII format names and the metahandler stops at the first one. Here is an example of one of the names 'Antikörper'
Just to get my add-on working again I started filtering such as '([a-zA-Z \d # $ . : / = * , - " " ! @ & ?]+)' trying to add as many special characters as there are in the movie list. It scrapes the list and pulls metadata however there has to be a better way of fixing the non ASCII issue?
--UPDATE--
After spending the weekend working on this and asking questions from the awesome devs here at xbmchub, I was able to come up with a solution that worked with my add-on. This is the code I used to encode by default the website I am scraping *helps to match the website encoding*
import sys
reload (sys) *had to reload sys or would get error*
sys.setdefaultencoding('iso8859-1') *iso8859-1 is the encoding for the website I am scraping*
It was stated in python documentation this is not the best way to use encoding, however was easy and does work for my situation. Any feedback or recommendation are wanted.
Just to get my add-on working again I started filtering such as '([a-zA-Z \d # $ . : / = * , - " " ! @ & ?]+)' trying to add as many special characters as there are in the movie list. It scrapes the list and pulls metadata however there has to be a better way of fixing the non ASCII issue?
--UPDATE--
After spending the weekend working on this and asking questions from the awesome devs here at xbmchub, I was able to come up with a solution that worked with my add-on. This is the code I used to encode by default the website I am scraping *helps to match the website encoding*
import sys
reload (sys) *had to reload sys or would get error*
sys.setdefaultencoding('iso8859-1') *iso8859-1 is the encoding for the website I am scraping*
It was stated in python documentation this is not the best way to use encoding, however was easy and does work for my situation. Any feedback or recommendation are wanted.
Last edited: