Page 1 of 1

Character encoding in HTTP-Request

Posted: 27 Nov 2019 22:37
by Horschte
Hi guys,

I'm screenscraping an internet page using the action HTTP-Request. The problem is that the internet page is encoded in iso-8859-1. The response I get from the action shows some special characters as question marks. I suspect that Automagic uses UTF-8 by default so the special characters get replaced.

How can I set the HTTP-Request to use iso-8859-1? I already tried the costum header option using "Accept-Charset:iso-8859-1" but it's not working.

Any help?


Thank you very much.

Re: Character encoding in HTTP-Request

Posted: 28 Nov 2019 18:02
by Desmanto
I never tried it out. Do you have maybe other website that use iso-8859-1 that I can test with?

Also maybe try custom header Accept-Charset: utf-8, iso-8859-1;q=0.5

Re: Character encoding in HTTP-Request

Posted: 07 Dec 2019 16:19
by Horschte
The solution goes like this:

Instead of saving the response from the HTTP-Request to a variable save it to a file. Then load that file using the action Init Variable Text File and set the Encoding to "iso-8859-1".

Credits for this solution go to Desmanto.

Re: Character encoding in HTTP-Request

Posted: 07 Dec 2019 17:11
by Desmanto
Nice to see the solution works.

@Martin : I wonder this might be a bug. But what Horschte encounter is when we HTTP request using the iso_8859-1 charset, if we store the result in {response}, and view it in debug dialog; some chararcter won't show up properly. It seems the debug dialog force to show it in UTF-8.

Saving it to file, make the charset correct. And if we init the file back using iso_8859-1, then debug dialog will show it up properly this time.

Re: Character encoding in HTTP-Request

Posted: 09 Dec 2019 15:34
by Martin
This might indeed be a bug. The action should respect the encoding but likely the action does not do it right in all circumstances.
Is the URL publicly accessible so I can test it myself?
What device model and Android version are you using?

Thanks & Regards,
Martin

Re: Character encoding in HTTP-Request

Posted: 13 Dec 2019 21:38
by Martin
The server does not indicate the encoding so Automagic falls back to UTF-8 which is not correct. However I fear that not all files without encoding are actually ISO-8859-1 so I will provide a new configuration to optionally specify the encoding.

Regards,
Martin