Character encoding in HTTP-Request

Post your questions and help other users.

Moderator: Martin

Post Reply
Horschte
Posts: 56
Joined: 03 Nov 2014 18:00

Character encoding in HTTP-Request

Post by Horschte » 27 Nov 2019 22:37

Hi guys,

I'm screenscraping an internet page using the action HTTP-Request. The problem is that the internet page is encoded in iso-8859-1. The response I get from the action shows some special characters as question marks. I suspect that Automagic uses UTF-8 by default so the special characters get replaced.

How can I set the HTTP-Request to use iso-8859-1? I already tried the costum header option using "Accept-Charset:iso-8859-1" but it's not working.

Any help?


Thank you very much.

User avatar
Desmanto
Posts: 2709
Joined: 21 Jul 2017 17:50

Re: Character encoding in HTTP-Request

Post by Desmanto » 28 Nov 2019 18:02

I never tried it out. Do you have maybe other website that use iso-8859-1 that I can test with?

Also maybe try custom header Accept-Charset: utf-8, iso-8859-1;q=0.5
Index of Automagic useful thread List of my other useful posts (and others')
Xiaomi Redmi Note 5 (whyred), AOSP Extended v6.7 build 20200310 Official, Android Pie 9.0, Rooted.

Horschte
Posts: 56
Joined: 03 Nov 2014 18:00

Re: Character encoding in HTTP-Request

Post by Horschte » 07 Dec 2019 16:19

The solution goes like this:

Instead of saving the response from the HTTP-Request to a variable save it to a file. Then load that file using the action Init Variable Text File and set the Encoding to "iso-8859-1".

Credits for this solution go to Desmanto.

User avatar
Desmanto
Posts: 2709
Joined: 21 Jul 2017 17:50

Re: Character encoding in HTTP-Request

Post by Desmanto » 07 Dec 2019 17:11

Nice to see the solution works.

@Martin : I wonder this might be a bug. But what Horschte encounter is when we HTTP request using the iso_8859-1 charset, if we store the result in {response}, and view it in debug dialog; some chararcter won't show up properly. It seems the debug dialog force to show it in UTF-8.

Saving it to file, make the charset correct. And if we init the file back using iso_8859-1, then debug dialog will show it up properly this time.
Index of Automagic useful thread List of my other useful posts (and others')
Xiaomi Redmi Note 5 (whyred), AOSP Extended v6.7 build 20200310 Official, Android Pie 9.0, Rooted.

User avatar
Martin
Posts: 4468
Joined: 09 Nov 2012 14:23

Re: Character encoding in HTTP-Request

Post by Martin » 09 Dec 2019 15:34

This might indeed be a bug. The action should respect the encoding but likely the action does not do it right in all circumstances.
Is the URL publicly accessible so I can test it myself?
What device model and Android version are you using?

Thanks & Regards,
Martin

User avatar
Martin
Posts: 4468
Joined: 09 Nov 2012 14:23

Re: Character encoding in HTTP-Request

Post by Martin » 13 Dec 2019 21:38

The server does not indicate the encoding so Automagic falls back to UTF-8 which is not correct. However I fear that not all files without encoding are actually ISO-8859-1 so I will provide a new configuration to optionally specify the encoding.

Regards,
Martin

Post Reply