mgran's ars technica

Monday, August 21, 2006

Downloading Binary Streams with Javascript XMLHttpRequest

(or How to do Binary Ajax)

I've been fiddling around trying to download a binary stream using the object XmlHttpRequest (XHR) of Javascript in Mozilla Firefox.

What I wanted was to be able to create a byte array containing the original bytes of the downloaded binary file. This mechanism would be used to implement a ROM loader, without modifying the original binary file (e.g. changing its binary encoding to some form of text at the server), because I wanted to reuse legacy binary files already stored in third-party servers which I couldn't modify.

After looking at developer.mozilla.org, many online docs (e.g. here and here), IANA's docs, checking hundreds of technical blogs, googling around etc, I wasn't able to find any example (or even an implied acknowledgement) that the object XmlHttpRequest could be used to fetch a binary file into a byte array. Instead, the general consensus is that any attempt at doing that is destined to result in a garbled binary stream and the 'problematic' binary download should never, ever be tried with XmlHttpRequest.

I decided to set out to find out a way to use this object to download any generic binary file. The problem I faced is that a usual Javascript code, in Firefox, for downloading a text file, would by default use a unicode-derived encoding and it would (1) eat up bytes in the binary stream because they would correspond to a single unicode-like character or symply ignore some of them because they were undefined in the character encoding; (2) unpredictably map some bytes into others (it seems the most affected range in the unicode charsets is between values 128-160, but in the default charset the affected range is 128-255). The result using the code below to fetch a binary file is a mess.


//fetches text/plain file synchronously. Do NOT use for binary files!
load_url = function(url) {
netscape.security.PrivilegeManager.enablePrivilege("UniversalBrowserRead");
var req = new XMLHttpRequest();
req.open('GET',url,false);
req.send(null);
if (req.status != 200) return '';
return req.responseText;
}

After looking in the XMLHttpRequest documentation I realized I could force the HTTP request header to ask for a specific MimeType. In principle, the only necessary thing would be to force Javascript to interpret the received string characters as a constant 8-bit-length encoding, such as ASCII, as can be seen in the code below. A UTF-8 or any other variable-length encoding such as the default one should be avoided for the reasons above.


//should fetch binary files synchronously. But it DOESN'T work.
load_url = function(url) {
netscape.security.PrivilegeManager.enablePrivilege("UniversalBrowserRead");
var req = new XMLHttpRequest();
req.open('GET',url,false);
req.overrideMimeType('text/plain; charset=us-ascii');
req.send(null);
if (req.status != 200) return '';
return req.responseText;
}


However, I discovered that even though now the resulting string length was correct, the underlying US-ASCII charset conversion of Javascript still mapped bytes in the range 128-160 to garbage.
I tried many (I really mean many) other 8-bit-length charsets without success. It seems that all available charsets reserved part of their range to code undefined and control characters in a fierce purposeful attempt to overcome my initial objective.

Then, after greping around, I found the file firefox/res/charsetalias.properties, which includes all accepted charsets in Firefox. After trying some extra charsets whose aliases were not available in any other source, I FINALLY found a simple charset that correctly instructed Javascript to map an identity between the received stream bytes and the characters returned by the string responseText. The code below is the correct one to download a byte stream in Javascript:


//fetches BINARY FILES synchronously using XMLHttpRequest
load_url = function(url) {
netscape.security.PrivilegeManager.enablePrivilege("UniversalBrowserRead");
var req = new XMLHttpRequest();
req.open('GET',url,false);
//XHR binary charset opt by Marcus Granado 2006 [http://mgran.blogspot.com]
req.overrideMimeType('text/plain; charset=x-user-defined');
req.send(null);
if (req.status != 200) return '';
return req.responseText;
}


After reviewing some UNICODE documents, it seems that the explanation is that the charset x-user-defined uses the UNICODE Private Area 0xF700-0xF7ff to map its range.

Now you just access the bytes in the received binary stream as you would access characters in the returned string:


var filestream = load_url(url);
var abyte = filestream.charCodeAt(x) & 0xff;


where x is the offset (i.e. position) of the byte in the returned binary file stream. The valid range for x is from 0 up to filestream.length-1.

If you use the code above, please leave the comment about the author (me) for recognition of the many hours of labor to find this out. Please bear in mind that even though it worked for me and I'm making this code available in the hope it may help you too, I do not take any responsibility for any problems this code may cause to you or your project!

Labels: , , , , , , ,

39 Comments:

At 8/24/2006 7:52 AM , Anonymous Joe said...

In one word: Wow! This opens up a whole lot of new applications, for instance with Greasemonkey scripts.

A few months ago I concluded that this simply was not possible (see Retrieving binary data with XMLHttpRequest), I'm glad I was wrong...

 
At 9/14/2006 9:50 AM , Blogger Ben said...

This sounds like a great way to fix this problem but unfortunately it doesn't seem to work in IE. I'm trying to preload images using this method but the Microsoft.XMLHTTP implementation of XmlHttpRequest unfortunately doesn't include overrideMimeType. I've tried to use req.setRequestHeader('Accept','text/plain; charset=x-user-defined') but this doesn't work either.

Any help would be greatly appreciated :)

 
At 9/14/2006 11:00 AM , Blogger Marcus Granado said...

Hi, Ben. I developed the technique above to work on Mozilla/Firefox, but I didn't test on IE or any other browser (BTW, IE understands x-user-defined charset). It would be interesting if you could sniff the http headers being sent by IE after using setRequestHeader() as you suggested, to see if the headers are really being modified and compare with the output of FF. You might also want to set the headers 'accept-charset' and 'content-type' to the charset/mimetype above and have a try.

Further, IE seems to be able to natively read binary raw bytes by reading the property 'XHR.responseStream', which FF lacks. Have e.g. a look here, here and
here.
Let me know if any of the above works.

 
At 9/15/2006 8:23 PM , Anonymous Anonymous said...

The Statement "netscape.security.PrivilegeManager.enablePrivilege("UniversalBrowserRead");" causes my Mozilla Firefox to abort script execution. If I remove the line, the script wont work. If there is a way around it, it would be great

 
At 9/15/2006 8:30 PM , Anonymous Anonymous said...

P.S.: If I remove the said line, the script will stop at the line with "overrideMimeType" - the most important line in this script.
I am trying to read an image file (a favincon.ico).

 
At 9/16/2006 5:14 AM , Blogger Marcus Granado said...

Hello, Anonymous,
if you could paste your error messages, I might be able to find out what is going on. Try this image fetcher based on my script above and see if it works for you: Thumbnail fetcher

 
At 9/28/2006 1:49 PM , Anonymous Anonymous said...

Now, how would one get the offset (x) if the binary is server generated dynamic data?

 
At 9/28/2006 7:06 PM , Blogger Marcus Granado said...

>Now, how would one get the offset(x)
>if the binary is server generated dynamic data?

by offset x I mean: the x-th byte of the data stream of y bytes generated and returned by the server (0<x<y) and stored in responseText.

 
At 10/10/2006 5:58 PM , Anonymous Anonymous said...

Here is a solution I came up with that works on IE,FF, and Opera. it uses base 64 encoding to send the binary data to avoid character set encoding problems then is decoded at the client via js. http://h1.ripway.com/BillySurgent/binaryWebFun.zip

 
At 10/12/2006 8:52 PM , Anonymous Anonymous said...

Update: the link above now includes a
24bit bitmap loader and example page demonstrating how to use it!

 
At 11/19/2006 2:41 PM , Blogger Sebastien said...

Guy, you saved me hours and hours of work !!!
This is really excellent

 
At 1/03/2007 4:11 AM , Anonymous k12u said...

I have posted a blog entry about this article in Japanese here . I am going to use this technique for my research project. Thank you.

 
At 1/04/2007 1:59 PM , Blogger Alex said...

Can't say how helpful this was... outstanding!!

 
At 1/06/2007 5:04 PM , Blogger Marcus Granado said...

Hi, k12u, thanks for making it available in japanese! good luck with your research project. alex, sebastian and all the others: it's good to know this hack is being so useful.

 
At 1/10/2007 5:41 PM , Anonymous Anonymous said...

Hi, Marcus.
Great is your work. I appreciate that.
I was planning to make a site
consisted by only pure htmls and javascripts which have embeded-DB functions.
I was searching some solution like yours about binary-accessing with js. Thanks for your effort, I may complete my plan faster then I thought.
Once again, good job, thanks.

 
At 1/31/2007 9:13 AM , Anonymous Anonymous said...

Marcus

I have a Q.
I need to Upload a Document file using Javascript and then stream to the server.

How can i convert the Client side document to Binary format??

Thanks
Vin

 
At 4/12/2007 1:17 AM , Blogger Anthony said...

What about streaming a file to the client and then forcing a save dialog? Any ideas?

 
At 4/19/2007 12:10 PM , Blogger Mark said...

Could you have streamed base64 encoded data to browser where then it is decoded by a javascript decoder?

 
At 4/20/2007 3:30 AM , Blogger Marcus Granado said...

vin: i didn't try uploading binary data, just downloading it. i believe you could fill a buffer/array with your binary data and then use the same charset x-user-defined to send it to the server using post, but this is untested, let me know if you manage to do it.

anthony: if using the correct mime type doesn't work, try injecting an iframe tag pointing to the resource.

mark: yes, you can, but the main idea here is to reuse legacy data, when you do not have access to the server side in order to convert your original data (like a binary format) to something else (like base64), or just to save you the time to do this conversion.

 
At 4/25/2007 11:10 AM , Anonymous Anonymous said...

Cool solution! What I can't figure out is how to display (or handle) the binary data once I have it...

For instance if the data is a PDF, how would I get the browser to display it? Or, if the data was an MP3, how to play it. I can open a new window with content type "text/html" and write to it, but how can I open a new window with "application/pdf" and stream binary data to it?

Any ideas?
Thanks,
lparker@oppenheimerfunds.com

 
At 5/07/2007 5:41 AM , Blogger manish said...

Great work man!! :) You saved lot of work of lot of people..

 
At 5/17/2007 11:14 AM , Blogger Marcus Granado said...

lparker: in firefox, try using a data URL scheme (http://developer.mozilla.org/en/docs/The_data_URL_scheme and http://en.wikipedia.org/wiki/Data:_URI_scheme), and setting the mediatype to application/pdf while opening a new window/frame. let me know if it works.

e.g. for text/html and image/png:

window.open('data:text/html;charset=utf-8,%3C!DOCTYPE%20HTML%20PUBLIC%20%22-'+
'%2F%2FW3C%2F%2FDTD%20HTML%204.0%2F%2FEN%22%3E%0D%0A%3Chtml%20lang%3D%22en'+
'%22%3E%0D%0A%3Chead%3E%3Ctitle%3EEmbedded%20Window%3C%2Ftitle%3E%3C%2Fhea'+
'd%3E%0D%0A%3Cbody%3E%3Ch1%3E42%3C%2Fh1%3E%3C%2Fbody%3E%0D%0A%3C%2Fhtml%3E'+
'%0D%0A','_blank','height=300,width=400');

[img src="data:image/png;base64,
iVBORw0KGgoAAAANSUhEUgAAAAoAAAAKCAYAAACNMs+9AAAABGdBTUEAALGP
C/xhBQAAAAlwSFlzAAALEwAACxMBAJqcGAAAAAd0SU1FB9YGARc5KB0XV+IA
AAAddEVYdENvbW1lbnQAQ3JlYXRlZCB3aXRoIFRoZSBHSU1Q72QlbgAAAF1J
REFUGNO9zL0NglAAxPEfdLTs4BZM4DIO4C7OwQg2JoQ9LE1exdlYvBBeZ7jq
ch9//q1uH4TLzw4d6+ErXMMcXuHWxId3KOETnnXXV6MJpcq2MLaI97CER3N0
vr4MkhoXe0rZigAAAABJRU5ErkJggg==" alt="Red dot" /]

 
At 5/28/2007 2:12 AM , Blogger Cerber said...

Thanx a lot for the piece of code, I'll be using it to save images directly inside the img src attribute.
I'm going to use it to save images which tend to disappear from the Web little time after their appearance.

Below you'll find my code, it adds a function to translate the incoming text to a valid binary string, when the main function downloads an image url and returns a (so far) valid data scheme url.
function translateToBinaryString(text){
var out;
out='';
for(i=0;i<text.length;i++){
//*bugfix* by Marcus Granado 2006 [http://mgran.blogspot.com] adapted by Thomas Belot
out+=String.fromCharCode(text.charCodeAt(i) & 0xff);
}
return out;
}
function getImgAsDataScheme(url) {
try{
//asking for privilège
netscape.security.PrivilegeManager.enablePrivilege("UniversalBrowserRead");
//building the request
var req = new XMLHttpRequest();
//Setting the URL with a synchronous GET
req.open('GET',url,false);
//XHR binary charset opt by Marcus Granado 2006 [http://mgran.blogspot.com]
req.overrideMimeType('text/plain; charset=x-user-defined');
//rending the request
req.send(null);

//Server reported something bad ==> exiting with null
if (req.status != 200) return null;

//building the dataScheme given the true content-type
var dataScheme = 'data:'+req.getResponseHeader('content-type')+';base64,';

//translating the text to a valid binary stream
var stream = translateToBinaryString(req.responseText);
//translating the binaryStream to Bas64
dataScheme+=window.btoa(stream);
//returning the result
return dataScheme
}catch(e){
alert(e);
}
}

 
At 6/14/2007 2:14 PM , Anonymous Anonymous said...

Doing the same in IE works fine in VBScript, JavaScript is not able to handle the SafeArray present in responseBody, I also tried toArray of VBArray but didn't work.

If you are ok with using VBScript the following might help.
in my test 'folder.bin' is a BMP file folder.bmp

<html>
<head>
<script language="VBScript">





Sub doOnload()

Dim xhr
Set xhr = CreateObject("Microsoft.XMLHTTP")

xhr.Open "GET", "folder.bin", False

xhr.setRequestHeader "Accept-Charset", "x-user-defined"
xhr.setRequestHeader "Content-Type", "application/pdf"
xhr.send Null

Dim xx


xx= xhr.responseBody


Dim dump
dump =""


Dim i
i=0


Dim x
Dim c

For i = 1 To LenB(xx)
c = Hex( AscB(MidB(xx, i, 1)) )
if Len(c)=1 then
c = "0" & c
end if
dump = dump & c & " "
if (i Mod 16) = 0 then
dump = dump & Chr(10)
end if

Next



document.form1.hexval.value=dump


End Sub

</script>
</head>
<body onLoad="doOnload">
Hello image
<form name="form1">
<textarea name="hexval" rows=20 cols=90>
</textarea>
</form>
</body>
</html>

-- ifti

 
At 6/14/2007 2:18 PM , Anonymous Anonymous said...

also note, even if you don't set the request header for accept-char
and content-type, the responseBody seems to work out fine.
I think the trick was simply to use the MidB and LenB VBScript code.
--ifti

 
At 9/26/2007 4:15 AM , Anonymous Anonymous said...

Hi Marcus,
thanks much for download trick! I'm trying to adopt it for upload, but Firefox's send method seems to set Content-Length with strlen of passed 'string', so it stops on first \0. I can't even prove your 'charset x-user-defined' idea at upload, since I got zero chars very soon at stream(zip file).

Bad luck, I'd have to base64 encode mine uploads, loosing bandwith. FF will be again the worse compared to IE (since IE's send accept binary arrays).

--
Martin

 
At 9/30/2007 5:01 PM , Anonymous Anonymous said...

This is just what I needed, but I can't get the last bit to work.

I have a site that has some large PDFs that take 10 to 30 seconds to download. When displayed inline, the users get impatient, so I wanted to throw up a progress bar. I'm using the above script, in async mode with a callback, to update a progress bar, and it all works fine. I end up with the PDF file in the javascript variable just fine ... but I can't get it into the browser. I tried the data URI Kitchen at http://software.hixie.ch/utilities/cgi/data/data to get the correct format, but I still can't get Adobe to display the PDF data. Adobe will attempt to open when I feed the javascript var with the appropriately coded data: URI "data: data:application/pdf,%25PDF-1.6%0D%25%E2%E3 ..." to the browser, but Adobe hangs, never shows up in the browser, and has to be killed in task manager.

Any suggestions?

 
At 10/03/2007 4:38 PM , Blogger troy a. said...

hi fokes,

this is really neat, and kudos to mgran. however, I'd like to point people to this article:

http://www.rodsdot.com/ee/scriptingRemoteImages.asp

As the author says, trying to send binary data using the XMLHTTP object isn't really what AJAX was meant for:

"You can encapsulate a binary stream inside an XML envelope and encode the binary data (say Base64) then decode it on the client page, but this is not practical for an image file data.

In keeping with our tool metaphors, that would be like driving a nail with a steamroller."

 
At 10/08/2007 6:25 AM , Blogger Pavel Šavara said...

Hi Marcus,
thanks for the article. I summed your ideas and ideas from comments of your readers and implemented Single byte reader for both FF and IE here.

Pavel

 
At 11/08/2007 6:09 AM , Anonymous Anonymous said...

Genial article, my most sincerely congratulations.

The link http://h1.ripway.com/BillySurgent/binaryWebFun.zipisn't available

Anyone could provide a new link to this or explain how it works with Opera?

Thanks in advance

 
At 11/18/2007 8:17 AM , Blogger molchuvka said...

IE-related: I have read around, tried all kinds of tricks, and the only way I found is to transfer via base64 or similar encodings. If anyone will find a better method, please tell.

 
At 2/02/2008 1:09 AM , Anonymous kjeet said...

I want to save an image selected by user to the server database.
and vice versa.
I am sure this will help me a lot.
but i want to do it in IE,

a link sent by anonymous doesnt seem to open,
http://h1.ripway.com/BillySurgent/binaryWebFun.zip

can you please give me some examples
where the same has been done in IE

 
At 4/02/2008 3:04 PM , Anonymous Edoardo Marcora said...

Has anybody figured out how to post binary data using XmlHttpRequest? Downloading binary data works like a charm using this trick... but I can't get uploads to work unless I do the string2binary conversion and a base64 encoding on the client side, which is very heavy. I would like to send the string received as x-user-defined, sending to a server via XHR and do the conversion on the server.

Any progress in this area?!?

Thanx!

 
At 4/05/2008 11:21 PM , Anonymous bianbian.org said...

Excellent! Thank you very much. I translated this technique in Chinese: [译] JavaScript (XMLHttpRequest) 读取二进制数据流. Hope working in IE also.

 
At 4/07/2008 7:30 AM , Blogger nagoon97 said...

I've put together this with an IE solution as a single function at
http://nagoon97.com/reading-binary-files-using-ajax/
:-D

 
At 5/14/2008 8:32 PM , Blogger Martin said...

Dude I am working in computer/information security (don't know how to say in english) and that shit will be really useful for me but I hope not too many hacker will find it :)

Good Job

 
At 6/01/2008 7:59 AM , Anonymous Anonymous said...

Hi! I successfully used part of your solution. The only funny thing is that I don't even have to do any "&"-operation at all.

I was first pointed to your page at

http://codingforums.com/showthread.php?p=695420
(My code-result there.)

I just hope it will work on all jpg-pictures, so I won't have to readd the "&"-operations. =)
So far I downloaded and saved 2 pictures and it worked.

Greetings
DH

 
At 7/11/2008 9:46 AM , Blogger chenjl said...

great trick!
This helps me a lot.

However, I found that this trick only works with readyState==4. Since I am doing XHR streaming and keep fetching data in readyState==3. The binary data I got between 128~159 are still messed up...

But I found it is messed with patterns. So I do lookup table to map it back, and this works!

here is my code:

R3Mapping = [8364,129,8218,402,8222,8230,
8224,8225,710,8240,352,8249,
338,141,381,143,144,8216,8217,
8220,8221,8226,8211,8212,732,
8482,353,8250,339,157,382,376];
pos=0;
xmlhttp.onreadystatechange=function(){
if (this.readyState == 3) {
if ((this.status == 200 this.responseText.length > 0) {
var recvdata = "";
for (var i=pos; i < this.responseText.length; i++) {
var cb = this.responseText.charCodeAt(i);
if (cb > 255 || (cb >= 128 && cb < 160)) {
recvdata += String.fromCharCode(R3Mapping.indexOf(tt) + 128);
} else {
recvdata += String.fromCharCode(this.responseText.charCodeAt(i) & 0xff);
}
}
pos = this.responseText.length;
if (recvdata.length > 0) {
{do job with recvdata};
}
}
} else if (this.readyState == 4) {
blah, blah, blah...
}
}

Now, I can do binary data XHR streaming with Firefox! GREAT!

(I might write this more detail later in my blog.)

 
At 7/25/2008 8:10 AM , Blogger Johan Sundström said...

You are going to extremes for little reason. To send and receive raw octets, it is easiest to declare the encoding, where the eight low-order bits of Unicode are sent as is. The name of that encoding is iso-8859-1 a k a latin-1 in layman's terms. Ideally you set up your server correctly to declare this encoding for the files you receive, so there is no need to override content-type headers from the client end.

Then you also need not do the byte-by-byte binary and with 0xFF, because each byte sent already is in the \u00XX range.

This goes equally for GET and PUT and POST, by the way, so you can send raw data in any direction and need no manual recoding step anywhere.

 

Post a Comment

Subscribe to Post Comments [Atom]

Links to this post:

Create a Link

<< Home