How to go further with the fileupload

Posted by Matthias 
How to go further with the fileupload
February 07, 2013 10:08PM
Currently, there is the discussion what to do with droopy and the upload stuff.

Droopy
Droopy is currently included via iframe on the startup page. Droopy is an extra python script started during startup.

Pro
- The current implementation works relatively stable
- writes direct the incoming data to the target location called "tmp*"
- Has a first implementation of sub-directory upload
- UI of droopy looks ok
- any filesize is possible

Con
- (Python)
- extra thread is active
- upload relative slow (is it really?)
- extra TCP port
- integration via JS is not really possible (with different port, it counts as another host)
- Iframe works with hostname - which can`t be changed easily
- remote (mesh) upload difficult is not so easy to implement (because of hostname)

Lighttpd "complete file" upload
This is a simple solution for uploading files with normal HTML form. In background a <python/php/perl> script is called by lighttpd which recieves the data and handles the file stuff.
The problem is, that lighttpd puffers all incoming POST-file data to temporary files until it is complete. If everything is transfered, the script is called and then receives the data-stream (reading from disc again) and saves it to the destination place.

Pro
- (Mesh) remote upload works
- can be very nice integrated
- relative simple and classic solution
- no extra thread

Con
- neet to be programmed
- needs goood validation of the upload (do not overwrite, no index.html etc)
- any other language beside python has to be installed
- lighttpd puffer problem wastes time
- last tries with > 100MB resulted in hardresets of the router because of heavy system loads


---
? how to configure the upload destination? Config file?
? upload folder support ?
? upload with password ?

Lighttpd with JS uploader
Complex solution of JS and backend-scripts. Frontend sends via JS only parts of the file, called chunks, with optimized chunk-size.
Tested solution can be have a very nice look & feel, but the tested uploader2 was very unstable in high load situations of the backend.

Pro
- Can look very nie
- cool userexpierence, because of upload process and stop&resume
- no extra thread
- (Mesh) remote upload works fine

Con
- very complex solution and example was unstable
- causes high load because every chunk triggers a new php
- chunk size has to be optimized (between performance single upload and multi upload)
- multi upload causes much load and trouble, sometimes
- any other language have to be installed
- (if you use /tmp as lighttpd puffer) can cause reboots with alot of simultan upload because of /tmp usage
- need fallback for normal clients
- had problems with uncommon filenames

---
? how to configure the upload destination? Config file?
? upload folder support ?
? upload with password ?
? need limitations to limit load ?

Lighttpd with fcgi
fcgi is an implementation which lets webserver threads of i.e. php open to save recources while high load situations.
Saves the time of starting a new thread.

is this really so?
is the more complex solution it worth?
Does it really help? we don't know yes

Switch webserver
Does this help? This can solve the problem with the lighttpd data puffer for the normal or JS upload solution.
Re: How to go further with the fileupload
January 26, 2017 11:20PM
I know this thread is very old but I think it's still something to talk about? Also I might have some infos, but a screenshoot first:

What you see here is a upload handled by lighttpd + cgi. The cgi script is written in C and it's designed to be fast and stupid, so I guess you guys won't need it but it proofs uploading via lighttpd directly is possible. Not only that but it's possible without excessive disk trashing and without the need for the cgi tool to wait for the upload beeing completed, just add "server.stream-request-body = 2" to lighttpds config file (tested with lighttp 1.4.45 on debian jessie (just use the .deb from unstable), might work on older versions, too).

The largest file I uploaded with this so far was 7.2G, so handling large files also isn't an issue.

With that said I really think removing droopy and using lighttpd + cgi is the way to go (if the devs are still thinking about that).
Re: How to go further with the fileupload
January 31, 2017 06:43PM
I'm thinking that we shouldn't rely on CGI scripts to upload large files.

Uploading to lighttpd is a process in which data comes as a POST request and get written in 1Mb chunks to the server.upload-dirs location that you can set in the config file.

To process it with a CGI script we have to load all the file on disk and manipulate it.

The alternative is using a streaming request as V10lator pointed out, just pass everything to a program as a bytestream but this still have a certain amount of issues, file rewriting, interruptions,

I would happily cast my vote to keep droopy for smaller uploads and work with a ftp server for bigger uploads.
Re: How to go further with the fileupload
January 31, 2017 08:01PM
Hey guy,
thanks for digging out that old topic. Awesome!
To be honest, I am so covered with my daily work.. I haven't found much energy to work much on PirateBox over the last month. :-(

I vote against a C program, because it changes our "script-collection" into device depended binaries. :-(
Do you think it is possible to use the stream option with another script language?

best regards
Matthias
Re: How to go further with the fileupload
January 31, 2017 08:28PM
edoput Wrote:
-------------------------------------------------------
> To process it with a CGI script we have to load
> all the file on disk and manipulate it.

No. lighttpd will give you a bytestream on stdin no matter what. These on-disk 1MB files are just for lighttpd internally and you should never use them directly. Also re-read what "server.stream-request-body = 2" is good for.


Matthias Wrote:
-------------------------------------------------------
> I vote against a C program, because it changes our
> "script-collection" into device depended binaries.
> :-(
> Do you think it is possible to use the stream
> option with another script language?

I'm also against using C programs in PirateBox. Yes, it should be possible to use other languages (everything that you can use for CGI + that can read environment variables and stdin should be good).

Lighttpd gives two important environment variables to the CGI script: REQUEST_METHOD (should be "POST" ) and CONTENT_TYPE (should start with "multipart/form-data" and also contain the boundary). Then there's a third one: CONTENT_LENGTH - This is the length of the bytestream waiting on stdin. There might be more environment variables but these are what my c tool uses to determine it's really a file upload. Then it's code that should be simple to port in any other language:

Language: C
while(len > 0) // len == CONTENT_LENGTH { size_t got = fread(buffer, 1, len < BUFSIZE ? len : BUFSIZE), stdin); if(len < 1) { // Error handling here } else { len -= got; // Do something with the buffer, i.e. find boundary, read header, pipe body to a file, ... } }

Hope this helps. smiling smiley
Re: How to go further with the fileupload
January 31, 2017 08:38PM
Thank you, very cool about that info.
I coded a perl uploader when I did the first prototype of the lighttpd based piratebox... this is ages ago.

I'll add this as a github issue, then it isn't digged in the forum.
Re: How to go further with the fileupload
February 01, 2017 12:05PM
V10lator Wrote:
-------------------------------------------------------
> edoput Wrote:
> --------------------------------------------------
> -----
> > To process it with a CGI script we have to load
> > all the file on disk and manipulate it.
>
> No. lighttpd will give you a bytestream on stdin
> no matter what. These on-disk 1MB files are just
> for lighttpd internally and you should never use
> them directly. Also re-read what
> "server.stream-request-body = 2" is good for.

Server stream requests in v1.5

The default is writing a file to a buffer and then store it somewhere, the other options are the streaming request and a hybrid between the two.

I can't find the documentation for v0.9, the one on my piratebox, but maybe they already had these options back then.

Again, I'm not feeling well leaving all as http, I understand that we have content range multipart uploads but it requires too many things as opposed to a ftp server and a dead simple browser addon that we can store on the piratebox.
Author:

Your Email:


Subject:


Spam prevention:
Please, enter the code that you see below in the input field. This is for blocking bots that try to post this form automatically.
 ********  **     **  **     **  **      **  **     ** 
 **        **     **   **   **   **  **  **  ***   *** 
 **        **     **    ** **    **  **  **  **** **** 
 ******    **     **     ***     **  **  **  ** *** ** 
 **         **   **     ** **    **  **  **  **     ** 
 **          ** **     **   **   **  **  **  **     ** 
 ********     ***     **     **   ***  ***   **     ** 
Message: