You don't know how the partial block at the end will be filled [...]
Oh, I see the problem now. Thanks for your explanation. The last block of the file might contain information from the next file, and since that would be included in the block hash, it wouldn't be useful for detecting duplicates.
My thought process was to split the file into blocks and hash the partial last block, but I didn't consider the interaction with torrents containing multiple files.
Sorry to be a bother, but if I understand correctly, this would also not allow for detecting duplicate files that aren't aligned to block boundaries, is that the case?
It would also be nice to know the rich text formatting used on this feedback page. Figured it out, it's pseudo-HTML.
Another minor question: the documentation says "CRC32" a lot sometimes, which CRC polynomial is actually used here? The Ethernet one?
As opposed to including the partial block at the end? If so, I don't see how you could make it work.
Keep in mind that the point is to provide an indicator to detect duplicates. Arguably it's not 100% accurate, but it should be close. You don't know how the partial block at the end will be filled - if it's the last block of the torrent, it isn't padded, if it's not the last block, it's filled with the contents of the next file. There's no way to compute the hash for the latter case, since you can't anticipate what file will be concatenated onto the end. The former case is computable, but would only be useful to match against torrents where that file is at the end and the file's offset is block aligned - basically it'd only be useful for matching against single file torrents, but useless for matching batches.
[quote]The hash isn't intended to detect errors, but if you used it for such, then yes.[/quote] Ah, of course, my bad. Ditto for duplicate detection though, files with different trailing data will hash equal. Is there a reason it's done this way? In any case, it's not like it can be changed at this point, and even if, it would be of small benefit, if any. Thanks for the quick response.
> torpc_sha1_*: hex encoded SHA1 hash of concatenated SHA1 hashes (binary encoded) of the respective block size. For example, the torpc_sha1_16k hash is obtained by breaking the file into 16KB blocks (if the last block is less than 16KB, it is discarded), calculating a 20 byte SHA1 hash for each block, concatenating these hashes, which is then fed through SHA1 to obtain the final hash. The selected block sizes correspond with the most common piece sizes used for torrents, and hence this hash can be useful in trying to detect duplicate torrents which have different info hash values.
This means that errors in the last block will not be detected, right?
> Mediainfo and related data are currently not included mainly due to size and time it takes to dump the data. The data is also compressed using a custom LZMA2 based scheme, which users would need to implement a decompressor for. I may consider including this data if many are interested in such.
This has made me curious about that scheme. Would you mind sharing it or some code implementing it?
Thanks for the awesome site, cute design, and making these database dumps available :) Very cool!
No, accessing a subdomain on an onion address (storage.xxxxx.onion) has no meaning in Tor, it will always go to the onion service by the xxxxx public key. The HTTP client/web browser will however send `Host: storage.xxxxx.onion` so it's useful for virtual hosting on the same webserver/onion identity. Since the main site and storage server are physically separate you should instead run two onion services, one on each server, which will have different identities (xxxxx.onion for the server on animetosho.org, yyyyy.onion for the server on storage.animetosho.org).
Thanks for the suggestion. This site doesn't actually use load balancing; subdomains direct requests to the secondary server (e.g. storage.animetosho.org). Do you know if subdomains can be set up to direct requests to different servers (without a load balancer)?
Hello! Is there an "Animetosho for live action" site? (There is an anglo-american TV series said to be very similar to Bokuyaba / Dangers in My Heart in its characters and premise, so I want to check it out.)
I had a quick look at it and it initially appears straightforward. A complication is that this site runs across two servers, so I'm not sure how easy it is to configure it that way. I'll put it on the nice-to-have list for now.
Yes, but the uploader will still be you - even if you automated the process of retrieving the files from torrent and uploading it then linking the NZB.
Ah damn, you are right i can just use the GZip'd files in my downloader. Thanks for the tip! I meant, if the contents of the NZBs are uploaded by you or by someone else.
Thanks. Most clients should accept a GZip'd NZB directly, so it shouldn't really matter if it's archived or not. You can just remove the '.gz' at the end of the URL to get something that isn't in a GZip. I don't get your 'hotlinked' question. The NZB is generated here.
Love this website, would love having NZBs not archived but i understand the attempt on saving Space. Are the NZBs uploaded by the website or are they 'hotlinked' too?
Thanks for the question. The size limit is mostly there to minimise bandwidth wastage in the case of failures (often, you can only tell if the upload succeeded or not after all the data has been sent). It's sometimes also helpful for downloaders, as not all hosts play nicely with resume. For Gofile, it's set at 2GB, so the vast majority of files won't be split.
I can look at increasing the threshold if there's interest in such.
20/03/2024 13:07 — Anonymous