Comment in Feedback 05/08/2020 23:41 — Anonymous: "LastExile"
Found something part of Fate/Grand Order out of place and thought you might want to give it the home you made for it. -->Manga de Wakaru! Fate/Grand Order https://animetosho.org/series/manga-de...rder.14579
Pre-2016, mplayer was used to render the video and save out PNG images.
Now it works by extracting frames and rendering them as separate steps. Frame extraction is done using ffmpeg, subtitle rendering done via VapourSynth and image rendering done with PyAV (libav* wrapper). There's a more detailed write up on how it works here.
The main issue with rolling CRC is that it's relatively slow (and problematic if it finds too many false matches), so it's often restricted. By default, par2cmdline will only do rolling CRC checks for 64 bytes, so it only really works for small movements. Since this is all very custom though, you could just increase the limit though, at the expense of processing speed.
Selecting a compression block size could be difficult. In general, you want it to be large to maximise efficiency and minimize the frequency it straddles PAR2 blocks, but not so large that small changes require lots of recovery data.
The data is in TSV format. Padding is rather unusual there, but I suppose not impossible. I don't really get the aim of it though, since compression would eliminate any padding you add to uncompressed data.
output something that can be processed with standard tools that someone else maintains because they're useful elsewhere.
Do you have an example of such a standard tool which can handle the scheme you describe?
> the compressed blocks won't be 4KB - their sizes will vary (hence they won't align to some PAR2 block boundary) So long as the changes to the output are more than a block-length apart, par2 will find a block anyway, because it uses a rolling CRC to look at one-byte intervals for candidate blocks (which it then tests against a proper hash that's not CRC32). So long as you're resetting the huffman table at a deterministic place (for example the nth-id'd INSERT statement), it doesn't matter to PAR2 that this new block isn't located at a block-length-multiple offset, just that there's at least a block worth of unchanged compressed output. What would trip up PAR2 is changes happening across the file at a (compressed) distance less than a block length.
>Also, later changes can affect the output of earlier bytes in the block This is true, it's not a design goal that changes inside a block are confined to a subset of the compressed block, merely that two compressed blocks are independent of each other.
Admittedly I've not looked at a dump (because they're huge), but if they're standard [My]SQL dumps, then you could sort the INSERT statements by their primary key (if they're not already), than pad every nth id with a comment to align it to an mKB boundary. The idea would be to constrain all the bespoke code to the server-side, and output something that can be processed with standard tools that someone else maintains because they're useful elsewhere.
Ahh, I see - if you're looking to make some end-user application, database dumps isn't the right solution. There's currently no database index on this sort of information, but if you can write up exactly what you want, I can help look into it for you.
Thanks for considering an API. Just looking for a way to search for subtitles, an API that lists torrent entries that have subtitles for a given AniDB ID would be nice. If you're ok with hosting an API for that, I could make it.
If you happen to see this: I'm not sure what is exactly required, but in terms of numerical categories, 5070 is the anime category (and the only one that gets served here).
Comment in Feedback 12/07/2020 21:29 * — Anonymous: "Jacklyn Leboeuf"
Completely new to the game and I'm wondering how I would go adding anime tosho to my Sonarr as an indexer? I have no idea what I should enter in the "categories" field and all my searches with anime tosho on Sonarr return no results (even though the episodes exist on this site). Could anybody help me which numbers to add to "categories" and to "Anime categories"?
Thanks for the explanation. That sounds like typical schemes where compression is broken into blocks (or using zlib full flushes periodically). I think your understanding may be a little incorrect though. If you break the input into 4KB chunks and compress them separately, the compressed blocks won't be 4KB - their sizes will vary (hence they won't align to some PAR2 block boundary). Also, later changes can affect the output of earlier bytes in the block - a change in byte 2203 can affect the entropy used to encode the first bytes of that block. This also does degrade compression efficiency.
I probably should also mention this: it doesn't sound like there's any standard tool for doing this either, which means that someone would have to write it. And even then, it's going to be a fairly custom setup that few are going to adopt. Engineering a solution may be interesting, but I don't really want to over-complicate the export process (can make things more likely to fail, takes effort etc). To be brutally honest, I don't actually see the size as a big issue. Currently the total size of all dumps is 250MB, which is smaller than most video files on offer here. Even if you downloaded them every day, that's only 7.5GB/month. Now I understand that some people have more restricted internet and the like, but then I question what use you would have in getting full dumps every day. However, I'm happy to be corrected here. May I ask if you're trying to achieve something in particular with these dumps? Maybe there's another way.
If you're willing to develop something custom, I might suggest just scraping feeds or the like for data. Alternatively, if you're willing to develop some API which can query a MySQL database for the data you want, I may consider hosting it.
Thanks again for the suggestions, and hope that this is of some value.
How about incremental snapshots? Like 1 full dump followed one snapshot for each day limited by x days. To keep in sync check whether local version is older than x days or not, if older then get full dump, if not then get snapshots since last local sync
So in principle, if the format is laid out chronologically and one-file-per-table, the changes will all be on the end and the start of each large file should always be the same, and compress the same.
"rsync-compatible" is a mode where it resets the dictionary every so often, so there's a finite limit to how far changes propagate, for example if the interval is 4096, and byte 2203 changes, this changes the compressed stream for bytes 2204-4095, but byte 4096 will be the same.
If you just want stuff without subtitles, you can download the soft-subbed files here and remove the subtitles. Technically, many of the webrips here are "raw" in the sense that they're straight rips from the source. If you're looking for BD/DVD rips, your choices might be limited, due to them being less popular and expensive to deal with (large file sizes). I recall jpddl.com used to offer raws, but don't know how usable they are now.
Thanks for the suggestion. I'm not sure what you mean by "rsync-compatible compression" (rsync just uses zlib for compression), but PAR2 only really works well for corruption type situations where the data isn't shifting around much. I suppose dumps often do contain the same data across days, so it could work if the PAR2 was generated at 100% redundancy, but it is quite involved (i.e. a few manual steps required). The other problem is that you can't use compression at all.
If the archive is done with rsync-compatible compression, then generating a PAR2 set would do. Download .par2, feed in the old dump, out comes how many blocks you need to download to get the current dump. I've not downloaded or looked at the underlying data, but if it's a series of table dumps, I think each table would need to in its own file or padded to par2 blocksize alignment.
Firstly, it's nice to know that they're actually being used. I'm not too sure how to do this nicely. rsync is problematic for public distribution. Patches might be possible, if performance is reasonable on dump files, but it'd only work if you always have the latest files. A 'workaround' solution may be just to download them less frequently, or at least for the 'files' table.
I haven't ever accepted any style submissions here, but it's not out of the question. If you're familiar with CSS, you can edit the style code and apply it using a plugin like Stylist or Stylish. There's two custom styles posted here, and if you create one yourself, you could post it there too.
You can create your own style just for your machine using your brower's tools. No AT involvement is required. AT styles already cover most of the spectrum for most people.
05/08/2020 23:41 — Anonymous: "LastExile"