does the search engine/DB only return the subset/limit?
Yes, a LIMIT clause is being used.
Is it more expensive to filter the limit at the search engine/DB level?
No, it's generally best to do the filtering on the DB side.
For example, if I query for a series that has 150 items and the limit is 50, then the script calls for the search engine to find all 150 items.
The problem is if there happens to be 100,000 items - trying to pull these off disk into memory isn't going to be performant. With a limit set to 50, you only need to read 50 items off disk instead of 100k.
On further look, I think you can use the sphinx_total_found meta attribute.
Thanks for finding that. I'm not sure why my searches didn't find the SHOW META query. This looks feasible, but will only work if a search is being performed (which is probably all that these apps care about anyway).
I've added to the API now so hopefully it starts showing up now (cached result sets may take a while to clear). Thanks for the tip!
I have minimal experience with full text search engines and decent experience with SQL databases. When the API performs a search query on behalf of a user, does the search engine/DB only return the subset/limit? Is it more expensive to filter the limit at the search engine/DB level? To get what items to return the search engine/DB has to query all records anyways before limiting.
For example, if I query for a series that has 150 items and the limit is 50, then the script calls for the search engine to find all 150 items. Once the items are known, do a sum of items and also use the offset/limit to determine which items to return as pagination. I suppose this can be done by script or in SQL.
Thanks for pointing that out. Unfortunately, I'm a rather hesitant to implement a total (which is why it isn't there). The problem is, to get this figure, the script needs to issue a SELECT COUNT query with the same set of filters used in search. To put it another way, every time someone makes an API call, the script would effectively need to perform two searches - one to grab the results, and a second to grab the total. In terms of resource usage, it's likely worse than 2x server load, because the results query only need to find 75 records, whereas the count query needs to find all of them (though it may be possible to put a cap on this).
It may be possible to selectively include a total, but it likely won't help your cause. If you or anyone knows of an efficient way to get counts in Sphinx Search and MySQL, I'm willing to investigate it (I tried searching, but couldn't find anything). Otherwise, it's a maybe at best. The API gets >1M hits per day, and adding a total is a fair chunk of additional load for a relatively inconsequential (at least in my opinion) thing, but I'm not completely against the idea, as it is probably more in line with other implementations.
Note that results on this website don't display totals; navigation is done via Next/Previous links for this very reason.
Aside rant: I really don't like how Newznab's API has effectively become a standard - it's rather unfriendly to third party implementations (which is fair, since I doubt it was an aim of their design). I can kinda see why such has become the case, but it could be beneficial if a standard, which isn't so tied to an implementation, gets widely adopted. If the API supported AniDB, for example, you wouldn't be as reliant on full text indexing just to find entries for a particular series.
Are you referring to the Nyaa.si repository takedown? If someone decides to send takedowns to my repositories, then there's not really anything that can be done about it. They aren't necessary for this site to function though, so I'm not seeing much concern.
Glad you found the info useful and good luck on your project when you get back to it!
NZBHydra2 is a popular Usenet meta search aggregator used in the media server community and the dev is requesting that this be implemented so that it can support AnimeTosho as an indexer better.
Hey Admin, given the attack on nyuu's code I'm wondering if you have a plan if they come for you?
Also, thanks so much for your recent logistics page. Last summer I asked about doing a "jav tosho" which I made some small progress on. Mainly I learned I don't have to money to do what I want to do now and your page confirmed my cost estimates. Still hoping for a better paying job so I can make it happen one day.
Congrats. You're running 7 years old software with (at least one) 5 year old severe security vulnerability. Remote code execution made easy. https://www.cvedetails.com/cve/CVE-2015-5474/
Thanks for letting me know. May I ask which version of uTorrent you're using, and your OS? I disabled TLS 1.0 and TLS 1.1 recently (so TLS 1.2 is the minimum required) - a number of sites have already done this. I don't mind re-enabling it, if that's the issue, though would like to know what's the baseline to support.
Update: so from a search, it seems that 2.2.1 may only support TLS 1.0, so I've re-enabled TLS 1.0 and 1.1. It may be a good idea to look at adopting a newer client though, as I expect TLS 1.2 to be a new minimum across the internet.
From your speedtest, I'm guessing you're in Russia, where I think they go to some effort with enforcing their blocking. According to this tester page, there's no IPv6 blocking, but it appears various IPv4 addresses are blocked, including those from providers like CloudFlare (which the meow mirror uses and might get fully blocked). So it sounds like your theory is valid.
Using a mirror is likely best at the moment.
If your VPN doesn't provide some split tunneling setup, and you're willing to mess with routing, you could always have your VPN active but set the preference on your main NIC higher (i.e. VPN active but not used by default). Then, the script could just bind to your VPN NIC, so only that uses the VPN. Alternatively, if you don't want to mess with routing, you could set up a VM/container and run the VPN+script in there.
Comment in Feedback 18/01/2021 20:55 — Anonymous: "Sad panda"
RSS not working on uTorrent since this morning (European time) [2021-01-18 06:38:17] RSS: Unable to download: Could not establish secure connection. Error -2146893054
Please fix
Comment in Feedback 18/01/2021 15:25 * — Anonymous: "Kennith from GetPeople.io"
> https://meowinjapanese.cf/ Hm, didn't saw that one before, this is actually work. But some other are not, like I tried some before, but they got blocked too later.
Some time ago I tried other tool like GoodbyeDPI, but it seems like ISP is checking logs or improving protection, so I switched from over time from least protective method (-4 flag) to most (-1), then it stop working completely. It's getting more and more wild over time, but I don't want to use VPN. Partially because I have static and white IP and I hosting some stuff on my PC on my IP, the other side is that now I have really good internet quality and speed (I have optic fiber right into my router (SFP module at end), https://www.speedtest.net/result/10536540645) and I don't want to order a VPN with less speed (downgrading), speaking that it can't work on specific apps and will work globally which I don't want. And ordering it with same quality will cost a lot.
To sum it up, I have really good ISP, but it just doing, what ordered government and there is not much I can do about it.
Oh, and one thing, I also have IPv6 address and I configure several things on my end both in router and in browser: I use DNS-over-HTTPS/TLS (Cloudflare's 1.1.1.1) and priority is IPv6 over IPv4. Now I just make a check for 3 sites: nyaa, this and meow | here: https://ipv6-test.com/validate.php Nyaa don't have IPv6 address. Your and meow has. It might been, that even both your and meow sites blocked for me too, but I was able to connect to IPv6, those, bypass blocking. That might be the reason I can open this site and your.
Oops, forgot to respond to that. Have you tried any of the many mirrors available? For example https://meowinjapanese.cf/ Or have you looked at how the block is implemented - e.g. DNS redirection? SNI sniffing? There may be easy ways to work around it.
Ah, for such a bot, it sounds like Nyaa's RSS would suit you better. As you've pointed out, AT only does one category.
Subtitles are a separate process from releases and don't occur at the same time (i.e. they could appear much later). You could try to build an attachments link from the entry ID I suppose, though there's no guarantee that it exists.
I just check json feed and it really nice, a way easier to parse.
It a bit shame, that I lost access to other stuff, like I don't mention, but this filters also affected on channels where is bot posted, so we also have records for other type of content, like music feed in music channel. I know your site is collection only anime, but it at least something, than nothing at all.
In the past I had bot, which get new releases from nyaa.si and post it into Discord servers, filtered by conditions they add, like a releaser name, e.g. HorribleSubs and several others, then may be filiered by name, e.g. excluding Dragon Ball, Boruto, Fairy Tail and add check to show only 1080p.
My bot get following: link to view page, name, category, size, torerent download link and magnet link and construct with formatted string like that:
There is no formatting on this site, so, view and name was **bold** and view is actually link formatted like [text](https://url), torrent and magnet are also links and magnet code after enclosed in inline `code` block.
However I host my bot locally, and as happened, this website was blocked by my country, so I was unable to open it without VPN, which I don't want to do, so I just shut down bot. Some days ago I saw conversation in discord where some guy arguing in SubsPlease about not posting softsubs (.ass) somewhere without raw, he don't want to download episode to then only extracting subtitle, which is where other guy appeared and suggested this site, which I was found for myself as well.
So back to bot, I was thinking if you would have in feed URL to attachment, I also can include this in my message like above, but:
The majority of languages should have XML parsers available - you should avoid trying to parse it yourself if that's what you are thinking. I'm not sure the RSS spec allows for a magnet to be specified directly. The feed is designed to list releases, so information specific to that is included. I can't really go about listing everything in there. If I may ask, what are you trying to do with an API for extracted subtitles?
I've modified the feed so that it can serialize to JSON instead of XML. You can access it by taking a feed URL, replace rss2 (or atom) with json.
Also two things I was found, there is no clear magnet link (you need to parse from description) and there is no link to extracted subs in RSS. Are they removed for purpose?
Not JSON, but if it helps, there's RSS/Atom feeds available (use the feed icon in the top-right of listing pages). Alternatively, there's also database dumps available to get a full listing.
Thanks for the notice. It seems to be up to date here - perhaps it slow for a period. Do you know what it was stalled on (i.e. the last entry available)?
Glad you found it interesting! Fortunately, with the heavy emphasis on automation here, I personally don't actually spend all that much time these days. (Gurphy_TC probably spends more time keeping things running smoothly here)
Thank you for the logistics page, admin! Fascinating stuff.
The monetary costs may not be that high, but the expenditure of expertise, time, and repeated explanations that no, you cannot upload things that no longer exist, is truly inspirational. A toast to all you do!
There were reports of others experiencing similar issues. Dunno if it's still problematic - I tried from a few endpoints but couldn't reproduce. Nothing reported on their Twitter.
It's interesting to note that Nyaa uses DDoS-Guard, which may suffer from increasing scrutiny in the future. I don't suspect it to be the cause of the 503 errors, but could be something to watch out in the future.
21/01/2021 20:53 — Anonymous