Freezing while downloading large datasets through Shodan?

Freezing while downloading large datasets through Shodan? - anaconda

I'm using Shodan's API through the Anaconda Terminal on Windows 10 to get data against the query below, but after a few seconds of running, the ETA timer freezes, and my network activity drops to zero. Hitting Control+C restarts it when this happens and gets it moving for a few seconds again, but it stops soon after.
shodan download --limit 3100000 data state:"wa"
Also, when it is running- the download speeds seem pretty slow; and I wanted to inquire if there was any way I can speed it up? My Universities internet is capable of upwards of 300 Mbps, but the download seems to cap at 5 Mbps.
I don't know how to solve either of these issues; my device has enough space and my internet isn't disconnecting. We've tried running the Anaconda Terminal as an Administrator, but that hasn't helped either.

I am not familiar with the specific website, but in general seeing limited speed or stopped downloads are not caused by things 'on your side' like the university connection, or even your download script.
Odds are that the website wants to protect itself, and that you need to use the api differently (for example with a different account). Or that you have some usage limits in place based on your account, that you hit.
The best course of action may be to contact the website and ask them how to do this.

I heard back from Shodan support; cross-posting some of their reply here-
The API is not designed for large, bulk export of data. As a result,
you're encountering a few problems/ limits:
There is a hard limit of 1 million results per search query. This means that it isn't possible to download all results for the search query "state:wa".
The search API performs best on the first few pages and progressively responds slower the deeper into the results you get.
This means that the first few pages return instantly whereas the 100th
page will take potentially 10+ seconds.
You can only send 1 request per second so you can't multiplex/ parallelize the search requests.
A lot of high-level analysis can be performed using search facets.
There's documentation on facets in the shodan.pdf booklet floating around their site for returning summary information from their API.

Related

Why does PageSpeed Insights keeps returning a high TTI (Time to Interactive) for a simple game?

I submitted my app/game/PWA to PageSpeed Insights and it keeps giving me TTI values > 7000ms and TBT values > 2000ms, as it can be seen in the screenshot below (the overall score for a mobile experience is around 63):
I read what those values mean over and over, but I just cannot make them lower!
What is most annoying, is that when accessing the page in a real-life browser, I don't need to wait 7 seconds for the page to become interactive, even with a clear cache!!!
The game can be accessed here and its source is here.
What comforts me is that Google's own game, Doodle Cricket also scores terribly. In fact, PageSpeed Insights gives it an overall score of "?".
Summing up: is there a way to tell PageSpeed Insights the page is actually a game with only a simple canvas in it and that it is indeed interactive as soon as the first frame is rendered on the canvas (not 7 seconds later)?
UPDATE: Partial Solution
Thanks to #Graham Ritchie's answer, I was able to detect the two slowest points, simulating a mid-tier mobile phone:
Loading/Compiling WebAssembly files: there was not much I could do about this, and this alone consumes almost 1.5 seconds...
Loading the main script file, script.min.js: I split the file into two, since almost two thirds of this file are just string constants, and I started loading them asynchronously, both using async to load the main script and delay loading the other string constants, which has saved more than 1.2 seconds from the load time.
The improvements have also saved some time on better mobile devices/desktop devices.
The commit diff is here.
UPDATE 2: Improving the Tooling
For anyone who gets here from Google, two extra tips that I forgot to mention before...
Use the CLI Lighthouse tool rather than the website (both for localhost and for internet websites): npm install -g lighthouse, then call lighthouse --view http.... (or use any other arguments as necessary).
If running on a notebook, make sure it is not running on the battery, but actually connected to a power source 😅

Summing up: is there a way to tell PageSpeed Insights the page is actually a game with only a simple canvas in it and that it is indeed interactive as soon as the first frame is rendered on the canvas (not 7 seconds later)?
No and unfortunately I think you have missed one key piece of the puzzle as to why those numbers are so high.
Page Speed Insights uses throttling on the Network AND the CPU to simulate a mid-tier mobile phone on a 4G connection.
The CPU throttling is your issue.
If I run your game within the "performance" tab on Google Chrome Developer Tools with "4x slowdown" on the CPU I get a few long tasks, one of which takes 5.19s to run!
Your issue isn't page weight as the site is lightweight it is JavaScript execution time.
You would have to look through your code and see why you have a task that takes so long to run, look for nested loops as they are normally the issue!
There are several other tasks that take 1-2 seconds total between them but that 5 second task is the main culprit!
Hopefully that clears things up a bit, any questions just ask.

XPages performance - 2 apps on same server, 1 runs and 1 doesn't

We have been having a bit of a nightmare this last week with a business critical XPage application, all of a sudden it has started crawling really badly, to the point where I have to reboot the server daily and even then some pages can take 30 seconds to open.
The server has 12GB RAM, and 2 CPUs, I am waiting for another 2 to be added to see if this helps.
The database has around 100,000 documents in it, with no more than 50,000 displayed in any one view.
The same database set up as a training application with far fewer documents, on the same server always responds even when the main copy if crawling.
There are a number of view panels in this application - I have read these are really slow. Should I get rid of them and replace with a Repeat control?
There is also Readers fields on the documents containing Roles, and authors fields as it's a workflow application.
I removed quite a few unnecessary views from the back end over the weekend to help speed it up but that has done very little.
Any ideas where I can check to see what's causing this massive performance hit? It's only really become unworkable in the last week but as far as I know nothing in the design has changed, apart from me deleting some old views.

Try to get more info about state of your server and application.
Hardware troubleshooting is summarized here: http://www-10.lotus.com/ldd/dominowiki.nsf/dx/Domino_Server_performance_troubleshooting_best_practices
According to your experience - only one of two applications is slowed down, it is rather code problem. The best thing is to profile your code: http://www.openntf.org/main.nsf/blog.xsp?permaLink=NHEF-84X8MU
To go deeper you can start to look for semaphore locks: http://www-01.ibm.com/support/docview.wss?uid=swg21094630, or to look at javadumps: http://lazynotesguy.net/blog/2013/10/04/peeking-inside-jvms-heap-part-2-usage/ and NSDs http://www-10.lotus.com/ldd/dominowiki.nsf/dx/Using_NSD_A_Practical_Guide/$file/HND202%20-%20LAB.pdf and garbage collector Best setting for HTTPJVMMaxHeapSize in Domino 8.5.3 64 Bit.
This presentation gives a good overview of Domino troubleshooting (among many others on the web).

Ok so we resolved the performance issues by doing a number of things. I'll list the changes we did in order of the improvement gained, starting with the simple tweaks that weren't really noticeable.
Defrag Domino drive - it was showing as 32% fragmented and I thought I was on to a winner but it was really no better after the defrag. Even though IBM docs say even 1% fragmentation can cause performance issues.
Reviewed all the main code in the application and took a number of needless lookups out when they can be replaced with applicationScope variables. For instance on the search page, one of the drop down choices gets it's choices by doing an #Unique lookup on all documents in the database. Changed it to a keyword and put that in the application Scope.
Removed multiple checks on database.queryAccessRole and put the user's roles in a sessionScope.
DB had 103,000 documents - 70,000 of them were tiny little docs with about 5 fields on them. They don't need to be indexed by the FTIndex so we moved them in to a separate database and pointed the data source to that DB when these docs were needed. The FTIndex went from 500mb to 200mb = faster indexing and searches but the overall performance on the app was still rubbish.
The big one - I finally got around to checking the application properties, advanced tab. I set the following options :
Optimize document table map (ran copystyle compact)
Dont overwrite free space
Dont support specialized response hierarchy
Use LZ1 compression (ran copystyle compact with options to change existing attachments -ZU)
Dont allow headline monitoring
Limit entries in $UpdatedBy and $Revisions to 10 (as per domino documentation)
And also dont allow the use of stored forms.
Now I don't know which one of these options was the biggest gain, and not all of them will be applicable to your own apps, but after doing this the application flies! It's running like there are no documents in there at all, views load super fast, documents open like they should - quickly and everyone is happy.
Until the http threads get locked out - thats another question of mine that I am about to post so please take a look if you have any idea of what's going on :-)
Thanks to all who have suggested things to try.

Heroku taking 2 seconds to load every page--including pages that simply render a single text string

No, this is NOT the "my page doesn't have any traffic and it has to be reloaded" issue.
We have 4 dynos for an alpha application. The reason we do, is because each page takes over 2 seconds to load. Even little things like rendering a text string (no layouts, erb or anything).
If I watch our logs, for our longer queries, they report response times in the 300-700ms range--which is far shorter than 2 seconds.
The DNS is cached, and the collective time to load given that isn't a slow DNS issue. And, that shouldn't affect subsequent page loads, right?
Any thoughts on how to get to the bottom of this would be appreciated.
Here are two screenshots to show what I mean.
http://dl.dropbox.com/u/7175041/Screenshots/qo.png
http://dl.dropbox.com/u/7175041/Screenshots/qq.png
Thanks!

First thing I'd do is to switch on NewRelic Basic - it's a free performance monitor integrated with Heroku. That'll help you get a bearing on the basics of where the trouble is coming from.
I take it that you don't see similar results locally? If you don't, then skip this step, but if you do, you can also run NewRelic locally and interrogate all of your queries for response times.
I'd stay away from using things like the Benchmark library - that was my first thought in troubleshooting a speed issue, but Benchmark is necessarily going to ignore elements of your app that are outside the pure Ruby layer, and if that's where you're slow then NewRelic catches that anyway.
Finally, if all else fails, a support ticket with Heroku's team has always been extremely helpful to me. Just make sure you check the box that lets them clone your app, it makes things a lot easier for them.
Let us know what you find out - I'm curious to see what the particular gremlin is!

Performance problem website

We have website, which is based on Drupal. There is a ~30 modules, which is not huge amount for our VPS. We have no high traffic, so traffic doesn't making Site overload.
On same VPS we have other Sites which are loading properly.
Site: http://jnews.am
Where I can start? How can I check what part of my server/website is causing performance issues? What investigating methods you can suggest?

You need to dig in to find out the bottle-neck. Here is a good article about it. http://www.mindyourcode.com/php/drupal-profiling-performance-analysis-for-optimization-finding-the-performance-bottlenecks-in-your-application-at-core-level/

Good question. Here's some answers:
1) Downlaod the YSlow extension to Firefox, and install it. This will let you test on a number of different items that can suggest why your site may be slow. This doesn't currently look to be your current big problem right now, though.
2) Install the Firebug extension to firefox. The 'Net' tab of firebug tells you how long every document took to download. your core page is taking 5 seconds, and for some reason system.css is taking almost as long, which is unusual in that it's a static file.
3) Check and see if you've got any slow queries, and why. Assuming you're using mysql, this page http://dev.mysql.com/doc/refman/5.0/en/slow-query-log.html tells you how to set up the slow query log, which will collect and report which queries are taking a long time to complete.
Beyond that, some suggestions: You're almost certainly not using some of Drupal's performance options, such as caching, and I would suggest using memcache to speed the site up as well. (See http://drupal.org/project/memcache)
It really looks like you've got a query that's taking too long. It seems to me that the slow query log is what's going to be the most useful tool - it'll tell you where you need to optimize your site. Do note that mysql tends to use the first index on a table that it finds in the WHERE clause instead of the fastest, and as such a query like "WHERE type='story' AND status = 1" is faster than "WHERE status = 1 AND type='story'", because there type index filters the data better than the status one. (And Views tends to put the items in the where clause in the same order that they're in the Filter section.)

Is there a generally acceptable definition of (soft) realtime delays?

I'm trying to find a benchmark for how long users are willing to wait for a response from a remote service. In my case the response is for very useful but not business critical validation of data entry. I guess that there must have been some work done in the HCI space on this.
If you know of a generally accepted definition for soft realtime responses then great but I'd also appreciate your well reasoned thoughts.
Chris

US DOD MIL-STD 1472-F Human Engineering Standard has the most widely accepted requirements for maximum allowed response time (from Table XXII, page 196, times in seconds):
Key Response (Key depression until positive response, e.g., "click"): 0.1
Key Print (Key depression until appearance of character): 0.2
Page Turn (End of request until first few lines are visible): 1.0
Page Scan (End of request until text begins to scroll): 0.5
XY Entry (From selection of field until visual verification): 0.2
Function (From selection of command until response): 2.0
Pointing (From input of point to display point): 0.2
Sketching (From input of point to display of line): 0.2
Local Update (Change to image using local data base, e.g., new menu list): 0.5
Host Update (from display buffer): 2.0
File Update (Change where data is at host in readily accessible form): 10.0
Inquiry - Simple (e.g., a scale change of existing image): 2.0
Inquiry - Complex (Image update requires an access to a host file): 10.0
Error Feedback (From command until display of a commonly used message): 2.0
As you can see, acceptable response time depends on what response the user is waiting for. For something like a pulldown menu appearing, it's 0.5 seconds max. For a full page load in a browser, you want something to appear in 1.0 s to 2.0 s and the full page loaded in 10.0 s. In all the above, shorter response times are better. Only in bizarre circumstances will users object to a 0.001s response time.
In any case, if the response time will be greater than 0.5 s, then you need to provide feedback such as a throbber or hourglass sprite. If response time is a minimum of 5-15 s (depending on what standard you use), provide a progress bar. With a progress bar, very long response times (on the order or minutes over even hours) may be acceptable as long as you set it up for the user as a “batch” process rather than being an interactive program. It's much better for the user to make all input and wait an hour than to make input on four occasions, waiting 15 minutes after each.
The above list has the accepted standards. How long your users are willing to wait (e.g., before giving up) essentially boils down to the user making a cost-benefit analysis. Is what I’m going to get worth the wait? What are my sunk costs? Is there an alternative (e.g., another web site) that can do it better? Can I do other things while I wait to make the most of my time? However, whatever users willing to do, you can bet they’ll resent delays greater than the standards above.

Human reaction time seems to be around 200 ms - anything around there will be perceived as instantaneous. That sort of number is hard to achieve, especially in an application that gets information from remote services.
If you take a look at Google's search suggestion box, the lag there is minimal - less than a second. It's astoundingly fast, and really remarkable for a web application. This is really nice for Google's users, but it's bad news for you. These days, users expect most applications to react with the same sort of speed an efficiency; anything slower is considered rather laggy. However, it's worth noting that people's patience usually varies with the complexity of the task at hand. A simple form submit should never take much time, but something like uploading photos is expected to take a while.
My feeling is this: go with your gut. If your application is fairly simple then you should try to get the wait/load time down to less than a second. If you can't, then your best bet is to add an indicator so the user knows that some computations are being done in the background. This can be in the form of a small animation or a progress bar.

Unfortunately, the answer to this question is not typically a well-defined number. Users expectations vary widely and can change depending on what it is you're talking about.
As computers continue to become more ubiquitous and we (the consumers) continue to have growing expectations of speed, remote services, websites, and even applications will need to continue to respond more quickly. Generally speaking, you want everything to be as fast as possible.
With this said, I would look at what your remote service is for. Since you said, "the response is for very useful..." to me, that means it probably will get used frequently. People tend to use what is useful. If that's the case, I would look for ways to make that remote service respond quickly.
Of course, there is also the caveat that you don't want to start optimizing before the service is written. What is the current response time? What is the context in which this will be used? Those factors will do a lot to determine the longest users are willing to wait for the service.

You might want to search for "SLA" or "Service Level Agreement". Those are the documents in a web business that make guarantees as to how long data will take to get back to the user, whether it's an HTML document or a web service call.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio