How to get statistics from Scopus - scopus

I am working on a survey and I would like to retrieve statistics on the number of papers published within two given years (e.g. 1990 and 2018).
In a paper, for instance (https://doi.org/10.1016/j.apm.2009.10.005), I have seen that such statistics can be obtained from Scopus, but I have no clue how to get those stats.
I would appreciate if you could help!

I realized that one has to use the net / proxy of a library (school) that has access to SCOPUS so that one can make use of the search engine some other data and stats. Regards

Related

Where can I find a good amount of sample elastic search data for practice?

Hello ElasticSearchers,
I want to do some practice on sample data set. Can anyone please point me to some freely available data set for practice.
Thanks.
The Global Terrorism Database (GTD) is an open-source database including information on terrorist events around the world from 1970 through 2016 (with annual updates planned for the future). https://www.start.umd.edu/gtd/
The Armed Conflict Location and Event Data Project (ACLED) is a project that collates data on political violence in developing states, from 1997 to the present. https://www.acleddata.com/data/
2 datasets for the same category, covering the same topic and timeline, geo-tagged, a good example for collect, explore and transform, ideal to make a search engine, aggregator, and deduplication exercises.

Project Server Sharing resources across departments

Microsoft Project Server 2013, I am newbie , I had a system under MSProfessional to see capacity when I had just parts of people
We have a number of staff that work % of time in multiple departments, is there a way of seeing the total capacity of a department e.g. Software Department when we have staff say 50% on SW and 50% on help desk
Also this applies to role so we expect staff to have multiple roles so when working on Help desk they have a different role to Developer.
In Microsoft Project there are many different ways to accomplish this. I know, this is a "programming" site and I have used it for query help in the past but what you are asking for is built into Project already without having to go into development mode. One way you can do this is with one or two custom fields. Or, you could use the RBS and the units on each Resource. I know that the goal is to answer the question but this is one of those questions that is pretty wide open because there are two or three good solutions but each are based on a different business scenario. So, in the spirit of giving a path to do this I would suggest that you add a custom field for each resource with the department classification and then also put in the Resource's capacity and then you can run as many slices of this data as you want from the Resource Center. That will show you availability versus capacity and workload. You also mention roles, you may use a skill set field for that and use that as another method of slicing this data. Again, many choices to get that data directly from the system....
In the end I was able to perform this with custom fields on the resources, able to work with % split roles the amount of capacity by role +ve and negative to give full capacity planning. Managers able to see forecast resource capacity and down to an individual see how they were being utilised.
This was not real time, but an over night process that collated the data into set of Excel pivots to give graphs on trend etc..
Worked a treat as it was even able to model based on historic for different types and classes of projects . Basically gave a detailed portfolio management facility, so even worked when projects had no detailed plans. This was based on either forecast and actual. Also showed issues with the profile of projects e.g. optimistic testing profile, showed resource bottle knecks and where high valued staff doing low valued tasks. Even identifying quality issues where high % of rework on user stories (no unit tests normally ) .

badoo.com user search - how can this be done?

Badoo.com has 56.000.000 user profiles. Profiles can be searched by sex, age, hair color, zodiac, education and so on, plus distance from my hometown, online status and date of registration. So far, this seems doable even if it's quite some query on huge tables (56m members...), it can be cached in a general way.
The interesting part is that they also have an individual "exclude list" (with every profile you look at, you can say that you don't want to meet this person). Plus, you friends don't show up either.
The second interesting part are the OR parts of the query. You can search for someone who's a woman, 25-35, blonde OR brunette, non-smoker, hetero OR bisexual, virgo OR twins OR cancer, living in a 50KM radius of Paris and who is not your friend and not on your exclude list and who's online now. Many ORs, heavy query, sort options, no way of caching or pre-calculating all this, but the search returns 11.298 results in milliseconds.
How do they do such a thing with 56 million datasets and 250K people using it at the same time? Fulltext search indexes? Relational Databases? Key Value Stores?
Does anyone have an idea abou the concept or architecture?
They are most likely built using an inverted indexing technology like Lucene or Sphinx. If you are looking to build a solution, my recommendation would be Apache Solr (a search server built using Lucene). It is very popular, has an active OSS community, and is used by sites such as Netflix, Cnet etc.
I'd recommend to take a look at Badoo Dev Blog. It's in Russian but google translate helps a lot.
In short they are using sharded MySQL and memcached. Here is some badoo evolution list.

Can OLAP be done in BigTable?

In the past I used to build WebAnalytics using OLAP cubes running on MySQL.
Now an OLAP cube the way I used it is simply a large table (ok, it was stored a bit smarter than that) where each row is basically a measurement or and aggregated set of measurements. Each measurement has a bunch of dimensions (i.e. which pagename, useragent, ip, etc.) and a bunch of values (i.e. how many pageviews, how many visitors, etc.).
The queries that you run on a table like this are usually of the form (meta-SQL):
SELECT SUM(hits), SUM(bytes),
FROM MyCube
WHERE date='20090914' and pagename='Homepage' and browser!='googlebot'
GROUP BY hour
So you get the totals for each hour of the selected day with the mentioned filters.
One snag was that these cubes usually meant a full table scan (various reasons) and this meant a practical limitation on the size (in MiB) you could make these things.
I'm currently learning the ins and outs of Hadoop and the likes.
Running the above query as a mapreduce on a BigTable looks easy enough:
Simply make 'hour' the key, filter in the map and reduce by summing the values.
Can you run a query like I showed above (or at least with the same output) on a BigTable kind of system in 'real time' (i.e. via a user interface and the user get's their answer ASAP) instead of batch mode?
If not; what is the appropriate technology to do something like this in the realm of BigTable/Hadoop/HBase/Hive and the likes?
It's even kind of been done (kind of).
LastFm's aggregation/summary engine: http://github.com/zohmg/zohmg
A google search turned up a google code project "mroll" but it doesn't have anything except contact info (no code, nothing). Still, might want to reach out to that guy and see what's up. http://code.google.com/p/mroll/
We managed to create low latency OLAP in HBase by preagragating a SQL query and mapping it into appropriate Hbase qualifiers. For more detail visit below site.
http://soumyajitswain.blogspot.in/2012/10/hbase-low-latency-olap.html
My answer relates to HBase, but applies equally to BigTable.
Urban Airship open-sourced datacube, which I think is close to what you want. See their presentation here.
Adobe also has a couple of presentations (here and here) on how they do "low-latency OLAP" with HBase.
Andrei Dragomir made an interesting talk about how Adobe performs OLAP functionality with M/R and HBase.
Video: http://www.youtube.com/watch?v=5U3EnfiKs44
Slides: http://hstack.org/hbasecon-low-latency-olap-with-hbase/
If you are looking for a table-scan approach, have you considered Google BigQuery? BigQuery does automatic scale-out on the back-side that gives interactive response. There is a good session by Jordan Tigani from the 2012 Google I/O event that explains some of the internals.
http://www.youtube.com/watch?v=QI8623HlYd4
It's not MapReduce but it is geared towards high-speed table scan like what you described.

Filter by zip code, or other location based data retrieval strategies

My little site should be pooling list of items from a table using the active user's location as a filter. Think Craigslist, where you search for "dvd' but the results are not from all the DB, they are filtered by a location you select. My question has 2 levels:
should I go a-la-craigslist, and ask users to use a city level location? My problem with this is that you need to generate what seems to me a hard coded, hand made list of locations.
should I go a-la-zipCode. The idea of just asking the user to type his zipcode, and then pool all items that are in the same or in a certain distance from his zip code.
I seem to prefer the zip code way as it seems more elegant solution, but how on earth do one goes about creating a DB of all zip codes and implement the function that given zip code 12345, gets all zipcodes in 1 mile distance?
this should be fairly common "task" as many sites have a need similar to mine, so I am hoping not to re-invent the wheel here.
Getting a Zip Code database is no problem. You can try this free one:
http://zips.sourceforge.net/
Although I don't know how current it is, or you can use one of many providers. We have an annual subscription to ZipCodeDownload.com, and for maybe $100 we get monthly updates with the latest Zip Code data complete with Lat/Longs of the centroid of the zip code.
As for querying for all zips within a certain radius, you are going to need a spatial library of some sort. If you just have a table of zips with lats/longs, you will need a database-oriented mechanism. SQL Server 2008 has the capability built in, and there are open source libraries and commercial libraries that will add such capabilities to SQL Server 2005. The open source database PostgreSQL has a project, PostGIS that adds this capability to that database. It is here: http://postgis.refractions.net/
Other database platforms probably have similar projects, but those are the ones I am aware of. With one of these DB based libraries you should be able to directly query for any zip codes (or any rows of any kind that have lat/long columns) within a given radius.
If you want to go a different route you can use spatial tools with a mapping library. There are open source options here as well, such as SharpMap and many others (Google can help out) that can use the free Tiger maps for the united states as the data source. However, this route is somewhat more complicated and possibly less performant if all you need is a radius search.
Finally, you may want to look into a web service. This, as you say, is a common need, and I imagine there are any number ob web services that you can subscribe to that can provide all zip codes in a given radius from a provided zip code. A quick Google search turned up this:
http://www.zip-codes.com/free-zip-code-tools.asp#radius
But there are MANY resources to be had for the searching on this subject.
how on earth do one [...] implement the function that given zip code 12345, gets all zipcodes in 1 mile distance?
Here is a sample on how to do that:
http://www.codeproject.com/KB/cs/zipcodeutil.aspx
Just to be technical... PostGIS isn't a project of the Postgres community... it's a stand-alone project that is built on top of Postgres. If you want help or support with PostGIS, you'll want to go to it's community instead of Postgres.
You can use PostGIS. Additionally, I've used deCarta's mapping libraries. They have technology which allows you to geokey any arbitrary data type. Then you can query these spatially.
disclaimer: I work for deCarta
Wouldn't it be more efficient to just figure out which cities are within a 1 mile radius and store that information in a table? Then you don't have to do calculations in the database all the time.

Resources