Analyzing User Agent strings

Analyzing User Agent strings - user-agent

I've recently inherited a large codebase for an online product. One of the things I'm trying to determine is what are the most predominant clients people are using to access the product online (e.g. Browsers, Mobile Devices, IPads, etc...). On the up side I have a database table with all the User Agent strings (about 10 million records). Can anyone recommend a tool that can analyze and summarize the User Agent data?
Note: Keep in mind, these are not IIS Logs. This is a table of just User Agents strings that were captured using a variety of other processes.

Never heard about such kind of software. I may suggest the following option.
You can export this table (with user agent log) to some text file which can reproduce IIS or Apache log structure (simulate) and then feed this log file to standard web log analyzing tool.

Related

What would be the advantages of using ELK for log management over a simple python logging + existing database log table combo?

Assuming I have many Python processes running on an automation server such as Jenkins, let's say I want to use Python's native logging module and, other than writing to the Jenkins console or to a log file, I want to store & centralize the logs somewhere.
I thought of using ELK for that, but then I realized that I can just as well create a dedicated log table in an existing database (I'm using Redshift), use something like Grafana for log dashboards/visualization and save myself the trouble of deploying a new system (most of the people in my team are familiar with Redshift but not with ElasticSearch).
Although it sounds straightforward, I feel like I'm not looking at the big picture and that I would be missing some powerful capabilities that components like Logstash were written for the in the first place. What would these capabilities be and how would it be advantageous to use ELK instead of my solution?
Thank you!

I have implemented a full ELK stack in my company in the past year.
The project was huge and took a lot of time to properly implement. The advantages of using ELK and not implementing our own centralized logging solution would be:
Not needing to re-invent the wheel- There is already a product that is doing just that. (and the installation part is extremely easy)
It is battle tested and can stand huge amount of logs in a short time.
As your business and product grows and shift you will need to parse more logs with different structure which will mean DB changes on self built system. logstash will give you endless possibilities of filtering and parsing those new formatted logs.
It has Cluster and HA capabilities, and you can scale your logging system vertically and horizontally.
Very easy to maintain and change over time.
It can send the needed output to a variety of products including Zabbix, Grafana, elasticsearch and many more.
Kibana will give you ability to view the logs, build graphs and dashboards, alerts and more...
The options with ELK are really endless and the more I work with it, the more I find new ways it can help me. not just from viewing logs on distributed remote server systems, but also security alerts and SLA graphs and many other insights.

ODBC-Performance Access 2010 and SQL Server 2008 R2

I have an Access 2010 application. Access is only the Frontend. The Backend is a SQL-Server 2008. The connection between, is ODBC. The ODBC driver is „SQL Server“ (Version 6.01.7601.17514).
I Access is a table with over 500.000 rows. Every row has 58 columns. So the performance is very, very slow, at the most time. To search for one column is not possible, Access is freezing.
I know, that’s not a new Problem...
Now my questions:
Is the driver ok? Because, when I create an ODBC-Connection local (Windows 8), I can choose also the driver „SQL Server“. But here is the version 6.03.9600.17415.
Is there a difference between the speed? I've got a feeling, that, when I use the Acc local under Win8 with the newer driver, it is faster than Terminal Server and older driver.
Local under Win8 I can also choose the driver „SQL Server Native Client 10.0“ (Version 2009.100.1600.01). What ist he difference between those „Win8-ODBC-Drivers“? Which driver would you use and why?
What is with a newer SQL Server? For example 2014 vs 2008. Is 2014 faster than 2008 with ODBC?
What is about the Server-Hardware? When I use a SSD instead oft he HDD? Make a SSD the ODBC-Connection faster?
All users are working on the Terminal Servers. Main with Office 2010, but also with proAlpha (ERP-System). And also with the Access. Now one user told me, that sometimes, if not many users on the TS‘, Access is much faster. What do you mean? When take one TS and work on it, only with Access, not with other application. Is then the ODBC faster?
What can I try else?
Thank you very much.

I have noticed some performance improvements with SQL Server Native Client 10.0 also using Sql Server 2008 with Access 2010, over the original Native Client.
I would question why you need search/load all 500,000 rows of your table. Assuming this is in a form, it sounds a bit like poor form design. All your forms should only load the records you are interested in, not all records by default. In fact it's considered reasonably good practice to not load any records on form load, until you know what the user is looking for.
58 Columns also sounds a little excessive - are there memo (varchar(Max)) fields included in these columns? These should probably be moved into a separate table. examine your data structure and see if you have normalised it correctly.
Are your fields indexed correctly? If you are searching on them an index will considerably improve performance.
Creating views on sql server that only return a suitable subset of records, that can then be linked as tables within Access can also have performance benefits.

A table with 500,000 rows is small – even for Access. Any search you do should give results in WELL UNDER 1 SECOND!
The best way to approach this is to ask a 90 year old lady at a bus stop the following question:
When you use an instant teller machine does it make sense to download EVERY account and THEN ask the user for the account number? Even 90 year old ladies at bus stops will tell you it would be FAR better to ASK for the account number and then download 1 record!
And when you use Google, you don’t download the WHOLE internet and THEN ask the user what to search for. Or do you create one huge massive web page that you then say use ctrl+f to search that huge browser page.
So think about how near all software works. That software does not download and prepare all the data local and THEN ask you what you want to look for. You do the reverse!
So the simple solution here is to ask the user BEFORE you start pulling data from the server. Build a form that looks like this:
Then, to match the search (say on LastName), you use this code in
after update of the text box.
Dim strSQL As String
strSQL = "select * from tblCustomers where LastName like '" & Me.txtLastName & "*'"
Me.RecordSource = strSQL
That way the form ONLY pulls the data you require – this approach even with 10 million rows will run INSTANT on your computer. The above uses a "*" so only the first few chars of the LastName need be typed in. The result is a form of "choices" You can then jump or edit the one record by clicking on the "glasses" button in above. That simply launches + opens one detail form. the Code is:
docmd.OpenForm "frmCustomer",,,"id = " & me!id
Let’s address a few more of your questions:
Is there a difference between the speed? (ODBC drivers)
No, there really no difference in the driver’s performance wise – they all perform about the same and users likely will never see or notice the difference in performance when using different drivers.
For example 2014 vs 2008. Is 2014 faster than 2008 with ODBC?
Not usually. I mean think of ANY experience you have with computers (unless you are new to computers?). Every time you upgrade to new Word, or new Accounting program, that program is larger, takes longer to load, uses more memory, uses more disk space, and near always uses more processing. So given the last 30 years of desktop computers, in almost EVERY case, the next newer version of software requires more ram, more disk, more processing and thus runs slower than the previous version of that software (I willing to be that is YOUR knowledge and experience – so newer versions tend not to run faster – there are a few “rare” exceptions in computer history, but later versions of any software tends to require more computer resources and not less.
Now one user told me, that sometimes, if not many users on the TS‘, Access is much faster. What do you mean?
The above has nothing to do with ODBC drivers. In the above context when you are using Terminal Server, the both the database application and the front end (Access) are running on the same computer/server. What this means is that data transfer from the server to the application is BLISTERING fast and occurs not at network speed, but at computer speed (since both database and application are running on the SAME server). You could install Access on each computer, and then have Access pull data OVER the network from the server to the client workstation – this is slow since there is a network. With TS then the application and server run very fast without a network in-between. The massive processing and speed of the application and server can work together – once the data is pulled and the screen rendered, then ONLY the screen data comes down the network wire. The result is thus FAR FASTER than running Access on each workstation.
that sometimes, if not many users on the TS‘,
Correct, since the users application is running on the server, then no network exists between the application and the SQL server. However since each user has their application running on the server (as opposed to each workstation computer), then more load and resources are required on the server. If many users are using the server, then the server now has a big workload since the server has to run both SQL server and also allocate memory and processing for each copy of Access running on that server.
A traditional setup means that Access runs on each computer. So the memory and CPU to run Access occurs on each workstation – the server does not have to supply Access with CPU and memory, the server ONLY runs SQL server and facilities requests for data from each workstation. However because networks are FAR slower then processing data on one computer, then your bottle neck is not processing, but the VERY limited network speed. Since both Access and SQL and all processing is occurring on the server, then it is far easier to overload the resources and capacity of that server. However the speed of the network is usually the slowest link in computer setups. Since all processing and data crunching occurs server side, only the RESULTING screens and display is sent down the network wire. If the computer software has to process 1 million rows of data, and then display ONE total result, then only 1 total result comes down the network wire that is displayed. If you run Access local on each workstation and process 1 million rows, then 1 million rows of data must come down he network pipe (however, you can modify your Access design to have SQL server to FIRST process the data before it comes down the network pipe to avoid this issue. However with TS since Access is not running on your computer, then you don’t worry about network traffic – but you MUST STILL worry about how much data Access grabs from SQL server – thus the above tips about ONLY loading the data you require into a form. So don’t load huge data sets into the Access form, but simply ask the user BEFORE you start pulling that data from SQL server.

Performance logging/monitoring API/product

I'm not sure how to categorize this question, so let me just explain what I would like and hopefully it will make sense.
I'm after a product (with an API) which I can send different numbers to with tags, and it will take care of all the monitoring/logging stuff.
So for example, say I have a program that downloads a file from a website every 10 seconds. I would like to monitor how long each of these downloads is taking. It is quite easy in my application to time how long it takes. I would now like to send this number and tag (e.g., tag='download time', value = '1.234') to a 3rd party product. The 3rd party product will now store this value/tag for me. The product will have a website I can go to, and configure a bunch of things. So in this example, I could setup an alert like "if 'download time' > 5 send me an email". I could also visit a website, and view a graph of the logged values and maybe some random statistics (e.g., how often the value has been in the warning/error zone).
I think that's about it. Sure it wouldn't be too hard to do this myself, but I'm no web designer and it'd end up looking pretty ugly. The more user friendly this kind of product is the more willing users will be to look at the data and actually monitor stuff.
Does such a service exist?
EDIT: Products similar to this: http://dashboard.kpilibrary.com/. This is pretty much exactly what I was after, but am still searching around.

There are many monitoring tools out there. Nagios or RHQ (http://rhq-project.org/) come to mind. Most of the tools work a little different: rather than throwing stuff at them, they have plugins that actively go out and do something to do the measuring. In your example, the plugin would download the file and then report the measurement data to the central server, which can then show you graphs or run alerts on it.

On Windows, you can use this:
http://technet.microsoft.com/en-us/library/cc771692%28WS.10%29.aspx
(Windows Performance Monitor)
It pretty much does what you are looking for:
Passively collects performance data (E.g. CPU Usage)
Can be fed App specific performance metrics (E.g. download time)
Can alert you on various thresholds
Has a reporting interface for analyzing metrics
EDIT : http://technet.microsoft.com/en-us/library/cc749249.aspx , more documentation on this.

This answer is specific to Windows.
If you are looking to analyze events from various systems and you also what the opportunity to create your own events you should consider ETW.
The ETW system allows you to consume data events from any number of sub-systems. You can look at an exhaustive list of built in providers by running the following command:
logman query providers
The beauty of ETW is that you also have the opportunity to create your own providers and push your own data into the resulting report. This is a high-performance logging mechanism and is used by Windows itself for many performance investigations.
The resulting report will be an ETL file. This is a standard file that can be viewed using xPerf, ships with Windows SDK, or the build-in ETL analyzer, tracerpt.exe.

Costs for setting up data integration tool for Salesforce

I'm writing a report and thought you guys could help by providing me with the costs of company support in setting up and training a client on a data integrator for Salesforce. E.g., if someone wants to use Salesforce, but first needs a tool to consolidate and transfer data from back office systems to Salesforce how much would that support service cost?

Salesforce actually comes with a very good integration tool called Data Loader. It can be run as an interactive application under Windows or Macintosh, or it can be run as a command-line tool on Windows, Mac or Linux.
In interactive mode, it can import & export CSV files.
In batch mode it can also read data from, and write data to, a database.
For example, I have a Linux server where a daily cron job activates the Data Loader which runs several jobs. Some of these jobs run SQL against a database and upload the resulting data into Salesforce. Other jobs extract from Salesforce (using their SOQL query language, which is SQL-like) and store the information into a database.
Data Loader has a bit of a learning curve for batch mode (mostly around creating some XML configuration files), but the Interactive mode is very easy to use.
So, to answer your question... If it's a one-time data load, just run the interactive version and it's easy. If you want regularly-updated data, then use the batch mode. Support costs for operating the integration are really all in the setup. Once it's running, there shouldn't be any on-going costs unless the data structures change and you want to change the data being transferred. Better yet, if the system is setup by somebody who has done it before, you'll avoid a big learning curve.
If you want a figure to put into your report, then allow 3 days for the initial integration (allows for learning curve) and then a half-day for each additional one. That's generous, but provides extra time to debug problems.

To some degree, it depends on two factors:
Where is the data's source of truth?
How often do you want to sync the data?
If the answers are "it's a weird place and I only need to sync it once," then you probably want to figure out how to get it in CSV form and then use tools built into Salesforce to import it.
However, if the data lives in a database or data warehouse (postgres, mysql, mongo, redshift, snowflake, big query, etc) and especially if you want to keep Salesforce up to date with that source of truth continuously, then you could look into so-called "Reverse ETL" tools made for this purpose.
Costs depend on the tool chosen and the data volumes and other factors, but here are some options:
Grouparoo is an open source Reverse ETL tool. You can host it yourself for free. Paid plans start at $150/month.
Census is a SaaS Reverse ETL tool. Paid plans start at $300/month.
Hightouch is a SaaS Reverse ETL tool. Paid plans start at $350/month.

How do I deploy an Oracle database?

I have an ASP .NET application that connects to an Oracle or a SQL Server database. An installer has been developed to install a fresh database to an existing SQL Server using sql commands such as "restore database..." which simply restores a ".bak" file which we keep under source control.
I'm very new to Oracle and our application has only recently been ported to be compatible with 10g.
We are currently using the "exp.exe" tool to generate a ".dmp" file and then using the "imp.exe" to import it into a developers box.
How would you go about creating an "Oracle Database Installer"?
Would you create the database using script files and then populate the database with required default data?
Would you run the "imp.exe" tool behind the scenes?
Do we need to provide a clean interface for system administrators so that they can just select the destination server and have done, or should we just provide them with the ".dmp" file? What are the best practices?
Thanks.

The question is -- what do your customers know about Oracle?
Nothing? You should probably rethink this position. Oracle is very large and complex. If you assume your customers know nothing, you'll then start providing tutorials and help that's inappropriate.
Minimally Competent? If they're competent, they know enough to run imp by themselves. Also, they know enough to run a script that executes SQL.
Actual DBA's? Most organizations that can afford Oracle can afford real DBA's. Real DBA's can cope with a lot of things -- they do not need much hand-holding. Some of them like to assign storage parameters according to their shop standards.
You should provide a script with reasonable defaults. You should define your script in a way that someone can easily find all of your storage parameters and tweak them if necessary.
Your initial data can be via export/import or via a script. I prefer a script.

I have done this repeatedly from both sides (consumer and provider) as a DBA, developer, and architect.
As a provider, one of my grand accomplishments (in 1996) was the creation of an installation CD for a commercial insurance claims management software product targeted to the largest insurance carriers (a multi-million dollar item). That installation CD installed the Oracle 7.2 RDBMS engine, the FileNet optical storage system (scans paper documents and creates cataloged binary versions), and our custom claim-processing application (built in VB 4.0), all integrated and ready to run. As part of the installation process, the user could skip the Oracle software installation or customize it, and the user could customize/override the database configuration in all of its major details (database, schemas, tablespaces, sizes, disks, etc.).
I also provided the field service for this product, which included traveling to the client site as necessary. I tested the installation CD literally hundreds of times under every imaginable scenario that I could replicate, and we NEVER had a field failure that required even a phone call, let alone a trip (I did travel on four occasions, but for pre-sales stuff instead).
More recently (2007), I scripted the creation of an Oracle 10g database for an internal system at a megacorp. In production, the database was sized at 8 TB, mostly for a single transaction table with high data volume. In test, the database was sized around 1 TB for a modest server. In development, the database was sized around 100 MB to run on my laptop. The EXACT SAME SCRIPTS created all three environments, and I could extend them to handle a new environment/machine in about five minutes. This database involved extreme performance tuning, so customization of all pertinent characteristics was absolutely crucial.
Back to the insurance claims processing product--let me please add that I was originally hired to lead its conversion from a SQL Server database to an Oracle database. That conversion was identified as a business necessity because most potential clients did not view a SQL-Server-based product as a professional, serious solution. That is not quite as common today, but it still applies in general: a software product has a better chance of market penetration if it can accommodate multiple database options as preferred by the target customers (especially enterprise-class customers).
Likewise, the installation CD was also viewed as an essential element. However, that situation and many more have revealed to me that most "real" DBAs will not accept an import-based database installation. As a DBA and architect, I know that I definitely will not for the same reasons.
Simply put, an import-based database installation gives the customer almost no control over the resulting database. It is opaque to the customer, leaving them questioning what it did. It forces the customer to expend massive efforts to attempt to exercise what little control they can. It is notoriously fragile and error-prone (Oracle imports are well known for ownership and permission problems, constraint problems, etc.). Weighing all those impacts, an import-based database installation is unprofessional--it does not put the customers' needs first.
Scripting the database installation provides the right kind of transparency, configurability, selective repeatability, and overall customer control that professionalism demands. It also encourages you to properly understand the impacts of your database design decisions in a way that an import does not.
Best wishes.

Personally I favour SQL scripts to database creation and data loads where possible. I tend to use PL/SQL Developer. It has some good options to generate scripts from an existing database. Once you have these you can run the scripts using sqlplus or any application code that can execute arbitrary SQL (eg JDBC with Java). Toad is the more common (and more expensive) tool for Oracle development.
The only limitation of a SQL export is it can't export CLOB/BLOB fields. If you have those, you either need to do them separately (as a PL/SQL export) or do the whole thing as a PL/SQL export. Theres no dramas with this except the file is effectively a binary export (extension .pde) and is more limited in how you can execute it.
The other big advantage of SQL source files is they can be version controlled easily. It's really handy to be able to create a database environment by running one or two scripts.
The import and export tools for Oracle I think are more applicable for backup and restore operations.
Now, as for delivering that to a customer, from your comments it seems that you'll be giving this to DBAs. Pretty much any Oracle installation will have DBAs involved. They will be fine with SQL scripts to create the schema and do the data load. They will be doing a lot of site-specific configuration (eg tuning the SGA, temp tablespaces, # of concurrent connections, etc based on expected load).
You, as the vendor, can give guidance on any relevant configuration and you may get involved in support and possibly installation but ultimately it's up to them to figure out what works for them. Oracle runs on a large number of operating systems and hardware variants with infinite variations in network topology and firewall configuraiton. You can't factor in all of these to an installer or even a set of instructions (other than the guidelines mentioned previously).

The last time I was involved in the creation of a (oracle) db (for a reasonably large company with in-house DBAs) the DBAs wanted to know things like:
what we wanted to call the db,
what tablespaces we would need, and an estimate of how much data would be in each one
how many users would be connecting.
(From memory) they set up the db and tablespaces, then we provided a combination of simple scripts that they could run (or clear instructions if a task wasn't easy to automate)
As I say this was for an in-house app, so your mileage may vary, but in my case they wanted all instructions clearly spelt out so that (a) there was no possibily of a misunderstanding leading to the wrong thing being done, and (b) no culpability on their part if something didn't work ("we were just following the instructions")

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio