Can I use Bigdata in ecommerce (Magento)? we are having a custom website that extracts data from DB and displays it as report. But due to large data, timeout is happening…What can i do?
Yes, you can use Big data in e-commerce.
But I don't think the problem you're solving is related to big data. Try generating the report on the server side (batch and store in a table) and use pagination to display the report on the website. If you have to generate the report on user action, try using materialized views or stored procedure to speed up the generation.
Yes, you can use big data platform such as Hadoop and NoSQLs whichever best fit for your use case. In my work experience so far people used Hadoop back-end with BI tools like tableau and Qilikview etc.
Hope this will help.
Related
We're considering Snowflake and want to understand how we could use it, and possibly other tools, to overcome one of our main problems - ETL! We currently use a legacy DWH with an ETL process consisting of SSIS and some views. This has all the common pitfalls of this methodology - most notably that it takes ages!
I was under the assumption that we'd move to an ELT model in Snowflake, I started to research tools to do the 'T' part of it, however, I'm just listening to this podcast: https://www.dataengineeringpodcast.com/snowflakedb-cloud-data-warehouse-episode-110/
And it's suggesting that just slapping a SQL View over something and exposing it in say PowerBI or Tableau is enough for the T part of things!...
Just wondering what people's experience was here?
- Do you do transformations just by writing a view in Snowflake?
- Do you use a third party tool specifically to address this need?
Secondary to this, for the Extraction and Loading, do you:
- Do this using Snowflake only
- Use a third party tool
I'm specifically interested if you do this to create some kind of timeseries in Snowflake from a non timeseries source. That's something we'd be keen to do.
This question is hard to answer without sounding opinionated, especially not knowing your use case. For what it's worth here is what I think:
Don't stick views on top of your tables and expose to a reporting tool unless you have a very very simple setup. If you're considering a tool like Snowflake then you will probably want to go for something more sustainable, this approach can become prohibitive in terms of cost and complexity in your views.
Use a third-party tool to manage your ELT process. Your choice of tool will depend on your internal skills and cloud strategy, have a look at the tools out there like Stich, Fivetran etc. If you don't mind having on-premise technologies why not stick with SSIS or use something like Apache Airflow (requires up-skilling)
Snowflake will not help you with the E of ELT, you will need to use a third-party tool to manage the extract of data from your other systems like SSIS. It will help with the L part, for this you can use Snowpipe or COPY commands which are available within the Snowflake ecosystem. Snowflake will also help you share your data with external parties which is really nice.
My organization has created a fairly complicated dimensional model in Snowflake using layers of SQL views, against which we can point our reporting tools. We use a separate replication tool for extraction from source systems and loading into Snowflake. Using views simplifies our approach in that we don't need to use an additional tool. It also makes managing the code easier than something like SSIS. For instance, we can search for code using the Snowflake interface or our version control tool instead of having to open individual SSIS packages.
i want data intregration part do with the help of Talend and other reporting, dashboard work will do in microstrategy. How can i connect them ?
is any odbc or any kind of process is possible ?
As i understand you want to integrate an ETL tool (Talend) to adecuate data for presentation, the only thing that does not sound to clear for me it's why you have interest in integrate them.
They are two process apart, so you can take the processed data thru the ETL tool (Talend or whatever other tool) to a db and that be the source of the BI tool you prefer.
Regards,
Alejandro
I've spent a lot of time reading and watching videos of people talking about how they use tools designed for handling huge datasets and real-time processing in their architectures. And while I understand what it is that tools like Hadoop/Cassandra/Kafka etc do, no one seems to explain how the data gets from these large processing tools to rendering something on a client/webpage.
From what I understand of big data tools, is that you can't build your application the same way you would a standard web-app querying MySQL, which I can understand given the size of the data that flows through these tools, however, for all this talk of "realtime data analytics" I cannot find any explanation of how the actual analytics gets put in front of someone in terms of some chart/table/etc?
explain how the data gets from these large processing tools to rendering something on a client/webpage.
With respect to this, one way would be to process the big data using Spark or Hadoop and store the results onto a RDBMS. Then have your webapp pull data from RDBMS to render charts, table etc. I can provide you the examples that I have done myself if you need more information.
Impala supports ODBC/JDBC interfaces. So, you actually could hook up a web app to it the same way you do with MySQL.
Other stuff you might want to check out is HBase, Kudu or Solr. In some realtime architectures data ends up in one of those. And all of them have some sort of an API that you can use in your web app to access their data.
If you want a simple solution for realtime data processing and analytics, check out the new Stride API, which enables developers to collect, process, and analyze streaming data and then either visualize summary data in Stride or push processed data out to applications in realtime. This is a very easy way to build the kind of realtime reporting dashboards and monitoring / alerting systems you described above.
Take a look at the Stride API technical docs for examples and more info on how to implement this.
I am optimizing Postgres with Ruby on Rails. For last few days I am finding that my site is loading slowly. Application is using different queries with join of 3-4 tables to fetch the data.
Could you help me what I need to do to improve the performance of the application at database level?
Have a look here. You need to capture and look at all the activity and go from there. The link below provides a open source tool to do that.
http://dalibo.github.io/pgbadger/
I am working on a social networking web based application, which is uses Apache web server and MYSQL server for database with codeigniter MVC frameworks. I don't know how to integrate Hadoop in this application and how to write map- reduce program.
Hadoop and map-reduce have no direct relationship to web applications. You should not integrate Hadoop into a web application as long as you understand web application as something that responds (quickly) to user input (web requests).
Hadoop and map-reduce are very useful for algorithms that run on large datasets in order to transform/extract data/knowledge from those datasets.
While it is true that Hadoop is nowadays mostly used for "offline analytics", it can be useful to web projects as well. For example, to pre-compute recommendations or suggestions that are then provided to the users of a website.
Another case of use is to be able to ETL from multiple sources of data to produce an inverted index for a website (for example, jobs/cars/rentals-like websites with huge amounts of input data).
Always think of Hadoop when you have a "Big Data" problem, not if your website is managing small amounts of data.
Using Hadoop to tackle this sort of problems has some advantages and disadvantages. The obvious advantage is that it makes any sort of batch process (like the examples I mentioned) scale transparently. The disadvantage is that it isn't real-time: you can't use Hadoop to update your website every 5 seconds.
I think Hadoop can have two "classic" usages for the social network style of applications.
First is usage of HBASE to store messaging and other dynamic information. Storage of user profiles in the HBASE also can be considered in order to completely replace MySQL with this kind of NoSQL solution.
Second is usage of Hadoop MapReduce for analysis of Your network. Good example of such analysis is looking for friends suggestions.
Yes it is possible to make web application using apache hadoop as a back-end
You can create web application using apache hive and pig you can write custom mapper and reducers and use as udf , but personal experience it is slow , In case you have very less data , It is better to use other database and do analytics. , I prefer spark is the solution for better reponse time..
By using hadoop analyse your data and take the results into your mysql database. Then use that with your web application.
In your web application you can get required data from Hadoop (like job results) using REST services: https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/WebServicesIntro.html