Appropriate data structure to read this file - logic

I have the following info in a text file.
Item Rate
pencil 2
eraser 1
laser 3
pencil 1
torch 4
eraser 1
Specifically, I want to know if any item in the above list has a different price.
For eg: In the above one, you can see that pencil has 2 rates ie 2 and 1.
The price of the eraser is same in both entries, so no problem.
Further complexities - The text file is very huge.
Since dicts don't allow us to store duplicate keys, please suggest ways to solve this problem along with appropriate data structure.

You Can use Hash Table with Separate Chaining Method.Hope it will works

Does the file have to be plain text ? I recommend tackling this problem by using XML format and parsing it with SAX (not DOM !). SAX will not load the entire file in the memory, so it works well with huge file sizes.
As for the data structure, you could always define your own or you could just use something like this Map<KeyType, List<ValueType>>. I feel it's counter-intuitive to have different prices mapped for the same product name. You could create a unique ID for every type of product and have a new field: quantity.

Related

Filter Data for Each Row in a Column

EVE Online Manufacturing Spreadsheet
In Batch!F3:G, I'm attempting to break down the data input from columns B3:C to their components (and eventually materials/minerals in I3:J) by using filter to compare results in Engine!P:R. Multiplied of course by the total number of each finished product I need.
I've been trying to figure out ways to arrayformula this together, and even tried quite a few query functions without success. The best I've been able to come up with is to string the actual formula together, appending them with {}, but this gets bloated quickly. I need this to be open ended because I have a tendency to build a lot of things at once. Any help would be appreciated, even just point me in the right direction!
Well, based on my limited knowledge about google sheet, I can only think of one way to do this automatically.
Here's a sheet I constructed based on your sheet.
https://docs.google.com/spreadsheets/d/1AfX8o05gUGPiN5S90w4o0yxuIYjsJRaXsaYUFTJuEPo/edit?usp=sharing
First, on Engine sheet, add one more column which will give you the number of materials required for that part, which is looked up in the PART LIST of BATCH sheet. For this I use VLOOKUP, as you see in D2.
Then on BATCH sheet, query the materials that VLOOKUP return positive, multiply it by the amount of item and then sum them.
This is done by the QUERY used in F3
This method only if you don't have duplicate item in your PART LIST, due to the way VLOOKUP work.
Of course if you want to break the material list further, you can do the same approach..

Does Spotfire provide any simple way of creating an "Other" category to group entries within a filter?

Right now, I am using a filtering scheme which only looks at the data of the 5 or 6 most common entries in the 'Clinic' field. But, there are a handful of other possibilities which might account for a few rows each. They are too inconsequential to include on their own (I am using pie charts and bar charts), but I would like these rows to be accounted for. For this reason, I would like to create an "Other" category which groups these entries together. What is the best way of doing this? I know I can create a calculated column that groups everything aside from the top 5 or 6 in an other category, but I thought there might be a way to keep working with the original column and achieve the same result.
Unfortunately not. In 6.5.x you will have to write a case statement that will specify everything that is not most common to other.
In 7.0.x you can go to insert binned column. Add the bottom you can use values to create a bin. Add the values you want to the bin and call them "Other". Of course if you look at the column created like this, it is a case statement. But it is a whole lot faster than writing it yourself.
Following phiver I came out with this solution in Spotfire 6.5.2:
Add a calculated column
With something like this:
If(DenseRank(Count([Formation]) OVER ([Formation]),"desc")<10, DenseRank(Count([Formation]) OVER ([Formation]),"desc") & " " & [Formation], "10 Other")
Hope that helps.

Comparison of alphanumeric data in two columns, Excel VBA

I got a simple sheet in which I got some columns of data representing certain grid-line i.e. A-25, B-35, C-41 etc and I got another column in which the data is like B-25.5, C-26.5 etc. I want to compare the both columns and know for each item that which item is bigger or smaller (grid-line sense) than the other. e.g. C-26.5 will surely be lesser than C-41. I need a simple algorithm or code to achieve the same.

Reading XML-files with StAX / Kettle (Pentaho)

I'm doing an ETL-process with Pentaho (Spoon / Kettle) where I'd like to read XML-file and store element values to db.
This works just fine with "Get data from XML" -component...but the XML file is quite big, several giga bytes, and there fore reading the file takes too long.
Pentaho Wiki says:
The existing Get Data from XML step is easier to use but uses DOM
parsers that need in memory processing and even the purging of parts
of the file is not sufficient when these parts are very big.
The XML Input Stream (StAX) step uses a completely different approach
to solve use cases with very big and complex data stuctures and the
need for very fast data loads...
There fore I'm now trying to do the same with StAX, but it just doesn't seem to work out like planned. I'm testing this with XML-file which only has one element group. The file is read and then mapped/inserted to table...but now I get multiple rows to table where all the values are "undefined" and some rows where I have the right values. In total I have 92 rows in the table, even though it should only have one row.
Flow goes like:
1) read with StAX
2) Modified Java Script Value
3) Output to DB
At step 2) I'm doing as follow:
var id;
if ( xml_data_type_description.equals("CHARACTERS") &&
xml_path.equals("/labels/label/id") ) {
id = xml_data_value; }
...
I'm using positional-staz.zip from http://forums.pentaho.com/showthread.php?83480-XPath-in-Get-data-from-XML-tool&p=261230#post261230 as an example.
How to use StAX for reading XML-file and storing the element values to DB?
I've been trying to look for examples but haven't found much. The above example uses "Filter Rows" -component before inserting the rows. I don't quite understand why it's being used, can't I just map the values I need? It might be that this problem occurs because I don't use, or know how to use, Filter Rows -component.
Cheers!
I posted a possible StAX-based solution on the forum listed above, but I'll post the gist of it here since it is awaiting moderator approval.
Using the StAX parser, you can select just those elements that you care about, namely those with a data type of CHARACTERS. For the forum example, you basically need to denormalize the rows in sets of 4 (EXPR, EXCH, DATE, ASK). To do this you add the row number to the stream (using an Add Sequence step) then use a Calculator to determine a "bucket number" = INT((rownum-1)/4). This will give you a grouping field for a Row Denormaliser step.
When the post is approved, you'll see a link to a transformation that uses StAX and the method I describe above.
Is this what you're looking for? If not please let me know where I misunderstood and maybe I can help.

a bit of a string matching conundrum in excel-vba

i'm writing a program at work for a categorizing issue.
i get data in the form of CODE, DESCRIPTION, SUB-TOTAL for example:
LIQ013 COGNAC 25
LIQ023 VODKA 21
FD0001 PRETZELS 10
PP0502 NAPKINS 5
Now it all generally follows something like this...the problem is my company supplies numerous different bars. So there are like 800 records a month with data like this. My boss wants to breakdown the data so she knows how much we spend on a certain category each month. For example:
ALCOHOL 46
FOOD 10
PAPER 5
What I've thought of is I setup a sort of "data-base" which is really a csv text file that contains entries like this:
LIQ,COGNAC,ALCOHOL
LIQ,VODKA,ALCOHOL
FD,PRETZELS,FOOD
FD,POPCORN,FOOD
I've already written code that imports the database as a worksheet and separates each field into its own column. I want excel to look through the file and when it sees LIQ and COGNAC to assign it the ALCOHOL designator. That way I can use a pivot table to get the category sums. For example I want the final product to look like this:
LIQ013 COGNAC 25 ALCOHOL
LIQ023 VODKA 21 ALCOHOL
FD0001 PRETZELS 10 FOOD
PP0502 NAPKINS 5 PAPER
Does anyone have any suggestions? My worry is that a single point expression match to JUST the code i.e. just to LIQ without a match to COGNAC as well would maybe result in problems later when there are conflicting descriptions? I'd also like the user to be able to add ledger entries so that the database of recognized terms grows and becomes more expansive and hopefully more accurate.
EDIT
as per #Marc 's request i'm including my solution:
code file
please note that this is a pretty dumb-ed down solution. i removed a bunch of the fail-safes and other bits of code that were relevant to a robust code but not to our particular solution.
in order to get this to work there are two parts:
the first is the macro source code
the second is the actual file
because all the fail-safes are removed, the file needs to be imported to excel exactly the way it appears. i.e. Sheet1 on the googleDoc should be Sheet1 on the excel, start pasting data at cell "A1". before the macro is run, be sure to select cell "A1" in Sheet1. as i said, there are implementations in the finished product to make it more user friendly! enjoy!
EDIT2
These links suck. They don't paste well into excel.
If your comfortable with it I can email you the actual workbook. Which would help in preserving the formatting etc.
Use a lookup table in a separate sheet. Column A of the lookup sheet contains the lookup value (e.g. PRETZELS), Column B contains the category (FOOD, ALCOHOL, etc). In the cells where you want the category to show up in your original sheet (let's use D3 for the result where B3 holds the "PRETZELS" value), type this formula:
=VLOOKUP(B3,OtherSheet!$A$1:$B$500,2,FALSE)
That assumes that your lookup table is in range A1:B500 of a worksheet named "OtherSheet".
This formula tells Excel to find the lookup value (B3) in column A of your lookup and return the corresponding value from column B of your lookup table. Absolute references (the $) ensure that your formula won't increment cell references when you copy/paste the formula in other cells.
When you get new categories and/or inventory, you can update your lookup table in this one place by just adding new rows to it.

Resources