Removing entire entry if line is less than an amount desired - windows

I have a long list made up of text like this
Email: example#example.com
Language Spoken: Sample
Points: 52600
Lifetime points: 100000
Country: US
Number: 1234
Gender: Male
Status: Activated
=============================================
I need a way of filtering this list so that only students with higher than 52600 points gets shown. I am currently looking at solutions for this, I thought maybe excel would be a start but am not too sure and wanted input.

Here's a solution in Excel:
1) Copy Text into Column A
2) In B1 enter "1", then in B2 enter the formula: =IF(LEFT(A1,1)="=",B1+1,B1), then copy that down to the end.
(This splits the text into groups divided by the equal signs)
3) In C1 enter the formula: =IF(LEFT(A1,8)="Points: ",VALUE(RIGHT(A1,LEN(A1)-8)),0), then copy that down to the end.
(Basically this is populating the points in column B)
4) In D1 enter the formula: =SUMIF(B:B,B1,C:C), then copy that down to the end.
(This just sums the amounts in column B by grouping)
5) Finally put a filter on Column D, and filter by greater than or equal to the amount desired.

Related

Most common "denominators" in a two column list in Google Sheets

How can I find the most commonly found 'Code' (Col B) associated with each unique 'Name' in (Col A) and find the closest value if the 'Code' in Col B is unique?
The image below shows the shared google sheet with Starting data in Columns A & B and the desired output columns in columns C and D. Each Unique Name has associated codes. Column D displays the most commonly occuring Code for each unique name. For example, Buick La Sabre 1 has 3 associated codes in B3,B4,B5 but in D3 only 98761 because it appears more frequently than the other 2 codes do in B2:B. I will explain what I mean by the closest value below.
The Codes that have a count = 1 are unique so the output in column D tries to find the closest match.
However, when the count of the code in B2:B > 1, then the output in column D = to the most frequent code associated with the Name.
Approach when there is 2 or more of the same values in column B
Query
I thought I might use a QUERY with a ORDER BY count(B) DESC LIMIT 2 in a fashion similar to this working equation:
QUERY($A$1:$D$25,"SELECT A, B ORDER BY B DESC Limit 2",1)
but I could not get it to work when I substituted in the Count function.
SORT & INDEX OR VLOOKUP
If the query function can't be fixed to work, then I thought another approach might be to combine a Vlookup/Index after sorting column B in a descending order.
UNIQUE(sort($B$3:$B,if(len($B$3:$B),countif($B$3:$B,$B$3:$B),),0,1,1))
Since a Vlookup or Index using multiple criteria would just pull the first value it finds, you would just end up with the first matching value, we would then get the most frequent value.
Approach when there is < 2 of the same values in column B
This is a little more complicated since the values can be numbers and letters.
A solution like that seen in the image below could be used if everything were a number. In our case there will usually be between 3 - 5 character alphanumeric code starting with 0 - 1 letters numbers and followed by numbers. I'm not sure what the best way to match a code like A1234 would be. I imagine a solution might be to SPLIT off letters and trying to match those first. For example A1234 would be split into A | 1234, then matching the closest letter and then the closest number. But I really am not sure what the best solution to this might be that works within the constraints of Google Sheets.
In the event that a number is equidistant between two numbers, the lower number should be chosen. For example, if 8 is the number and the closest match would be 6 or 10, then 6 should be selected.
In the event that a letter is being used it should work in a similar fashion. For example, thinking of {A, B, C} as {1, 2, 3}, B should preferrentially match to A since it comes before C.
In summary, looking for a way to find the most frequently associated code in col B that is associated with unique names in col A in this sheet and; In the event where there are none of the same codes in B2:B, a formula that will find the closest match for a number or alphanumeric code.
You can use this formula:
=QUERY({range of numerators & denominators}, "select Col2, count(Col2) group by Col2 label Col2 'Denominator', count(Col2) 'Count'")
That outputs something like this:
Denominator
Count
Den 1
Count 1
Den 2
Count 2
use:
=ARRAY_CONSTRAIN(SORTN(QUERY({A3:B},
"select Col1,Col2,count(Col2)
where Col1 is not null
group by Col1,Col2
order by count(Col2) desc,Col2 asc
label count(Col2)''"), 9^9, 2, 1, 1), 9^9, 2)

RStudio Beginner: Joining tables

So I am doing a project on trip start and end points for a bike sharing program. I have two .csv files - one with the trips, which shows a start and end station ID (e.g. Start at 1, end at 5). I then have another .csv file which contains the lat/lon coordinates for each station number.
How do I join these together? I basically just want to create a lat and lon column alongside my trip data so it's one .csv file ready to be mapped.
I am completely new to R and programming/data in general so go easy! I realize it's probably super simple. I could do it by hand in excel but I have over 100,000+ trips so it might take a while...
Thanks in advance!
You should be able to achieve this using just Excel and the VLOOKUP function.
You would need your two CSV files in the same spreadsheet but on different tabs. Your stations would need to be in order of ID (you can order it in Excel if you need to) and then follow the instructions in the video below.
Example use of VLOOKUP.
Hope that helps!
Here is a step-by-step on how to use start and end station ids from one csv, and get the corresponding latitude and longitudes from another.
In technical terms, this shows you how to make use of merge() to find commonalities between two data frames:
Files
Firstly, simple fake data for demonstration purposes:
coordinates.csv:
station_id,lat,lon
1,lat1,lon1
2,lat2,lon2
3,lat3,lon3
4,lat4,lon4
trips.csv:
start,end
1,3
2,4
Import
Start R or rstudio in the same directory containing the csvs.
Then import the csvs into two new data frames trips and coords. In R console:
> trips = read.csv('trips.csv')
> coords = read.csv('coordinates.csv')
Merges
A first merge can then be used to get start station's coordinates:
> trip_coords = merge(trips, coords, by.x = "start", by.y = "station_id")
by.x = "start" tells R that in the first data set trips, the unique id variable is named start
by.y = "station_id" tells R that in the second data set coords, the unique id variable is named station_id
this is an example of how to merge data frames when the same id variable is named differently in each data set, and you have to explicitly tell R
We check and see trip_coords indeed has combined data, having start, end but also latitude and longitude for the station specified by start:
> head(trip_coords)
start end lat lon
1 1 3 lat1 lon1
2 2 4 lat2 lon2
Next, we want the latitude and longitude for end. We don't need to make a separate data frame, we can use merge() again, and build upon our trip_coords:
> trip_coords = merge(trip_coords, coords, by.x = "end", by.y = "station_id")
Check again:
> head(trip_coords)
end start lat.x lon.x lat.y lon.y
1 3 1 lat1 lon1 lat3 lon3
2 4 2 lat2 lon2 lat4 lon4
the .x and .y suffixes appear because merge combines two data frames, and our data frame 1 was trip_coords which already had a lat and lon, and data frame 2 coords also has lat and lon. So the merge() function needed to help us tell them apart after merge, so
for data frame 1, aka original trip_coords, lat and lon is automatically renamed to lat.x and lon.x
for data frame 2, aka coords, has lat and lon is automatically renamed to lat.y and lon.y
But now, the default result puts variable end first. We may prefer to see the order start followed by end, so to fix this:
> trip_coords = trip_coords[c(2, 1, 3, 4, 5, 6)]
we re-order and then save the result back into trip_coords
We can check the results:
> head(trip_coords)
start end lat.x lon.x lat.y lon.y
1 1 3 lat1 lon1 lat3 lon3
2 2 4 lat2 lon2 lat4 lon4
Export
> write.csv(trip_coords, file = "trip_coordinates.csv", row.names = FALSE)
saves csv
where file = to set the file path to save to. In this case just trip_coordinates.csv so this will appear in the current working dir, where you have the other csvs
row.names = FALSE otherwise by default, the first column is filled with automatic row numbers
You can check the results, for example on Linux, on your command prompt:
$ cat trip_coordinates.csv
"","start","end","lat.x","lon.x","lat.y","lon.y"
"1",1,3,"lat1","lon1","lat3","lon3"
"2",2,4,"lat2","lon2","lat4","lon4"
So now you have a method for taking trips.csv, getting lat/lon for each of start and end, and outputting a csv again.
Automation
Remember that with R you can automate, write the exact commands you want to run, save it in a myscript.R, so if your source data changes and you wish to re-generate the latest trip_coordinates.csv without having to type all those commands again, you have at least two options to run the script
Within R or the R console you see in rstudio:
> source('myscript.R')
Or, if on the Linux command prompt, use Rscript command:
$ Rscript myscript.R
and the trip_coordinates.csv would be automatically generated.
Further resources
How to Use the merge() Function...: Good VENN diagrams of the different joins

Highlighting mininimum row value in Pander

I am trying to display a dataframe in an RMarkdown document using the Pander package.
I would like to highlight the minimum value in each row of values. Here's what I have tried:
df <- replicate(4, rnorm(5))
df <- as.data.frame(df)
df$min <- apply(df, 1, min)
emphasize.strong.cells(which(df == df$min, arr.ind = T))
pander(df[1:4])
When I do this I get the error:
Error in check.highlight.parameters(emphasize.strong.cells, nrow(t), ncol(t)) :
Too high number passed for column indexes that should be kept below 6
I can print out the whole table (with the min column) without any trouble or I can print out a partial table without emphasis, but neither of these is ideal. I want the highlighting, but I do not wish to include the 'min' column.
I imagine the fact that I am leaving some highlighted cells out of the pander command is causing the error.
Is there a way around this? Or a better way to do this?
Thanks.
Subquestion: What if I wanted to highlight the minimum in the first few rows and the maximum in the next few. Is that possible in a single table?
Instead of the which lookup, with the possibility to match row minimums in the wrong rows, you can easily construct those array indices with a simple sequence (1:N) and calling which.min on each row, eg with apply:
> df <- replicate(4, rnorm(5))
> df <- as.data.frame(df)
> emphasize.strong.cells(cbind(1:nrow(df), apply(df, 1, which.min)))
> pander(df)
----------------------------------------------
V1 V2 V3 V4
----------- ----------- ----------- ----------
0.6802 0.1409 **-0.7992** 0.1997
0.6797 **-0.2212** 1.016 0.6874
2.031 -0.009855 0.3881 **-1.275**
1.376 0.2619 **-2.337** -0.1066
**-0.4541** 1.135 -0.1566 0.2912
----------------------------------------------
About your next question: you could of course do that in a single table, eg rbind two matrices created similarly as described above with which.min and which.max.

How to remove values in spreadsheet B based on values in spreadsheet A?

I am working on automating a business process using excel macros in VB and I have it all completed except for one part. I have an inventory sheet that I would like updated upon running the macro. What it would do is search an order file for part numbers, compare those part numbers with the inventory sheet, and then remove inventory quantities within the inventory sheet based on the values found within the order sheet. These are in two separate workbooks. Here is an example of how it looks:
Spreadsheet A - Order Sheet:
A B C
Part #: Description: Quantity
123456 Item 1 1
1234567 Item 2 1
12345678 Item 3 1
Spreadsheet B - Inventory Sheet:
A B C
Part #: Description: Quantity
123456 Item 1 580
1234567 Item 2 790
12345678 Item 3 578
So this program would subtract values in Spreadsheet B - Column C based on the values in Spreadsheet A - Column C and Column A
In the order sheet even if a customer orders multiple items it shows each purchase as a separate item, so this program would only need to remove quantities of one at a time.
I'm rather new to this type of Excel Automation so any input would be greatly appreciated. I've been looking into Vlookup but from what I understand it only looks for information and displays existing values.
If the idea is to remove the "Orders" from the "Inventory" every time you run a macro, the right thing to do should be, in words, for each line in "Orders", search the corresponding object into the inventory and subtract the quantity.
In code, it's as easy as in words:
For j = 2 To Sheets("Orders").Range("A2").End(xlDown).Row
For k = 2 To Sheets("Inventory").Range("A2").End(xlDown).Row
If Sheets("Orders").Range("A" & j).Value = Sheets("Inventory").Range("A" & k).Value Then '<-- when the object is found
Sheets("Inventory").Range("C" & k).Value = Sheets("Inventory").Range("C" & k).Value - Sheets("Orders").Range("C" & j).Value '<-- subtract order's value
Exit For '<-- you don't need to loop any further after having found the object
End If
Next k
Next j
Press alt+f11
right click project on left and insert a module.
type in the mane code pane:
Public Sub UpdateInventory
'place some code like
for n1=0 to 1000
for n2=0 to 100
InventoryItemCode= Sheets("Inventory").range("A1").offset(n1,0).value
OrderCode=Sheets("Orders").range("A1").offset(n2,0).value
If InventoryItemCode=OrderCode then
'etc....
End if
Next n2
NEXT n1
End sub
see google for troubleshooting

separate a row of strings into separate columns using ruby

I am trying to manipulate a csv file using Ruby which will separate a row of strings into separate columns. Starting with 'Part#' to create a column then move past the comma to 'Quantity' and create a second column next to it and so on... I anticipate that I will need to utilize the split method to create an array. Is this the best method and how would I paste the array into excel so that it creates rows?
I would like the same thing to happen for the rows below the header containing the actual data where it separates into S-001, 1, [Mela] etc.
Here is a sample of the csv:
Sheet Goods
Part#,Quantity,Description,Length(L),Width(W),Thickness(T),Square Foot (per),Square Foot (total),Total Length (Feet),Material,
S-001,1, [Mela] Fridge Sides, 30",12",0 5/8",2.5,2.5,2.5,Not assigned,
S-002,1, [Mela] Fridge Sides#1,30",12",0 5/8",2.5,2.5,2.5,Not assigned,
S-003,1, [Mela] Fridge TB,32 1/4", 30",0 5/8",6.72,6.72,2.69,Not assigned,
S-004,1, [Mela] Fridge TB#1,32 1/4", 30",0 5/8",6.72,6.72,2.69,Not assigned,
S-005,1, [Mela] Fridge back,32 3/4",11 1/4",0 5/8",2.56,2.56,2.73,Not assigned,
Any help would be appreciated!
EDIT:
This is what the data should look like by the time it's done:
Sheet Goods
Pat# Quantity Description Length (L) Thickness (T) Square Foor (per) Square Foot (total) Total Length (Feet) Material
S-001 1 [Mela] Fridge Sides 30 5/8 2.5 2.5 2.5 Not assigned
Where the commas are removed and the data between the commas are put into separate columns.
Mark
First, use the libraries for the task: CSV. Secondly, it's pretty handy to have the rows indexed by column name (and not by a meaningless number). An example (where you'd get all widths):
require 'csv'
rows = CSV.open("data.csv")
name, headers = rows.take(2)
quantities = rows.map do |row_values|
row = Hash[headers.zip(row_values)]
# here you specific processing
row["Width(W)"]
end
As noted by Jim, your text is not valid CSV, double quotes are reserved.

Resources