How to delete lines containing a string in specific column - ruby

I have a big CSV file, and I need to delete all rows that contain a certain string ("linux") in the fourth column. Could you advise me how to do it?

Try below code
csv_lines = CSV.read('path/to/file.csv',headers:true).reject{|row| row[3].to_s.include?('linux') }

Related

How do I sort a data range but only return one column? (Sheets)

I have a data range in Google Sheets where I want to sort the data by column B, but only return column A. If it matters, column A is a string, column B is integers.
Using =SORT(A1:B10,2,FALSE) returns both columns A and B, sorted by column B...but I only want it to return column A.
I've also tried:
=QUERY((SORT(A1:B10,2,FALSE)),"select *") <- does exactly the same as sort, tried just for testing
=QUERY((SORT(A1:B10,2,FALSE)),"select col1") <- #value error
=QUERY((SORT(A1:B10,2,FALSE)),"select A") <- #value error (also tried "select A:A" and "select A1:A10")
=QUERY((SORT(A1:B10,2,FALSE)),"select Stat") <- #value error
I've also tried all of the above, but starting with =QUERY(A1:B10,SORT(...
Am I using QUERY wrong? Is SORT not what I want? I could just use SORT in a hidden part of the sheet, then reference the column I want but that feels cheaty, I want to know if there's a way to do what I want to do.
You can set in the first part the column you want to be returned, then the column you want to be sorted with, and then if it's ascending or not (you can then add other columns, obviously. They don't need to be included nor contiguous, but of the same size). Try this:
=SORT(A1:A10,B1:B10,FALSE)
use:
=INDEX(SORT(A1:A10,2,),,1)

Remove all columns to the right of a specific column

I have an Excel template file with a dynamic number of columns that represent work week dates. Some users have decided to add their own subtotal columns to the right of those columns. I need a way to identify the first blank column, and then truncate that column and all columns following it.
I had previously been using the following script to remove all columns that begin with the word "Column":
// Create a list of columns that start with "Column" and remove them.
Removed_ColumnNum_Columns = Table.RemoveColumns(PreviousStepName, List.Select(Table.ColumnNames(PreviousStepName), each Text.StartsWith(_, "Column") )),
Based on being able to find the first ColumnXX column, I want to remove it and all columns after it
You can use List.PositionOf to get your ColumnIndex instead of parsing text.
I'd put it together like this:
// [...]
ColumnList = Table.ColumnNames(#"Promoted Headers"),
ColumnXX = List.Select(ColumnList, each Text.StartsWith(_, "Column")){0},
ColumnIndex = List.PositionOf(ColumnList, ColumnXX),
ColumnsToKeep = List.FirstN(ColumnList, ColumnIndex),
FinalTable = Table.SelectColumns(#"Promoted Headers", ColumnsToKeep)
Remove Columns after ColumnXX
Find the first column that begins with the name "Column" and delete that column and all columns following it. This parses the XX as the column index so you need to make sure you haven't deleted columns prior to this step. i.e. "Column35" needs to be the 35th column at this step in the code.
// Find the first ColumnXX column and remove it and all columns to the right.
ColumnXX = List.Select(Table.ColumnNames(#"Promoted Headers"), each Text.StartsWith(_, "Column")){0},
ColumnIndex = Number.FromText(Text.Middle(ColumnXX, 6,4)),
ColumnListToRemove = List.Range(Table.ColumnNames(#"Promoted Headers"),ColumnIndex-1),
RemovedTrailingColumns = Table.RemoveColumns(#"Promoted Headers", ColumnListToRemove),
To make this more robust I would prefer to have a way to identify the column index of columnXX without parsing the digits from it.

Sort CSV data in ascending order by the column number

I want to sort CSV data like this:
"key1","1007829"
"key2","1003196"
"key3","999604"
by the ascending order of the number of the second column, like shown below. So I want a CSV result like like:
"key3","999604"
"key2","1003196"
"key1","1007829"
What can I do?
lines = CSV.read("path/to/file.csv")
sorted_lines = lines.sort_by{|line| line[1].to_i}
This will read in the entire csv as an array of line_arrays.
It will sort the line_arrays by the second value, converted to integer.

How to split a Webix datatable column into multiple columns?

In my webix datatable, I am showing multiple values in the cells for some columns.
To identify which values belong to which header, I have separated the column headers by a '|' (pipe) and similarly the values under them as well.
Now, in place of delimiting the columns by '|' , I need to split the columns into some editable columns with the same name.
Please refer to this snippet : https://webix.com/snippet/8ce1148e
In this above snippet, for example the Scores column will be split into two more editable columns as Rank and Vote. Similarly for Place column into Type and Name.
How the values of the first array elements is shown under each of them will remain as is.
How can this be done ?
Thanks
While creating the column configuration for webix, you can provide array to the header field for the first column along with the colspan like below:
var columns = [];
columns[0] =
{"id":"From", "header":[{"text":"Date","colspan":2},{"text":"From"}]};
columns[1] =
{"id":"To","header":[null, {"text":"To"}]};
column[0] will create Date and From and column[1] will be creating the To.

Is there an ISNUMBER() or ISTEXT() equivalent for Power Query?

I have a column with mixed types of Number and Text and am trying to separate them into different columns using an if ... then ... else conditional. Is there an ISNUMBER() or ISTEXT equivalent for power query?
Here is how to check type in Excel Powerquery
IsNumber
=Value.Is(Value.FromText([ColumnOfMixedValues]), type number)
IsText
=Value.Is(Value.FromText([ColumnOfMixedValues]), type text)
hope it helps!
That depends a bit on the nature of the data and how it is originally encoded. Power Query is more strongly typed than Excel.
For example:
Source = Table.FromRecords({[A=1],[A="1"],[A="a"]})
Creates a table with three rows. The first row's data type is number. The second and third rows are both text. But the second row's text could be interpreted as a number.
The following is a query that creates two new columns showing if each row is a text or number type. The first column checks the data type. The second column attempts to guess the data type based on the value. The guessing code assumes everything that isn't a number is text.
Example Code
Edit: Borrowing from #AlejandroLopez-Lago-MSFT's comment for the interpreted type.
let
Source = Table.FromRecords({[A=1],[A="1"],[A="a"]}),
#"Added Custom" = Table.AddColumn(Source, "Type", each
let
TypeLookup = (inputType as type) as text =>
Table.FromRecords(
{
[Type=type text, Value="Text"],
[Type=type number, Value="Number"]
}
){[Type=inputType]}[Value]
in
TypeLookup(Value.Type([A]))
),
#"Added Custom 2" = Table.AddColumn(#"Added Custom", "Interpreted Type", each
let
result = try Number.From([A]) otherwise "Text",
resultType = if result = "Text" then "Text" else "Number"
in
resultType
)
in
#"Added Custom 2"
Sample output
Put it in logical test format
Value.Type([Column1]) = type number
Value.Type([Column1]) = type text
The function Value.Type returns a type, so by putting it in equation thus return a true / false.
Also, equivalently,
Value.Type([Column1]) = Date.Type
Value.Type([Column1]) = Text.Type
HTH
ISTEXT() doesn't exist in any language I've worked with - typically any numeric or date value can be converted to text so what would be a false result?
For ISNUMBER, I would solve this without any code by changing the Data Type to a number type e.g. Whole Number. Any rows that don't convert will show Error - you can then apply Replace Errors or Remove Errors to handle them.
Use Duplicate Column first if you don't want to disturb the original column.
I agree with Mike Honey.
I have a SKU code that is a mix of Char and Num.
Normally the last 8 Char are Numbers but in some weird circumstances the SKU is repeated with an additional letter but given the same EAN which causes chaos.
by creating a new temp column using Text.End(SKU, 1) I get only the last character. I then convert that column to Whole Number. Any Error rows are then removed to leave only the rows I need. I then delete the temp Column and am left with the Rows I need in the format I started with.

Resources