I did not succeed to get a simple 3 levels indented list with Restructuredtext:
$ cat test.rst
Title
======
- aaaa
- aaaa2
- aaaa2
- aaaa3
- aaaa
- aaaa
Ok
$ rst2html test.rst > /tmp/a.html
test.rst:7: (ERROR/3) Unexpected indentation.
$
I've try different combination of spaces in front of aaaa3 but I get in all cases (ERROR/3) Unexpected indentation.
Nested lists are tricky because different levels require blank lines between those levels.
Nested lists are possible, but be aware that they must be separated from the parent list items by blank lines
This should work:
- aaaa
- aaaa2
- aaaa2
- aaaa3
- aaaa
- aaaa
Related
I have a .csv file of character strings (about 5,400) that appear, in addition to many other strings, in a large .txt file of a huge corpus. I need to count the occurrences of each one of the 5,400 strings in the .txt corpus file. I'm using the shell (I have a Macbook Pro) and I don't know how to write a for loop with an input from one file to then work in another file. The input_file.csv looks like this:
A_back
A_bill
A_boy
A_businessman
A_caress
A_chat
A_con
A_concur
A_cool
A_cousin
A_discredit
A_doctor
A_drone_AP_on
A_fellow
A_flatter
A_friend
A_gay
A_giddy
A_guilty
A_harangue
A_ignore
A_indulge
A_interested
A_kind
A_laugh
A_laugh_AP_at
...
The corpus_file.txt I'm searching through is a cleaned and lemmatized corpus with one sentence per line; this is 4 lines of the text:
A_recently N_pennsylvania N_state_N_university V_launch a N_program that V_pay A_black N_student AP_for V_improve their N_grade a N_c AP_to N_c A_average V_bring 550 and N_anything A_high V_bring 1,100
A_here V_be the N_sort AP_of A_guilty N_kindness that V_kill
what N_kind AP_of N_self_N_respect V_be a A_black N_student V_go AP_to V_have AP_as PR_he or PR_she V_reach AP_out AP_to V_take 550 AP_for N_c N_work A_when A_many A_white N_student V_would V_be V_embarrass AP_by A_so A_average a N_performance
A_white N_student V_would V_be V_embarrass AP_by A_so A_average a N_performance
I am looking to count exactly how many times each of the strings in input_file.csv appear in corpus_file.txt. I can do one at a time with the following code:
grep -c A_guilty corpus_file.txt
And in a few seconds I get a count of how many times A_guilty appears in corpus_file.txt (it appears once in the bit of the corpus I have put here). However, I don't want to do that 5,400 times, so I'm trying to put it into a loop that will output each character string and its count.
I have tried to run the code below:
for input_file.csv in directory/path/folder/ do grep -c corpus_file.txt done
But it doesn't work. input_file.csv and corpus_file.txt are both in the same folder so have the same directory.
I'm hoping to end up with a list of the 5,400 character strings and the number of times each string appears in the large corpus_file.txt file. Something like this:
term - count
A_back - 2093
A_bill - 873
A_boy - 1877
A_businessman - 148
A_caress - 97
A_chat - 208
A_con - 633
This might be all you need:
$ cat words
sweet_talk
white_man
hispanic_american
$ cat corpus
foo
sweet_talk
bar
hispanic_american
sweet_talk
$ grep -Fowf words corpus | sort | uniq -c
1 hispanic_american
2 sweet_talk
If not then edit your question to clarify your requirements and provide more truly representative sample input/output.
I have two variables in bash:
in the first variable, the field separator is (,)
in the second variable, the field separator is also (,)
in the first variable named VAR1 I have:
Maria Debbie Annie,Chewbakka Zero,Yoda One,Vader 001
in the second variable named VAR2:
"number":"11112",Maria Debbie Annie
"number":"11113",Maria Debbie Annie Lisa
"number":"33464",Chewbakka Zero
"number":"22465",Chewbakka Zero Two
"number":"34534",Christine Ashley
"number":"45233",Yoda One
"number":"45233",Yoda One One
"number":"38472",Susanne Ann
"number":"99999",Vader 001
"number":"99991",Vader 001 001
"number":"99992",Vader 001 002
The desired output in variable VAR3:
"number":"11112","number":"33464","number":"45233","number":"99999"
So basically i need to change the names in the output from some name to "number":"somenumber" the same order as in the first variable.
What is also important that there are very similar strings so
Yoda One != Yoda One One also Chewbakka Zero is not equal Chewbakka Zero Two.
VAR2 contains much more lines than listed, I just wanted to show the script needs to find exact matches between VAR1 and VAR2.
Thank you for the help.
Check this out..
> echo "$VAR1"
Maria Debbie Annie,Chewbakka Zero,Yoda One,Vader 001
> echo "$VAR2"
"number":"11112",Maria Debbie Annie
"number":"11113",Maria Debbie Annie Lisa
"number":"33464",Chewbakka Zero
"number":"22465",Chewbakka Zero Two
"number":"34534",Christine Ashley
"number":"45233",Yoda One
"number":"45233",Yoda One One
"number":"38472",Susanne Ann
"number":"99999",Vader 001
"number":"99991",Vader 001 001
"number":"99992",Vader 001 002
> export VAR1A=$(echo $VAR1| sed 's/,/$\|/g' | sed 's/$/\$/g')
> echo "$VAR1A"
Maria Debbie Annie$|Chewbakka Zero$|Yoda One$|Vader 001$
> echo "$VAR2" | egrep "$VAR1A" | awk -F"," ' { printf("%s,",$1)} END { printf("\n") } ' | sed 's/.$//g'
"number":"11112","number":"33464","number":"45233","number":"99999"
>
I'm very new to shell scripting and wasn't sure how to go about doing this.
Suppose I have two files:
file1.csv | file2.csv
--------------------
Apples Apples
Dogs Dogs
Cats Cats
Grapes Oranges
Batman Thor
Borgs Daleks
Kites Kites
Blah Blah
xyz xyz
How do I only keep the differences in each file, and 2 lines above the start of the differences, and 2 lines after? For example, the output would be:
file1.csv | file2.csv
-----------------------
Dogs Dogs
Cats Cats
Grapes Oranges
Batman Thor
Borgs Daleks
Kites Kites
Blah Blah
Thank you very much!
This is a job for diff.
diff -u2 file1.csv file2.csv | sed '1,3d;/##/,+2d' > diff
The diff command will produce a patch style difference containing meta information of the files in the form:
--- file1.csv 2017-05-12 15:21:47.564801174 -0700
+++ file2.csv 2017-05-12 15:21:52.462801174 -0700
## -2,7 +2,7 ##
Any block of difference will have header like ## -2,7 +2,7 ##. We want to throw these away using sed.
1,3d - means delete the top 3 lines
/##/,+2d - delete any lines containing ## and the next 2 lines after it. This is not needed for your case but is good to be included here in case your input suddenly has multiple blocks of differences.
The result of the above commands will produce this list.
Dogs
Cats
-Grapes
-Batman
-Borgs
+Oranges
+Thor
+Daleks
Kites
Blah
The contents has a 1 character prefix, ' ' is common to both, '-' is only on file1.csv while '+' is only on file2.csv. Now all we need is to distribute these to the 2 files.
sed '/^+.*/d;s/^.//' diff > file1.csv
sed '/^-.*/d;s/^.//' diff > file2.csv
The sed commands here will filter the file and write the proper contents to each of the input files.
/^+.*/d - lines starting with '+' will be deleted.
s/^.// - will remove the 1 character prefix which was added by diff.
/^-.*/d - lines starting with '-' will be deleted.
Finally, remove the transient file diff.
I would like to do the following in the bash command line...
I have 2 files. File1 looks like.
585 1504 13 10000 10468 ID1
585 3612 114 10468 11447 ID2
585 437 133 11503 11675 ID1
File2 looks like.
400220 10311 10311
400220 11490 11490
400220 11923 11923
for each number in File2 column 2, I would like to know if it is between any of the number pairs in File1 columns 4 and 5 And create File3.txt with the output as follows...
If yes, I want to write column 2 from File2 and column 6 from File1 to File3.
If no, I want to write column 2 from File2 and the string "NoID" to File3.
So for the example data File3.txt should look like so.
10311 ID1
11490 NoID
11923 NoID
I am used to working in Python and in there would write a script using a nested for loops and if statements, but would prefer to use Bash for this (of which I am still a relative beginner). It seems to me that using a similar nested loop approach combined with awk and other conditional statements could be the way to go. can anyone suggest good ideas with maybe example syntax?
NB. The actual files contain over 3 million rows data
Cheers muchly in advance
awk 'NR==FNR{f[NR]=$4;l[NR]=$5;id[NR]=$6;next}{for(i in id){if($2>f[i]&&$2<l[i]){printf "%-8s%s\n",$2,id[i];next}}printf "%-8s%s\n",$2,"NoID"}' file1 file2
Output:
10311 ID1
11490 NoID
11923 NoID
When I use 'join' to merge two sorted files, the result is unexpected.
here is the example:
//file a.bat
12
123
456
13421
123456
//file b.bat
12
123
5432
123456
execute the command:
$ join -1 1 -2 1 -o '1.1 2.1' a.dat b.dat
12 12
123 123
where 123456 is ignored! In fact, i did try other files, some of them also didn't get full results. why did it happen?
Your input needs to be lexically sorted in order for join to work correctly. Your input is numerically sorted, which is wrong. All strings which start with 1 should be before all strings which start with 2, etc.